0% found this document useful (0 votes)

199 views824 pages

Progress in Artificial Intelligence: Francisco Pereira Penousal Machado Ernesto Costa Amílcar Cardoso

Uploaded by

Gerardo Mauricio Toledo Acosta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

199 views824 pages

Progress in Artificial Intelligence: Francisco Pereira Penousal Machado Ernesto Costa Amílcar Cardoso

Uploaded by

Gerardo Mauricio Toledo Acosta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 824

Francisco Pereira · Penousal Machado

Ernesto Costa · Amílcar Cardoso (Eds.)

LNAI 9273

Progress in
Artificial Intelligence
17th Portuguese Conference
on Artificial Intelligence, EPIA 2015
Coimbra, Portugal, September 8–11, 2015, Proceedings

123
Lecture Notes in Artiﬁcial Intelligence 9273

Subseries of Lecture Notes in Computer Science

LNAI Series Editors

Randy Goebel
University of Alberta, Edmonton, Canada
Yuzuru Tanaka
Hokkaido University, Sapporo, Japan
Wolfgang Wahlster
DFKI and Saarland University, Saarbrücken, Germany

LNAI Founding Series Editor

Joerg Siekmann
DFKI and Saarland University, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/1244
Francisco Pereira Penousal Machado
•

Ernesto Costa Amílcar Cardoso (Eds.)

•

Progress in
Artiﬁcial Intelligence
17th Portuguese Conference
on Artiﬁcial Intelligence, EPIA 2015
Coimbra, Portugal, September 8–11, 2015
Proceedings

123
Editors
Francisco Pereira Ernesto Costa
ISEC - Coimbra Institute of Engineering CIUSC, Department of Informatics
Polytechnic Institute of Coimbra Engineering
Coimbra University of Coimbra
Portugal Coimbra
Portugal
Penousal Machado
CIUSC, Department of Informatics Amílcar Cardoso
Engineering CIUSC, Department of Informatics
University of Coimbra Engineering
Coimbra University of Coimbra
Portugal Coimbra
Portugal

ISSN 0302-9743 ISSN 1611-3349 (electronic)

Lecture Notes in Artiﬁcial Intelligence
ISBN 978-3-319-23484-7 ISBN 978-3-319-23485-4 (eBook)
DOI 10.1007/978-3-319-23485-4

Library of Congress Control Number: 2015947099

LNCS Sublibrary: SL7 – Artiﬁcial Intelligence

Springer Cham Heidelberg New York Dordrecht London

© Springer International Publishing Switzerland 2015
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, express or implied, with respect to the material contained herein or for any errors or
omissions that may have been made.

Printed on acid-free paper

Springer International Publishing AG Switzerland is part of Springer Science+Business Media

(www.springer.com)
Preface

The Portuguese Conference on Artiﬁcial Intelligence is returning to Coimbra, 18 years

after its previous edition in this city. UNESCO recently recognized the University of
Coimbra, as well as some areas of the city, as a world heritage site, acknowledging its
relevance in the dissemination of knowledge throughout the fields of arts, sciences,
law, architecture, urban planning, and landscape. It is therefore the ideal period to
welcome back the conference to Coimbra.
EPIA events have a longstanding tradition in the Portuguese Artificial Intelligence
(AI) community. Their purpose is to promote research in AI and the scientific exchange
among researchers, practitioners, scientists, and engineers in related disciplines. The
first edition took place in 1985 and since 1989 it has occurred biennially as an inter-
national conference. Continuing this successful tradition, the 17th Portuguese Con-
ference on Artificial Intelligence (EPIA 2015) took place on the campus of the
University of Coimbra, Coimbra, Portugal (http://epia2015.dei.uc.pt), September 8–11,
2015.
Following the organization of recent editions, EPIA 2015 was organized as a series
of thematic tracks. Each track was coordinated by an Organizing Committee and
includes a specific international Program Committee, composed of experts in the
corresponding scientific area. Twelve thematic tracks were selected for EPIA 2015: -
Ambient Intelligence and Affective Environments (AmIA) - Artificial Intelligence in
Medicine (AIM) - Artificial Intelligence in Transportation Systems (AITS) - Artificial
Life and Evolutionary Algorithms (ALEA) - Computational Methods in Bioinformatics
and Systems Biology (CMBSB) - General Artificial Intelligence (GAI) - Intelligent
Information Systems (IIS) - Intelligent Robotics (IROBOT) - Knowledge Discovery
and Business Intelligence (KDBI) - Multi-Agent Systems: Theory and Applications
(MASTA) - Social Simulation and Modelling (SSM) - Text Mining and Applications
(TEMA).
For this edition, 131 submissions were received. All papers were reviewed in a
double-blind process by at least three members of the corresponding track Program
Committee. Some of the submissions were reviewed by up to 5 reviewers. Following
the revision process, 45 contributions were accepted as full regular papers. This cor-
responds to a full paper acceptance rate of approximately 34 %. Additionally, 36 other
contributions were accepted as short papers. Geographically, the authors of accepted
contributions belong to research groups from 18 different countries – Algeria, Austria,
Brazil, the Czech Republic, Egypt, France, India, Italy, the Netherlands, Norway,
Poland, Portugal, Russia, Spain, Sweden, the UK, and the USA –, confirming the
attractiveness and international character of the conference.
We express our gratitude for the hard work of the track chairs and the members
of the different Program Committees, as they were crucial for ensuring the high sci-
entific quality of the event. This conference was only possible thanks to the joint effort
of many people from different institutions. The main contributions came from the
VI Preface

University of Coimbra, the Polytechnic Institute of Coimbra, and the Centre for
Informatics and Systems of the University of Coimbra. We would also like to thank the
members of the Organizing Committee: Anabela Simões, António Leitão, João Correia,
Jorge Ávila, Nuno Lourenço, and Pedro Martins. Acknowledgment is due to SISCOG
– Sistemas Cognitivos S.A., Feedzai S.A., Thinkware S.A., iClio, FBA., and FCT –
Fundação para a Ciência e a Tecnologia for the financial support. A final word goes to
Easychair, which greatly simplified the management of submissions, reviews, and
proceedings preparation, and to Springer for the assistance in publishing the current
volume.

July 2015 Francisco Pereira

Penousal Machado
Ernesto Costa
Amlcar Cardoso
Organization

The 17th Portuguese Conference on Artiﬁcial Intelligence (EPIA 2015) was

co-organized by the University of Coimbra, the Polytechnic Institute of Coimbra,
and the Centre for Informatics and Systems of the University of Coimbra.

Conference Co-chairs
Francisco Pereira Polytechnic Institute of Coimbra, Portugal
Penousal Machado University of Coimbra, Portugal

Program Co-chairs
Ernesto Costa University of Coimbra, Portugal
Francisco Pereira Polytechnic Institute of Coimbra, Portugal

Organization Co-chairs
Amílcar Cardoso University of Coimbra, Portugal
Penousal Machado University of Coimbra, Portugal

Proceedings Co-chairs
João Correia University of Coimbra, Portugal
Pedro Martins University of Coimbra, Portugal

Program Committee
AmIA Track Chairs
Paulo Novais University of Minho, Portugal
Goreti Marreiros Polytechnic of Porto, Portugal
Ana Almeida Polytechnic of Porto, Portugal
Sara Rodriguez Gonzalez University of Salamanca, Spain

AmIA Program Committee

Antonio Fernández University of Castilla-La Mancha, Spain
Caballero
Amílcar Cardoso University of Coimbra, Portugal
Andrew Ortony Northwestern University, USA
Angelo Costa University of Minho, Portugal
Antonio Camurri University of Genoa, Italy
Boon Kiat Quek National University of Singapore
Carlos Bento University of Coimbra, Portugal
VIII Organization

Carlos Ramos Polytechnic of Porto, Portugal

César Analide University of Minho, Portugal
Dante Tapia University of Salamanca, Spain
Davide Carneiro University of Minho, Portugal
Diego Gachet European University of Madrid, Spain
Eva Hudlicka Psychometrix Associates Blacksburg, USA
Florentino Fdez-Riverola University of Vigo, Spain
Javier Jaen Polytechnic University of Valencia, Spain
Javier Bajo Polytechnic University of Madrid, Spain
José M. Molina University Carlos III of Madrid, Spain
José Machado University of Minho, Portugal
José Neves University of Minho, Portugal
Juan M. Corchado University of Salamanca, Spain
Laurence Devillers LIMS-CNRS, France
Lino Figueiredo Polytechnic of Porto, Portugal
Luís Macedo University of Coimbra, Portugal
Ricardo Costa Polytechnic of Porto, Portugal
Rui José University of Minho, Portugal
Vicente Julian Polytechnic University of Valencia, Spain

AIM Track Chairs

Manuel Filipe Santos University of Minho, Portugal
Filipe Portela University of Minho, Portugal

AIM Program Committee

Allan Tucker Brunel University, UK
Álvaro Silva Abel Salazar Biomedical Sciences Institute, Portugal
Andreas Holzinger Medical University Graz, Austria
António Abelha University of Minho, Portugal
Antonio Manuel de Jesus Polytechnic Institute of Leiria, Portugal
Pereira
Barna Iantovics Petru Maior University of Tîrgu-Mure, Romania
Beatriz de la Iglesia University of East Anglia, UK
Cinzia Pizzi Università degli Studi di Padova, Italy
Danielle Mowery University of Utah, USA
Do Kyoon Kim Pennsylvania State University, USA
Giorgio Leonardi University of Piemonte Orientale, Italy
Gören Falkman University of Skövde, Sweden
Hélder Coelho University of Lisbon, Portugal
Helena Lindgren Ume University, Sweden
Inna Skarga-Bandurova East Ukrainian National University, Ukraine
José Machado University of Minho, Portugal
José Maia Neves University of Minho, Portugal
Luca Anselma University of Turin, Italy
Organization IX

Michael Ignaz Schumacher University of Applied Sciences Western Switzerland

Miguel Angel Mayer Pompeu Fabra University, Spain
Mohd Khanapi Abd Ghani Technical University of Malaysia, Malaysia
Panagiotis Bamidis Aristotelian Univ. of Thessaloniki, Greece
Pedro Gago Polytechnic Institute of Leiria, Portugal
Pedro Pereira Rodrigues University of Porto, Portugal
Rainer Schmidt Institute for Biometrics and Medical Informatics,
Germany
Ricardo Martinho Polytechnic Institute of Leiria, Portugal
Rui Camacho University of Porto, Portugal
Salva Tortajada Polytechnic University of Valencia, Spain
Shabbir Syed-Abdul Taipei Medical University, Taiwan
Shelly Sachdeva Jaypee Institute of Information Technology, India
Szymon Wilk Poznan University of Technology, Poland
Ulf Blanke Swiss Federal Institute of Technology in Zurich,
Switzerland
Werner Ceusters University of New York at Buffalo, USA

AITS Track Chairs

Rui Gomes Universidade de Coimbra, Portugal
Rosaldo Rossetti Universidade do Porto, Portugal

AITS Program Committee

Achille Fonzone Edinburgh Napier University, UK
Agachai Sumalee Hong Kong Polytechnic University, Hong Kong
Alberto Fernandez Universidad Rey Juan Carlos, Spain
Ana Almeida Instituto Politécnico do Porto, Portugal
Ana L.C. Bazzan Universidade Federal do Rio Grande do Sul, Brazil
Constantinos Antoniou National Technical University of Athens, Greece
Cristina Olaverri-Monreal AIT Austrian Institute of Technology GmbH, Austria
Eduardo Camponogara Universidade Federal de Santa Catarina, Brazil
Giovanna Di Marzo University of Geneva, Switzerland
Serugendo
Gonçalo Correia Delft University of Technology, Netherlands
Harry Timmermans Eindhoven University of Technology, Netherlands
Hussein Dia Swinburne University of Technology, Australia
José Telhada Universidade do Minho, Portugal
Kai Nagel Technische Universität Berlin, Germany
Luís Moreira Matias NEC Europe Ltd, Germany
Luís Nunes ISCTE Instituto Universitário de Lisboa, Portugal
Oded Cats Delft University of Technology, Netherlands
Sascha Ossowski Universidad Rey Juan Carlos, Spain
Shuming Tang Chinese Academy of Sciences, China
Tânia Fontes Universidade do Porto, Portugal
X Organization

ALEA Track Chairs

Mauro Castelli NOVA IMS, Universidade Nova de Lisboa, Portugal
Leonardo Vanneschi NOVA IMS, Universidade Nova de Lisboa, Portugal
Sara Silva University of Lisbon, Portugal

ALEA Program Committee

Alberto Moraglio University of Exeter, UK
Alessandro Re NOVA IMS, Universidade Nova de Lisboa, Portugal
Anabela Simes Instituto Politecnico de Coimbra, Portugal
Antonio Della-Cioppa Università degli Studi di Salerno, Italy
António Gaspar-Cunha University of Minho, Portugal
Arnaud Liefooghe Université de Lille, France
Carlos Cotta Universidad de Málaga, Spain
Carlos Fernandes Instituto Superior Técnico, Lisboa, Portugal
Carlos Gershenson Universidad Nacional Autónoma de México, Mexico
Carlotta Orsenigo Politecnico di Milano, Italy
Christian Blum University of the Basque Country, Spain
Ernesto Costa University of Coimbra, Portugal
Fernando Lobo University of Algarve, Portugal
Helio Barbosa Laboratório Nacional de Computação Cientíﬁca, Brazil
Ivanoe De-Falco National Research Council, Italy
Ivo Gonçalves University of Coimbra, Portugal
James Foster University of Idaho, USA
Jin-Kao Hao University of Angers, France
Leonardo Trujillo Instituto Tecnológico de Tijuana, Mexico
Luca Manzoni University of Milano-Bicocca, Italy
Luís Correia University of Lisbon, Portugal
Luís Paquete University of Coimbra, Portugal
Marc Schoenauer INRIA, France
Mario Giacobini University of Turin, Italy
Pedro Mariano University of Lisbon, Portugal
Penousal Machado University of Coimbra, Portugal
Rui Mendes University of Minho, Portugal
Stefano Beretta University of Milano-Bicocca, Italy
Stefano Cagnoni University of Parma, Italy
Telmo Menezes Centre National de la Recherche Scientiﬁque, France

CMBSB Track Chairs

Rui Camacho Universidade do Porto, Portugal
Miguel Rocha Universidade do Minho, Portugal
Sara Madeira Instituto Superior Técnico, Portugal
José Luís Oliveira Universidade de Aveiro, Portugal
Organization XI

CMBSB Program Committee

Francisco Couto University of Lisbon, Portugal
Susana Vinga IDMEC-LAETA, IST-UL, Portugal
Marie-France Sagot INRIA Grenoble Rhône-Alpes and Université de Lyon
1, France
Alexessander Couto Alves Imperial College London, UK
Alexandre P. Francisco Technical University of Lisbon, Portugal
Vítor Santos Costa Universidade do Porto, Portugal
Mário J. Silva Universidade de Lisboa, Portugal
Fernando Diaz University of Valladolid, Spain
Sérgio Matos Universidade de Aveiro, Portugal
Paulo Azevedo Universidade do Minho, Portugal
Rui Mendes Universidade do Minho, Portugal
Inês Dutra Universidade do Porto, Portugal
Nuno A. Fonseca European Bioinformatics Institute, UK
Florentino Fdez-Riverola University of Vigo, Spain
André Carvalho USP, Brazil
Alexandra Carvalho IT/IST, Portugal
Arlindo Oliveira IST/INESC-ID and Cadence Research Laboratories,
Portugal
Ross King University of Manchester, UK
Luís M. Rocha Indiana University, USA

GAI Track Chairs

Francisco Pereira Polytechnic Institute of Coimbra, Portugal
Penousal Machado University of Coimbra, Portugal

GAI Program Committee

Adriana Giret Universitat Politècnica de València, Spain
Alexandra Carvalho Technical University of Lisbon, Portugal
Amal El Fallah Pierre-and-Marie-Curie University, France
Amílcar Cardoso University of Coimbra, Portugal
Andrea Omicini University of Bologna, Italy
Arlindo Oliveira INESC-ID, Portugal
Carlos Bento University of Coimbra, Portugal
Carlos Ramos Polytechnic Institute of Porto, Portugal
César Analide University of Minho, Portugal
Eric de La Clergerie INRIA, France
Ernesto Costa University of Coimbra, Portugal
Eugénio Oliveira University of Porto, Portugal
Frank Dignum Utrecht University, Netherlands
Gal Dias University of Caen, France
Hélder Coelho University of Lisbon, Portugal
Irene Rodrigues University of Évora, Portugal
XII Organization

João Balsa University of Lisbon, Portugal

João Gama University of Porto, Portugal
João Leite New University of Lisbon, Portugal
John-Jules Meyer Utrecht University, Netherlands
José Cascalho University of Azores, Portugal
José Neves University of Minho, Portugal
José Gabriel Pereira Lopes New University of Lisbon, Portugal
José Júlio Alferes New University of Lisbon, Portugal
Juan Corchado University of Salamanca, Spain
Luís Antunes University of Lisbon, Portugal
Luís Cavique University Aberta, Portugal
Luís Seabra Lopes University of Aveiro, Portugal
Luís Correia University of Lisbon, Portugal
Luís Macedo University of Coimbra, Portugal
Michael Rovatsos University of Edinburgh, UK
Miguel Calejo APPIA, Portugal
Paulo Cortez University of Minho, Portugal
Paulo Gomes University of Coimbra, Portugal
Paulo Moura Oliveira University of Trás-os-Montes and Alto Douro, Portugal
Paulo Urbano University of Lisbon, Portugal
Pavel Brazdil University of Porto, Portugal
Pedro Barahona New University of Lisbon, Portugal
Pedro Henriques University of Minho, Portugal
Pedro Mariano University of Lisbon, Portugal
Rui Camacho University of Porto, Portugal
Salvador Abreu University of Évora, Portugal

IIS Track Chairs

Álvaro Rocha University of Coimbra, Portugal
Luís Paulo Reis University of Minho, Portugal
Adolfo Lozano University of Extremadura, Spain

IIS Program Committee

Fernando Bobillo University of Zaragoza, Spain
Tossapon Boongoen Royal Thai Air Force Academy, Thailand
Carlos Costa IUL-ISCTE, Portugal
Hironori Washizaki Waseda University, Japan
Vitalyi Talanin Zaporozhye Institute of Economics & Information
Technologies, Ukraine
Mu-Song Chen Da-Yeh University, Taiwan
Garyfallos Arabatzis Democritus University of Thrace, Greece
Khalid Benali Université de Lorraine, France
Salama Mostafa UNITEN, Malaysia
Fernando Ribeiro Polytechnic Institute of Castelo Branco, Portugal
Pedro Henriques Abreu University of Coimbra, Portugal
Organization XIII

Radouane Yaﬁa Ibn Zohr University, Morocco

Maria José Sousa Universidade Europeia, Portugal
Mijalce Santa Ss Cyril and Methodius University, Macedonia
Sławomir Żółkiewski Silesian University of Technology, Poland
Kuan Yew Wong Universiti Teknologi Malaysia, Malaysia
Alvaro Prieto University of Extremadura, Spain
Roberto University of Extremadura, Spain
Rodriguez-Echeverria
Yair Wiseman Bar-Ilan University, Israel
Babak Rouhani Payame Noor University, Iran
Hing Kai Chan University of Nottingham Ningbo China, China
José Palma University of Murcia, Spain
Manuel Mazzara Innopolis University, Russia
Brígida Mónica Faria Polytechnic Institute of Porto, Portugal

IROBOT Track Chairs

Luís Paulo Reis University of Minho, Portugal
Nuno Lau University of Aveiro, Portugal
Brígida Mónica Faria Polytechnic Institute of Porto, Portugal
Rui P. Rocha University of Coimbra, Portugal

IROBOT Program Committee

António J.R. Neves University of Aveiro, Portugal
Antonio P. Moreira University of Porto, Portugal
Armando Sousa University of Porto, Portugal
Carlos Cardeira University of Lisbon, Portugal
Cristina Santos University of Minho, Portugal
Filipe Santos INESC TEC, Portugal
João Fabro Federal University of Technology-Parana, Brazil
Josémar Rodrigues de Bahia State University, Brazil
Souza
Luís Correia University of Lisbon, Portugal
Luís Mota Lisbon University Institute, Portugal
Manuel Fernando Silva Polytechnic Institute of Porto, Portugal
Nicolas Jouandeau Paris 8 University, France
Paulo Goncalves Polytechnic Institute of Castelo Branco, Portugal
Paulo Urbano University of Lisbon, Portugal
Pedro Abreu University of Coimbra, Portugal
Rodrigo Braga Federal University of Santa Catarina, Brazil

KDBI Track Chairs

Paulo Cortez University of Minho, Portugal
Luís Cavique Open University, Portugal
João Gama University of Porto, Portugal
XIV Organization

Nuno Marques New University of Lisbon, Portugal

Manuel Filipe Santos University of Minho, Portugal

KDBI Program Committee

Agnès Braud Univ. Robert Schuman, France
Albert Bifet University of Waikato, New Zealand
Aline Villavicencio UFRGS, Brazil
Alípio Jorge University of Porto, Portugal
Amílcar Oliveira Open University, Portugal
André Carvalho University of São Paulo, Brazil
Armando Mendes University of the Azores, Portugal
Bernardete Ribeiro University of Coimbra, Portugal
Carlos Ferreira Gomes Institute of Eng. of Porto, Portugal
Elaine Faria Federal University of Uberlandia, Brazil
Fátima Rodrigues Institute of Eng. of Porto, Portugal
Fernando Bação New University of Lisbon, Portugal
Filipe Pinto Polytechnical Inst. Leiria, Portugal
Gladys Castillo Choose Digital, USA
José Costa UFRN, Brazil
Karin Becker UFRGS, Brazil
Leandro Krug Wives UFRGS, Brazil
Luís Lamb UFRGS, Brazil
Manuel Fernandez Delgado University of Santiago Compostela, Spain
Marcos Domingues University of São Paulo, Brazil
Margarida Cardoso ISCTE-IUL, Portugal
Mark Embrechts Rensselaer Polytechnic Institute, USA
Mohamed Gaber University of Portsmouth, UK
Murate Testik Hacettepe University, Turkey
Ning Chen Institute of Eng. of Porto, Portugal
Orlando Belo University of Minho, Portugal
Paulo Gomes University of Coimbra, Portugal
Pedro Castillo University of Granada, Spain
Phillipe Lenca Télécom Bretagne, France
Rita Ribeiro University of Porto, Portugal
Rui Camacho University of Porto, Portugal
Stéphane Lallich University of Lyon 2, France
Yanchang Zhao Australian Government, Australia

MASTA Track Chairs

Ana Paula Rocha Porto University, Portugal
Pedro Henriques Abreu Coimbra University, Portugal
Jomi Fred Hubner Universidade Federal de Santa Catarina, Brazil
Jordi Sabater Mir IIIA-CSIC, Spain
Luís Moniz Lisbon University, Portugal
Organization XV

MASTA Program Committee

Alessandra Alaniz Macedo São Paulo University, Brazil
António Castro LIACC, Portugal
António Carlos da Rocha Universidade Federal do Rio Grande, Brazil
Costa
Brigida Mónica Faria Polytechnic Institute of Porto, Portugal
Carlos Carrascosa Universidad Politecnica de Valencia, Spain
César Analide Minho University, Portugal
Daniel Castro Silva LIACC-Porto University, Portugal
Didac Busquets Imperial College London, UK
Eugénio Oliveira LIACC-Porto University, Portugal
Felipe Meneguzzi Pontiﬁcia Universidade Católica do Rio Grande do Sul,
Brazil
Francisco Grimaldo Universidad de Valencia, Spain
Frank Dignum Utrecht University, Netherlands
Hélder Coelho Lisbon University, Portugal
Henrique Lopes Cardoso LIACC-Porto University, Portugal
Jaime Sichmann São Paulo University, Brazil
Javier Carbó Universidad Carlos III, Spain
Joana Urbano Instituto Superior Miguel Torga, Portugal
João Balsa Lisbon University, Portugal
Laurent Vercouter Ecole Nationale Supérieure des Mines de
Saint-Etienne, France
Luís Correia Lisbon University, Portugal
Luís Macedo Coimbra University, Portugal
Luís Paulo Reis LIACC-Minho University, Portugal
Manuel Filipe Santos Minho University, Portugal
Maria Fasli University of Essex, UK
Michael Schumacher University of Applied Sciences, Western Switzerland
Márcia Ito São Paulo Faculty of Technology, Brazil
Nicoletta Fornara University of Lugano, Switzerland
Nuno Lau Aveiro University, Portugal
Olivier Boisser ENS Mines Saint-Etienne, France
Pablo Noriega IIIA-CSIC, Spain
Paulo Trigo Superior Institute of Engineering of Lisbon, Portugal
Paulo Urbano Lisbon University, Portugal
Rafael Bordini Pontifícia Universidade Católica do Rio Grande do Sul,
Brazil
Ramón Hermoso University of Zaragoza, Spain
Rosaldo Rossetti Porto University, Portugal
Virginia Dignum Delft University of Technology, Netherlands
Viviane Torres Da Silva Universidade Federal Fluminense, Brazil
Wamberto Vasconcelos University of Aberdeen, UK
XVI Organization

SSM Track Chairs

Luís Antunes Universidade de Lisboa, Portugal
Graçaliz Pereira Dimuro Universidade Federal do Rio Grande, Brazil
Pedro Campos Universidade do Porto, Portugal
Juan Pavón Universidad Complutense de Madrid, Spain

SSM Program Committee

Frédéric Amblard Univ. Toulouse 1, France
Pedro Andrade INPE, Brazil
Tânya Araújo ISEG, Portugal
Robert Axtell George Mason Univ., USA
João Balsa Univ. Lisbon, Portugal
Ana Bazzan UFRGS, Brazil
François Bousquet CIRAD/IRRI, Thailand
Amílcar Cardoso University of Coimbra, Portugal
Cristiano Castelfranchi ISTC/CNR, Italia
Shu-Heng Chen National Chengchi Univ., Taiwan
Claudio Ciofﬁ-Revilla George Mason Univ., USA
Hélder Coelho Univ. Lisbon, Portugal
Rosaria Conte ISTC/CNR Rome, Italy
Nuno David ISCTE, Portugal
Paul Davidsson Blekinge Inst. Technology, Sweden
Guillaume Deffuant Cemagref, France
Alexis Drogoul IRD, France
Julie Dugdale Lab. d’Informatique Grenoble, France
Bruce Edmonds Centre for Policy Modelling, UK
Nigel Gilbert Univ. Surrey, UK
Nick Gotts Macaulay Inst., Scotland, UK
David Hales The Open Univ., UK
Samer Hassan Univ. Complutense Madrid, Spain
Rainer Hegselmann Univ. Bayreuth, Germany
Wander Jager Univ. Groningen, Netherlands
Adolfo Lópes Paredes Univ. Valladolid, Spain
Pedro Magalhães ICS, Portugal
Scott Moss Centre for Policy Modelling, UK
Jean-Pierre Muller CIRAD, France
Akira Namatame National Defense Academy, Japan
Fernando Neto Univ. Pernambuco, Brazil
Carlos Ramos GECAD – ISEP, Portugal
Juliette Rouchier Greqam/CNRS, France
David Sallach Univ. Chicago, USA
Keith Sawyer Washington Univ. St. Louis, USA
Carles Sierra IIIA, Spain
Elizabeth Sklar City Univ. New York, USA
Organization XVII

Keiki Takadama Univ. Electro-communications, Japan

Oswaldo Teran Univ. Los Andes, Venezuela
Takao Terano Univ. Tsukuba, Japan
Jan Treur Vrije Univ. Amsterdam, The Netherlands
Klaus Troitzsch Univ. Koblenz, Germany
Harko Verhagen Stockholm Univ., Sweden

TEMA Track Chairs

Joaquim F. Ferreira da Silva NOVA LINCS FCT/UNL, Portugal
Gabriel Pereira Lopes NOVA LINCS FCT/UNL, Portugal
Hugo Gonçalo Oliveira CISUC, University of Coimbra, Portugal
Vitor R. Rocio Universidade Aberta, Portugal
Gaël Dias University of Caen Basse-Normandie, France

TEMA Program Committee

Adam Jatowt Kyoto University, Japan
Adeline Nazarenko University of Paris 13, France
Aline Villavicencio Universidade Federal do Rio Grande do Sul, Brazil
Antoine Doucet University of Caen, France
António Branco Universidade de Lisboa, Portugal
Béatrice Daille University of Nantes, France
Belinda Maia Universidade do Porto, Portugal
Brigitte Grau LIMSI, France
Bruno Cremilleux University of Caen, France
Christel Vrain Université d’Orléans, France
Eric de La Clergerie INRIA, France
Gabriel Pereira Lopes NOVA LINCS FCT/UNL, Portugal
Gaël Dias University of Caen Basse-Normandie, France
Gracinda Carvalho Universidade Aberta, Portugal
Gregory Grefenstette CEA, France
Hugo Gonalo Oliveira CISUC, University of Coimbra, Portugal
Isabelle Tellier University of Orléans, France
Joaquim F. Ferreira da Silva NOVA LINCS FCT/UNL, Portugal
João Balsa Universidade de Lisboa, Portugal
João Magalhães Universidade Nova de Lisboa, Portugal
Lluís Padró Universitat Politècnica de Catalunya, Spain
Lucinda Carvalho Universidade Aberta, Portugal
Manuel Vilares Ferro University of Vigo, Spain
Marc Spaniol University of Caen Basse-Normandie, France
Marcelo Finger Universidade de São Paulo, Brazil
Maria das Graças Universidade de São Paulo, Brazil
Volpe Nunes
Mark Lee University of Birmingham, UK
Nuno Mamede Universidade Técnica de Lisboa, Portugal
Nuno Marques Universidade Nova de Lisboa, Portugal
XVIII Organization

Pablo Gamallo Universidade de Santiago de Compostela, Spain

Paulo Quaresma Universidade de Évora, Portugal
Pavel Brazdil University of Porto, Portugal
Pierre Zweigenbaum CNRS-LIMSI, France
Spela Vintar University of Ljubljana, Slovenia
Vitor R. Rocio Universidade Aberta, Portugal

Additional Reviewers

Aching, Jorge García-Magariño, Iván Ramos, Ana

Adamatti, Diana Francisca Gonçalves, Eder Mateus Rocha, Luis
Adedoyin-Olowe, Mariam Gorgonio, Flavius Rodrigues, Filipe
Andeadis, Pavlos Kheiri, Ahmed Santos, António Paulo
André, João Lopes Silva, Maria Sarmento, Rui
Barbosa, Raquel Amélia Serrano, Emilio
Billa, Cleo Magessi, Nuno Shatnawi, Safwan
Cardoso, Douglas Mota, Fernanda Soulas, Julie
Ferreira, Carlos Nibau Antunes, Francisco Souza, Jackson
Flavien, Balbo Nunes, Davide Trigo, Luis
Francisco, Garijo Paiva, Fábio
Fuentes-Fernández, Pasa, Leandro
Rubén Pinto, Andry
Contents

Ambient Intelligence and Affective Environments

Defining Agents’ Behaviour for Negotiation Contexts . . . . . . . . . . . . . . . . . 3

João Carneiro, Diogo Martinho, Goreti Marreiros, and Paulo Novais

Improving User Privacy and the Accuracy of User Identification

in Behavioral Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
André Pimenta, Davide Carneiro, José Neves, and Paulo Novais

Including Emotion in Learning Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Ana Raquel Faria, Ana Almeida, Constantino Martins,
Ramiro Gonçalves, and Lino Figueiredo

Ambient Intelligence: Experiments on Sustainability Awareness . . . . . . . . . . 33

Fábio Silva and Cesar Analide

Artificial Intelligence in Medicine

Reasoning with Uncertainty in Biomedical Models . . . . . . . . . . . . . . . . . . . 41

Andrea Franco, Marco Correia, and Jorge Cruz

Smart Environments and Context-Awareness for Lifestyle Management

in a Healthy Active Ageing Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Davide Bacciu, Stefano Chessa, Claudio Gallicchio, Alessio Micheli,
Erina Ferro, Luigi Fortunati, Filippo Palumbo, Oberdan Parodi,
Federico Vozzi, Sten Hanke, Johannes Kropf, and Karl Kreiner

Gradient: A User-Centric Lightweight Smartphone Based Standalone

Fall Detection System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Ajay Bhatia, Suman Kumar, and Vijay Kumar Mago

Towards Diet Management with Automatic Reasoning and Persuasive

Natural Language Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Luca Anselma and Alessandro Mazzei

Predicting Within-24h Visualisation of Hospital Clinical Reports

Using Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Pedro Pereira Rodrigues, Cristiano Inácio Lemes,
Cláudia Camila Dias, and Ricardo Cruz-Correia
XX Contents

On the Efficient Allocation of Diagnostic Activities in Modern

Imaging Departments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Roberto Gatta, Mauro Vallati, Nicola Mazzini, Diane Kitchin,
Andrea Bonisoli, Alfonso E. Gerevini, and Vincenzo Valentini

Ontology-Based Information Gathering System for Patients with Chronic

Diseases: Lifestyle Questionnaire Design . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Lamine Benmimoune, Amir Hajjam, Parisa Ghodous,
Emmanuel Andres, Samy Talha, and Mohamed Hajjam

Predicting Preterm Birth in Maternity Care by Means of Data Mining . . . . . . 116

Sónia Pereira, Filipe Portela, Manuel F. Santos, José Machado,
and António Abelha

Clustering Barotrauma Patients in ICU–A Data Mining Based

Approach Using Ventilator Variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Sérgio Oliveira, Filipe Portela, Manuel F. Santos, José Machado,
António Abelha, Álvaro Silva, and Fernando Rua

Clinical Decision Support for Active and Healthy Ageing: An Intelligent

Monitoring Approach of Daily Living Activities . . . . . . . . . . . . . . . . . . . . . 128
Antonis S. Billis, Nikos Katzouris, Alexander Artikis,
and Panagiotis D. Bamidis

Discovering Interesting Trends in Real Medical Data: A Study

in Diabetic Retinopathy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Vassiliki Somaraki, Mauro Vallati, and Thomas Leo McCluskey

Artificial Intelligence in Transportation Systems

A Column Generation Based Heuristic for a Bus Driver Rostering Problem . . . 143
Vítor Barbosa, Ana Respício, and Filipe Alvelos

A Conceptual MAS Model for Real-Time Traffic Control . . . . . . . . . . . . . . 157

Cristina Vilarinho, José Pedro Tavares, and Rosaldo J.F. Rossetti

Prediction of Journey Destination in Urban Public Transport . . . . . . . . . . . . 169

Vera Costa, Tânia Fontes, Pedro Maurício Costa,
and Teresa Galvão Dias

Demand Modelling for Responsive Transport Systems Using Digital

Footprints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Paulo Silva, Francisco Antunes, Rui Gomes, and Carlos Bento
Contents XXI

Artificial Life and Evolutionary Algorithms

A Case Study on the Scalability of Online Evolution of Robotic Controllers. . . 189

Fernando Silva, Luís Correia, and Anders Lyhne Christensen

Spatial Complexity Measure for Characterising Cellular Automata

Generated 2D Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Mohammad Ali Javaheri Javid, Tim Blackwell, Robert Zimmer,
and Mohammad Majid Al-Rifaie

Electricity Demand Modelling with Genetic Programming . . . . . . . . . . . . . . 213

Mauro Castelli, Matteo De Felice, Luca Manzoni,
and Leonardo Vanneschi

The Optimization Ability of Evolved Strategies . . . . . . . . . . . . . . . . . . . . . 226

Nuno Lourenço, Francisco B. Pereira, and Ernesto Costa

Evolution of a Metaheuristic for Aggregating Wisdom

from Artificial Crowds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
Christopher J. Lowrance, Omar Abdelwahab, and Roman V. Yampolskiy

The Influence of Topology in Coordinating Collective Decision-Making

in Bio-hybrid Societies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
Rob Mills and Luís Correia

A Differential Evolution Algorithm for Optimization Including Linear

Equality Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
Helio J.C. Barbosa, Rodrigo L. Araujo, and Heder S. Bernardino

Multiobjective Firefly Algorithm for Variable Selection

in Multivariate Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
Lauro Cássio Martins de Paula and Anderson da Silva Soares

Semantic Learning Machine: A Feedforward Neural Network Construction

Algorithm Inspired by Geometric Semantic Genetic Programming. . . . . . . . . 280
Ivo Gonçalves, Sara Silva, and Carlos M. Fonseca

Eager Random Search for Differential Evolution

in Continuous Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
Miguel Leon and Ning Xiong

Learning from Play: Facilitating Character Design Through Genetic

Programming and Human Mimicry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
Swen E. Gaudl, Joseph Carter Osborn, and Joanna J. Bryson

Memetic Algorithm for Solving the 0-1 Multidimensional

Knapsack Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
Abdellah Rezoug, Dalila Boughaci, and Mohamed Badr-El-Den
XXII Contents

Synthesis of In-Place Iterative Sorting Algorithms Using GP:

A Comparison Between STGP, SFGP, G3P and GE . . . . . . . . . . . . . . . . . . 305
David Pinheiro, Alberto Cano, and Sebastián Ventura

Computational Methods in Bioinformatics and Systems Biology

Variable Elimination Approaches for Data-Noise Reduction

in 3D QSAR Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
Rafael Dolezal, Agata Bodnarova, Richard Cimler, Martina Husakova,
Lukas Najman, Veronika Racakova, Jiri Krenek, Jan Korabecny,
Kamil Kuca, and Ondrej Krejcar

Pattern-Based Biclustering with Constraints for Gene Expression

Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
Rui Henriques and Sara C. Madeira

A Critical Evaluation of Methods for the Reconstruction

of Tissue-Specific Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
Sara Correia and Miguel Rocha

Fuzzy Clustering for Incomplete Short Time Series Data . . . . . . . . . . . . . . . 353

Lúcia P. Cruz, Susana M. Vieira, and Susana Vinga

General Artificial Intelligence

Allowing Cyclic Dependencies in Modular Logic Programming . . . . . . . . . . 363

João Moura and Carlos Viegas Damásio

Probabilistic Constraint Programming for Parameters Optimisation

of Generative Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
Massimiliano Zanin, Marco Correia, Pedro A.C. Sousa, and Jorge Cruz

Reasoning over Ontologies and Non-monotonic Rules . . . . . . . . . . . . . . . . . 388

Vadim Ivanov, Matthias Knorr, and João Leite

On the Cognitive Surprise in Risk Management: An Analysis

of the Value-at-Risk (VaR) Historical . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
Davi Baccan, Elton Sbruzzi, and Luis Macedo

Logic Programming Applied to Machine Ethics . . . . . . . . . . . . . . . . . . . . . 414

Ari Saptawijaya and Luís Moniz Pereira

Intelligent Information Systems

Are Collaborative Filtering Methods Suitable for Student Performance

Prediction? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
Hana Bydžovská
Contents XXIII

Intelligent Robotics

A New Approach for Dynamic Strategic Positioning in RoboCup

Middle-Size League. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
António J.R. Neves, Filipe Amaral, Ricardo Dias, João Silva,
and Nuno Lau

Intelligent Wheelchair Driving: Bridging the Gap Between Virtual

and Real Intelligent Wheelchairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
Brígida Mónica Faria, Luís Paulo Reis, Nuno Lau,
António Paulo Moreira, Marcelo Petry, and Luís Miguel Ferreira

A Skill-Based Architecture for Pick and Place Manipulation Tasks . . . . . . . . 457

Eurico Pedrosa, Nuno Lau, Artur Pereira, and Bernardo Cunha

Adaptive Behavior of a Biped Robot Using Dynamic Movement Primitives . . . 469

José Rosado, Filipe Silva, and Vítor Santos

Probabilistic Constraints for Robot Localization . . . . . . . . . . . . . . . . . . . . . 480

Marco Correia, Olga Meshcheryakova, Pedro Sousa, and Jorge Cruz

Detecting Motion Patterns in Dense Flow Fields: Euclidean Versus

Polar Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487
Andry Pinto, Paulo Costa, and Antonio Paulo Moreira

Swarm Robotics Obstacle Avoidance: A Progressive Minimal Criteria

Novelty Search-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
Nesma M. Rezk, Yousra Alkabani, Hassan Bedour, and Sherif Hammad

Knowledge Discovery and Business Intelligence

An Experimental Study on Predictive Models Using Hierarchical

Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
Ana M. Silva, Rita P. Ribeiro, and João Gama

Crime Prediction Using Regression and Resources Optimization . . . . . . . . . . 513

Bruno Cavadas, Paula Branco, and Sérgio Pereira

Distance-Based Decision Tree Algorithms for Label Ranking . . . . . . . . . . . . 525

Cláudio Rebelo de Sá, Carla Rebelo, Carlos Soares, and Arno Knobbe

A Proactive Intelligent Decision Support System for Predicting

the Popularity of Online News . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535
Kelwin Fernandes, Pedro Vinagre, and Paulo Cortez

Periodic Episode Discovery Over Event Streams . . . . . . . . . . . . . . . . . . . . . 547

Julie Soulas and Philippe Lenca
XXIV Contents

Forecasting the Correct Trading Actions . . . . . . . . . . . . . . . . . . . . . . . . . . 560

Luís Baía and Luís Torgo

CTCHAID: Extending the Application of the Consolidation Methodology . . . 572

Igor Ibarguren, Jesús María Pérez, and Javier Muguerza

Towards Interactive Visualization of Time Series Data to Support

Knowledge Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578
Jan Géryk

Ramex-Forum: Sequential Patterns of Prices in the Petroleum

Production Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584
Pedro Tiple, Luís Cavique, and Nuno Cavalheiro Marques

Geocoding Textual Documents Through a Hierarchy of Linear Classifiers . . . 590

Fernando Melo and Bruno Martins

A Domain-Specific Language for ETL Patterns Specification

in Data Warehousing Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
Bruno Oliveira and Orlando Belo

Optimized Multi-resolution Indexing and Retrieval Scheme of Time Series . . . . . 603

Muhammad Marwan Muhammad Fuad

Multi-agent Systems: Theory and Applications

Minimal Change in Evolving Multi-Context Systems. . . . . . . . . . . . . . . . . . 611

Ricardo Gonçalves, Matthias Knorr, and João Leite

Bringing Constitutive Dynamics to Situated Artificial Institutions . . . . . . . . . 624

Maiquel de Brito, Jomi F. Hübner, and Olivier Boissier

Checking WECTLK Properties of Timed Real-Weighted Interpreted

Systems via SMT-Based Bounded Model Checking. . . . . . . . . . . . . . . . . . . 638
Agnieszka M. Zbrzezny and Andrzej Zbrzezny

SMT-Based Bounded Model Checking for Weighted Epistemic ECTL . . . . . 651

Agnieszka M. Zbrzezny, Bożena Woźna-Szcześniak,
and Andrzej Zbrzezny

Dynamic Selection of Learning Objects Based on SCORM Communication . . . 658

João de Amorim Junior and Ricardo Azambuja Silveira

Sound Visualization Through a Swarm of Fireflies . . . . . . . . . . . . . . . . . . . 664

Ana Rodrigues, Penousal Machado, Pedro Martins,
and Amílcar Cardoso
Contents XXV

Social Simulation and Modelling

Analysing the Influence of the Cultural Aspect in the Self-Regulation

of Social Exchanges in MAS Societies: An Evolutionary Game-Based
Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673
Andressa Von Laer, Graçaliz P. Dimuro, and Diana Francisca Adamatti

Modelling Agents’ Perception: Issues and Challenges in Multi-agents

Based Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687
Nuno Trindade Magessi and Luís Antunes

Agent-Based Modelling for a Resource Management Problem

in a Role-Playing Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696
José Cascalho and Pinto Mabunda

An Agent-Based MicMac Model for Forecasting

of the Portuguese Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702
Renato Fernandes, Pedro Campos, and A. Rita Gaio

Text Mining and Applications

Multilingual Open Information Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 711

Pablo Gamallo and Marcos Garcia

Classification and Selection of Translation Candidates for Parallel

Corpora Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
K.M. Kavitha, Luís Gomes, José Aires, and José Gabriel P. Lopes

A SMS Information Extraction Architecture to Face Emergency Situations. . . 735

Douglas Monteiro and Vera Lucia Strube de Lima

Cross-Lingual Word Sense Clustering for Sense Disambiguation. . . . . . . . . . 747

João Casteleiro, Joaquim Ferreira da Silva, and Gabriel Pereira Lopes

Towards the Improvement of a Topic Model with Semantic Knowledge . . . . 759

Adriana Ferrugento, Ana Alves, Hugo Gonçalo Oliveira,
and Filipe Rodrigues

RAPPORT — A Portuguese Question-Answering System . . . . . . . . . . . . . . 771

Ricardo Rodrigues and Paulo Gomes

Automatic Distinction of Fernando Pessoas’ Heteronyms . . . . . . . . . . . . . . . 783

João F. Teixeira and Marco Couto

Social Impact - Identifying Quotes of Literary Works in Social Networks . . . 789

Carlos Barata, Mónica Abreu, Pedro Torres, Jorge Teixeira,
Tiago Guerreiro, and Francisco M. Couto
XXVI Contents

Fractal Beauty in Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796

João Cordeiro, Pedro R.M. Inácio, and Diogo A.B. Fernandes

How Does Irony Affect Sentiment Analysis Tools? . . . . . . . . . . . . . . . . . . . 803

Leila Weitzel, Raul A. Freire, Paulo Quaresma, Teresa Gonçalves,
and Ronaldo Prati

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 809

Ambient Intelligence
and Affective Environments
Defining Agents’ Behaviour for Negotiation Contexts

João Carneiro1(), Diogo Martinho1, Goreti Marreiros1, and Paulo Novais2

1
GECAD – Knowledge Engineering and Decision Support Group, Institute of Engineering,
Polytechnic of Porto, Porto, Portugal
{jomrc,1090557,mgt}@isep.ipp.pt
2
CCTC – Computer Science and Technology Center, University of Minho, Braga, Portugal
[email protected]

Abstract. Agents who represent participants in the group decision-making con-

text require a certain number of individual traits in order to be successful. By
using argumentation models, agents are capable to defend the interests of those
who they represent, and also justify and support their ideas and actions. How-
ever, regardless of how much knowledge they might hold, it is essential to
define their behaviour. In this paper (1) is presented a study about the most
important models to infer different types of behaviours that can be adapted and
used in this context, (2) are proposed rules that must be followed to affect posi-
tively the system when defining behaviours and (3) is proposed the adaptation
of a conflict management model to the context of Group Decision Support Sys-
tems. We propose one approach that (a) intends to reflect a natural way of hu-
man behaviour in the agents, (b) provides an easier way to reach an agreement
between all parties involved and (c) does not have high configuration costs to
the participants. Our approach will offer a simple yet perceptible configuration
tool that can be used by the participants and contribute to more intelligent
communications between agents and makes possible for the participants to have
a better understanding of the types of interactions experienced by the agents
belonging to the system.

Keywords: Group decision support systems · Ubiquitous computing · Affective

computing · Multi-Agent systems · Automatic negotiation

1 Introduction

Rahwan et al. (2003) defined negotiation as “a form of interaction in which a group of

agents, with conflicting interests and a desire to cooperate, try to come to a mutually
acceptable agreement on the division of scarce resources” [1], Hadidi, Dimopoulos,
and Moraitis (2011), defined negotiation as “the process of looking for an agreement
between two or several agents on one or more issues” [2] and El-Sisi and Mousa
(2012) defined as “a process of reaching an agreement on the terms of a transaction
such as price, quantity, for two or more parties in multi-agent systems such as
E-Commerce. It tries to maximize the benefits to all parties” [3]. It is possible to ver-
ify in the literature a consensus regarding to the main approaches to deal with negotia-
tion: game theory, heuristics and argumentation [1-3]. It is a known fact that game
© Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 3–14, 2015.
DOI: 10.1007/978-3-319-23485-4_1
4 J. Carneiro et al.

theoretic and heuristic based approaches evolved and turned more complex. With this
development they have been used in a wide range of applications. However they share
some limitations. In the majority of game-theoretic and heuristic models, agents
exchange proposals, but these proposals are limited. Agents are not allowed to ex-
change any additional information other than what is expressed in the proposal itself.
This can be problematic, for example, in situations where agents have limited infor-
mation about the environment, or where their rational choices depend on those of
other agents. Another important limitation is that agent’s utilities or preferences are
usually assumed to be completely characterized prior to the interaction. Thus, to over-
come these limitations, argumentation-based negotiation appeared and turned one of
the most popular approaches to negotiation [4], it has been extensively investigated
and studied, as witnessed by many publications [5-7]. The main idea of argumenta-
tion-based negotiation is the ability to support offers with justifications and explana-
tions, which play a key role in the negotiation settings. So, it allows the participants to
the negotiation not only to exchange offers, but also reasons and justifications that
support these offers in order to mutually influence their preference relations on the set
of offers, and consequently the outcome of the dialogue.
It is simple to understand the parallelism between this approach and group decision-
making. The idea of a group of agents exchanging arguments in order to achieve, for
instance, a consensus, in order to support groups in decision-making process is easy to
understand [8]. However the complexity of this process must not be underestimated, if
considering a scenario where an agent seeks to defend the interests of who it represents
and at the same time be part of a group that aims to reach a collective decision towards
a problem for their organization [9, 10]. Not only are those agents simultaneously
competitive and cooperative but also represent human beings. Establishing some sort
of dialog, as well as the different types of arguments that can be exchanged by agents
is only the first step towards the problem resolution. One agent that represents a deci-
sion-maker involved in a process of group decision-making may show different levels
of experience and knowledge related with the situation and should behave accordingly.
Literature shows that there are works on the subject [10-13], however it should be
noted the existence of some flaws in terms of real world applicability of certain mod-
els. Some require high configuration costs that will not suit the different types of users
they are built for and others show flaws that in our opinion are enough to affect the
success of a Group Decision Support System (GDSS).
In this work it will be presented the most relevant models that allow inferring or
configuring a behaviour style for a group decision-making context. It is also proposed
a set of rules for which a behaviour model must follow without jeopardizing the entire
GDSS and finally it is proposed an approach made through the modification of one
existing model to the context of GDSS.
The rest of the paper is organized as follows: in the next section is presented the litera-
ture review. Section 3 presents our approach, where we identify different types of behav-
iours, defined with the use of an existing model and presented the set of rules that we
believe that are the most important to allow defining types of behaviours for the agents in
a way that does not compromise the system. In section 4 it will be discussed and debated
how our approach can be applied to the context of GDSS and its differences compared
Defining Agents’ Behaviour for Negotiation Contexts 5

with other existing approaches. Finally, some conclusions are taken in section 5, along
with the work to be done hereafter.

2 Literature Review
The concern for identifying and understanding particular behavioural attitudes has led
to many investigations and studies throughout the last decades with emphasis on pro-
posing models and behaviour styles that can relate to the personality of the negotiator.
Carl Jung (1921), was the first to specify a model to study different psychological
personality types based on four types of consciousness (sensation, intuition, thinking
and feeling) that could in turn be combined with two types of attitudes (extraversion
and introversion) and that way identify eight primary psychological types [14].
In 1962, Myers Briggs, developed a personality indicator model (The Meyers-
Briggs Type Indicator) based on Jung’s theories [15]. This indicator is used as a psy-
chometric questionnaire and allows people to understand the world around them and
how they behave and make decisions based on their preferences [16]. This model was
useful in order to identify different styles of leadership, which were later specified in
Keirsey and Bate’s publication [17], in 1984, as four styles of leadership:
• Stabilizer: tends to be very clear and precise when defining objectives and organiz-
ing and planning tasks in order to achieve them. Stabilizer leaders are also reliable
and trustworthy due to the fact they show concern for other worker’s necessities
and problems. They are able to increase the motivation of their workers by setting
tradition and organization as an example of success;
• Catalyst: the main focus is to develop the quality of own work and the one pro-
vided by their staff. They serve the facilitator’s role by bringing the best out of
other people, and motivate other workers with their own enthusiasm and potential;
• Trouble-shooter: as the name suggests, focus on dealing and solving problems.
They show great aptitude for solving urgent problems by being practical and im-
mediate. They bring people together as a team by analysing what needs to be done
and informing exactly what to do as quickly as possible;
• Visionary: visionaries act based on their own intuition and perception of the prob-
lems in order to make decisions. They have a mind projected for the future and
plan idealistic scenarios and objectives which may not always be achievable.

Related to vocational behaviour, Holland [18], in 1973 proposed a hexagonal

model (RIASEC model) where he differentiates six types of personality mainly used
in careers environments and to guide through the individual’s choice of vocation.
Those types are defined as:
• Realistic: realistic individuals value things over people and ideas. They are me-
chanical and athletic, and prefer working outdoors with tools and objects;
• Investigative: investigative individuals have excellent analytic skills. They prefer
working alone and solving complex problems;
• Artistic: artistic individuals show a deep sense of creativity and imagination. They
prefer working on original projects and value ideas over things;
6 J. Carneiro et al.

• Social: social individuals have high social aptitude, preferring social relationships
and helping other people solving their problems. They prefer working with people
over things;
• Enterprising: enterprising individuals show great communication and leadership
skills, and are usually concerned about establishing direct influence on other peo-
ple. They prefer dealing with people and ideas over things;
• Conventional: conventional individuals value order and efficiency. They show
administrative and organization skills. They prefer dealing with numbers and
words over people and ideas.

Fig. 1. Representation of Holland’s hexagon model, adapted from [18]

It is important to note the distribution of these personalities in Holland’s hexagon

where personalities next to each other are the most similar while personalities facing
against each other are the least similar (see Fig. 1).
Conflict management has always been an important area of decision-making, since
it is very rare to find situations in group discussion where conflict is not present. In
1975, Thomas and Kilmann [19], also based on Jung’s studies and a conflict-handling
mode proposed by Blake and Mouton [20], suggested a model for interpersonal con-
flict-handling behaviour, defining five modes: competing, collaborating, compromis-
ing, avoiding and accommodating, according to two dimensions: assertiveness and
cooperativeness. As seen in Fig. 2, both assertiveness and cooperativeness dimensions
are related to integrative and distributive dimensions which were discussed by Walton
and McKersie in 1965 [21]. Integrative dimensions refer to the overall satisfaction of
the group involved in the discussion while distributive dimension refers to the indi-
vidual satisfaction within the group. It is possible to see that the thinking-feeling di-
mension maps onto the distributive dimension while the introversion-extraversion
dimension maps onto the integrative dimension. It is easy to understand this associa-
tion by looking at competitors as the ones who seek the highest individual satisfac-
tion, collaborators as the ones who prefer the highest satisfaction of the entire group.
On the other hand avoiders do not worry about group satisfaction and accommodators
do not worry about individual satisfaction. They also concluded that the thinking-
feeling dimension did not move towards the integrative dimension, and also that the
introversion-extraversion did not move towards the distributive dimension.
Defining Agents’ Behaviour for Negotiation Contexts 7

Fig. 2. Thomas and Kilmann’s model for interpersonal conflict-handling behaviour, adapted
from [19]

In 1992, Costa and McCrae [22] proposed a set of thirty traits extending the five-
factor model of personality (OCEAN model) which included six facets for each of the
factors. These traits were used in a study made by Howard and Howard [23] in order
to help them separate different kinds of behaviour styles and identify corresponding
themes. A theme is defined as “a trait which is attributable to the combined effect of
two or more separate traits”. Those styles and themes are based on common sense and
general research, and some of them have already been mentioned before in this litera-
ture review, however it is also important to referrer other relevant styles that were
suggested such as the Decision and Learning styles. Decision style includes the Auto-
cratic, Bureaucratic, Diplomat and Consensus themes while Learning style includes
the Classroom, Tutorial, Correspondence and Independent themes.
In 1995, Rahim and Magner [24] created a meta-model of styles for handling inter-
personal conflict based on two dimensions: concern for self and concern for the other.
This was the base for the five management styles identified as obliging, avoiding, dom-
inating, integrating and compromising as will be explained in detail in the Section 3.

3 Methods

It is really important to define correctly the agent’s behaviour in order to not jeopard-
ize the validation of the entire GDSS. Sometimes, in this area of research, there is an
exhaustive concern to find a better result and because of that, other variables may be
forgotten which can make impossible the use of a certain approach in those situations.
For example: Does it make sense for a decision-maker or a manager from a large
company, with his super busy schedule have the patience/time to answer (seriously) to
a questionnaire of 44 questions like “the Big Five Inventory” so that he can model his
agent with his personality? Due to reasons like this we have defined a list with con-
siderations to have when defining types of behaviours for the agents in the context
here presented. The definition of behaviour should:
8 J. Carneiro et al.

1. Enhance the capabilities of the agents, i.e., make the process more intelligent, more
human and less sequential, even though it may not be visible in the conceptual
model it must not be possible for the programmer to anticipate the sequence of
interactions just by reading the code;
2. Be easy to configure (usability) or not need any configuration at all from the user
(decision-maker);
3. Represent the interests of the decision-makers (strategy used), so that agent’s way
of acting meets the interests defined by the user (whenever possible);
4. Not be the reason for the decision-makers to give up using the application, i.e., in a
hypothetical situation, a decision-maker should not “win” more decisions just
because he knows how to manipulate/configure better the system;
5. Be available for everyone to benefit from it. Obviously all decision-makers face
meetings in different ways. Their interests and knowledge for each topic is not al-
ways the same. Sometimes it may be of their interest to let others speak first and
only after gathering all the information, elaborate a final opinion on the matter.
Other times it may be important to control the entire conversation and try to con-
vince the other participants to accept out opinion straightaway.
By taking into account all these points, we propose in this article a behaviour
model for the decision-making context based on conflict styles defined by Rahim and
Magner (1995) [24]. The styles defined are presented in Fig. 3 and have been adapted
to our problem. Rahim and Magner reckons the existence of 5 types of conflict styles:
integrating, obliging, dominating, avoiding and compromising. In their work, they
suggested these styles in particular to describe different ways of behave in conflict
situations. They defined these styles according to the level of concern a person has for
reaching its own goal and reaching other people’s objectives. This definition goes
along exactly with what we consider that the agents that operate in a GDSS context
should be, when we say that they are both cooperative and competitive simultaneous-
ly. Therefore this model ends up describing 5 conflict styles which support what we
think that is required for the agents to have a positive behaviour in this context. It also
has the advantage of being a model easy to understand and to use.
In our approach, the configuration of agent’s behavior made by the decision-
maker, will be done through the selection of one conflict style. The main idea is to
define the agent with the participant’s interests and strategies. For that, the definition
of each conflict style should be clear and understandable for the decision-maker. The
decision-maker can define in his agent different conflict styles throughout the process.
For example, a decision-maker who is included in a decision process and has few or
even no knowledge about the problem during the early stage of discussion. For that
situation he may prefer to use an “avoiding” style and learn with what other people
say, gather arguments and information that will support different options and that way
learn more about the problem. In a following stage, when the decision-maker already
has more information and knowledge about the problem, he may opt to use a more
active and dominating style in order to convince others towards his opinion. Like
mentioned before, there are many factors that can make the decision-maker face a
meeting in different ways: interest about a topic, lack of knowledge about a topic,
reckons the participation of more experienced people in the discussion, etc.
Defining Agents’ Behaviour for Negotiation Contexts 9

g. 3. Conflict Style, adapted from [24]

Fig

The different types of beehaviour defined and that can be used by the agents are:

• Integrating (IN): This sty

yle should be selected every time the decision-maker connsid-
ers that satisfying his ow
wn objectives is as important as satisfying the other parttici-
pants’ objectives. By chooosing this conflict style, the agent will seek and coopeerate
with other agents in orderr to find a solution that is satisfactory to all the participantts;
• Obliging (OB): This stylle should be selected if the decision-maker prefers to saatis-
fy other participant’s obbjectives instead of satisfying his own objectives. For ex-
ample, in a situation wheere the decision-maker does not have any knowledge abbout
the discussion topic;
• Dominating (DO): This style should be selected when the decision-maker oonly
wants to pursuit his ow wn objectives and do everything in his power to achiieve
them. For example, in a situation where the decision-maker is absolutely sure tthat
his option to solve the problem
p is the most benefic. By using this style, the aggent
will be more dominant and will try to persuade the maximum possible agents. W With
this style the agent will prefer to risk everything to achieve his objectives even if
that means he might end up at disadvantage because of that;
• Avoiding (AV): This sty yle should be selected when the decision-maker does not
have any interest in achieeving either his own and other participants’ objectives. For
example, when the deccision-maker has been include in a group discussion for
which he does not have any
a sort of interest;
• Compromising (CO): Th his style should be selected when the decision-maker haas a
moderate interest in the topic and at the same time he also has a certain interesst to
achieve his own and otheer participants’ objectives.

4 Discussion
Many approaches have been suggested in the literature which define/model ageents
with characteristics that willl differentiate them from each other and as result will aalso
10 J. Carneiro et al.

show different ways of operating [11-13, 25-27]. However, even if many of those
publications might be interesting for an academic context, they still show some issues
that must be addressed. These issues that we will analyze are related to the context of
support to group decision-making and also to competitive agents which that represent
real individuals. There are several approaches in literature for (1) agents that are mod-
eled according to the real participant personality (decision-maker) which they repre-
sent and (2) modeled with different intelligence levels (abilities) [10, 12]. One of the
most used technics in literature is “The Big Five Inventory” questionnaire that allows
to obtain values for each one of the personality traits defined in the model of “The Big
Five” (openness, conscientiousness, extraversion, agreeableness and neuroticism) [11,
26, 28]. Theoretically, we can think that the way agents operate, which is similar to
real participant because it is modeled with “the same personality” is perfect. Howev-
er, defining an agent with a conflict style based on the values of personality traits may
not be the right way to identify the decision-maker. What makes a human act in a
particular way is the result of much more than just its personality, it is a set of factors
such as: personality, emotions, humor, knowledge, and body (physic part), and it can
also be considered other factors such as sensations and the spiritual part [29]. Another
relevant question is the fact this type of approach allows that certain agents have ad-
vantage over other agents. Many may say and think that this occurrence is correct,
because close to what happens in real life, there are decision-makers that are more apt
and therefore have advantage over other decision-makers. However the questions that
arise are the following: Would a product like this used by decision-makers that knew
they would be at disadvantage by using this tool? Would it be possible to sell a prod-
uct that does not guarantee equality between its future users? It is also important to
discuss another relevant analysis point which is the fact that this type of approach, in
some situations, might provide less intelligent and more sequential outputs.
The study of different types of behaviour in agents has been represented in litera-
ture by a reasonable number of contributions. However, it is a subject that most of the
times offers validation problems. Although there are proposals with cases of study
aiming to validate this subject, that validation is somewhat subjective most of the
times. Even when trying to mathematically formulate the problem so that it becomes
scientifically “proven”, that proof may often feel forced. A reflection of this problem
is the difference between the practiced approaches for social and exact sciences. It is
clear for us, as computer science researchers that it is not our goal to elaborate a mod-
el for behavioural definition to use in specific scenarios. Instead, we will use a model
defined and theoretically validated by others who work in areas that allow them to
have these skills. However, the inclusion of intelligence in certain systems is growing
at a blistering pace and some of the systems would not make sense nor would succeed
without this inclusion. This means that it become more of common practice to adapt
certain models that have not been designed specifically for the context for which they
will be used. Because of that the evolution of the presented approaches will happen in
an empirical way.
Another relevant condition is related with how most of the works are focused on
very specific topics which may prevent a more pragmatic comparison of the various
approaches. Even if in some situations the use of a specific technic (such as “The Big
Defining Agents’ Behaviour for Negotiation Contexts 11

Five Inventory”) might make sense, in others, and even though it may scientifically
provide a case of study with brilliant results, it can be responsible for jeopardizing the
success of the system. Our work aims to support each participant (decision-maker) in
the process of group decision-making. It is especially targeted for decision support in
ubiquitous scenarios where participants are considered people with a very fast pace of
life, where every second counts (top managers and executives). In our context the
system will notify the participant whenever he is added in a decision process (for
instance, by email), and after that every participant can access the system and model
his agent according to his preferences (alternatives and attributes classification), as
well as how he plans to face that decision process (informing the agent about the type
of behaviour to have), always knowing that there are no required fields in the agent
setup. This way provides more freedom for the user to configure (depending on his
interest and time) his agent with detail or with no detail at all. As can be seen in this
context (and referred previously) the agents must be cooperative and competitive.
They are cooperative because they all seek one solution for the organization they
belong to, and competitive because each agent seeks to defend the interests of its par-
ticipant and persuade other agents to accept his preferred alternative. For us this
means that if an agent is both cooperative and competitive then it cannot exhibit
behaviour where it is only concerned in achieving its objectives and vice versa.

5 Conclusions and Future Work

The use of agents to represent/support humans as well as their intentions in negotia-

tion context is relatively common practice in literature. There are several approaches
which based on relationships allow agents to judge different levels for trust, credibil-
ity, intelligence, etc. Specifically looking at support to group decision-making
context, a few approaches have appeared and propose modelling agents based on a
number of characteristics that will allow them to operate in a way similar to how the
decision-maker would in real life. If in one hand the modelling of an agent with cer-
tain human characteristics makes sense since it allows to define different types of
behaviours and strategies according to the objectives of the decision-maker, on the
other hand even if some of those approaches may seem intellectually interesting and
complex, they affect the system where they belong due to many reasons, as for in-
stance: illusory intelligence creation, unbalanced agent capabilities, high configura-
tion costs e and weak representation of what in practice the decision-maker would
want the agent operating model to be.
In this paper we presented (1) a study about the most important models that can be
used to infer different types of behaviours that can be adapted and used in this con-
text, (2), a set of rules that must be followed and that will positively affect the system
when defining behaviours and (3) is proposed the adaptation of a conflict manage-
ment model in the context of GDSS. Furthermore we included a new approach of how
to look at this problem, and alert to the negative impact some other approaches might
12 J. Carneiro et al.

have in the system where they are used. Our approach intends to provide a more
perceptible and concrete way for the decision-maker to understand the five types of
behaviour that can be used to model the agent in support to group decision-making
context where each agent represents a decision-maker. We believe that with our
approach it will be simpler for agents to reach or suggest solutions since they are
modeled with behaviours according to what the decision-maker wants. This makes it
easier to reflect in the agent the concern to achieve the decision-maker’s objectives or
the objectives belonging to other participants in the decision process. With this ap-
proach the agents follow one defined type of behaviour that also works as a strategy
that can be adopted by each one of the decision-makers.
As for future work we will work in the specific definition of each type of behaviour
identified in this work. We intend to describe behaviours according to certain facets
proposed in the Five Factor Model and also study tendencies for each type of behav-
iour to make questions, statements, and requests. At later stage we will integrate this
model in the prototype of a group decision support system which we are developing.

Acknowledgements. This work is part-funded by ERDF - European Regional Development

Fund through the COMPETE Programme (operational programme for competitiveness) within
pro-ject FCOMP-01-0124-FEDER-028980 (PTDC/EEISII/1386/2012) and by National Funds
through the FCT - Fundação para a Ciência e a Tecnologia (Portuguese Foun-dation for Science
and Technology) with the João Carneiro PhD grant with the refer-ence SFRH/BD/89697/2012.

References
1. Rahwan, I., Ramchurn, S.D., Jennings, N.R., Mcburney, P., Parsons, S., Sonenberg, L.:
Argumentation-based negotiation. The Knowledge Engineering Review 18, 343–375
(2003)
2. Hadidi, N., Dimopoulos, Y., Moraitis, P.: Argumentative alternating offers. In: McBurney,
P., Rahwan, I., Parsons, S. (eds.) ArgMAS 2010. LNCS, vol. 6614, pp. 105–122. Springer,
Heidelberg (2011)
3. El-Sisi, A.B., Mousa, H.M.: Argumentation based negotiation in multiagent system. In:
2012 Seventh International Conference on, Computer Engineering & Systems (ICCES),
pp. 261–266. IEEE (1012)
4. Marey, O., Bentahar, J., Asl, E.K., Mbarki, M., Dssouli, R.: Agents’ Uncertainty in Argu-
mentation-based Negotiation: Classification and Implementation. Procedia Computer Sci-
ence 32, 61–68 (2014)
5. Mbarki, M., Bentahar, J., Moulin, B.: Specification and complexity of strategic-based rea-
soning using argumentation. In: Maudet, N., Parsons, S., Rahwan, I. (eds.) ArgMAS 2006.
LNCS (LNAI), vol. 4766, pp. 142–160. Springer, Heidelberg (2007)
6. Amgoud, L., Vesic, S.: A formal analysis of the outcomes of argumentation-based negotia-
tions. In: The 10th International Conference on Autonomous Agents and Multiagent
Systems, vol. 3, pp. 1237–1238. International Foundation for Autonomous Agents and
Multiagent Systems (2011)
Defining Agents’ Behaviour for Negotiation Contexts 13

7. Bonzon, E., Dimopoulos, Y., Moraitis, P.: Knowing each other in argumentation-based
negotiation. In: Proceedings of the 11th International Conference on Autonomous Agents
and Multiagent Systems, vol. 3, pp. 1413–1414. International Foundation for Autonomous
Agents and Multiagent Systems (2012)
8. Kraus, S., Sycara, K., Evenchik, A.: Reaching agreements through argumentation: a logi-
cal model and implementation. Artificial Intelligence 104, 1–69 (1998)
9. Faratin, P., Sierra, C., Jennings, N.R.: Negotiation decision functions for autonomous
agents. Robotics and Autonomous Systems 24, 159–182 (1998)
10. Rahwan, I., Kowalczyk, R., Pham, H.H.: Intelligent agents for automated one-to-many
e-commerce negotiation. In: Australian Computer Science Communications, pp. 197–204.
Australian Computer Society Inc. (2002)
11. Santos, R., Marreiros, G., Ramos, C., Neves, J., Bulas-Cruz, J.: Personality, emotion, and
mood in agent-based group decision making (2011)
12. Kakas, A., Moraitis, P.: Argumentation based decision making for autonomous agents. In:
Proceedings of the Second International Joint Conference on Autonomous Agents and
Multiagent Systems, pp. 883–890. ACM (2003)
13. Zamfirescu, C.-B.: An agent-oriented approach for supporting Self-facilitation for group
decisions. Studies in Informatics and control 12, 137–148 (2003)
14. Jung, C.G.: Psychological types. The collected works of CG Jung 6(18), 169–170 (1971).
Princeton University Press
15. Myers-Briggs, I.: The Myers-Briggs type indicator manual. Educational Testing Service,
Prinecton (1962)
16. Myers, I.B., Myers, P.B.: Gifts differing: Understanding personality type. Davies-Black
Pub. (1980)
17. Bates, M., Keirsey, D.: Please Understand Me: Character and Temperament Types. Prome-
theus Nemesis Book Co., Del Mar (1984)
18. Holland, J.L.: Making vocational choices: A theory of vocational personalities and work
environments. Psychological Assessment Resources (1997)
19. Kilmann, R.H., Thomas, K.W.: Interpersonal conflict-handling behavior as reflections of
Jungian personality dimensions. Psychological reports 37, 971–980 (1975)
20. Blake, R.R., Mouton, J.S.: The new managerial grid: strategic new insights into a proven
system for increasing organization productivity and individual effectiveness, plus a reveal-
ing examination of how your managerial style can affect your mental and physical health.
Gulf Pub. Co. (1964)
21. Walton, R.E., McKersie, R.B.: A behavioral theory of labor negotiations: An analysis of a
social interaction system. Cornell University Press (1991)
22. Costa, P.T., MacCrae, R.R.: Revised NEO Personality Inventory (NEO PI-R) and NEO
Five-Factor Inventory (NEO FFI): Professional Manual. Psychological Assessment Re-
sources (1992)
23. Howard, P.J., Howard, J.M.: The big five quickstart: An introduction to the five-factor
model of personality for human resource professionals. ERIC Clearinghouse (1995)
24. Rahim, M.A., Magner, N.R.: Confirmatory factor analysis of the styles of handling inter-
personal conflict: First-order factor model and its invariance across groups. Journal of
Applied Psychology 80, 122 (1995)
25. Allbeck, J., Badler, N.: Toward representing agent behaviors modified by personality and
emotion. Embodied Conversational Agents at AAMAS 2, 15–19 (2002)
26. Badler, N., Allbeck, J., Zhao, L., Byun, M.: Representing and parameterizing agent behav-
iors. In: Proceedings of Computer Animation, 2002, pp. 133–143. IEEE (2002)
14 J. Carneiro et al.

27. Velásquez, J.D.: Modeling emotions and other motivations in synthetic agents. In:
AAAI/IAAI, pp. 10–15. Citeseer (1997)
28. Durupinar, F., Allbeck, J., Pelechano, N., Badler, N.: Creating crowd variation with the
ocean personality model. In: Proceedings of the 7th International Joint Conference on Au-
tonomous Agents and Multiagent Systems, vol. 3, pp. 1217–1220. International Founda-
tion for Autonomous Agents and Multiagent Systems (2008)
29. Pasquali, L.: Os tipos humanos: A teoria da personalidade. Differences 7, 359–378 (2000)
Improving User Privacy and the Accuracy
of User Identification in Behavioral Biometrics

André Pimenta(B) , Davide Carneiro, José Neves, and Paulo Novais

Algoritmi Centre, Universidade Do Minho, Braga, Portugal

{apimenta,dcarneiro,jneves,pjon}@di.uminho.pt

Abstract. Humans exhibit their personality and their behavior through

their daily actions. Moreover, these actions also show how behaviors dif-
fer between different scenarios or contexts. However, Human behavior
is a complex issue as it results from the interaction of various internal
and external factors such as personality, culture, education, social roles
and social context, life experiences, among many others. This implies
that a specific user may show different behaviors for a similar circum-
stance if one or more of these factors change. In past work we have
addressed the development of behavior-based user identification based
on keystroke and mouse dynamics. However, user states such as stress or
fatigue significantly change interaction patterns, risking the accuracy of
the identification. In this paper we address the effects of these variables
on keystroke and mouse dynamics. We also show how, despite these
effects, user identification can be successfully carried out, especially if
task-specific information is considered.

Keywords: Mental fatigue · Machine learning · Computer security ·

Behavioral biometrics · Behavioral analysis

1 Introduction
In the last years there has been a signiﬁcantly increase in jobs that are mentally
stressful or fatiguing, in expense of otherwise traditional physically demanding
jobs [1]. Workers are nowadays faced not only with more mentally demanding
jobs but also with demanding work conditions (e.g. positions of high responsibil-
ity, competition, risk of unemployment, working by shifts, working extra hours).
This results in the recent emergence of stress and mental fatigue as some of the
most serious epidemics of the twenty ﬁrst century [2,3]. In terms of workplace
indicators, this has an impact on human error, productivity or quality of work
and of the workplace. In terms of social or personal indicators, this has an impact
on quality of life, health or personal development. Moreover, there is an increase
in the loss of focus that leads people to be unaware of risks, thus lowering the
security threshold.
Recent studies show the negative impact of working extra hours on produc-
tivity [4,5]: people work more but produce less. Stressful milieus just add to the

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 15–26, 2015.
DOI: 10.1007/978-3-319-23485-4 2
16 A. Pimenta et al.

problem. The questions is thus how to create the optimal conditions to meet
productivity requirements while respecting people’s well-being and health. Since
each worker is different, what procedures need to be implemented to measure the
level of stress or burnout of each individual worker? And their level of produc-
tivity? The mere observation of these indicators using traditional invasive means
may change the worker’s behavior, leading to biased results that do not reflect
his actual state. Directly asking, through questionnaires or similar instruments,
can also lead to biased results as workers are often unwilling to share feelings
concerning their workplace with their coworkers.
Recent approaches for assessing and managing fatigue have been developed
that look at one’s interaction patterns with technological devices to assess
one’s state (e.g. we type at a lower pace when fatigued). Moreover, the same
approaches can be used to identify users (e.g. each individual types in a dif-
ferent manner). This field is known as behavioral biometrics. In this paper we
present a framework for collecting from users in a transparent way, that allows
to perform tasks commonly associated to behavioral biometrics. Moreover, this
framework respects user privacy. Finally, we show how including the user’s state
and information about the interaction context may improve the accuracy of
user identification. The main objective of this work is to define a reliable and
non-intrusive user identification system for access control.

1.1 Human-Computer Interaction

Currently there is a very large community of computer users. In fact, most of the
jobs today require some type of computer usage [6]. Moreover, with services like
home banking, tech support services and social networks, people start to interact
more with computers than with other persons, even to take care of important
aspects of their lives.
There is thus a new form of communication in which computers serve as inter-
mediaries. Therefore, the perception of a conversation between human beings is
somewhat lost. By using a computer, the people involved cannot perceive one
of the most important aspects in communication which is body language. Other
important aspects include speech, intonation or facial expressions, just to men-
tion a few. To overcome this loss of information computer systems must adopt
new processes to better perceive the human being [7].
In fact, Humans tend to show their personality or their state through their
actions, even in an unconscious way. Facial expressions and body language, for
example, have been known as a gateway for feelings that result in intentions.
The resultant actions can be traced to a certain behavior. Therefore, it is safe
to assume that a human behavior can be outlined even if the person does not
want to explicitly share that information.
Human behavior can also be deemed complex as it is driven by internal and
external factors, such as personality, culture, education, social roles, life experi-
ences, among others. Accurately evaluating a behavior requires constant observa-
tion of all the elements that are able to provide useful information. Nonetheless,
with our evolved social skills, we are often able to conceal certain emotions or
Improving User Privacy and the Accuracy of User Identiﬁcation 17

shielding them with others. Thus, multi-modal approaches should be consid-

ered for increased accuracy (e.g. relying solely on visual emotion recognition
may be less accurate than including additional aspects such as tone of voice
or speech rhythm). An potentially interesting process is to consider involuntary
actions, that cannot consciously controlled by the individual. This often includes
movement and posture, hand gestures, touches and interaction with objects the
environment, among others. Therefore, it can be stated that the observation
and evaluation of behaviors must consider not only the displayed emotions and
actions, but also the nature of the interaction with the environment.
One particularly interesting source of these unconscious behaviors is our
interaction with computers and other technological devices. In fact, we don’t
think about controlling the rhythm at which we type on a keyboard or the way
we move the mouse when we become fatigued or stress, although we might want
to hide our state from our colleagues or superiors. But the truth is that our inter-
action does change, as we have established in previous work [8,9]. Under different
states we use the mouse and the keyboard differently. Moreover, we also interact
differently with the smartphone. The case of the smartphone is still more inter-
esting as it provides a range of sensors that are not available on other platforms
and can provide very valuable information about interaction patterns, including
a touch screen (that provides information about touches, their intensities, their
area or their duration), gyroscopes, accelerometers, among others.
Human-Computer Interaction thus becomes a very promising field when it
comes to reliable sources of information for characterizing one’s state following
behavioral approaches, the main advantage being that the individual generally
does not or cannot consistently change such precise and fine-grained behaviors.
Thus, while one can, to some extent, fake facial expressions and transmit a
chosen emotion or state, doing so successfully through these behaviors results
much harder.

2 Security Systems

As stated before, with the increase of Human-computer Interaction and the

development of social-networking, people rapidly increased the rate at which
they share information, even when it is sensitive personal information. Some of
the current concerns are thus privacy, security and data protection.
Deemed as the most profitable crime of modern times, information theft is
increasing at an alarming rate, the corporate sector being the most affected. This
type of crime frequently includes the theft of personal data and often does a large
damage in a person’s life, as well as in companies. One way to improve security
is to build robust authentication systems which prevent unauthorized access to
machines. These systems may range from password-based authentication, in the
simplest cases, to biometry systems in the most complex ones.
One of the possible ways to increase security in the context of user identi-
fication is to consider the user’ behavior when interacting with a technological
18 A. Pimenta et al.

device. As each individual has a particular way to walk, talk, laugh or do any-
thing else, each one of us has also their own interaction patterns with technolog-
ical devices. Moreover, most of the applications we interact with have a specific
flow of operation or require a particular type of interaction, restricting or condi-
tioning the user’s possible behaviors to a smaller set. Maintaining a behavioral
profile of authorized users may allow to identify uncommon behaviors on the
current user that may indicate a possible unauthorized user. This is even more
likely to work when behavioral information for particular applications is used.
Such systems are known as Behavioural Biometrics: they rely on the users’
behavioral profiles to establish the behavior of authorized users. Whenever, when
analyzing the behavior of the current user, a moderate behavioral deviation is
detected from the known profiles, the system may take action such as logging off,
notifying the administrator or using an alternative method of authentication.
Such systems can also include behaviors other than the ones originated from
keyboard and mouse interaction patterns [10]. In fact, any action performed on the
technological device can be used as threat detection. For example, if the system
console is started and the authorized user of the device never used the console
before, a potential invasion may be taking place. Similar actions can be taken on
other applications or even on specific commands (e.g. it is unlikely that a user with
a non-expert profile suddenly starts using advanced commands on the console).
To implement behavioral biometrics, distinct procedures that can be adopted,
such as:

– Biometric Sketch: this method uses the user drawings as templates for com-
parison [11,12]. The system collects patterns from the user drawing and
compares it to others in a database. Singularity is assured by the number
of possible combinations. The downside is that the drawings must be very
precise, which in most of the cases is quite difficult, even more by using a
standard mouse to draw.
– GUI Interaction: this technique uses the interaction of the user with visual
interfaces of the applications and compares it to the model present in the
database. For every application that the user interacts with, a model must
be present. Thus, both the model and the application must be saved in the
database. This method requires that every action per application is saved,
resulting in a large amount of information to be maintained. Moreover, each
new application or update must be trained and modeled. Therefore, this
method is very strict and complex to implement and maintain.
– Keystroke Dynamics: this method uses the keyboard as input and is based
on the user’s typing patterns. It captures the keys pressed, measuring time
and pressing patterns, extracting several features about the typing behavior.
This is a well established method, as it relies solely on the user’s interaction,
allowing to create simple and usable models.
– Mouse Dynamics: this method consists in capturing the mouse movement
and translating it into a model. All the interactions are considered, such
as movement and clicking. This method is similar to the GUI Interaction
method but simpler; although it suffers the same context problem. The main
Improving User Privacy and the Accuracy of User Identification 19

purpose of this method is to accompany the Keystroke Dynamics, providing

additional data to the main model, thus increasing accuracy and decreasing
false positives.
– Tapping: this method focuses on the pulse wave resulting from a touch sensor.
The pulse duration and tapping interval are the proprieties considered for
the analysis. The way a user acts with a smartphone or computer can be
monitored, and a model can be extracted. Conceptually, it is a solid method,
as tapping follows the same concept of the keystroke dynamics, but in reality
the usage context must also be considered, as tapping changes according to
the application being used.

In this work, Keystroke Dynamics and Mouse Dynamics are chosen as inputs.
Their broader features and availability are the traits that suit the aim of the
intended system. They are application independent and operating system inde-
pendent, and are nowadays the most common input method when interacting
with computers.

3 The Framework
The framework developed in the context of this work is a uniﬁed system, com-
posed of two main modules: fatigue monitoring and security. The process of
monitoring is implemented using an application that captures the keyboard and
mouse inputs transparently. The features used are the same in both systems and
are deﬁned in more detail in [13]. The features extracted from the keyboard are:

– Key down time: total time that a key is pressed

– Time between keys: time between a key being released and the following key
being pressed
– Writing speed: the rhythm at which keys are pressed
– Errors per key: quantiﬁcation of the use of the backspace and delete keys

And the features extracted from the mouse are:

– Distance of the mouse to the straight line: the sum of the distances between
the pointer and the line deﬁned by each two consecutive clicks
– Mouse acceleration: the acceleration of the mouse
– Mouse velocity: the velocity of the mouse
– Average distance of the mouse to the straight line: the average distance
at which the mouse pointer is from the line that is deﬁned by each two
consecutive clicks
– Total excess of distance: the distance that the mouse travels in excess
between each two consecutive clicks
– Average excess of distance: the distance that the mouse travels in excess
divided by the shortest distance between each two consecutive clicks
– Time between clicks: the time between each two consecutive clicks
– Distance during clicks: distance traveled by the mouse while performing a
click
20 A. Pimenta et al.

– Double click duration: the time between two clicks in a double click event
– Absolute sum of angles: the quantification of how much the mouse turns,
regardless of the direction of the turn, between each two consecutive clicks
– Signed sum of angles: the quantification of how much the mouse turns, con-
sidering the direction of the turn, between each two consecutive clicks
– Distance between clicks: the distance traveled between each two consecutive
clicks
The data gathered may be processed differently to extract the information
related to each scope of the framework (fatigue monitoring and stress). An inte-
grated system is beneficial due to the jointly nature of the data and to the
fact that only one application is present locally, thus having a low footprint on
computer resources. Furthermore, these are two areas that are intrinsically con-
nected, and one can affect the other. Their joint analysis is fundamental to the
achievement of the proposed objectives.

3.1 Providing Security and Safety in the Monitoring System

The use of Keystroke Dynamics and Mouse Dynamics for detecting behavior
is extremely useful, especially since it allows the creation of non-intrusive and
non-invasive systems.
However the use of behavioral biometrics, in particular the use of keystroke
dynamics, can pose some risks to the data security and privacy of the user,
especially when the data obtained from the mouse and keyboard are processed
by remote or 3rd party Web Services. For this reason, and to ensure the users’
privacy and data security, data must be encrypted. The most sensitive data
is indeed the information about which key was pressed. However, this speciﬁc
information is mostly irrelevant for behavior analysis, i.e., we are not interested
in which keys the user presses but on how the user presses them.
We therefore propose the collection of events created by the mouse and the
keyboard in the following way:

– MOV, timestamp, posX, posY - an event describing the movement of the

mouse, in a given time, to coordinates (posX, posY) in the screen;
– MOUSE DOWN, timestamp, [Left—Right], posX, posY - this event
describes the first half of a click (when the mouse button is pressed down),
in a given time. It also describes which of the buttons was pressed (left or
right) and the position of the mouse in that instant;
– MOUSE UP, timestamp, [Left—Right], posX, posY - an event similar to
the previous one but describing the second part of the click, when the mouse
button is released;
– MOUSE WHEEL, timestamp, dif - this event describes a mouse wheel scroll
of amount dif, in a given time;
– KEY DOWN, timestamp, encrypted key - identifies a given key from the
keyboard being pressed down, at a given time;
– KEY UP, timestamp, encrypted key - describes the release of a given key
from the keyboard, in a given time;
Improving User Privacy and the Accuracy of User Identification 21

In this approach, the encrypted key replaces the information about the spe-
cific key pressed. It is therefore still possible, while hiding what the user wrote,
to extract the previously mentioned features, thus guaranteeing the user’s pri-
vacy. The encryption of the pressed keys is carried out through the generation of
random key encryptions at different times. This is done as depicted in Algorithm
1, which exemplifies the developed approach for the case of the key down time
feature. An example of the result of the algorithm is depicted in Table 1, where
a record with and without encryption is depicted for different keys.

Data: Keyboard Inputs

Result: List of KeyDownTime records
while Keyboard Inputs have records do
Get timestamp of a KEY DOWN ;
Get code of a KEY DOWN ;
while Keyboard Inputs have records do
Get timestamp of a KEY UP ;
Get code of a KEY UP ;
if KEY DOWN code and KEY UP code are the same then
Save the diﬀerence between KEY UP timestamp and KEY DOWN
timestamp;
end
end
end
Algorithm 1. Key down time algorithm

Table 1. Example of a keyboard log with and without encryption.

No Encryption Random Encryption Key 1

KD,63521596046072,A KD,63521596046072,1COc0qNOOk=
KU,63521596046165,A KU,63521596046165,1COc0qNOOk=
KD,63521596057943,v KD,63521596057943,sMA0Wu0n3k=
KU,63521596058037,v KU,63521596058037,sMA0Wu0n3k=
KU,63521596058084,a KU,63521596058084,hb0s0lHEF8+sA==
KU,63521596058037,v KU,63521596058037,sMA0Wu0n3k=

Hiding user input is just one part of the solution for the issue of user security.
The other is to prevent intrusions. In this scope, behavioral biometrics security
systems can run in two different modes [14]: identification mode and verification
mode. In this system we use the identification mode instead of the verification
mode to ensure a constant user identification in the monitoring system.
The identification mode is the process of trying to discover the identity of
a person by examining a biometric pattern calculated from biometric data of
22 A. Pimenta et al.

the person. In this mode the user is identified based on information previously
collected from keystroke dynamics profiles of all users. For each user, a biometric
profile is built in a training phase. When in the running phase, the usage pattern
being created in real-time is compared to every known model, producing either
a score or a distance that describes the similarity between the pattern and the
model. The system assigns the pattern to the user with the most similar biomet-
ric model. Thus, the user is identified without the need for extra information.

4 Case Study

The system was analyzed and tested in four different ways. As a first step we
used the records of 40 users registered in the monitoring system to train different
models. The created models were then validated through 150 random system
usage records, taken from the system in order to validate the models created.
In a second step models were created using the type of task to be performed in
addition to the biometric information, and in the third step, models were trained
using the user’s fatigue state. We finally created models that, in addition to using
biometric data, also used the type of task and the user’s mental state at the time
of registration. Both the type of task as the user’s mental state are provided by
the monitoring system.
The participants, forty in total (36 men, 4 women) which are registered in
the monitoring system. Their age ranged between 18 and 45. The following
requirements were established to select, among all the volunteers, the ones that
participated: (1) familiarity and proficiency with the use of the computer; (2)
use of the computer on a daily basis and throughout the day; (3) owning at least
one personal computer.

4.1 Results and Discussion

After training different models (Naive Bayes, KNN, SVM and Random Forest)
with data from different users on the system, different degrees of accuracy have
been obtained in user identification, as depicted in Figure 1. It is also possible
to observe that the type of task and the level of fatigue have influence on the
process of identifying the user, since these factors effectively influence interaction
patterns. Taking this information into consideration allows the creation of more
accurate models.
The type of task being carried out during the monitoring of the interaction
patterns is particularly important, mainly due to the very nature of the task, as
well as the set of tools available to perform the task. Figure 2 shows the values of
features Key Down Time and Average Excess of Distance for five different types
of applications: Chat, Leisure, Office, Reading and Programming. The way each
different application conditions the interaction behavior is explicit. Such infor-
mation must, therefore, absolutely be considered while developing behavioral
biometrics system based on input behavior. Table 2 further supports this claim
showing that data collected in different types of applications has statistically
Improving User Privacy and the Accuracy of User Identification 23

Fig. 1. Accuracy of the diﬀerent algorithms and inputs considered for user authenti-
cation.

Table 2. p-values of the Kruskal-Wallis test when comparing the data organized
according to the types of application and to the level of fatigue. In the vast majority
of the cases, the differences between the different groups are statistically significant.

Features Applications Fatigue

Writing Velocity 4.4e-15 0.43
Time Between Keys <2.2e-16 <2.2e-16
Key Down Time <2.2e-16 3.8e-14
Distance of the Mouse to the SL 0.01 <2.2e-16
Mouse acceleration <2.2e-16 1.3e-09
Average Distance of the Mouse to the SL 0.03 <2.2e-16
Mouse velocity <2.2e-16 <2.2e-16
Average Excess of Distance 0.04 1.6e-08
Time Between Clicks <2.2e-16 <2.2e-16
Distance During Clicks <2.2e-16 0.05
Total Excess of Distance 0.22 <2.2e-16
Double Click Duration 1.0e-4 <2.2e-16
Absolute Sum of Angles 8.1e-3 2.5e-13
Signed Sum of Angles 0.91 <2.2e-16
Distance between clicks 0.06 <2.2e-16

significant differences for most of the features. The same happens for different
levels of fatigue.
Another extremely important aspect in user identification is the influence
of mental states on interaction patterns. Previous studies by our research team
[8,15] show that individuals under different states of stress, fatigue, high/low
mental workload or even mood evidence significant behavior changes that impact
interaction patterns with devices. They do, consequently, influence behavioral
24 A. Pimenta et al.

Fig. 2. Diﬀerences in the distributions of the data when comparing interaction patterns
with diﬀerent applications, for two interaction features.

Fig. 3. Effects of different levels of fatigue on the interaction patterns, depicted for two
different features.

biometric features. Figure 3 depicts this influence for two interaction features.
Numbers in the y-axis represent the level of fatigue as self-reported by the indi-
vidual using the seven-point USAFSAM Mental Fatigue Scale questionnaire [16].
Each value represents the following state:
1. Fully alert. Wide awake. Extremely peppy.
2. Very lively. Responsive, but not at peak.
3. Okay. Somewhat fresh.
4. A little tired. Less than fresh.
5. Moderately tired. Let down.
6. Extremely tired. Very difficult to concentrate.
7. Completely exhausted. Unable to function effectively. Ready to drop.
It is therefore possible to see how increased levels of fatigue result in generally
less efficient interactions of the participants with the computer. For example,
a higher value of the Mouse Acceleration depicts a more efficient interaction
behavior in which the user is moving the mouse. The same happens with Key
Down Time, where a shorter time corresponds to a more efficiency in the use of
the keyboard.
This conclusion justifies the need for the inclusion of mental states on behav-
ioral biometrics approaches and explains the increased accuracy of the presented
approach concerning user identification when all modalities are used jointly:
interaction patterns, user state and type of application, as depicted previously
in Figure 1.
Improving User Privacy and the Accuracy of User Identification 25

5 Conclusions and Future Work

This paper describes a non-invasive and non-intrusive approach to the field of
behavioral biometrics. Specifically, we described a system that, transparently,
acquires interaction features from keystroke dynamics and mouse dynamics, with
the purpose of user identification. We analyzed the significant effect that mental
states have on interaction patterns, particularly fatigue, to conclude that these
aspects must be considered when developing identification systems based on
interaction patterns. Likewise, we also show how user interaction significantly
changes according to the application being used.
These notions support the claim put forward in this paper that accurate
user identification approaches based on behavioral biometrics must absolutely
include these aspects when building user profiles. That much is evidenced by our
results. In all four classification algorithms used for user identification, the cases
in which all these types of information are used together outperform the others.
Moreover, the proposed approach is also concerned with user privacy. Specif-
ically, it is based not on what the user inputs but on how the user inputs. To this
end it conceals each key under an encryption key that prevents remote services
from knowing what the user typed. The data gathered from users thus respects
their privacy while allowing to extract the necessary features for the system to
carry out its task, i.e., to detect unauthorized accesses.
In future work we will focus on including additional user features, namely
in what concerns user state. Specifically, we will include information regarding
the user’s level of stress, which we have already determined in previous work
to significantly influence interaction patterns with technological devices. In the
long-term we will include more potentially interesting features, namely the user’s
mood. It is our conviction that this kind of information, when available, can
significantly increase the accuracy of user identification in behavioral biometrics
approaches.

Acknowledgments. This work is part-funded by ERDF - European Regional Devel-

opment Fund through the COMPETE Programme (operational programme for com-
petitiveness) and by National Funds through the FCT (Portuguese Foundation for Sci-
ence and Technology) within project FCOMP-01-0124-FEDER-028980 (PTDC/EEI-
SII/1386/2012) and project PEst-OE/EEI/UI0752/2014.

References
1. Tanabe, S., Nishihara, N.: Productivity and fatigue. Indoor Air 14(s7), 126–133
(2004)
2. Miller, J.C.: Cognitive Performance Research at Brooks Air Force Base, Texas,
1960–2009. Smashwords, March 2013
26 A. Pimenta et al.

3. Wainwright, D., Calnan, M.: Work stress: the making of a modern epidemic.
McGraw-Hill International (2002)
4. Folkard, S., Tucker, P.: Shift work, safety and productivity. Occupational Medicine
53(2), 95–101 (2003)
5. Rosekind, M.R.: Underestimating the societal costs of impaired alertness: safety,
health and productivity risks. Sleep Medicine 6, S21–S25 (2005)
6. Beauvisage, T.: Computer usage in daily life. In: Proceedings of the SIGCHI con-
ference on Human Factors in Computing Systems, pp. 575–584. ACM (2009)
7. Pantic, M., Rothkrantz, L.J.: Toward an aﬀect-sensitive multimodal human-
computer interaction. Proceedings of the IEEE 91(9), 1370–1390 (2003)
8. Pimenta, A., Carneiro, D., Novais, P., Neves, J.: Monitoring mental fatigue through
the analysis of keyboard and mouse interaction patterns. In: Pan, J.-S., Polycarpou,
M.M., Woźniak, M., de Carvalho, A.C.P.L.F., Quintián, H., Corchado, E. (eds.)
HAIS 2013. LNCS, vol. 8073, pp. 222–231. Springer, Heidelberg (2013)
9. Carneiro, D., Castillo, J.C., Novais, P., Fernández-Caballero, A., Neves, J.: Multi-
modal behavioral analysis for non-invasive stress detection. Expert Systems with
Applications 39(18), 13376–13389 (2012)
10. Lee, P.M., Chen, L.Y., Tsui, W.H., Hsiao, T.C.: Will user authentication using
keystroke dynamics biometrics be interfered by emotions?-nctu-15 aﬀective key-
board typing dataset for hypothesis testing
11. Al-Zubi, S., Brömme, A., Tönnies, K.D.: Using an active shape structural model
for biometric sketch recognition. In: Michaelis, B., Krell, G. (eds.) DAGM 2003.
LNCS, vol. 2781, pp. 187–195. Springer, Heidelberg (2003)
12. Brömme, A., Al-Zubi, S.: Multifactor biometric sketch authentication. In: BIOSIG,
pp. 81–90 (2003)
13. Pimenta, A., Carneiro, D., Novais, P., Neves, J.: Analysis of human performance
as a measure of mental fatigue. In: Polycarpou, M., de Carvalho, A.C.P.L.F.,
Pan, J.-S., Woźniak, M., Quintian, H., Corchado, E. (eds.) HAIS 2014. LNCS,
vol. 8480, pp. 389–401. Springer, Heidelberg (2014)
14. Shanmugapriya, D., Padmavathi, G.: A survey of biometric keystroke dynamics:
Approaches, security and challenges (2009). arXiv preprint arXiv:0910.0817
15. Rodrigues, M., Gonçalves, S., Carneiro, D., Novais, P., Fdez-Riverola, F.:
Keystrokes and clicks: measuring stress on E-learning students. In: Casillas, J.,
Martı́nez-López, F.J., Vicari, R., De la Prieta, F. (eds.) Management Intelligent
Systems. AISC, vol. 220, pp. 119–126. Springer, Heidelberg (2013)
16. Samn, S.W., Perelli, L.P.: Estimating aircrew fatigue: a technique with application
to airlift operations. Technical report, DTIC Document (1982)
Including Emotion in Learning Process

Ana Raquel Faria1(), Ana Almeida1, Constantino Martins1,

Ramiro Gonçalves2, and Lino Figueiredo1
1
GECAD - Knowledge Engineering and Decision Support Research Center Institute
of Engineering, Polytechnic of Porto (ISEP/IPP), Porto, Portugal
{arf,amn,acm,lbf}@isepp.ipp.pt
2
Universidade de Trás-Os-Montes E Alto Douro, Vila Real, Portugal
[email protected]

Abstract. The purpose of this paper is to propose new architecture that includes
the student’s, learning preferences, personality traits and emotions to adapt the
user interface and learning path to the students need and requirements. This
aims to reduce the difficulty and emotional stain that students encounter while
interacting with learning platforms.

Keywords: Learning styles · Student modeling · Adaptive systems · Affective

computing

1 Introduction

Human to human communication depends on the interpretation of a mix of audio-

visual and sensorial signals. To simulate the same behaviour in a human-machine
interface the computer has to be able to detect affective state and behaviour alterations
and modify its interaction accordingly. The field of affective computing develops
systems and mechanisms that are able to recognize, interpret and simulate human
emotions [1] so closing the gap between human and machine. Affective Computing
concept was introduced by Rosalind Picard in 1995 as a tool to improve human-
machine interfaces by including affective connotations.
Emotion plays an important role in the decision process and knowledge acquisition
of an individual. Therefore, it directly influences the perception, the learning process,
the way people communicate and the way rational decisions are made. So the impor-
tance of understanding affects and its effect on cognition and in the learning process.
To understand how the emotions influence the learning process several models were
developed. Models like Russell’s Circumplex model [2] are used to describe user’s
emotion space and Kort’s learning spiral model [3] are used to explore the affective
evolution during learning process.
In a traditional learning context the teacher serves as a facilitator between the student
and his learning material. Students, as individuals, differ in their social, intellectual,
physical, emotional, and ethnic characteristics. Also, differ in their learning rates, objec-
tives and motivation turning, their behaviour rather unpredictable. The teacher has to
perceive the student state of mind and adjust the teaching process to the student’s needs
© Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 27–32, 2015.
DOI: 10.1007/978-3-319-23485-4_3
28 A.R. Faria et al.

and behaviour. In a learning platform this feedback process does not take place in real
time and, sometimes it is not what the student requires to overcome the problem at hand.
This overtime can become a major problem and cause difficulties to the student learning
process. A possible solution to this problem could be the addition of mechanisms, to the
learning platforms, that enable computers to detect and interfere when the student re-
quires help or motivation to complete a task. The major difficulties of this work will be
the detection of these situations and how to interfere. The method of detection cannot be
too intrusive, because that would affect the student behaviour in a negative way that
would cause damage to his learning process. Another important issue is the selection of
the variable to monitor. This can include the capture of emotions, behaviour or learning
results among others. Finally, determining which will be the computer intervention
when a help situation is detected in order to reverse the help situation.

2 EmotionTest Prototype

In order to prove that emotion can have influence in the learning process. A proto-
type (EmotionTest) was developed, a learning platform that takes into account the
emotional aspect, the learning style and the personality traits, adapting the course
(content and context) to the student needs. The architecture proposed for this proto-
type is composed of 4 major models: the Application Model, Emotive Pedagogical
Model, Student Model, and Emotional Model [4], as shown in the figure bellow.

Fig. 1. Architecture

The student model consists in the user information and characteristics. This in-
cludes personal information (name, email, telephone, etc.), demographic data (gender,
race, age, etc.), knowledge, deficiencies, learning styles, emotion profile, personality
traits, etc. This information is use by the student model to better adapt the prototype to
student [4].
The emotion model gathers all the information the facial emotion recognition soft-
ware and feedback of the students. Facial Expression Recognition allows video analy-
sis of images in order to recognise an emotion. This type of emotion recognition was
Including Emotion in Learning Process 29

chosen because it was the least intrusive with the student activities. The emotion rec-
ognition is achieved by making use of an API entitled ReKognition [5]. This API
allows detection of the face, eyes, nose and mouth and if the eyes and mouth are open
or close. In addition specifies the gender of the individual and an estimate of age and
emotion. In each moment a group of three emotions is captured. For each emotion is
given a number that shows the confidence level of the emotion captured.
The application model is compose by a series of modules contain different subjects.
The subject consist in a number steps that the student has to pass in order to complete
is learning program. Usually each subject is composed by a Placement test in order to
access and update the student level of knowledge. Followed by the subject content in
which the subject is explained and follow by the subject exercises and final test. The
first step is the subject’s Placement Test (PT) that can be optional. This is designed to
give students and teachers a quick way of assessing the approximate level of student's
knowledge. The result of the PT is percentage PTs that is added to the student know-
ledge (Ks), on a particular subject, and places the student at one of the five levels of
knowledge ∑ exercise . If the PT is not performed Ksp will be equal to zero
and the student will start with any level of knowledge. The Subject Content (SC) con-
tains the subject explanation. The subject explanation depends on the stereotype. Each
explanation will have a practice exercise. These exercises will allow the students to
obtain points to perform the final test of the subject. The student needs to get 80% on
the TotalKsc to undertake the subject test. The Subject Test (ST) is the assessment of
the learned subject. This will give a final value kst that represents the student’s know-
ledge on the subject, ∑ exercise Only if the kst is higher than 50% it can be
concluded that the student has successful completed the subject. In this case the values
of the ksp and kst are compared to see if there was an effective improvement on the
student’s knowledge. This is represented in the following diagram [6].

Placement
update Ksp
Test

Learning
improvement
Subject
Exercise Ksc1
content 1
+ yes
Subject Ksc2 If ksp>kst
Exercise
No content 2
+
No
Subject
Exercise Ksc1
content 3
No
improvement
... =
yes

If TotalKsc>80% TotalKsc

yes
No
Subject
Kst If kst>50%
Test

Fig. 2. Representation diagram

The last model is the emotive pedagogical model that is composed by three sub-
models: the rules of emotion adaptability, the emotional interaction mechanisms and
the graph of concepts in case of failure.
The Rules of Emotion Adaptability manage the way the subject content is pre-
sented. The subject content is presented according the student’s learning preference
30 A.R. Faria et al.

and personality. This way information and exercises are presented in a manner more
agreeable to the student helping him to comprehend the subject at hand.
The subject content and subject exercises are presented according the learning
style and personality of the student. The emotional interaction mechanisms consist in
the trigger of an emotion interaction when is captured an emotion that need to be con-
tradicted in order to facilitate the learning process. The emotions to be contradicted
are: anger, sadness, confusion and disgusted. The interaction can depend on the per-
sonality and on the learning style of the student. Finally the graph of concepts in case
of failure this indicates the steps to be taken when a student fails to pass a subject.
The graph of concepts in case of failure represents the steps to be taken when fail to
surpass a subject. To be approved in a subject the whole the tasks must be completed,
and only with a subject completed it is possible to pass to the next. Inside of a subject
the student has to complete the placement test, the subject content plus exercises with
a grade equal or higher than 80% and the subject test with approval with a grade
higher than 50% to complete the subject. In case of failure it has to go back to the
subject content and repeat all the steps [6].

3 Data Analysis

To test the performance of the developed prototype some experiences were carried
out with students from two ISEP Engineering courses: Informatics Engineering and
Systems Engineering. The total number of students involved in these tests was 115
with ages between 17 and 42 years old. This group of students was composed of 20%
female (n=23) and 80% male (n=92), the participants were mainly from the districts
of Oporto, Aveiro and Braga.
To assess validity of the prototype the students were divided in two groups, v1 and
v2. Group v1 tested the prototype with emotional interaction and group v2 without
any emotional interaction.
Group v1 had to accomplish a diagnostic test (in paper) to help grade the student
initial knowledge level, followed by the evaluation of the prototype with the emotion-
al interaction and learning style. This would include, the login into the prototype, at
this moment is when the initial data is begins to be collected for the student model. By
accessing the school’s Lightweight Directory Access Protocol (LDAP) one was able
to gather the generic information of the students (like name, email and other). After
login the students were required to answer two questionnaires (TIPI, VARK) build
into the prototype. This allows the prototype to known the student’s personality traits
and learning preferences. Afterward the student could assess the learning materials
and exercises. From the moment the student login his emotion state has been monitor
and saved and every time that is detected an emotion that triggers an intervention it
would appear on the screen. After this evaluation the students had to complete a final
test (in paper) to help grade the student final knowledge level [7].
Group v2 had to accomplish diagnostic test (in paper) to help grade the student ini-
tial knowledge level, followed by the prototype evaluation without any emotional
interaction. This evaluation is in all similar to group v1, but with one big difference.
Even though the emotional state is monitor, when is detected an emotion that triggers
Including Emotion in Learning Process 31

an intervention it would not appear on the screen. After this test the students had to
complete a final test (in paper) to help grade the student final knowledge level [7].
Analyzing the data of evaluation test, for group v3 for diagnostic test it has a mean
of 45,7% (SD =40,3 ) and for the final test a mean of 85,7% (SD=12,2) and for group
v4 for diagnostic test it has a mean of 37,1% (SD = 29,2) and for the final test a mean
of 61,4% (SD=33,7). The data gathered did not have a normal distribution so the two
groups were compared using a non-parametric test Mann-Whitney. The diagnostic
test has a Mann–Whitney U = 83,0 and for a sample size of 14 students. For this anal-
ysis it was found a P value of 0,479 which indicates that it doesn’t have any statistical
difference which is understandable because it was assumed that all students had more
or less the same level of knowledge. The final test it has a Mann–Whitney U =54,0
and for an equal sample size of the diagnostic test. For this analysis it was found a P
value of 0,029 in this case the differences observed are statistical different. In addi-
tion, a series of tests were made to compare the means values of the students by group
and by learning preference, by group and personality and by group and emotional
state. The objective of running these tests was to find out if learning preference, per-
sonality and emotional state had any influence on the outcome of the final test. In
relation to the first two tests, by learning preference and by personality, no statistical-
ly significant differences were found in the data. Therefore it cannot be concluded
that learning preference and personality in each group had any influence in the final
test outcome. To prove this assumption it is needed a larger sample size. For the ques-
tion “if the emotional state had any influence in the in the final test”, the differences
observed were statistical significant [7].

4 Conclusion
In conclusion, with this work was attempted to answer several questions. The central
question that guided this work was: “Does a learning platform that takes into account
the student’s emotions, learning preferences and personality improve the student’s
learning results?”
The gathered data from the performed test showed that there is a statistical differ-
ence between students’ learning results while using two learning platforms: one learn-
ing platform that takes into account the student’s emotional state and the other
platform that does not have that in consideration. This gives an indication that by
introducing the emotional component, the students’ learning results can possibly be
improved. Another question was: “Does Affective computing technology help im-
proving a student learning process?”
In answering positively to the central question, this question is partly answered. As
results showed, the students’ learning results can be improved by adding an emotional
component to a learning platform; also the use of Affective Computing technology to
capture emotion can enhance this improvement. The use Affective Computing allows
the capture of the student’s emotion by using techniques that don’t inhibit the stu-
dent’s actions. Also, it can be used one or more techniques simultaneously to help
verify the accuracy of the emotional capture. The last question was: “What are the
stimuli that can be used to induce or change the student state of mind in order to im-
prove the learning process?”
32 A.R. Faria et al.

First, the results indicate that the platform with an emotional component had an
overall set of more positive emotions than the platform without this component.
Showing that, the stimuli produced in the platform with an emotional component was
able to keep the students in a positive emotional state and motivated to do the tasks at
hand, this did not happen in the platform without this component.
Second the results demonstrated that the platform with an emotional component
not only got the set of more positive emotions among the students, but also obtained
an improvement in the students learning results.

Acknowledgments. This work is supported by FEDER Funds through the “Programa Opera-
cional Factores de Competitividade - COMPETE” program and by National Funds through
FCT “Fundação para a Ciência e a Tecnologia” under the project: FCOMP-01-0124-FEDER-
PEst-OE/EEI/UI0760/2014.

References
1. Picard, R.W., Papert, S., Bender, W., Blumberg, B., Breazeal, C., Cavallo, D., Machover,
T., Resnick, M., Roy, D., Strohecker, C.: Affective learning - a manifesto. BT Technol. J.
22(4), 253–268 (2004)
2. Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161–1178 (1980)
3. Kort, B., Reilly, R., Picard, R.W.: An affective model of interplay between emotions and
learning: re-engineering educational pedagogy-building a learning companion. In: Proc. -
IEEE Int. Conf. Adv. Learn. Technol. ICALT 2001, pp. 43–46 (2001)
4. Faria, A., Almeida, A., Martins, C., Lobo, C., Gonçalves, R.: Emotional Interaction Model
For Learning. In: INTED 2015 Proc., pp. 7114–7120 (2015)
5. orbe.us | ReKognition - Welcome to Rekognition.com (2015) (Online).
http://rekognition.com/index.php/demo/face. (accessed: 26–Jul–2014)
6. Faria, A.R., Almeida, A., Martins, C., Gonçalves, R.: Emotional adaptive platform for learn-
ing. In: Mascio, T.D., Gennari, R., Vittorini, P., de la Prieta, F. (eds.) Methodologies and In-
telligent Systems for Technology Enhanced Learning. AISC, vol. 374, pp. 9–16. Springer,
Heidelberg (2015)
7. Faria, R., Almeida, A., Martins, C., Gonçalves, R.: Learning Platform. In: 10th Iberian
Conference on Information Systems and Technologies – CISTI 2015 (2015)
Ambient Intelligence:
Experiments on Sustainability Awareness

Fábio Silva() and Cesar Analide

Algoritmi Centre, University of Minho, Braga, Portugal

{fabiosilva,analide}@uminho.pt

Abstract. Computer systems are designed to help solve problems presented to

our society. New terms such as computational sustainability and internet of
things present new fields where traditional information systems are being ap-
plied and implemented on the environment to maximize data output and our
ability to understand how to improve them. The advancement of richer and in-
terconnected devices has created opportunities to gather new data sources from
the environment and use it together with other pre-existent information in new
reasoning processes. This work describes a sensorial platform designed to help
raise awareness towards sustainability and energy efficient systems by explor-
ing the concepts of ambient intelligence and fusion of data to create monitoring
and assessment systems. The presented platform embodies the effort to raise
awareness of user actions on their impact towards their sustainability objectives.

Keywords: Ambient intelligence · Pervasive systems · Sustainability · Energy

efficiency

1 Introduction

The advent of computer science and its evolution led to the availability of computa-
tional resources that can better assess and execute more complex reasoning and moni-
toring of sustainability attributes. This led to the creation of the field of computational
sustainability (Gomes, 2011). Coupled with sustainability is energy efficiency which
is directly affected by human behaviour and social aspects such as human comfort.
Fundamentally, efficiency deals with the best strategy to obtain the objectives that are
set, however, when the concept of sustainability is added, several efficient plans
might be deemed unsustainable because they cannot be maintained in the future.
While efficiency is focused on optimization, sustainability is mostly concerned on
restrictions put in place to ensure that the devised solution does not impair the future.
Not only, context hardens the problem but also the possibility of missing information
which might occur due to same unforeseen event that jeopardizes an efficient solu-
tion. To tackle such event, computational systems are able to maintain sensory net-
works over physical environments to acquire contextual information so it can validate
the conditions for efficient planning but also acquire information and, as a last resort,
act upon the physical environment.

© Springer International Publishing Switzerland 2015

F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 33–38, 2015.
DOI: 10.1007/978-3-319-23485-4_4
34 F. Silva and C. Analide

2 Related Work

The term computational sustainability is used by researchers such as Carla Gomes

(Gomes, 2011) to define the research field where sustainability problems are ad-
dressed by computer science programs and models in order to balance the three
dimensions of sustainability: economic, ecologic and social dimensions.
It is accepted that the world ecosystem is a complex sustainability problem,
affected by human and non-human actions. Despite the use of statistical and mathe-
matical models for the study of sustainability and computational models to address
problems of environmental and societal sustainability, the term computational sus-
tainability appeared around 2008. Nevertheless, the pairing between computer science
and the study of sustainability is as old as the awareness of sustainability and as long
as computing was available. It is a fact that, as computational power capacity
increased over time, so did the complexity and length of the models used to study
sustainability. The advent and general availability of modern techniques from artifi-
cial intelligence and machine learning allowed better approaches to the study of each
dimension of sustainability and their overall impact for sustainability.
The types of sensors used in the environment may be divided into categories to better
explain their purpose. In terms of sensing the environment, sensors can be dived into
sensors that sense the environment or users and their activities. Generally, an ambi-
ent might be divided in sensors and actuators. Sensors monitor the environment and
gather data useful for cognitive and reasoning process. On the other hand, actuators
take action upon the environment performing actions such as thermostats to control
the temperature, lightning switches or other appliances.
Different methodologies and procedures exist to keep track of human activities and
to make prediction based on previous and current information gathered in these envi-
ronments. Common approaches with machine learning techniques involve the use of
neural networks, classification techniques, fuzzy logic, sequence discovery, instance
based learning and reinforced learning as in (Costa, Novais &, Simões, 2014).
An approach to this problem using fuzzy logic algorithms is proposed by Hagras et
al (Hagras, Doctor, Callaghan, & Lopez, 2007). Sequence discovery approach is at the
heart of learning algorithm in (Aztiria, Augusto, Basagoiti, Izaguirre, & Cook, 2012),
which demonstrates a system that can learn user behavioural patterns and take proac-
tive measures accordingly.
The work presented considers the use of these types of sensor to assess and reason
about sustainability and indicator design. This information will then be used to reason
about user behaviour and their accountability.

3 Platform Engine

The focus of this project is, more than developing new procedures or algorithms to
solving problems, putting these innovations on the hand of the user, with a clear pur-
pose: that these innovative tools should be guided to assist people in the context of
energetic sustainability.
Ambient Intelligence: Experiments on Sustainability Awareness 35

3.1 Network Design

The PHESS platform supports heterogeneous devices by implementing middleware
upon groups of devices to control data and information acquisition. Local central
servers are viewed as decentralized by the platforms which access them to obtain data
and implement their plans through the local network actuators.

Fig. 1. Generic PHESS platform configuration.

Figure 1, details a generic composition of two different environment scenarios with

users and their connection to PHESS platform. This residential central node is respon-
sible for the middleware to connect local sensor networks to the PHESS platform
using dedicated protocols for data acquisition and storing information.
The data gathered is summarized locally according to time and user presence mod-
els and synchronized with the central PHESS platform. It is also responsible for creat-
ing different user and environment profiles. Notifications are generated by the central
PHESS platform to the project webpage or mobile application.

3.2 Data Fusion

The process of data fusion is handled by local central nodes where data is submitted
to data fusion process according to the number of overlapping and complementing
sensors. In this regard, there are strategies that can be followed according to the con-
text and nature of the fusion process.
The first one is a weighted average of values, for the same type of sensors in the
same context to get an overview of an attribute with multiple sensors to reduce meas-
urement errors. The weights are defined manually by the local administrator. More
sophisticated fusion is employed with complementary sensors which according to
some logic defined into the system measure an attribute by joining efforts such as user
presence with both RFID reader and wireless connection of personal devices such as
smartphones. The last resource is the use of heterogeneous data to create attributes
with some level of knowledge from the start. An example is the assessment of thermal
comfort using default indicator expressed as mathematical formulae such as the PMV
36 F. Silva and C. Analide

index (Fanger, 1970) for instance. Other application is the definition of sustainable
indicators according to custom mathematical formulae in the platform that shall proc-
ess some attributes in the system to make their calculation.
The configuration of data fusion steps, the selection of sensors and streams of data
is made on the initial step of the system by the local administrator. According to each
area of interest and with specialized knowledge obtained by experts it is possible to
monitor relevant information to build sustainable indicators.

3.3 Sustainability Indicators Generation

Indicators evaluate sustainability in terms of three main groups, namely economic,
environmental and social. However, due to their impact, some indicators can be de-
signed to influence more than one dimension of sustainability. For this reason it was
chosen to have these indicators as the general analysis of sustainable principles in-
stead of sustainability dimensions. In order to be directly comparable indicators are
defined to use the same scale, and are based on the notion of positive and negative
impact. The values of each indicator range from -1 to 1 and can be interpreted as un-
sustainable for values below zero and sustainable from there upwards.
Indicator definition is another configurable space inside the PHESS platform where
monitoring indicators are defined using values from sensors and sensor fusion, and
customized with mathematical formulas. All these indicators are calculated either
locally, i.e., in a room basis, or they are evaluated for the entire setting which sums
assessment of all different rooms. In this way, even if the environment is considered
sustainable, the user may still assess changes in premises with low supportable stan-
dards. Environments are generally hosts to of many different users, which influence it
with their behaviours, actions and habits. Tracking user activities is something that
can be used to infer and establish cause and effect relationships. The PHESS system
uses different dynamics to produce accountability reports on user actions based on
environment and personal monitoring coupled user presence detection. Areas unco-
pied, are considered the responsibility of all people present in the environment, so that
the coverage of the entire environment is assured by its occupants. In cases where the
local context and local sensor values are indistinguishable based on location them,
user accountability takes in consideration only user presence in the environment. The
richer the environment is in sensor data acquisition the richer results and analysis is.

4 Case Study and Results

As a case study, results from five days in an environment are presented. In this case a
home environment with a limited set of sensors, and a smartphone as a user detection
mechanism. User notifications are made by actuator modules which push notifications
to users in order to alert them based on notification schemes and personal rules. Sen-
sors include electrical consumption, temperature, humidity, luminosity and presence
sensing through smartphones and an indicator based on the sensation of temperature
PMV used in thermal comfort studies (Rana, Kusy, Jurdak, Wall, & Hu, 2013).
Ambient Intelligence: Experiments on Sustainability Awareness 37

The indicators are designed in the platform in order to perceive energy efficacy and
as such the case scenario uses electricity to do this analysis. Therefore, a list of sam-
ple indicator was defined using data fusion available through PHESS modules. A
sample of four indicators were defined and their expression is as follows:

• Unoccupied consumption – measures the deviation of consumption when no user is

present in the environment from a user inputted objective.
• Activity based consumption - measures the deviation of consumption during the
period of 1 hour from the objective value set by the user, in this case;
• Total consumption: measures the deviation of total consumption during the period
of a day based on a default value defined for consumption;
• Comfort Temperature – based on the PMV comfort indicator obtained through data
fusion process which is calculated by PHESS platform;
• Comfort Humidity – based on comfort values that define the normal range of val-
ues humidity in indoor environments.

Fig. 2. Indicator values from the PHESS platform

As seen in figure 3, with the graphical representation of the indicators it is possible

to analyse the behaviour of an environment based on user inputted indicators. These
indicators are represented in the scale -1 to 1 as stated in section 3.4. While comfort
values are being respected, consumption based indicator shows that the values set for
the system are not yet being followed. As a result, indicators towards consumption
analysis are performing below the accepted margin.
Accountability is based on the notion of user impact on the system. Results indi-
cate a higher consumption when the environment is not occupied which demonstrates
that the environment configuration has more impact than user actions. The person the
longest in the environment each day has more impact towards the total consumption
which is inputted to user actions.
This analysis allows the system to identify people with most impact on the system
based on the attributes and indicators defined. There is the need to adapt each indica-
tor set to the objectives and the areas to improve, but the generic platform allows for
this configuration based on the layers of the PHESS platform.
38 F. Silva and C. Analide

5 Conclusions

Computational sustainability, although a new and interesting topic of research to the

academic community, still presents a number of difficult challenges. The platform
presented is the combination of modules which take inspiration from ambient intelli-
gence and information systems to provide analysis and assessment of environment
and their users, to identify and provide real time analysis of concepts based on sus-
tainability and efficiency. Results indicate that modest configurations can yield mean-
ingful results that may be used to take actions on the environment and user behaviour.
The process of data fusion and indicator design are responsible to thoroughly analyse
key situation of environment and user behaviour so they expression appears meaning-
ful to produce not only user reports but also user suggestions.
With the mainstream use of interconnected appliances the possibilities for auto-
matic actuation are increasing, mainly with the new internet f things standards being
proposed by major companies. The PHESS project aim to increase its support by
adding new features to their middleware layer and allow actuation in conjunction with
sensorization. Moreover, it is expected to support multi environments for each user,
increasing accountability to user behaviours regardless of location.

Acknowledgements. This work is part-funded by ERDF - European Regional Development

Fund through the COMPETE Programme (operational programme for competitiveness) and by
National Funds through the FCT (Portuguese Foundation for Science and Technology) within
project FCOMP-01-0124-FEDER-028980 (PTDC/EEI-SII/1386/2012) and project PEst-
OE/EEI/UI0752/2014.. Additionally, it is also supported by a doctoral grant, with the reference
SFRH/BD/78713/2011, issued by FCT.

References
1. Aztiria, A., Augusto, J.C., Basagoiti, R., Izaguirre, A., Cook, D.J.: Discovering frequent
user-environment interactions in intelligent environments. Personal and Ubiquitous
Computing 16(1), 91–103 (2012)
2. Fanger, P.O.: Thermal comfort: Analysis and applications in environmental engineering.
Danish Technical Press (1970)
3. Gomes, C.P.: Computational sustainability. In: Gama, J., Bradley, E., Hollmén, J. (eds.)
IDA 2011. LNCS, vol. 7014, p. 8. Springer, Heidelberg (2011).
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.158.2293&rep=rep1&type=pdf
4. Hagras, H., Doctor, F., Callaghan, V., Lopez, A.: An Incremental Adaptive Life Long
Learning Approach for Type-2 Fuzzy Embedded Agents in Ambient Intelligent Environ-
ments. IEEE Transactions on Fuzzy Systems 15(1), 41–55 (2007)
5. Rana, R., Kusy, B., Jurdak, R., Wall, J., Hu, W.: Feasibility analysis of using humidex as an
indoor thermal comfort predictor. Energy and Buildings 64, 17–25 (2013).
doi:10.1016/j.enbuild.2013.04.019
6. Costa, A., Novais, P., Simões, R.: A Caregiver Support Platform within the Scope of an
AAL Ecosystem. Sensors 14(3), 654–5676 (2014). MDPI AG, ISSN: 1424-8220
Artificial Intelligence in Medicine
Reasoning with Uncertainty
in Biomedical Models

Andrea Franco, Marco Correia, and Jorge Cruz(B)

NOVA Laboratory for Computer Science and Informatics,

DI/FCT/UNL, Caparica, Portugal
[email protected]

Abstract. The use of mathematical models in biomedical research

largely developed in the second half of the 20th century. However, their
translation to clinically useful tools has proved challenging. Reasoning
with deep biomedical models is computationally demanding as parame-
ters are typically subject to nonlinear relations, dynamic behavior, and
uncertainty. This paper proposes a new approach for assessing the reli-
ability of the conclusions drawn from these models given the underly-
ing uncertainty. It relies on probabilistic constraint programming for a
sound propagation of uncertainty from model parameters to results. The
advantages of the approach are illustrated on an important problem in
the obesity research ﬁeld, namely the estimation of free-living energy
intake in humans. Based on a well known energy intake model, our app-
roach is able to correctly characterize the provided estimates given the
uncertainty inherent to the model parameters.

Keywords: Biomedical models · Constraint programming · Energy

intake

1 Introduction
Mathematical models are extensively used in many biomedical domains for sup-
porting rational decisions. A mathematical model describes a system by a set
of variables and constraints that establish relations between them. Uncertainty
and nonlinearity play a major role in modeling most real-world continuous sys-
tems. A competitive framework for decision support in continuous domains must
provide an expressive mathematical model to represent the system behavior and
be able to perform sound reasoning that accounts for the uncertainty and the
eﬀect of nonlinearity.
Given the uncertainty, there are two opposite attitudes for reasoning with
scenarios consistent with the mathematical model. Stochastic approaches [1]
reason on approximations of the most likely scenarios. They associate a proba-
bilistic model to the problem thus characterizing the likelihood of the diﬀerent
scenarios. In contrast, constraint programming approaches [2] reason on safe
enclosures of all consistent scenarios. Rather than associate approximate values

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 41–53, 2015.
DOI: 10.1007/978-3-319-23485-4 5
42 A. Franco et al.

to real variables, intervals are used to include all their possible values. Model-
based reasoning and what-if scenarios are adequately supported through safe
constraint propagation techniques, which only eliminate combinations of values
that definitely do not satisfy model constraints.
In this work we use a probabilistic constraint approach that combines a
stochastic representation of uncertainty on the parameter values with a reliable
constraint framework robust to nonlinearity. Similarly to stochastic approaches
it associates an explicit probabilistic model to the problem, and similarly to
constraint approaches it assumes reliable bounds for the model parameters. The
approach computes conditional probability distributions of the model parame-
ters, given the uncertainty and the constraints.
The potential of our approach to support clinical practice is illustrated in
a real world problem from the obesity research field. The impact of obesity on
health, at both individual and public levels, is widely documented [3–5]. Despite
this fact, and the availability of nutritional recommendations and guidelines to
the general audience, the prevalence of overweight and obesity in adults and
children increased dramatically in the last 30 years [6]. According to the World’s
Health Organization, the main cause for the “obesity pandemics” is the energy
unbalance caused by an increased calorie intake associated to a lower energy
expenditure as a result of a sedentary lifestyle.
Many biomedical models use the energy balance approach to simulate indi-
vidual body weight dynamics, e.g. [7,8]. Change of body weight over time is
modeled as the rate of energy stored (or lost), which is a function of the energy
intake (from food) and the energy expended. However, the exact amount of calo-
ries ingested, or energy intake, is difficult to ascertain as it is usually obtained
through methods that underestimate its real value, such as self-reported diet
records [9].
The inability to rigorously assess the energy intake is considered by [10]
the “fundamental flaw in obesity research”. This fact hinders the success and
adherence to individual weight control interventions [11]. Therefore the correct
evaluation of such interventions will be highly dependent on the precision of
energy intake estimates and the assessment of the uncertainty inherent to those
estimates. In this paper we show how the probabilistic constraint framework can
be used in clinical practice to correctly characterize such uncertainty given the
uncertainty of the underlying biomedical model.
Next section overviews the energy intake problem and introduces a biomedi-
cal model used in clinical practice. Section 3 addresses constraint programming
and its extensions to differential equations and probabilistic reasoning. Section
4 shows how the problem is cast into the probabilistic constraint framework.
Section 5 discusses the experimental results and the last section summarizes the
main conclusions.

2 Energy Intake Problem

The mathematical models that predict weight change in humans are usually
based on the energy balance equation:
Reasoning with Uncertainty in Biomedical Models 43

R=I −E (1)
where R is the energy stored or lost (kcal/d), I is the energy intake (kcal/d) and
E is the energy expended (kcal/d).
Several models have been applied to provide estimates of individual energy
intake [12,13]. Our paper focus on the work of [12] which developed a compu-
tational model to determine individual energy intake during weight loss. This
model, herein designated EI model, calculates the energy intake based on the
following diﬀerential equation:
dF dF F
cf + cl = I − (DIT + P A + RM R + SP A) (2)
dt dt
The left hand side of equation (2) represents the change in body’s energy
stores, R in equation (1), and is modeled through the weighted sum of the
changes in Fat mass (F ) - the body’s long term energy storage mechanism - and
Fat Free mass (F F ) - proxy for protein content used for energy purposes.
Diﬀerently from other models, that express the relationship between F and
F F using logarithmic eq. (3) [14], or linear approximations [15], the EI model
uses a fourth-order polynomial for estimating F F as a function of F , the age of
the subject a, and its height h eq. (4),

F F log (F ) = d0 + d1 log F (3)

poly
2 3 4

FF (F, a, h) = c0 + c1 F + c2 F + c3 F + c4 F (c5 + c6 a) (c7 + c8 h)(4)

The rate of energy expended, E in equation (1), is the total amount of energy
spent in several physiological processes: Diet Induced Thermogenesis (DIT ) -
energy required to digest and absorb food; Physical Activity (P A) - energy
spent in volitional activities; Resting Metabolic Rate (RM R) - minimal amount
of energy used to sustain life and; Spontaneous Physical Activity (SP A) - energy
spent in spontaneous activities.
The EI model uses data from the 24-week CALERIE phase I study [16], in
particular body weight for one female subject of the caloric restriction group.
During the experiment, participants had their weight monitored every two weeks.
Those weight measures are used to estimate the real energy intake for that
particular individual.

3 Constraint Programming
A constraint satisfaction problem is a classical artiﬁcial intelligence paradigm
characterized by a set of variables and a set of constraints that specify relations
among subsets of these variables. Solutions are assignments of values to all vari-
ables that satisfy all the constraints. Constraint programming [2] is a form of
declarative programming which must provide a set of constraint reasoning algo-
rithms that take advantage of constraints to reduce the search space, avoiding
44 A. Franco et al.

regions inconsistent with the constraints. These algorithms are supported by

specialized techniques that explore the specificity of the constraint model such
as the domain of its variables and the structure of its constraints.
Continuous constraint programming [17,18] has been widely used to model
safe reasoning in applications where uncertainty on the values of the variables is
modeled by intervals including all their possibilities. A Continuous Constraint
Satisfaction Problem (CCSP) is a tripleX, D, C where X is a tuple of n real
variables x1 , · · · , xn , D is a Cartesian product of intervals D(x1 ) × · · · × D(xn )
(a box), where each D(xi ) is the domain of variable xi and C is a set of numerical
constraints (equations or inequalities) on subsets of the variables in X. A solution
of the CCSP is a value assignment to all variables satisfying all the constraints
in C. The feasible space F is the set of all CCSP solutions within D.
Continuous constraint reasoning relies on branch-and-prune algorithms [19]
to obtain sets of boxes that cover exact solutions for the constraints (the feasible
space F ). These algorithms begin with an initial crude cover of the feasible space
(the initial search space, D) which is recursively refined by interleaving pruning
and branching steps until a stopping criterion is satisfied. The branching step
splits a box from the covering into sub-boxes (usually two). The pruning step
either eliminates a box from the covering or reduces it into a smaller (or equal)
box maintaining all the exact solutions. Pruning is achieved through an algo-
rithm that combines constraint propagation and consistency techniques based
on interval analysis methods [20].
In the biomedical context, constraint technology seems to have the poten-
tial to bridge the gap between theory and practice. The declarative nature
of constraints makes them an adequate tool for the explicit representation of
any kind of domain knowledge, including “deep” biophysical modeling. The
constraint propagation techniques provide sound methods, with respect to the
underlying model, that can be used to support practical tasks (e.g. diagno-
sis/prognosis may be supported through propagation on data about the patient
symptoms/diseases). In particular, the continuous constraint framework seems
to be the most adequate for representing the nonlinear relations on continuous
variables, often present in biophysical models. Additionally, the uncertainty of
biophysical phenomena may be explicitly represented as intervals of possible
values and handled through constraint propagation.
However, the direct application of classical constraint programming to
biomedical models suffers from two major pitfalls. System dynamics which is
often modeled through differential equations cannot be explicitly represented
by these approaches and integrated within the constraint model. Moreover, the
interval representation of uncertainty may be too conservative and inadequate
to distinguish between consistent scenarios based on their likelihood which may
be crucial to the development of effective tools. This work is based on exten-
sions to constraint programming for handling both problems and provide sound
propagation of uncertainty from model parameters to results.
Reasoning with Uncertainty in Biomedical Models 45

3.1 Diﬀerential Equations

The behavior of many systems is naturally modeled by a system of ﬁrst order
Ordinary Diﬀerential Equations (ODEs), often parametric. ODEs are equations
that involve derivatives with respect to a single independent variable, t, usually
representing time. A parametric ODE system, with parameters p, represented
in vector notation as:

y = f (p, y, t) (5)
is a restriction on the sequence of values that y can take over t. A solution, for
a time interval T , is a function that satisfies equation (5) for all values of t ∈ T .
Since (5) does not fully determine a single solution (but rather a family of
solutions), initial conditions are usually provided with a complete specification
of y at some time point t. An Initial Value Problem (IVP) is characterized by
an ODE system together with the initial condition y(t0 ) = y0 . A solution of the
IVP with respect to an interval of time T is the unique function that is a solution
of (5) and satisfies the initial condition.
Parametric ODEs are expressive mathematical means to model system
dynamics. Notwithstanding its expressive power, reasoning with such models
may be quite difficult, given their complexity. Analytical solutions are available
only for the simplest models. Alternative numerical simulations require precise
numerical values for the parameters involved, often impossible to gather given
the uncertainty on available data. This may be an important drawback since
small differences on input values may cause important differences on the output
produced.
Interval methods for solving differential equations with initial conditions [20]
do verify the existence of unique solutions and produce guaranteed error bounds
for the solution trajectory along an interval of time T . They use interval arith-
metic to compute safe enclosures for the trajectory, explicitly keeping the error
term within safe bounds.
Several extensions to constraint programming [21] were proposed for handling
differential equations based on interval methods for solving IVPs. An approach
that integrates other conditions of interest was proposed in [22] and successfully
applied to support safe decisions based on deep biomedical models [23].
In this paper we use an approach similar to [21] that allows the integration of
IVPs with the standard numerical constraints. The idea is to consider an IVP as
a function Φ where the first argument are the parameters p, the second argument
is the initial condition that must be verified at time point t0 (third argument)
and the last argument is a time point t ∈ T . A relation between the values at
two time points t0 and t1 along the trajectory is represented by the equation:

y(t1 ) = Φ (p, y(t0 ), t0 , t1 ) (6)

Using variables x0 and x1 to represent y(t0 ) and y(t1 ), equation (6) is inte-
grated into the CCSP as a constraint x1 = Φ (p, x0 , t0 , t1 ) with specialized con-
straint propagators to safely prune both variable domains based on a validated
solver for IVPs [24].
46 A. Franco et al.

3.2 Probabilistic Constraint Programming

In classical CCSPs, uncertainty is modeled by intervals that represent the
domains of the variables. Constraint reasoning reduces uncertainty providing
a safe method for computing a set of boxes enclosing the feasible space. Nev-
ertheless this paradigm cannot distinguish between diﬀerent scenarios and all
combination of values within such enclosure are considered equally plausible. In
this work we use probabilistic constraint programming [25] that extends the con-
tinuous constraint framework with probabilistic reasoning, allowing to further
characterize uncertainty with probability distributions over the domains of the
variables.
In the continuous case, the usual method for specifying a probabilistic model
assumes, either explicitly or implicitly, a full joint probability density function
(p.d.f.) over the considered random variables, which assigns a probability mea-
sure to each point of the sample space Ω. The probability of an event H, given
a p.d.f. f , is its multidimensional integral on the region deﬁned by the event:
ˆ
P (H) = f (x)dx (7)
H

The idea of probabilistic constraint programming is to associate a probabilis-

tic space to the classical CCSP by defining an appropriate density function. A
probabilistic constraint space (PC) is a pair X, D, C ´ , f , where X, D, C is
a CCSP and f is a p.d.f. defined in Ω ⊇ D such that: Ω f (x)dx = 1.
A constraint (or set of constraints) can be viewed as an event H whose prob-
ability can be computed by integrating the density function f over its feasible
space as in equation (7). In general these multidimensional integrals cannot be
easily computed, since they may have no closed-form solution and the event may
establish a complex nonlinear integration boundary. The probabilistic constraint
framework relies on continuous constraint reasoning to get a tight box cover of
the region of integration H and compute the overall integral by summing up the
contributions of each box in the cover. Generic quadrature methods are used to
evaluate the integral at each box.
In this work Monte Carlo methods [26] are used to estimate the value of
the definite multidimensional integrals at each box. As long as the function is
reasonably well behaved, the integral can be estimated by randomly selecting
N points in the multidimensional space and averaging the function values at
these points. Consider N random sample points x1 , . . . , xN uniformly distributed
inside a box B. The contribution of this box to the overall integral on the region
of integration H is approximated by:
ˆ N
i=1 1H (xi )f (xi )
f (x)dx ≈ vol(B) (8)
B∩H N
where 1H is the indicator function1 of H. This method displays √1N convergence,
i.e., by quadrupling the number of sampled points the error is halved, regardless
of the number of dimensions.
1
1H (xi ) returns 1 if xi ∈ H and 0 otherwise.
Reasoning with Uncertainty in Biomedical Models 47

The advantages from this close collaboration between constraint pruning and
random sampling were previously illustrated in ocean color remote sensing stud-
ies [27] where this approach achieved quite accurate results even with small
sampling rates. The success of this technique relies on the reduction of the sam-
pling space where a pure non-naive Monte Carlo (adaptive) method is not only
hard to tune but also impractical in small error settings.

4 Probabilistic Constraints for Solving the EI Problem

Let t be the number of days since the beginning of treatment of a given subject,
F (t) the Fat Mass at time t, w (t) the weight observed at time t, and I the
subject’s energy intake, which is assumed to be a constant parameter between
consecutive observations [12]. The energy balance equation and total body mass
are related through the model:

F (t) = g (I, F (t) , t) (9)

w (t) = F F (a, h, F (t)) + F (t) (10)

where g is obtained by solving equation (2) with respect to F (t).

4.1 CCSP Model

Let i ∈ {0, . . . , n} denote the i’th observation since beginning of treatment,
occurred at time ti , and let Fi and wi be respectively the fat mass and the weight
of the patient at time ti (with t0 = 0). The EI modelmay be formalized as a
n
CCSP X, IR2n+1 , C with a set of variables X = {F0 } i=1 {Fi , Ii } representing
the fat mass Fi at each observation and the energy intake Ii betweenconsecutive
n
observations (at ti−1 and ti ), and a set of constraints C = {b0 } i=1 {ai , bi }
enforcing eqs. (9, 10):

ai ≡ [Fi = Φ (Ii , Fi−1 , ti−1 , ti )]

bi ≡ [wi = F F (a, h, Fi ) + Fi ]

Recall that solving the above CCSP means ﬁnding the values for F0 and the
variables Fi , Ii (1 ≤ i ≤ n) that satisfy the above set of constraints.

4.2 Probabilistic CCSP Model

Uncertainty inherent to F F estimation may be integrated into the above CCSP
model by considering that the true value of F F is the model given F F M plus
an associated error term i ∼ N (μ = 0, σ ),

F F (a, h, Fi ) = F F M (a, h, Fi ) + i

and we may rewrite the set of bi constraints of the CCSP model as follows,

bi ≡ wi = F F M (a, h, Fi ) + i + Fi
48 A. Franco et al.

Additionally, to keep the errors within reasonable bounds, bounding con-

straints are considered for each observation: 3σ ≤ i ≤ 3σ , thus ignoring
assignments whose contribution to the total error is less than 0.1%.
Note that a solution to the new (probabilistic) CCSP, i.e. an assignment
of values to F0 and the variables Fi , Ii (1 ≤ i ≤ n), determines the possible
combinations of values for the errors 0 , . . . , n .
If we assume that the F F model errors over the n + 1 distinct observations
are independent, then each solution has an associated probability density value
given by the joint p.d.f. f ,
n
f (0 , . . . , n ) = fi (i ) (11)
i=0

where fi is the normal distribution associated with the error i .

Instead of considering independence between model errors from consecutive
observations, a more realistic alternative, explicitely represents the deviation
between error i and the previous error i−1 as a normally distributed random
variable δi ∼ N (μ = 0, σδ ), resulting in the following joint p.d.f. f ,
n n
f (0 , . . . , n , δ1 , . . . , δn ) = fi (i ) hi (δi ) (12)
i=0 i=1

where fi and hi are the normal distributions associated with the errors i and
δi respectively. The deviations are introduced in the model by considering con-
straints δi = i − i−1 (1 ≤ i ≤ n) determining their values from the i values.
A naive approximate algorithm for solving both alternative CCSP models
could be simply to perform Monte Carlo sampling in the space deﬁned by
D (Fj ) × D (I1 ) × . . . × D (In ), with j ∈ {1, . . . , n}. Note that, given the con-
straints in the model, each sampled point determines the values of all variables
Fi and i (and δi ). From the values assigned to i (and δi ), eq. 11 (or 12) can be
used to compute an estimate of its probability, as shown in (8).
With this approach, accurate results are hard to obtain for increasing number
n+1
of observations due to the huge size of sampling space O |D| . Instead, we
developed an improved technique that is able to drastically reduce both the
exponent n and the base |D| of this expression, as described in the following
section.

4.3 Method

The main idea is to avoid considering all variables simultaneously but instead to
reason only with a small subset that changes incrementally over time. For each
observation i, we can compute the probability distributions of the variables of
interest given the past knowledge already accumulated.
We start by computing the probability distribution of F0 given the initial
weight w0 subject to the constraint b0 and the bounding constraints for 0 . This
Reasoning with Uncertainty in Biomedical Models 49

distribution, denoted P (F0 ), is discretized on a grid over D (F0 ) computed

through probabilistic constraint reasoning integrated with Monte Carlo sampling
as described in section 3.2. Speciﬁcally, given a point Ḟ0 sampled from D (F0 ),
value ˙0 is determined by the constraint b0 , and its p.d.f. value is f (Ḟ0 ) = f0 (˙0 ).
The joint probability of F1 , I1 is computed by considering the constraints
associated with observation 1, the observed weight w1 , and P (F0 ). The method
is similar: the grid P (F1 , I1 ), discretized over D (F1 )×D (I1 ) is computed using
probabilistic constraint reasoning; and Monte Carlo sampling is performed over
this space region.
Given a point (F˙1 , I˙1 ), sampled from D (F1 )×D (I1 ), the values Ḟ0 and ˙1 are
determined by constraints a1 and b1 , and accordingly to equation (11), its p.d.f.
value should be f (˙0 , ˙1 ) = f0 (˙0 ) f1 (˙1 ). However, we replace the computation
of f0 (˙0 ) with the value of the discretized probability P (Ḟ0 ) computed in the
previous step providing an approximation that converges to the correct value
when the number of grid subdivisions goes to inﬁnity: f (F˙1 , I˙1 ) ≈ P (F˙0 )f1 (˙1 ).
If the alternative equation (12) is used, δ̇1 is also computed from the constraints
and the respective p.d.f. approximation is: f (F˙1 , I˙1 ) ≈ P (F˙0 )f1 (˙1 )h1 (δ̇1 ).
Finally, the computed P (F1 , I1 ) is marginalized to obtain P (F1 ), and the
process is iterated for the remaining observations.

5 Experimental Results
This section demonstrates how to the previously described method may be used
to improve the applicability of the EI model by complementing its predictions
with measures of confidence. The algorithm was implemented in C++ and used
for obtaining the probability distribution approximations P (Fi , Ii ) at each
observation i ∈ {1, . . . , 12} of a 45 year-old woman over the course of the 24-
week trial (CALERIE Study phase I). The runtime was about 2 minutes per
observation on an Intel Core i7 @ 2.4 GHz.
Fat Free mass is estimated using two distinct models: F F poly (eq. 4), and
F F log (eq. 3). Both of these models were initially fit to a set of 7278 north
american women resulting in the corresponding standard deviation of the error,
σpoly = 3.35 and σlog = 5.04. This data set was collected during NHANES
surveys (1994 to 2004) and is available online at the Centers for Disease Control
and Prevention website [28].
We considered also different assumptions regarding independence of the error:
the uncorrelated error model (11), and a correlated error model (12) with a small
σδ = 0.5. Note that, due to current data access restrictions, this latter value is
purely illustrative.
The following techniques could be used for assessing propagation of uncer-
tainty.

5.1 Joint Probability Distributions

The direct visual inspection of the joint probability distributions of Fi and Ii
conveys important information about the relation between these parameters.
50 A. Franco et al.

Fig. 1. Probabilities of fat mass (F ) and intake (I) on the first clinical observation
(t = 14). Top and bottom rows shows results for different F F models. Left and right
columns correspond to different assumptions regarding independence of model errors.

In figure 1 we plot the obtained results regarding the first observation i = 1 for
each combination of F F model and error correlation. The following is apparent
from these plots: a) Uncertainty on F is positively correlated with uncertainty
on I; b) The assumption of independence between model errors on consecutive
weeks drastically affects the predicted marginal distribution of I (compare hor-
izontally); c) The improved accuracy of F F poly model (note that σpoly < σlog )
reflects in slightly sharper F estimates, but does not seem to impact the estima-
tion of I (compare vertically).

5.2 Marginal Probability Distributions with Conﬁdence Intervals

To perceive the eﬀect of the uncertainty on the estimated variables over time,
it is useful to marginalize the computed joint probability distributions. Figure
2 shows the estimated Fi , and Ii over time, for each of the error correlation
assumptions. Since the results concerning the F F poly model are very similar to
those obtained for F F log , and for space economy reasons, we focus only on the
former.
Each box in these plots depicts the most probable value (marked in the center
of the box), the union hull of the 50% most probable values (the rectangle), and
the union hull of the 82% most probable values (the whiskers). Additionally,
each plot overlays the estimates obtained from the algorithm published by the
author of the EI model [7].
Reasoning with Uncertainty in Biomedical Models 51

The presented results show that the previous conclusions for the case of i = 1
extend for all remaining observations. Additionally, an interesting phenomena
occurs in the case of correlated error: the uncertainty in the estimation of F
decreases slightly over time. This is most probably the consequence of having,
at each new observation, an increasingly constrained problem for which the size
of the solution space is consequently increasingly smaller. At least in the case of
F , more information seems to lead to signicatively better estimations. This can
not occur if the errors are independent, as is indeed conﬁrmed in the plots.
Finally, our results show that in some cases the most probable values obtained
by [7] are crude approximations to their own proposed model.

Fig. 2. Most probable intervals for the values of I (top row) and F (bottom row) over
time using the F poly model. Left and right columns correspond to diﬀerent assumptions
regarding independence of model errors. The continuous line plots estimates obtained
in [12].

5.3 Best-Fit
Although the presented algorithm is primarly intended for characterizing uncer-
tainty in model predictions, it is nevertheless a sound method for obtaining
the predictions themselves. Indeed, as the magnitude of the error in the model
parameters decreases (σ in our example), the obtained joint probability distri-
butions will converge to the correct solution of the model.
52 A. Franco et al.

6 Conclusions
The standard practice for characterizing confidence on the predictions resulting
from a complex model is to perform controlled experiments. In the biomedi-
cal field, this often translates to closely monitoring distinct groups of subjects
over large periods of time, and assessing the fitness of the model statistically.
While the empirical approach has its own advantages, namely that it does not
require a complete understanding of the implications of the individual assump-
tions and approximations made in the model, it has some important shortcom-
ings. Depending on the medical field, controlled experiments are not always
practical, do not convey enough statistical significance, or have associated high
costs.
Contrary to the empirical, black-box approach, this paper proposes to charac-
terize the uncertainty on the model estimates by propagating the errors stemming
from each of its parts. The described technique extends constraint programming
to integrate probabilistic reasoning and constraints modeling dynamic behaviour,
offering a mathematically sound and efficient alternative.
The application field of the presented approach is quite broad: it targets
models which are themselves composed of other, possibly identically complex
(sub)models, for which there is a known characterization of the error. The
selected case-study is a good example: the EI model is a fairly complex model
including dynamic behaviour and nonlinear relations, and integrates various
(sub)models with associated uncertainty. The experimental section illustrated
how different choices for one of these (sub)models, the F F model, impacts the
error of the complete EI model.
Probabilistic constraint programming offers modeling and reasoning capabil-
ities that go beyond the traditional alternatives. This approach has the potential
to bridge the gap between theory and practice by supporting reliable conclusions
from complex biomedical models taking into account the underlying uncertainty.

References
1. Halpern, J.Y.: Reasoning about Uncertainty. MIT Press (2003)
2. Rossi, F., Beek, P.V., Walsh, T. (eds.): Handbook of Constraint Programming.
Foundations of Artiﬁcial Intelligence. Elsevier Science (2006)
3. Swinburn, B.A., et al.: The global obesity pandemic: shaped by global drivers and
local environments. The Lancet 378(9793), 804–814 (2011)
4. Leahy, S., Nolan, A., O’Connell, J., Kenny, R.A.: Obesity in an ageing society
implications for health, physical function and health service utilisation. Technical
report, The Irish Longitudinal Study on Ageing (2014)
5. Lehnert, T., Sonntag, D., Konnopka, A., Heller, S.R., König, H.: Economic costs of
overweight and obesity. Best Pract Res Clin Endoc. Metab. 27(2), 105–115 (2013)
6. Ng, M., et al.: Global, regional, and national prevalence of overweight and obesity in
children and adults during 1980–2013: a systematic analysis for the global burden
of disease study 2013. The Lancet 384(9945), 766–781 (2014)
Reasoning with Uncertainty in Biomedical Models 53

7. Thomas, D., Martin, C., Heymsfield, S., Redman, L., Schoeller, D., Levine, J.:
A simple model predicting individual weight change in humans. J. Biol. Dyn. 5(6),
579–599 (2011)
8. Christiansen, E., Garby, L., Sørensen, T.I.: Quantitative analysis of the energy
requirements for development of obesity. J. Theor. Biol. 234(1), 99–106 (2005)
9. Hill, R., Davies, P.: The validity of self-reported energy intake as determined using
the doubly labelled water technique. Brit. J. Nut. 85, 415–430 (2001)
10. Winkler, J.T.: The fundamental flaw in obesity research. Obesity Reviews 6,
199–202 (2005)
11. Champagne, C.M., et al.: Validity of the remote food photography method for
estimating energy and nutrient intake in near real-time. Obesity 20(4), 891–899
(2012)
12. Thomas, D.M., Schoeller, D.A., Redman, L.M., Martin, C.K., Levine, J.A.,
Heymsfield, S.: A computational model to determine energy intake during weight
loss. Am. J. Clin. Nutr. 92(6), 1326–1331 (2010)
13. Hall, K.D., Chow, C.C.: Estimating changes in free-living energy intake and its
confidence interval. Am. J. Clin. Nutr. 94, 66–74 (2011)
14. Forbes, G.B.: Lean body mass-body fat interrelationships in humans. Nutr. Rev.
45, 225–231 (1987)
15. Thomas, D., Ciesla, A., Levine, J., Stevens, J., Martin, C.: A mathematical model
of weight change with adaptation. Math. Biosci. Eng. 6(4), 873–887 (2009)
16. Redman, L.M., et al.: Effect of calorie restriction with or without exercise on body
composition and fat distribution. J. Clin. Endocrinol. Metab. 92(3), 865–872 (2007)
17. Lhomme, O.: Consistency techniques for numeric CSPs. In: Proc. of the 13th
IJCAI, pp. 232–238 (1993)
18. Benhamou, F., McAllester, D., van Hentenryck, P.: CLP(intervals) revisited. In:
ISLP, pp. 124–138. MIT Press (1994)
19. Hentenryck, P.V., Mcallester, D., Kapur, D.: Solving polynomial systems using a
branch and prune approach. SIAM J. Num. Analysis 34, 797–827 (1997)
20. Moore, R.: Interval Analysis. Prentice-Hall, Englewood Cliffs (1966)
21. Goldsztejn, A., Mullier, O., Eveillard, D., Hosobe, H.: Including ordinary differ-
ential equations based constraints in the standard CP framework. In: Cohen, D.
(ed.) CP 2010. LNCS, vol. 6308, pp. 221–235. Springer, Heidelberg (2010)
22. Cruz, J.: Constraint Reasoning for Differential Models. Frontiers in Artificial Intel-
ligence and Applications, vol.126. IOS Press (2005)
23. Cruz, J., Barahona, P.: Constraint reasoning in deep biomedical models. Artificial
Intelligence in Medicine 34(1), 77–88 (2005)
24. Nedialkov, N.: Vnode-lp a validated solver for initial value problems in ordinary
differential equations. Technical report, McMaster Univ., Hamilton, Canada (2006)
25. Carvalho, E.: Probabilistic Constraint Reasoning. PhD thesis, FCT/UNL (2012)
26. Hammersley, J., Handscomb, D.: Monte Carlo Methods. Methuen London (1964)
27. Carvalho, E., Cruz, J., Barahona, P.: Probabilistic constraints for nonlinear inverse
problems. Constraints 18(3), 344–376 (2013)
28. National health and nutrition examination survey. http://www.cdc.gov/nchs/
nhanes.htm
Smart Environments and Context-Awareness
for Lifestyle Management in a Healthy Active
Ageing Framework*

Davide Bacciu1, Stefano Chessa1,2(), Claudio Gallicchio1, Alessio Micheli1,

Erina Ferro2, Luigi Fortunati2, Filippo Palumbo2, Oberdan Parodi2, Federico Vozzi2,
Sten Hanke3, Johannes Kropf3, and Karl Kreiner4
1
Department of Computer Science, University of Pisa, Largo Pontecorvo 3, Pisa, Italy
2
CNR-ISTI, PISA CNR Research Area, via Moruzzi 1, 56126 Pisa, Italy
[email protected]
3
Health and Environment Department, AIT Austrian Institute of Technology GmbH,
Vienna, Austria
4
Safety and Security Department, AIT Austrian Institute of Technology GmbH,
Vienna, Austria

Abstract. Health trends of elderly in Europe motivate the need for technologi-
cal solutions aimed at preventing the main causes of morbidity and premature
mortality. In this framework, the DOREMI project addresses three important
causes of morbidity and mortality in the elderly by devising an ICT-based home
care services for aging people to contrast cognitive decline, sedentariness and
unhealthy dietary habits. In this paper, we present the general architecture of
DOREMI, focusing on its aspects of human activity recognition and reasoning.

Keywords: Human activity recognition · E-health · Reasoning · Smart

environment

1 Introduction
According to the University College Dublin Institute of Food and Health, three are the
most notable health promotion and disease prevention programs that target the main
causes of morbidity and premature mortality: malnutrition, sedentariness, and cogni-
tive decline, conditions that particularly affect the quality of life of elderly people and
drive to disease progression. These three features represent the target areas in the
DOREMI project. The project vision aims at developing a systemic solution for the
elderly, able to prolong the functional and cognitive capacity by stimulating, and un-
obtrusively monitoring the daily activities according to well-defined “Active Ageing”
lifestyle protocols. The project joins the concept of prevention centered on the elderly,
characterized by a unified vision of being elderly today, namely, a promotion of the
health by a constructive interaction among mind, body, and social engagement.

This work has been funded in the framework of the FP7 project “Decrease of cOgnitive de-
cline, malnutRition and sedEntariness by elderly empowerment in lifestyle Management and
social Inclusion” (DOREMI), contract N.611650.
© Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 54–66, 2015.
DOI: 10.1007/978-3-319-23485-4_6
Smart Environments and Context-Awareness for Lifestyle Management * 55

To fulfill these goals, food intake measurements, exergames associated to social in-
teraction stimulation, and cognitive training programs (cognitive games) will be pro-
posed to an elderly population enrolled during a pilot study. The DOREMI project is
going further with respect to the current state of the art by developing, testing, and
exploiting with a short-term business model impact a set of IT-based (Information
Technology) services able to:

• Stimulate elderly people in modifying dietary needs and physical activity accord-
ing to the changes in age through creative, personalized, and engaging solutions;
• Monitor parameters of the elderly people to support the specialist in the daily veri-
fication of the compliance of the elderly with the prescribed lifestyle protocol, in
accordance with his/her response to physical and cognitive activities.
• Advise the specialist with different types and/or intensities of daily activity for
improving the elderly health, based on the assigned protocol progress assessment.
• Empower aging people by offering them knowledge about food and physical activ-
ity effectiveness, to let them become the main actors of their health.

To reach these objectives, the project builds over interdisciplinary knowledge en-
compassing health and artificial intelligence, the latter covering aspects ranging from
sensing, machine learning, human-machine interfaces, and games. This paper focuses
on the machine learning contribution of the project, which applies to the analysis of
the sensor data with the purpose of identifying users’ conditions (in terms of balance,
calories expenditure, etc.) and activities, detecting changes in the users’ habits, and
reasoning over such data. The ultimate goal of this data analysis is to support the user
who is following the lifestyle protocol prescribed by the specialist, by giving him
feedbacks through an appropriate interface, and by providing the specialist with in-
formation about the user lifestyle. In particular, the paper gives a snapshot of the sta-
tus of the project (which just concluded the first year of activity) in the design of the
activity recognition and reasoning components.

2 Background and State of the Art on Machine Learning

Exploratory data analysis (EDA) analyzes data sets to find their main features [1],
beyond what can be found by formal modeling or hypothesis testing task. When deal-
ing with accelerometer data, features are classified in three categories: time domain,
frequency domain, and spatial domain [2]. In the time domain, we use the standard
deviation in a frame, which is indicative of the acceleration data and the intensity of
the movement during the activity. In the frequency domain, frequency-domain entro-
py helps the distinction of activities with similar energy intensity by comparing their
periodicities. This feature is computed as the information entropy of the normalized
Power Spectral Density (PSD) function of the input signal without including the DC
component (mean value of the waveform). The periodicity feature evaluates the peri-
odicity of the signal that helps to distinguish cyclic and non-cyclic activities. In the
spatial domain, orientation variation is defined by the variation of the gravitational
56 D. Bacciu et al.

components at three axes of the accelerometer sensor. This feature effectively shows
how severe the posture change can be during an activity.
Other EDA tasks of the project concern unsupervised user habits detection aimed at
finding behavioral anomalies, by retrieving heterogeneous and multivariate timeseries
of sensor data, over long periods. In the project, these tasks are unsupervised to avoid
obtrusive data collection campaign at the user site. For this reason, we focus on motif
search on sensory data collected in the test by exploiting the results obtained in the field
of time series motifs discovery [3,4]. Time series motifs are approximately repeated
patterns found within the data. The approach chosen is based on stigmergy. Several
works used this technique in order to infer motifs in time series related to different
fields, from DNA and biological sequences [5,6] to intrusion detection systems [7].
Human activity recognition refers to the process of inferring human activities from
raw sensor data [8], classifying or evaluating specific sections of the continuous sensors
data stream into specific human activities, events or health parameters values. Recently,
the need for adaptive processing of temporal data from potentially large amounts of
sensor data has led to an increasing use of machine learning models in activity recogni-
tion systems (see [9] for a recent survey), especially due to their robustness and flexibili-
ty. Depending on the nature of the treated data, of the specific scenario considered and
of the admissible trade-off among efficiency, flexibility and performance, different su-
pervised machine learning methods have been applied in this area.
Among others, Neural Network for sequences, including Recurrent Neural Net-
works (RNNs) [10], are considered as a class of learning models suitable for ap-
proaching tasks characterized by a sequential/temporal nature, and able to deal with
noisy and heterogeneous input data streams. Within the class of RNNs, the Reservoir
Computing (RC) paradigm [11] in general, and the Echo State Network (ESN) model,
[12,13] in particular, represent an interesting efficient approach to build adaptive non-
linear dynamical systems. The class of ESNs provides predictive models for effi-
ciently learning in sequential/temporal domains from heterogeneous sources of noisy
data, supported by theoretical studies [13,14] and with hundreds of relevant successful
experimental studies reported in literature [15]. Interestingly, ESNs have recently
proved to be particularly suitable for processing noisy information streams originated
by sensor networks, resulting in successful real-world applications in supervised
computational tasks related to AAL (Ambient Assisted Living) and human activity
recognition. This is also testified by some recent results [16,17,18,19,20], which may
be considered as a first preliminary experimental assessment of the feasibility of ESN
to the estimation of some relevant target human parameters, although obtained on
different and broader AAL benchmarks.
At the reasoning level, our interest is for hybrid approachs founded on static rules
and probabilistic methods. Multiple‐stage decisions refer to decision tasks that consist
of a series of interdependent stages leading towards a final resolution. The deci-
sion‐maker must decide at each stage what action to take next in order to optimize
performance (usually utility). Some examples of this sort are working towards a de-
gree, troubleshooting, medical treatment, budgeting, etc. Decision trees are a useful
mean for representing and analyzing multiple-stage decision tasks; they support deci-
sions learned from data, and their terminal nodes represent possible consequences [21].
Smart Environments and Context-Awareness for Lifestyle Management * 57

Other popular approaches, which have been used to implement medical expert sys-
tems, are Bayesian Networks [22] and Neural Networks [23], but they require many
empirical data to train the algorithms and are not appropriate to be manually adjusted.
On the other hand, in our problem the decision process must be transparent and main-
ly requires static rules based on medical guidelines provided by the professionals.
Thus, the decision trees are the best solution since they provide a very structured and
easy to understand graphical representation. There also exist efficient and powerful
algorithms for automated learning of the trees [24,25,26]. A decision tree is a flow-
chart-like structure in which an internal node represents the test on an attribute, each
branch represents a test outcome and each leaf node represents a class label (decision
taken after computing all attributes). A path from root to leaf represents classification
rules. Decision trees give a simple representation for classifying examples. In general,
as for all machine learning algorithms, the accuracy of the algorithms increases with
the number of sample data. In applications in which the number of samples is not
large, a high number of decisions could lead to problems. In these cases, a possible
solution is the use of a Hybrid Decision Tree/Genetic Algorithm approach as
suggested in [27].

3 Problem Definition and Requirements

Our main objective is to provide a solution for prolonging the functional and cogni-
tive capacity of the elderly by proposing an “Active Ageing” lifestyle protocol. Medi-
cal specialists monitor the progress of their patients daily through a dashboard and
modify the protocol for each user according to their capabilities. A set of mobile ap-
plications (social games, exer-games, cognitive games and diet application) feedback
the protocol proposed by the specialist and the progress of games to the end user. The
monitoring of each user is achieved by means of a network of sensors, either wearable
or environmental, and applications running on personal mobile devices. The human
activity recognition (HAR) measures characteristics of the elderly lifestyle in the
physical and social domains through non-invasive monitoring solutions based on the
sensor data. Custom mobile applications cover the areas of diet and cognitive moni-
toring. In the rest of this section, we present the main requirements of the HAR.
By leveraging environmental sensors, such as PIRs (Passive InfraRed) and a loca-
lization system, the HAR module profiles user habits in terms of daily ratio of room
occupancy and indoor/outdoor living. The system is also able to detect changes in the
user habits that occur in the long-term. By relying on accelerometer and heartbeat
data from a wearable bracelet, the HAR module provides time-slotted estimates, in
terms of calories, of the energy expenditure associated with the physical activities of
the user. Energy consumption can result from everyday activities and physical exer-
cises proposed by the protocol. The system also computes daily outdoor covered dis-
tance, the daily number of steps and detects periods of excessive physical stress by
using data originated by the accelerometer and the heartbeat in the bracelet. Finally, a
smart carpet is used to measure the user weight and balance skills, leveraging a ma-
chine learning classification model based on the BERG balance assessment test.
58 D. Bacciu et al.

The HAR assesses the social interactions of the user both indoor and outdoor. In par-
ticular, in the indoor case, HAR estimates a quantitative measure of the social interac-
tions based on the occurrence and duration of the daily social gatherings at the user
house. Regarding the outdoor socialization, the system estimates the duration of the
encounters with other users by detecting the proximity of the users’ devices.
The Reasoner uses the data produced by the HAR, the diet, and the games applica-
tions to provide an indicator of the user protocol compliance and protocol progress in
three areas: social life, physical activity and related diet, cognitive status. These indi-
cators, along with the measured daily metrics and aggregate data, support medical
specialists on providing periodical changes to the protocol (i.e.: set of physical activi-
ties and games challenges, diet). The Reasoner is able to suggest changes to the user
protocol by means of specialist-defined rules.
The HAR module and the Reasoner are, therefore, core system modules, bridging
the gap between sensors data, medical specialists, and the end user.

4 An Applicative Scenario

We consider a woman in her 70s, still independent and living alone in her apartment
(for the sake of simplicity, we give her the name of Loredana). She is a bit overweight
and she starts forgetting things. Recently, the specialist told her that she is at risk for
cardiovascular disease, due to her overweight condition, and that she has a mild cog-
nitive impairment. For this reason, Loredana uses our system as a technological sup-
port to monitor her life habits and to keep herself healthier and preventing chronic
diseases. In a typical day, Loredana measures her weight and balance by means of a
smart carpet, which collects data for the evaluation of her BERG scale equilibrium
and her weight. The data concerning the balance is used to suggest a personalized
physical activity (PA) plan, while the data about the weight give indications about the
effectiveness of the intervention in terms of a personalized diet regimen and PA plan.
During the day, Loredana wears a special bracelet, which measures (by means of an
accelerometer) her heart rate, how much she walked, and how many movements she
did with physical exercises. These data are used, by the system developed in the
project, to assess her calories expenditure and to monitor the execution of the pre-
scribed physical exercises. The bracelet is also used to localize her both indoor (also
collecting information about the time spent in each room) and outdoor (collecting
information about distance covered). Furthermore, the bracelet detects the proximity
of Loredana with other users wearing the same bracelet, while machine-learning clas-
sification models based on environmental sensors deployed at home (PIRs and door
switches) detect the presence of other people in her apartment to give indication about
the number of received visits. These data are used as an indicator of her social life.
Loredana also uses a tablet to interact with the system, with which she performs
cognitive games and inserts data concerning her meals, which are converted by the
system (under the supervision of her specialist) in daily Kcalories intake and food
composition. She is also guided through the daily physical exercises and games that are
selected by the system (under the supervision of her specialist) based on the evolution
Smart Environments and Context-Awareness for Lifestyle Management * 59

of her conditions (in terms of balance, weight, physical exercises etc.). All the data
collected during the day are processed at night to produce a summary of the Loredana
lifestyle, with the purpose of giving feedbacks to Loredana in terms of proposed physi-
cal activity, and presenting the condition of Loredana to the specialist on a daily basis.

5 Activity Recognition and Reasoning

5.1 High Level Architecture and Data Flows

The high-level system architecture is presented in Fig. 1, highlighting the data flow
originated at a pilot site (in terms of sensor data), through the data processing stages
(preprocessing, activity recognition and reasoner subsystems), and then back to the
user (in terms of feedbacks) and to the specialist (in terms of information about the
user performance).
In particular, Fig. 1 shows that the data processing system contains five main sub-
systems running on the server (the grey rectangles in the figure), three databases
(RAW, HOMER [28] and KIOLA [29]), plus a middleware that uploads the sensors’
data in the RAW database, whose description is out of the scope of this paper. At
night, a synchronization mechanism (shown as a clock in Fig. 1), sequentially acti-
vates these five subsystems. In turn, these subsystems pre-process data (pre-
processing subsystem in the Fig. 1), configure the predictive activity recognition tasks
(task configurator subsystem), process the daily pre-processed data through the pre-
dictive human activity recognition subsystem (HAR subsystem), perform the explora-
tory data analysis (EDA subsystem), and refine and aggregate the results of these
stages (Reasoner subsystem).
Along with the sensors’ data, the RAW DB also stores the intermediate data pro-
duced by the pre-processing subsystem. The HOMER DB stores configuration infor-
mation (e.g. regarding sensors deployment) that is used by the task configurator to
retrieve the tuning parameters for the different pilot sites. The refined data produced
by the HAR and EDA subsystems is then stored in the KIOLA DB, where the Rea-
soner reads them. The Reasoner outputs feedbacks for the user in terms of suggestions
about her lifestyle, and data about her compliance to the suggested lifestyle protocol
for the specialist (or caregiver) through the dashboard.
The data processing stages deal with three flows of data, related to the user diet,
social relationships, and sedentariness, respectively. The dietary data flow relies on
data produced by the smart carpet (pressure data and total weight), the bracelet
(hearth rate and accelerometers data), and data about the food composition provided
by the user himself through an interface on his tablet. In particular, the data produced
by the bracelet pass through the HAR subsystem that estimates the user physical ac-
tivity. The data flow about the user’s social relationships relies on a number of envi-
ronmental sensors, which detect the contacts of the user with other people. To this
purpose, the HAR and EDA subsystems detect the user encounters and the proximity
of the user with other people by fusing information produced by presence sensors,
user’s localization and door switches. This data flow also relies on the user’s mood
60 D. Bacciu et al.

pilot site
outdoor indoor

Games - Nutrition Advisor internet

remote server

Pre-processing EDA
Subsystem Subsystem

Middleware
internet sensors data RAW HOMER KIOLA Reasoner
DB DB DB

Activity
Task Configurator
Recognition
Subsystem
Physician practice Subsystem

Dashboard Lifestyle protocol – User management internet

Fig. 1. High-level architecture and deployment of a typical installation, with data flowing from
a pilot site to the remote Activity Recognition and Reasoning system.

information, which the user himself asserts daily through an app on his tablet. Finally,
the sedentariness data flow exploits data produced by the bracelet (heart rate, user’s
localization, movements, and step count), the smart carpet, and data about the use of
the application that guides the user through the daily physical activity. The HAR sub-
system processes the data of the smart carpet to assess the user balance according to
the BERG scale. The HAR and EDA subsystems also process data from the bracelet
to assess the intensity of the physical effort.
The Reasoner fuses all these data flows at a higher level than that of HAR and
EDA. In a first step, it exploits rules extracted from clinical guidelines to compute
specific parameters for each of the three data flows. For example, it uses the physical
activity estimation in the sedentariness data flow to assess the compliance of the user
to the prescribed lifestyle protocol (as the medical expert defines it). In the second
step, the Reasoner performs a cross-domain reasoning on top of the first step, allow-
ing a deeper insight in the well-being of the patient. Note that the empirical rules
needed to define the second-level reasoning protocol are yet not available, as they are
the output of medical studies on the data collected from the on-site experimentation
that will be concluded in the next year of the project activity. Hence, at this stage of
the project, the second level reasoning is not yet implemented.
Note that the Reasoner subsystem operates at a different time-scale with respect to
the HAR and EDA subsystems. In fact, the aim of HAR and EDA is the recognition of
short-term activities of the user. These can be recognized from a sequence of input
sensor information (possibly pre-processed) in a limited (short-term) time window.
Smart Environments and Context-Awareness for Lifestyle Management * 61

All short-term predictions generated across the day are then forwarded to the Reasoner
for information integration across medium/long time scales. Medium term reasoning
operates over 24h periods (for example, to assess the calories assumption/consumption
balance in a day). Long-term reasoning, on the other hand, shows general trends by
aggregating information on the entire duration of the experimentation in the pilot sites
(for example to offer statistical data about the user, which the medical experts can use
to assess the overall user improvement during the experimentation).

5.2 Human Activity Recognition and Exploratory Data Analysis Subsystems

The goal of the activity recognition subsystems (HAR and EDA) is to evaluate the
user parameters (referred to as predictions) concerning his short-term activities per-
formed during the day. It exploits the activity recognition configuration, which is the
result of an off-line configuration phase aimed at finding the final setting for the pre-
processing and for the activity recognition subsystems (both HAR and EDA) that are
deployed to implement the activity recognition tasks.
The EDA subsystem analyses pre-processed data in order to profile user’s habits,
to detect behavioral deviations of the routine indoor activities, and to provide aggre-
gated values useful to the Reasoner in the sedentariness area, such as user habits,
daily outdoor distance covered, daily steps and information about outdoor meetings
with other users. The actual nature of the data streams processed by the EDA depends
on the particular sensors originating them. For example, in the case of BERG score
prediction concerning the user balance, the data produced by the smart carpet have a
high and variable frequency (~100 Hz). These data are normalized and broken into
fixed frequency time series segments of 200 ms, from which the preprocessing stage
extracts statistical features consisting in mean, standard deviation, skewness and kur-
tosis. The resulting features time series, presenting a lower frequency of 5 Hz, are the
input of the EDA subsystem. A similar pre-processing stage is applied to data coming
from the other sensors.
The unsupervised models used for the EDA do not rely on a long-term, invasive
and costly ground truth collection and annotation campaigns, which may be not ac-
ceptable by the users. Rather, EDA is designed to detect symptoms of chronic diseas-
es (which are most relevant for the project purposes), characterized by a gradual,
long-term deviation from the user typical behavior, or by critical trends in the user’s
vital parameters. For example, EDA features a module for the detection of abnormal
deviations in the user habits (based on motif discovery and stigmergy algorithms) that
relies on the locations of the user (at room-level) during his daily living activities.
Concerning the HAR subsystem, its internal architecture is shown in Fig. 2. It is
composed of two main subsystems, which are the task configurator and the activity
recognition subsystem. The former handles the retrieval of configuration information
and pre-processed data for the tasks addressed in the system and forwards it to the
latter subsystem, which is responsible for performing the actual activity recognition
tasks. The core of the activity recognition system is given by the HAR scheduler
(which activates the activity recognition tasks when all sensor information is consoli-
dated and pre-processed in the RAW DB), and by the pool of activity recognition
62 D. Bacciu et al.

Task Configurator Activity Recognition

Subsystem Subsystem

RAW data
Data
assembled
task
DB Interface data
AR1 … ARL
A.R. POOL

Task
HOMER
configuration Interface
DB activity
recognition
control outputs
commands
Output
Interface KIOLA
A.R. DB
Scheduler

Fig. 2. Detailed architecture of the supervised HAR system

components (based on predictive learning models), one for each specific task. These
components implement the trained predictive learning model obtained from a prelim-
inary validation phase, and they produce their predictions by computing the outputs of
the supervised learning model in response to the input data.

5.3 Reasoning Subsystem and Dashboard

Fig. 3 shows the Reasoner, the high-level database and the dashboard with the other
system components. The high-level database receives the data from three sources: the
activity recognition subsystem (data about physical activity, calories consumption,
balance, user sociality etc.), the diet application on the tablet (nutritional data inserted
by the users themselves), and the application for serious games on the tablet (statistics
about the performance of the user in cognitive games).
The Reasoner compares all these data with the clinical protocol the person should
follow based on the pre-sets from the medical experts. To this purpose, it adopts a rule
based with hierarchical decision trees, where the rules will be created according to the
actual medical guidelines. Based on this, a general overview, as well as some calcu-
lated data relations, is presented to the specialist’s dashboard. The Reasoner settings
can be modified by the medical experts to change the protocol according to a certain
user behavior (for example he can change the food composition or reduce the overall
caloric intake), or the Reasoner itself may adapt the protocol according to pre-defined
rules, when some known conditions occur. For example, an improvement in the phys-
ical activity assessed by the heart rate response to exercise may result in a progressive
increase in the intensity of proposed exercises. The Reasoner gives feedbacks to the
user by means of applications on the user tablet (namely, the nutritional and physical
activity advisors and the cognitive games).
Smart Environments and Context-Awareness for Lifestyle Management * 63

Fig. 3. Architecture of the Reasoner and its relationship with the other system components

Fig. 4. Personal dashboard showing activity data of an end-user

The reasoning module and the dashboard are integrated in the KIOLA modular
platform, suitable for clinical data and therapy management. It is built on top of the
open-source web-framework Django1 and uses PostgreSQL 9.4 as primary data sto-
rage. KIOLA has two groups of components: core components (that provide data

1
http://www.djangoproject.com
64 D. Bacciu et al.

models for receiving and storing external sensor data, rule-based reasoning on obser-
vations, and messaging services to communicate results of the reasoning to external
systems), and frontend components (a dashboard for the specialists, an administrative
interface, and a search engine for all data stored in KIOLA). In particular, the dash-
board provides specialists with the possibility to review and adjust clinical protocols
online, and it is designed for both mobile devices and computers. The dashboard can
provide either an overview of all end-users to which the specialist has access, or a
detailed view of a specific end-user. Charts are used to visualize all observations in
the area of social, physical games, and dietary data (see Fig. 4). A task module on the
dashboard is also used to notify specialists when the reasoning system suggests an
adoption of the clinical protocol. Here, the specialist can then approve or disapprove
the recommendation, and he can tune the parameters of the protocol by himself.

6 Conclusions

The DOREMI project addresses three important causes of morbidity and mortality in
the elderly (malnutrition, sedentariness and cognitive decline), by designing a solution
aimed at promoting an active aging lifestyle protocol. It envisages to provide an ICT-
based home care services for aging people to contrast cognitive decline, sedentariness
and unhealthy dietary habits. The proposed approach builds on activity recognition
and reasoning subsystems, which are the scope of this paper. At the current stage of
the project such components are being deployed, and they will be validated in the
course of the year by means of an extensive data collection campaign aimed at obtain-
ing the annotated datasets. These datasets, that are currently being collected over a
group of elderly volunteers in Pisa, in view of the experimentation in the pilot sites
planned by the beginning of 2016.

References
1. Tukey, J.W.: Exploratory data analysis, pp. 2–3 (1977)
2. Long, X., Yin, B., Aarts, R.M.: Single-accelerometer-based daily physical activity classifi-
cation. In: Engineering in Medicine and Biology Society, EMBC 2009. Annual Interna-
tional Conference of the IEEE. IEEE (2009)
3. Fernández-Llatas, C., et al.: Process Mining for Individualized Behavior Modeling Using
Wireless Tracking in Nursing Homes. Sensors 13(11), 15434–15451 (2013)
4. der Aalst, V., Wil, M.P., et al.: Workflow mining: A survey of issues and approaches. Data
& Knowledge Engineering 47(2), 237–267 (2003)
5. Yang, C.-H., Liu, Y.-T., Chuang, L.-Y.: DNA motif discovery based on ant colony optimi-
zation and expectation maximization. In: Proceedings of the International Multi Confe-
rence of Engineers and Computer Scientists, vol. 1 (2011)
6. Bouamama, S., Boukerram, A., Al-Badarneh, A.F.: Motif finding using ant colony optimi-
zation. In: Dorigo, M., Birattari, M., Di Caro, G.A., Doursat, R., Engelbrecht, A.P.,
Floreano, D., Gambardella, L.M., Groß, R., Şahin, E., Sayama, H., Stützle, T. (eds.) ANTS
2010. LNCS, vol. 6234, pp. 464–471. Springer, Heidelberg (2010)
Smart Environments and Context-Awareness for Lifestyle Management * 65

7. Cui, X., et al.: Visual mining intrusion behaviors by using swarm technology. In: 2011
44th Hawaii International Conference on System Sciences (HICSS). IEEE (2011)
8. Bao, L., Intille, S.S.: Activity Recognition from User-Annotated Acceleration Data. In:
Ferscha, A., Mattern, F. (eds.) PERVASIVE 2004. LNCS, vol. 3001, pp. 1–17. Springer,
Heidelberg (2004)
9. Lara, O.D., Labrador, M.A.: A survey on human activity recognition using wearable sen-
sors. Communications Surveys & Tutorials, IEEE 15(3), 1192–1209 (2013)
10. Kolen, J., Kremer, S. (eds.): A Field Guide to Dynamical Recurrent Networks. IEEE Press
(2001)
11. Lukoševicius, M., Jaeger, H.: Reservoir computing approaches to recurrent neural network
training. Computer Science Review 3(3), 127–149 (2009)
12. Jaeger, H., Haas, H.: Harnessing nonlinearity: Predicting chaotic systems and saving
energy in wireless communication. Science 304(5667), 78–80 (2004)
13. Gallicchio, C., Micheli, A.: Architectural and markovian factors of echo state networks.
Neural Networks 24(5), 440–456 (2011)
14. Tino, P., Hammer, B., Boden, M.: Markovian bias of neural based architectures with
feedback connections. In: Hammer, B., Hitzler, P. (eds.) Perspectives of neural-symbolic
integration. SCI, vol. 77, pp. 95–133. Springer-Verlag, Heidelberg (2007)
15. Lukoševičius, M., Jaeger, H., Schrauwen, B.: Reservoir Computing Trends. KI -
Künstliche Intelligenz 26(4), 365–371 (2012)
16. Bacciu, D., Barsocchi, P., Chessa, S., Gallicchio, C., Micheli, A.: An experimental charac-
terization of reservoir computing in ambient assisted living applications. Neural Compu-
ting and Applications 24(6), 1451–1464 (2014)
17. Chessa, S., et al.: Robot localization by echo state networks using RSS. In: Recent Ad-
vances of Neural Network Models and Applications. Smart Innovation, Systems and
Technologies, vol. 26, pp. 147–154. Springer (2014)
18. Palumbo, F., Barsocchi, P., Gallicchio, C., Chessa, S., Micheli, A.: Multisensor data fusion
for activity recognition based on reservoir computing. In: Botía, J.A., Álvarez-García, J.A.,
Fujinami, K., Barsocchi, P., Riedel, T. (eds.) EvAAL 2013. CCIS, vol. 386, pp. 24–35.
Springer, Heidelberg (2013)
19. Bacciu, D., Gallicchio, C., Micheli, A., Di Rocco, M., Saffiotti, A.: Learning context-
aware mobile robot navigation in home environments. In: 5th IEEE Int. Conf. on Informa-
tion, Intelligence, Systems and Applications (IISA) (2014)
20. Amato, G., Broxvall, M., Chessa, S., Dragone, M., Gennaro, C., López, R., Maguire, L.,
Mcginnity, T., Micheli, A., Renteria, A., O’Hare, G., Pecora, F.: Robotic UBIquitous
COgnitive network. In: Novais, P., Hallenborg, K., Tapia, D.I., Rodrìguez, J.M. (eds.)
Ambient Intelligence - Software and Applications. AISC, vol. 153, pp. 191–195. Springer,
Heidelberg (2012)
21. Lavrac, N., et al.: Intelligent data analysis in medicine. IJCAI 97, 1–13 (1997)
22. Chae, Y.M.: Expert Systems in Medicine. In: Liebowitz, J. (ed.) The Handbook of applied
expert systems, pp. 32.1–32.20. CRC Press (1998)
23. Gurgen, F.: Neuronal-Network-based decision making in diagnostic applications. IEEE
EMB Magazine 18(4), 89–93 (1999)
24. Anderson, J.R., Machine learning: An artificial intelligence approach. In: Michalski, R.S.,
Carbonell, J.G., Mitchell, T.M. (eds.) vol. 2. Morgan Kaufmann (1986)
25. Hastie, T., et al.: The elements of statistical learning, vol. 2(1). Springer (2009)
26. Murphy, K.P.: Machine learning: a probabilistic perspective. MIT Press (2012)
66 D. Bacciu et al.

27. Carvalho, D.R., Freitas, A.A.: A hybrid decision tree/genetic algorithm method for data
mining. Information Sciences 163(1), 13–35 (2004). [EDA1] Tukey, J.W.: Exploratory da-
ta analysis, pp. 2–3 (1977)
28. Fuxreiter, T., et al.: A modular platform for event recognition in smart homes. In: 12th
IEEE Int. Conf. on e-Health Networking Applications and Services (Healthcom), pp. 1–6
(2010)
29. Kreiner, K., et al.: Play up! A smart knowledge-based system using games for preventing
falls in elderly people. Health Informatics meets eHealth (eHealth 2013). In: Proceedings
of the eHealth 2013, OCG, Vienna, pp. 243–248 (2013). ISBN: 978-3-85403-293-9
Gradient: A User-Centric Lightweight
Smartphone Based Standalone Fall
Detection System

Ajay Bhatia1(B) , Suman Kumar2 , and Vijay Kumar Mago3

1
Punjab Technical University, Jalandhar, India
[email protected]
2
Troy University, Troy, AL 36082, USA
[email protected]
3
Department of Computer Science, Lakehead University,
Thunder Bay, ON, Canada
[email protected]

Abstract. A real time pervasive fall detection system is a very impor-

tant tool that would assist health care professionals in the event of falls
of monitored elderly people, the demography among which fall is the epi-
demic cause of injuries and deaths. In this work, Gradient, a user centric
and device friendly standalone smartphone based fall detection solution
is proposed. Our solution is standalone and user centric as it is portable,
cost eﬃcient, user friendly, privacy preserving, and requires only tech-
nologies which exists in cellphones. In addition, Gradient is light weight
which makes it device friendly since cellphones are constrained by energy
and memory limitations. Our work is based on accelerometer sensor data
and the data derived from gravity sensors, a recently available inbuilt
sensor in smartphones. Through experimentation, we demonstrate that
Gradient exhibits superior accuracy among other fall detection solutions.

Keywords: Fall detection · Accelerometer · Gravity · Sensors · Android

app

1 Introduction

Advances in health care services and techonologies, and decrease in fertility rate
(especially in developed countries) are bringing major demographic changes in
aging. Currently, around 10% world population is aged over 65 and it is estimated
that this demography will see a 10 times increase in next 50 years [1]. It is
observed that fall is the major contributor to the growing rates of mortality,
morbidity of aging population, and complications induced by fall contribute
to the increase in higher health care cost also [2]. According to U.S. Census
Bureau, 13% of the population is over 65 years old, out of which 40% homely
old-age adults fall atleast once a year and 1 in 40 is hospitalised. Those who are
hospitalised have 50% of chance to be alive a year later. Fall is a major health

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 67–78, 2015.
DOI: 10.1007/978-3-319-23485-4 7
68 A. Bhatia et al.

threat to not only indpendently living elders [3] but also to community-dwelling
ones where fall rate is estimated to be 30% to 60% annually [4]. Elders who suffer
from visual impairment, urinary incontinence, and functional limitations are at
increased risk of recurrent falls [5]. Therefore, pervasive fall detection systems
that meet the needs of aging population is a necessity especially since increase
of aging population and estimated decrease in health care professionals demand
for technology assisted intelligent health care solutions [6].
Considerable efforts have been made to design fall detection solutions for
elderly populations, see [7] for a comprehensive list of solutions proposed by
researchers. As more of society is relying on smartphones because of its ubiq-
uitous nature of internet connectivity and computing, smartphone based fall-
detection assisted health monitoring and emergency response system is highly
desired by elderly population. In fact, recent trend suggests that smartphones
are reducing the need for wearing watches, the most common wearable gadget
in previous centuries [8] making smartphones the most commonly carried device
by people. Naturally, effective smartphone based fall-detection solutions that do
not require any infrastructure support external to smartphones are more fitting
the needs of elderly population.
In this work, we aim to design a smartphone based standalone fall detec-
tion system that is portable, cost efficient, user friendly, privacy preserving [9],
and requiring only existing cellphone technology. In addition, such solution must
exhibit low memory and low computational overhead which is only fitting since
cellphones are constrained by limited energy and limited memory. Clearly, this
motivates the design approaches centering around existing sensing technology
available in cellphones. Among sensor data, accelerometer sensor is considered
to be accurate [10] and therefore, a natural choice for design of such system.
In fact, accelerometer and orientation sensor combination based solutions are
proposed in past researches, for example, iFall [11] are some of the most notable
solutions that meet the design requirements mentioned above. However, through
experimentation, we show that smartphone fall detection solutions that involve
accelerometer data supplemented with orientation sensor data are not very accu-
rate. Therefore, to meet the above requirements, we propose a novel fall detection
mechanism that utilizes gravity sensor data and accelerometer data. Experimen-
tal comparision shows that the proposed approach is superior as compared to its
peers. In our work, Android operating system platform is used for experimenta-
tion and implementation purposes.
This paper is organized as follows: in Section 2, we discuss recent and land-
mark research work in this area. Section 3 discusses an Android based applica-
tion for data acquisition and our proposed fall detection method. In Section 4,
alpha test is performed on Gradient using simulated data. Finally, summary and
possible future work are discussed in section 5.

2 Related Work
Various studies have been carried out on fall detection using wearable sensors
and mobile phones. We divide the fall detection systems in two categories:
Gradient: A User-Centric Lightweight Smartphone 69

2.1 User-Centric and Device-Friendly

The first accelerometer data driven fall detection system was proposed in [12].
Their system detects a fall when there is a change in body orientation from
upright to lying that occurs immediately after a large negative acceleration. This
system design later becomes a reference point for many fall detection algorithms
using accelerometers.
A popular Android phone application, iFall [11], is developed for fall mon-
itoring and response. Data acquired from the accelerometer with the help of
application is evaluated with several threshold based algorithms to determine a
fall. Basic body metrics like height and weight, along with level of activity are
used for estimating the threshold values. An alert notification system, SMS and
emergency call is developed in moderate to critical situations and emergency.
A threshold based fall detection algorithm is proposed in [13]. The algorithm
works by detecting dynamic situations of postures, followed by unintentional
falls to lying postures. The thresholds calculated and obtained from the collected
data are compared with the linear acceleration and the angular velocity sensor
data for detecting the fall. After the fall is detected, a notification is sent to
make an alert about the fall. The authors used gyroscope for data acquisition.
The gyroscope data is not efficient in detecting the fall accurately therefore
making the proposed system not a good choice for monitoring the fall. In [14],
accelerometer and gyroscope sensors are used together to detect fall in elderly
population. Further gravity and angular velocity is extracted from the data to
detect fall. Using gravity and angular velocity not only detect a fall but also the
posture of the body during fall is detected. Authors experimented the system to
know the false positive and false negative fall detection. Also, false fall positions
were used in study to know the impact and efficiency of the algorithm. The
approach used to detect the fall in their proposed study is too simple to be
adopted for detecting quick falls using accelerometer data. That is why, the
system performs poorly in differentiating jumping into bed and falling against
wall with seated posture.
In [15], a smartphone-based fall detection system using a threshold-based
algorithm to distinguish between activities of daily living and falls in real time is
proposed. By comparative analysis of threshold levels for acceleration, in order
to get the best sensitivity and specificity, acceleration thresholds were deter-
mined for early pre-impact alarm (4.5-5 m/s2 ) and post-fall detection (21-28
m/s2 ) under experimental conditions. The experimental thresholds calculated
are helpful for further study in this area of research. But accelerometer alone is
not sufficient for detecting the fall effectively.

2.2 External Infrastructure Based

The paper [16] describes a fall detection sensor to monitor the subject safely and
accurately by implementing it in a large sensor network called SensorNet. Initial
approaches such as conjoined angle change and magnitude detection algorithm
were unsuccessful. Another drawback of this approach involves ineﬃciency of
70 A. Bhatia et al.

complex fall detection in which the user did not end up oriented horizontally
with the ground. The fall-detection board designed was able to detect 90% of all
falls with 5% false positive rate.
The Ivy project [17] used low-cost and low-power wearable accelerometer on
wireless sensor network to detect the fall. The threshold on the peak values of
the accelerometer were estimated along orientation angular data to detect the
fall. The authors found that the intensity and acceleration of fall is far diﬀerent
from other activities. The main drawback of this proposed system is that it
only works well for an indoor environment because of its dependance on a ﬁxed
network to relay events. In the study [18], accelerometer sensor is used to detect
wearer’s posture, activity and fall in wireless sensor network. The activity is
determined by the alternating current component and posture is determined by
the direct current component of the accelerometer signal. Fall detection rate of
the proposed system is 93.2%. The paper lacks in explaining the actual algorithm.
Moreover the complexity and cost involved in designing the system makes it less
suitable for fall-detection in real life scenario.
In [19], authors proposed a fall detection system which uses two sets of sen-
sors, one with an accelerometer and a gyroscope and the other with only an
accelerometer. They used sensor data to calculate angular data and their system
outperforms the earlier known systems as the system is able to detect fall with
an average lead-time of 700 milliseconds before the impact to ground occurs.
The major drawback of the system is that the subject has to wear torso and
thigh sensors and this tends to be cumbersome for subjects. Also, the system is
not capable to record data in real-life situations.

3 Methods

In this section, ﬁrst, principle behind previous researches that utilize orientation
sensor based design approaches is discussed, then we describe gravity sensor
based approach that forms the principle and basis of our proposal. We further
show that gravity sensor based design is more accurate and becomes a natural
choice for user-centric and device friendly fall detection solution approaches.

3.1 Design Principle: Orientation Sensor

A gyroscope can be used to either measure, or maintain, the orientation of a

device. In comparison with an accelerometer, a gyroscope measures the orien-
tation rather than linear acceleration of the device. Analyzing accelerometer
and orientation data to detect fall is the core of several fall detection systems.
Orientation sensor which in most cases is the combination of gyro sensor magne-
tometer supplements the aim to provide accurate orientation data. We further
explain the idea behind this approach.
Tri-axial orientation is represented in Figure 1(a)) where θ, φ, and ψ are
azimuthal angle (00 is north), pitch angle ( 00 is ﬂat on its back from positive Y,
east) and roll angle (00 is ﬂat on its back from positive Z, downward) respectively.
Gradient: A User-Centric Lightweight Smartphone 71

(a) Orientation (b) Cellphone body axis

relative to earth to earth axis

Fig. 1. Orientation of cellphone relative to real world axis system

Consider the example in Figure 1(b), where axis system is rotated around
z axis by θ. The rotated axis are shown as X and Y . X and Y are axis
coordinates relative to phone. Contribution of this rotation to the X component
of ﬁxed axis (X and Y) is given as: aX = rxy cos(θ + k), where ax and ay are X
and Y components of acceleration vector and k is the angle acceleration vector
makes width the X axis. Combining the eﬀect of azimuth, roll and pitch, we
get following matrix:
⎡ ⎤
cos(ψ).cos(φ) −sin(ψ).cos(θ) sin(ψ).sin(θ)
M = ⎣sin(ψ).cos(φ) cost(ψ).cos(θ) −cost(ψ).sin(θ)⎦
−sin(ψ) cos(φ).sin(θ) cos(φ).cos(θ)
⎡ ⎤
0 cost(ψ).sin(φ).sin(θ) cost(ψ).sin(φ).cos(θ)
+ ⎣0 sin(φ).sin(θ) sin(ψ).sin(φ).cos(θ) ⎦ (1)
0 0 cos(φ).cos(θ)

Relation between body axis ([x , y , z ]) and earth axis ([x, y, z]) is given as

[ax ay az ] = [ax ay az ] × M (2)

Here ax , ay and az are accelerometer sensor readings on tri-axial coordinate

system relative to cell phone and θ, φ and ψ are orientation sensor readings from
the mobile phone and M is the matrix deﬁned in equation 1.

3.2 Design Principle: Gravity Sensor

Gravity sensor returns only the inﬂuence of gravity. Gravity vector under the
→
−
context of the phone coordinate system is given as: G = (gx , gy , gz ), and
→
−
acceleration in the same axis system is given as: A = (ax , ay , az ).
72 A. Bhatia et al.

Fig. 2. Cellphone Acceleration Along (Apar ) and Perpendicular (Aperp ) to Gravity

From Figure 2, since both vectors are under the same coordinate system, the
phone coordinate system by simple vector inner product, the angle between the
two can be easily found and thus acceleration component of cellphone parallel
to gravity vector is given as below,
→
−
−−→ − → → −
− → G
Apar = az = ( A G ) 2 (3)
−
→
G

Likewise, the perpendicular counterpart is given as below,

→
−
−−−→ − → → −
− → G
Aperp = A − ( A G ) 2 (4)
−
→
G

The absolute value of downward acceleration is given by following equation:

−−→
ax gx + ay gy + az gz
Apar = |az | = 2 (5)
gx + gy 2 + gy 2
And the absolute acceleration along X − Y plane of cellphone is given as
−−−→ −−→

Aperp = a2x + a2y + a2z − |Apar |2 (6)

3.3 Orientation Sensor vs. Gravity Sensor

A comparison of vertical downward acceleration component of linear acceleration
sensor data using orientation sensor values (Equation 1 and Equation 2) with the
same using gravity sensor values is presented in Figure 3. We observe unstable
and erroneous values using orientation sensor where as the values using gravity
sensor data is stable and accurate. Gyro sensor and Magnetometer sensor suﬀer
through convergence to stabilize values and hence causes a problem which one
can see in our experiment. Therefore, it explains why many supposedly promising
solutions failed the accuracy test by a huge margin. With this observation, we
are motivated to use gravity sensor in our work. To the best of our knowledge,
our work is beyond the research already done and includes gravity sensor which
is more promising in detecting a real fall than a false fall. The system design
using accelerometer and gravity sensor follows next.
Gradient: A User-Centric Lightweight Smartphone 73

Fig. 3. Downward acceleration: Orientation sensor vs. Gravity sensor

4 Gradient: System Design

4.1 Data Acquisition

The Android application has been developed by considering the importance of

each sensor inside the phone. Since application is basically developed to collect
data through sensors even when screen is off therefore it is designed to work as
a service rather than an activity for collecting data through available sensors in
the device. The application is designed to run on any Android platform having
at least GingerBread operating system [20]. Application uses both accelerometer
and gravity sensors for the data acquisition. Both the sensors are acceleration
based sensors and are responsible for measuring acceleration of the device in
all three directions. An acceleration sensor measures the acceleration applied to
the device, including the force of gravity. The gravity sensor provides a three
dimensional vector indicating the direction and magnitude of gravity. Along this
configuration, an application uses permissions to write to external storage for
saving the data in the form of CSV (Comma-separated values) files. The data is
saved in different CSV files based on the type of sensors. x−axis, y−axis, z−axis
and timestamp are recorded in the respective files. As the data is acquired, the
saved files are used to process and detect fall. Subsection 4.2 describes the our
processed methodology.

4.2 Algorithm

Gradient is based on the observation that when a user falls there is a sudden
change in acceleration in vertical downward direction and therefore, downward
acceleration component contains enough information to detect fall events. We
ran several laboratory experiments to verify this hypothesis. The sensor data is
collected on a mobile device using Gradient application. Then the diﬀerentiation
of vertical downward component in time is compared with preset threshold value
to identify fall events. It is to be noted that the name Gradient is derived
from Gravity-diﬀerentiation, the driving idea behind the proposed solution.
The detailed algorithm is presented below:
74 A. Bhatia et al.

Algorithm 1. Fall Detection Algorithm (Using Vectorization)

−
→
Data: Ad ← (ax ay az ); //Accelerometer sensor readings
−
→
Gd ← (gx gy gz ); // Gravity sensor readings
Result: F allDetected

Ws ← 10; //Set window size for moving averages

5 Result and Discussion

In this section, performance of the proposed solution is discussed along with the
experiment scenarios. We did not test our system on real fall datasets1 because
gravity sensor values are not available. However, we tried our best to collect fall
data in a controlled environment to match the real life natural fall events.

5.1 Experiment Set-Up

Gradient app distributed among internal members of the research team (4 grad-
uate students and 2 faculty members) to evaluate the user acceptance and feed-
back of the use of the app and also its accuracy for data acquisition. Data was
collected on Samsung Galaxy S4 which has Quad-core 1.6 GHz processor, 2
GB internal RAM and runs on Android OS, v4.2.2 (Jelly Bean). Members of
the team carried the smart phone with Gradient application during their daily
routine of work and activities across diﬀerent time spans and number of days.
Detailed logs of the times when they fall were maintained to check the sustain-
ability of the app post-experimentation and results. Feedbacks were taken from
the members to improve the app in every aspect of data acquisition. Two of the
experiments performed by the team members are represented in the Figure 4a
and 4b respectively. In the ﬁrst experiment, Figure 4a, there is fall at the end
of 6th minute of data acquisition. Second fall is clearly visible at end of the 7th
minute. Similarly in Figure 4b fall can be seen in the mid of 6th and 7th minute
respectively.

1
http://www.bmi.teicrete.gr/index.php/research/mobifall
Gradient: A User-Centric Lightweight Smartphone 75

(a) Experiment 1

(b) Experiment 2

Fig. 4. Fall detection experiments

The experiments were conducted by graduate students of the leading author.

The students were informed about the procedure and implications of this study.
The institute of the leading author also approved this research work. A total of
20 experiments2 are performed in the closed and safe environment. Total number
of actual falls performed by the subjects for the study are 27. It is found that
the best threshold value (θ) for the fall detection is -6.15, while applying the
Gradient approach. The values lesser than θ are treated as fall event. The true
positive falls and false negative falls detected by the proposed system are 24 and
3 respectively. On the other hand, false positive falls were counted to be 3. On
close observation of false negative cases, we ﬁnd that these are the cases where
the subject fell from low heights, e.g., from a sofa set. We wish to acknowledge
that to distinguish between putting the phone in pocket after receiving a call
and falling from a low lying sofa is challenging. These soft falls needs further
investigation. Also, there is no way to compute the true negative cases, so we
are not reporting the speciﬁcity of the system.
We also attempted to compare our method with a well known Android fall
detection application iFall [11]. The iFall detects all 27 fall events but it also
raised count of false negative to 13, and hence reducing the sensitivity to 67.5%
only, see Table 1. We observe that the application is very sensitive to jerks and
movements like brisk walking/running. Statistically, our system is more reliable
and robust to cope with the real-time activities.
2
All the experiments were performed in controlled and safe environment.
76 A. Bhatia et al.

Table 1. Comparison between iFall application and Gradient

Fall
Positive Negative
Positive predictive value
True positive False positive = TP / (TP + FP)
Outcome Positive
(TP) = 20 (FP) = 3 = 20 / (20 + 3)
= 86.96%
Negative predictive value
False negative True negative = TN / (FN + TN)
Outcome Negative
(FN) = 2 (TN) = 2 = 2 / (2 + 2)
= 50.00%
Sensitivity Specificity
= TP / (TP + FN) = TN / (FP + TN) Total
= 20 / (20 + 2) = 2 / (3 + 2) 27
= 90.91% = 40.00%

The positive predictive value (86.96%) and sensitivity (90.91%) shows eﬀec-
tivity of the proposed system in detecting falls.

5.2 Gradient in Action

We plotted the time series data in Figure 4. The ﬁrst subplot is drawn from the
magnitude of accelerometer sensor data, which is calculated as:

|A| = a2x + a2y + a2z

It is to be noted that several researches [11,13,15] utilized acceleration data

derived by the above Equation for fall detection. However, fall event mainly
corresponds to vertical fall therefore, including horizontal components would
result into false positives in many activity scenarios where there is a sudden
change in horizontal velocity of a person (see next section for further explanation)
and clearly, those events are not fall events. For the sake of completeness of this
work we have included the absolute value of acceleration in subplot of Figure 4.
The second subplot is derived from value of az from Equation 5. The fall event
points are circled in the ﬁgure and fall is correctly detected by using the time
diﬀerential value of az . Although our experimental scenario involves controlled
environment, we observe successful detection of all fall events.

5.3 Performance Comparison

In this section, we compare our work with one of the most notable work known
as iFall [11]. The experiment was performed by running iFall and Gradient
concurrently along a stop watch to measure the exact time of real falls. The
comparison between the iFall and our proposed design is presented in Figure 5.
The upper numbered labels 1 through 9 represents falls detected by iFall and
the lower numbered labels 1 through 4 represents falls detected by Gradient.
In Figure 5, we observe although iFall successfully detects fall events, it also
Gradient: A User-Centric Lightweight Smartphone 77

Fig. 5. Comparison between iFall (red square) and Gradient (blue square)

outputs several false positives in the event of no fall. We observe if the device
is on a running motion or on a jerky motion such as shake, iFall records such
events as fall events. Clearly, Gradient shows better accuracy than iFall.

6 Conclusion
Fall is the major health risk among the old-aged people around the world. Fall
detection using computational approach has remained a challenging task, that
prompted researchers to propose various computational methods to detect the
occurrence of fall. But the solution that are user-centric and device-friendly
is elusive. In this paper, we proposed a novel approach of fall detection using
accelerometer and gravity sensors which are now integral components of smart-
phones. We designed an Android application to collect experimental data, and
applied our algorithm to test the accuracy of the system. Our initial results
are very promising and the proposed method has a potential to reduce the false
positives which is a common problem with other popular user-centric and device-
friendly systems. Furthermore, we believe that this system can help health care-
takers, health professionals, and medical practitioners to better mange health
hazards due to fall in elder people. In future, we plan to conduct a user study
with a healthcare center and test our system on real fall datasets.

References
1. Haub, C.: World population aging: clocks illustrate growth in population under
age 5 and over age 65. Population Reference Bureau, June 18, 2013 (2011)
2. Fulks, J., Fallon, F., King, W., Shields, G., Beaumont, N., Ward-Lonergan, J.:
Accidents and falls in later life. Generations Review 12(3), 2–3 (2002)
3. Duthie Jr, E.: Falls. The Medical clinics of North America 73(6), 1321–1336 (1989)
4. Graafmans, W., Ooms, M., Hofstee, H., Bezemer, P., Bouter, L., Lips, P.: Falls in
the elderly: a prospective study of risk factors and risk proﬁles. American Journal
of Epidemiology 143(11), 1129–1136 (1996)
5. Tromp, A., Pluijm, S., Smit, J., Deeg, D., Bouter, L., Lips, P.: Fall-risk screening
test: a prospective study on predictors for falls in community-dwelling elderly.
Journal of Clinical Epidemiology 54(8), 837–844 (2001)
78 A. Bhatia et al.

6. Kleinberger, T., Becker, M., Ras, E., Holzinger, A., Müller, P.: Ambient intelligence
in assisted living: enable elderly people to handle future interfaces. In: Stephanidis,
C. (ed.) UAHCI 2007 (Part II). LNCS, vol. 4555, pp. 103–112. Springer, Heidelberg
(2007)
7. Igual, R., Medrano, C., Plaza, I.: Challenges, issues and trends in fall detection
systems. BioMedical Engineering OnLine 12(1), 1–24 (2013)
8. Phones replacing wrist watches. http://today.yougov.com/news/2011/05/05/
brother-do-you-have-time/ (online accessed April 22, 2014)
9. Ziefle, M., Rocker, C., Holzinger, A.: Medical technology in smart homes: exploring
the user’s perspective on privacy, intimacy and trust. In: 2011 IEEE 35th Annual
Computer Software and Applications Conference Workshops (COMPSACW),
pp. 410–415. IEEE (2011)
10. Lindemann, U., Hock, A., Stuber, M., Keck, W., Becker, C.: Evaluation of a fall
detector based on accelerometers: A pilot study. Medical and Biological Engineer-
ing and Computing 43(5), 548–551 (2005)
11. Sposaro, F., Tyson, G.: ifall: An android application for fall monitoring and
response. In: Annual International Conference of the IEEE Engineering in Medicine
and Biology Society, EMBC 2009, pp. 6119–6122. IEEE (2009)
12. Williams, G., Doughty, K., Cameron, K., Bradley, D.: A smart fall and activ-
ity monitor for telecare applications. In: Proceedings of the 20th Annual Interna-
tional Conference of the IEEE Engineering in Medicine and Biology Society, 1998,
vol. 3, pp. 1151–1154. IEEE (1998)
13. Wibisono, W., Arifin, D.N., Pratomo, B.A., Ahmad, T., Ijtihadie, R.M.: Falls
detection and notification system using tri-axial accelerometer and gyroscope sen-
sors of a smartphone. In: 2013 Conference on Technologies and Applications of
Artificial Intelligence (TAAI), pp. 382–385. IEEE (2013)
14. Li, Q., Stankovic, J.A., Hanson, M.A., Barth, A.T., Lach, J., Zhou, G.: Accurate,
fast fall detection using gyroscopes and accelerometer-derived posture informa-
tion. In: Sixth International Workshop on Wearable and Implantable Body Sensor
Networks, BSN 2009, pp. 138–143. IEEE (2009)
15. Mao, L., Liang, D., Ning, Y., Ma, Y., Gao, X., Zhao, G.: Pre-impact and impact
detection of falls using built-in tri-accelerometer of smartphone. In: Zhang, Y.,
Yao, G., He, J., Wang, L., Smalheiser, N.R., Yin, X. (eds.) HIS 2014. LNCS,
vol. 8423, pp. 167–174. Springer, Heidelberg (2014)
16. Brown, G.: An accelerometer based fall detector: development, experimentation,
and analysis. University of California, Berkeley (2005)
17. Chen, J., Kwong, K., Chang, D., Luk, J., Bajcsy, R.: Wearable sensors for reli-
able fall detection. In: 27th Annual International Conference of the Engineering in
Medicine and Biology Society, IEEE-EMBS 2005, pp. 3551–3554. IEEE (2006)
18. Lee, Y., Kim, J., Son, M., Lee, J.H.: Implementation of accelerometer sensor mod-
ule and fall detection monitoring system based on wireless sensor network. In: 29th
Annual International Conference of the IEEE Engineering in Medicine and Biology
Society, EMBS 2007, pp. 2315–2318. IEEE (2007)
19. Nyan, M., Tay, F.E., Murugasu, E.: A wearable system for pre-impact fall detection.
Journal of Biomechanics 41(16), 3475–3481 (2008)
20. Inc., G.: Android Gingerbread OS (2013). http://developer.android.com/about/
versions/android-2.3-highlights.html (online accessed April 04, 2014)
Towards Diet Management with Automatic
Reasoning and Persuasive
Natural Language Generation

Luca Anselma(B) and Alessandro Mazzei

Dipartimento di Informatica, Università di Torino, Turin, Italy

{anselma,mazzei}@di.unito.it

Abstract. We devise a scenario where the interaction between man and

food is mediated by an intelligent system that, on the basis of various
factors, encourages or discourages the user to eat a speciﬁc dish. The
main factors that the system need to account for are (1) the diet that the
user intends to follow, (2) the food that s/he has eaten in the last days,
and (3) the nutritional values of the dishes and their speciﬁc recipes.
Automatic reasoning and Natural Language Generation (NLG) play a
fundamental role in this project: the compatibility of a food with a diet
is formalized as a Simple Temporal Problem (STP), while the NLG tries
to motivate the user. In this paper we describe these two facilities and
their interface.

Keywords: Diet management · Automatic reasoning · Natural

language generation

1 Introduction
The daily diet is one of the most important factors inﬂuencing diseases, in partic-
ular for obesity. As highlighted by the World Health Organization, this factor is
primarily due to the recent changes in the lifestyle [26]. The necessity to encour-
age the world’s population toward a healthy diet has been sponsored by the FAO
[20]. In addition, many states specialized these guidelines by adopting strategies
related to their food history (for instance, for USA http://www.choosemyplate.
gov). In Italy, the Italian Society for Human Nutrition has recently produced a
prototypical study with recommendations for the use of specialized operators [1].
This scenario suggests the possibility to integrate the directives on nutrition
in the daily diet of people by using multimedia tools on mobile devices. The
smartphone can be considered as an super-sense that creates new modalities of
interaction with food. In recent years there has been a growing interest in using
multimedia applications on mobile devices as persuasive technologies [13].
Often a user is not able to carefully follow a diet for a number of reasons.
When a deviation occurs, it is useful to support the user in devising the conse-
quences of such deviation and to dynamically adapt the rest of the diet in the
upcoming meals so that the global Dietary Reference Values (henceforth DRVs)

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 79–90, 2015.
DOI: 10.1007/978-3-319-23485-4 8
80 L. Anselma and A. Mazzei

Fig. 1. The architecture of the diet management system.

could nevertheless be reached. In particular in this paper we describe a system

which is useful for (i) evaluating the compatibility of a dish with a diet allowing
small and occasional episodes of diet disobedience, (ii) determining what are the
consequences of eating a specific dish on the rest of the diet, (iii) showing such
consequences to the user thus empowering her/him and, moreover, (iv) motivat-
ing the user in following the diet by persuading her/him to minimize the acts
of disobedience. Using automatic reasoning to evaluate the compatibility of a
dish with a diet could enhance a smartphone application with a sort of virtual
dietitian. Artificial intelligence should make the system tolerant to diet disobedi-
ence, but also persuasive to minimize these acts of disobedience. Thus, a critical
issue directly related to automatic reasoning is the final presentation to the user
of the results. Several studies have addressed the problem of generating natural
language sentences that explain the results of automatic reasoning [4,17].
In our hypothetical scenario the interaction between man and food is medi-
ated by an intelligent system that, on the basis of various factors, encourages or
discourages the user to eat a specific dish. The main factors that the system needs
to account for are (1) the diet that the user has to follow, (2) the food that s/he has
been eating in the last days or that s/he intends to eat in the next days, and (3)
the nutritional values of the ingredients of the dish and its specific recipe. In Fig. 1
we report the architecture of our system. It is composed by five modules/services:
a smartphone application (APP), a central module that manages the information
flow (DietMAnager), an information extraction module (NLU/IE), a reasoning
module (Reasoner) and a natural language generation module (NLGenerator). In
this paper we focus on the description of the Reasoner and NLGenerator modules;
some details on the other modules and on the system can be found on the webpage
of the project (http://di.unito.it/madiman).
We think that this system could be commercially attractive at least in two
contexts. The first context is the medical one, where users (e.g. patients affected
by essential obesity) are strongly motivated to strictly follow a diet and need
tools that help them. The second context is the one involving, e.g., healthy
Towards Diet Management with Automatic Reasoning 81

fast food or restaurant chains, where the effort of deploying the system can be
rewarded by an increase in customer retention.
This paper is organized as follows: in Section 2 we describe the automatic
reasoning facilities, in Section 3 we describe the design of the persuasive NLG
based on different theories of persuasion and, finally, in Section 4 we draw some
conclusions.

2 Automatic Reasoning for Diet Management

Since our approach to automatic reasoning for diet management is based on the
STP framework, ﬁrst we introduce STP, then we describe how we exploit STP
to reason on a diet and how we interpret the results from STP.

2.1 Preliminaries: STP

We base our treatment of nutrition constraints on the framework of “Simple
Temporal Problem” (STP) [8]. An STP constraint consists in a bound on diﬀer-
ences of the form c ≤ x − y ≤ d, where x and y are temporal points and c and
d are numbers (their domain can be either discrete or real). An STP constraint
can be interpreted in the following way: the temporal distance between the time
points x and y is between c – the lower bound of the distance – and d – the
upper bound of the distance. It is also possible to impose strict inequalities (i.e.,
<) and −∞ and +∞ can be used to denote the fact that there is no lower or
upper bound, respectively. An STP is a conjunction of STP constraints.
An interesting feature of STP is that the problem of determining the consis-
tency of an STP is tractable and that the algorithm employed, i.e., an all-pairs
shortest paths algorithm such as Floyd-Warshall’s one, also obtains the mini-
mal network, that is the minimum and maximum distance between each pair of
points. STP can be represented with a graph whose nodes correspond to the tem-
poral points of the STP and whose arcs are labeled with the temporal distance
between the points.
Property. Floyd-Warshall’s algorithm is correct and complete on STP, i.e.
it performs all and only the correct inferences while propagating the STP con-
straints [8], and obtains a minimal network. Its temporal computational cost is
cubic in the number of time points.

2.2 Towards Automatically Reasoning on a Diet

Reasoning on DRVs. In a diet it is necessary to consider parameters such
as the total energy requirements and the specific required amount of nutrients
and macronutrients such as proteins, carbohydrates and lipids. In particular in
the literature it is possible to find systems of DRVs that are recommended to be
followed for significant amounts of time. In the running example, without loss of
generality we refer to the Italian values [1]. Such values have to be customized
for the specific patients according to their characteristics. In particular, from
82 L. Anselma and A. Mazzei

Fig. 2. Example of DRVs for a week represented as STP (for space constraints the
constraints for the meals are not represented).

weight, gender and age, using Schoﬁeld equation [24], it is possible to estimate
the basal metabolic rate; for example a 40-year-old male who is 1.80 m tall
and weighs 71.3 kg has an estimated basal metabolic rate of 1690 kcal/day.
Such value is then adjusted [1] by taking into account the energy expenditure
related to the physical activity of the individual; for example a sedentary lifestyle
corresponds to a physical activity level of 1.45, thus, in the example, since the
physical activity level is a multiplicative factor, the person has a total energy
requirement of 2450 kcal/day. Moreover, it is recommended [1] that such energy
is provided by the appropriate amount of the diﬀerent macronutrients, e.g., 260
kcal/day of proteins, 735 kcal/day of lipids and 1455 kcal/day of carbohydrates.
In this section we focus on the total energy requirement; the macronutrients can
be dealt with separately in the same way.
We represent the DRVs as STPs; more precisely, we use an STP con-
straint to represent – instead of temporal distance between temporal points
– the admissible DRVs. Thus, e.g., a recommendation to eat a lunch of min-
imum 500 kcal and maximum 600 kcal is represented by the STP constraint
500 ≤ lunchE − lunchS ≤ 600, where lunchE and lunchS represent the end and
the start of the lunch, respectively.
Furthermore, we exploit the STP framework to allow a user to make small
deviations with regard to the “ideal” diet and to know in advance what are
the consequences of such deviations on the rest of the diet. Thus, we impose
less strict constraints over the shortest periods (i.e., days or meals) and stricter
constraints over the longest periods (i.e., months, weeks). For example the rec-
ommended energy requirement of 2450 kcal/day, considered over a week, results
in a constraint such as 2450 · 7 ≤ weekE − weekS ≤ 2450 · 7 and for the
single days we allow the user to set, e.g., a deviation of 10%, thus result-
ing in the constraints 2450 − 10% ≤ SundayE − SundayS ≤ 2450 + 10%,
. . . , 2450 − 10% ≤ SaturdayE − SaturdayS ≤ 2450 + 10% (see Fig. 2). For
single meals we can further relax the constraints: for example the user can
decide to split the energy assumption for the day among the meals (e.g., 20%
for breakfast and 40% for lunch and dinner) and to further relax the con-
straints (e.g., of 30%), thus resulting in a constraint, e.g., 2450 · 20% − 30% ≤
Sunday breakf astE − Sunday breakf astS ≤ 2450 · 20% + 30%.

Representing and Reasoning on the Diet and the Food. Along these
lines, it is possible to represent the dietary recommendations for a speciﬁc user.
However, we wish to support such a user into taking advantage of the information
regarding the actual meals s/he consumes. In this way, the user can learn what
Towards Diet Management with Automatic Reasoning 83

Fig. 3. Example of DRVs represented as STP.

are the consequences on his/her diet of eating a specific dish and s/he could
use such information in order to make informed decisions about the current
or future meals. Therefore it is necessary to “integrate” the information about
the eaten dishes with the dietary recommendations. We devise a system where
the user inputs the data about the food s/he is eating using a mobile app where
the input is possibly supported by reading a QR code and s/he can also specify
the amount of food s/he has eaten. Thus, we allow some imprecision due to
possible differences in the portions (in fact, the actual amount of food in a
portion is not always the same and, furthermore, a user may not eat a whole
portion) or in the composition of the dish [6]. We support such feature by using
STP constraints also for representing the nutritional values of the eaten food.
The dietary recommendations can be considered constraints on classes, which
can be instantiated several times when the user assumes his/her meals. Thus, the
problem of checking whether a meal satisfies the constraints of the dietary rec-
ommendations corresponds to checking whether the constraints of the instances
satisfy the constraints of the classes. This problem has been dealt with in [25] and
[2]. In these works the authors have considered the problem of “inheriting” the
temporal constraints from classes of events to instances of events in the context of
the STP framework, also taking into account problems deriving from correlation
between events and from observability. In our setting we have a simpler setting,
where correlation is known and observability is complete (even if possibly impre-
cise). Thus, we generate a new, provisional, STP where we add the new STP
constraints deriving from the meals that the user has consumed: the added con-
straints possibly restrict the values allowed by the constraints in the STP. Then
we propagate the constraints in such a new STP and we determine whether the
new constraints are consistent and we obtain the new minimal network with the
implied relations. For example, let us suppose that the user on Sunday, Monday
and Tuesday had an actual intake of 2690 kcal for each day. This corresponds
to adding to the STP the new constraints 2690 ≤ SundayE − SundayS ≤ 2690,
. . . , 2690 ≤ T uesdayE − T uesdayS ≤ 2690. Then, propagating the constraints
of the new STP (see Fig. 3), we discover that (i) the STP is consistent and thus
the intake is compatible with the diet and (ii) on each remaining day of the week
the user has to assume a minimum of 2205 kcal and a maximum of 2465 kcal.

2.3 Interpretation of STP

Although the information deriving from the STP is complete (and correct), in
order to show to the user a meaningful feedback and to make it possible to
interface the automatic reasoning module with the NLG module, it is useful
84 L. Anselma and A. Mazzei

to interpret the results of the STP. In particular we wish to provide the user
with a user-friendly information not limited to a harsh “consistent/inconsistent”
answer regarding the adequacy of a dish with regard to her/his diet. Therefore
we consider the case where the user proposes to our system a dish, we obtain
its nutritional values, we translate them, along with the user’s diet and past
meals, into STP and, by propagating the constraints, we obtain the minimal
network. By taking into account a single macronutrient (carbohydrates, lipids
or proteins), the resulting STP allows us to classify the macronutrient in the
proposed dish in one of the following five cases: permanently inconsistent (I.1),
occasionally inconsistent (I.2), consistent and not balanced (C.1), consistent and
well-balanced (C.2) and consistent and perfectly balanced (C.3).
In the cases I.1 and I.2 the value of the macronutrient is inconsistent. In case
I.1 the value for the nutrient is inconsistent with the DRVs as represented in
the user’s diet. The dish cannot be accepted even independently of the other
food s/he may possibly eat. This case is detected by considering whether the
macronutrient violates a constraint on classes. In case I.2 the dish per se does
not violate the DRVs, but – considering the past meals s/he has eaten – it would
preclude to be consistent with the diet. Thus, it is inconsistent now, but it could
become possible to choose it in the future, e.g., next week or month. This case
is detected by determining whether the macronutrient, despite it satisfies the
constraints on the classes, is inconsistent with the propagated inherited STP.
In the cases C.1, C.2 and C.3 the value of the macronutrient is consistent
with the diet, also taking into account the other dishes that the user has already
eaten. It is possible to detect that the dish is consistent by exploiting the minimal
network of the STP: if the value of the macronutrient is included between the
lower and upper bounds of the relative constraint, then we are guaranteed that
the STP is consistent and that the dish is consistent with the diet. This can
be proven by using the property that in a minimal network every tuple in a
constraint can be extended to a solution [19]. A consistent but not balanced
choice of a dish will have consequences on the rest of the user’s diet because the
user will have to “recover” from it. Thus we distinguish three cases depending on
the level of the adequacy of the value of the macronutrient to the diet. In order
to discriminate between the cases C.1, C.2 and C.3, we consider how the value
of the macronutrient stacks upon the allowed range represented in the related
STP constraint. We assume that the mean value is the “ideal” value according
to the DRVs and we consider two parametric user-adjustable thresholds relative
to the mean: according to the deviation with respect to the mean we classify the
macronutrient as not balanced (C.1), well balanced (C.2) or perfectly balanced
(C.3) (see Fig. 4). In particular, we distinguish between lack or excess of a specific
macronutrient for a dish: if a macronutrient is lacking (in excess) with regard to
the ideal value, we tag the dish with the keyword IPO (IPER). This information
will be exploited in the generation of the messages.
Towards Diet Management with Automatic Reasoning 85

Fig. 4. Classiﬁcation of a consistent value of a macronutrient given the minimum and

maximum value of an STP constraint in a minimal network.

3 Persuasive NLG for Diet

A number of works considered the problem of NLG for presenting the results
of automated reasoning to a user, especially in the case of expert systems for
reasoning, e.g., [4,17]. In order to convert the five possible kinds of output of
the STP reasoner (see Section 2.3) in messages, we adopted a simple template-
based generator that produces five kinds of messages designed for persuasion.
We first describe the generator (Section 3.1) and later we describe the theories
that motivated the design of our messages (Section 3.2).

3.1 A Simple Template-Based Generation Architecture

The standard architecture for NLG models generation is a pipeline composed by
three distinct modules/processes: the document planning, the micro-planning,
and the surface realization [22]. Each one of these modules addresses distinct
issues, in particular: (1) In the document planning one decides what to say,
that is which information contents will be communicated; (2) In the micro-
planning, the focus is on the design of a number of features that are related to
the information contents as well as to the specific language, as the choice of the
words; (3) In the surface realization, sentences are finally generated on the base
of the decision taken by the previous modules and by considering the constraints
related to the language specific word order and inflections.
For our system, the contents of information that have to be communicated,
i.e. the document planning, are produced by the reasoner. Moreover, with the aim
to easily implement in the messages the prescriptions of the persuasion theories,
we adopted the simplest architecture for NLG. We treat sentence planning and
surface realization in one single module by adopting a template-based approach.
We use five templates to communicate the five cases of output of the reasoner:
in Table 1 we report the cases obtained by the interpretation of the output of
the reasoner (column C), the direction of the deviation (column D), the Italian
templates and their rough English translation.
Indeed, the final message is obtained by modifying the templates on the basis
of the specific values for the motivation of inconsistency that can be extracted by
86 L. Anselma and A. Mazzei

Table 1. The persuasive message templates: the underline denotes the variable parts of
the template. The column C contains the classiﬁcation produced by the STP reasoner,
while the column D contains the direction of the deviation: IPO (IPER) stands for
the information that the dish is poor (rich) in the value of the macronutrient.

C D Message Template Translation

Questo piatto non va aﬀatto bene, This dish is not good at all,
I.1 IPO
contiene davvero pochissime proteine! it’s too poor in proteins!
Ora non puoi mangiare questo piatto You cannot have this dish now
perché è poco proteico. Ma se domenica because it doesn’t provide enough proteins,
I.2 IPO
mangi un bel piatto di fagioli allora lunedı̀ but if you eat a nice dish of beans on
potrai mangiarlo. Sunday, you can have it on Monday.
Va bene mangiare le patatine ma nei It’s OK to eat chips but in the next
C.1 IPO
prossimi giorni dovrai mangiare più proteine. days you’ll have to eat more proteins.
Questo piatto va bene, è solo un po’ scarso This dish is OK, but it’s a bit poor
C.2 IPO di proteine. Nei prossimi giorni anche in proteins. In the next days
fagioli però! :) you’ll need beans too! :)
Ottima scelta! Questo piatto è perfetto Great choice! This dish is perfect for
C.3 -
per la tua dieta :) your diet :)

interpreting the output of the reasoner (cf. Section 2.3) and possible suggestions
that can guide the choices of the user in the next days. The suggestions can be
obtained by a simple table that couples the excess (deficiency) of a macronutrient
with a dish that could compensate this excess (deficiency). In particular, for the
reasoner’s outputs I.1, I.2, C.1 and C.2, we need to distinguish the case of a dish
poor in a macronutrient (IPO in Table 1) with respect to the case of a dish rich
in a macronutrient (IPER). If the dish is classified as IPO (IPER), we insert
into the message a suggestion to consume in the next days a dish that contains
a big (small) quantity of that specific macronutrient.
For sake of simplicity we do not describe the algorithm used in the generation
module to combine the three distinct outputs of the reasoner on the three distinct
macronutrients (i.e. proteins, lipids and carbohydrates). In short, the messages
corresponding to each macronutrient need to be aggregated into a single message.
A number of constraints related to coordination and relative clauses need to
be accounted for [22]. In the next Section we describe the three theories of
persuasion that influenced and motivated the design of the messages.

3.2 Designing Persuasive Messages in the Diet Domain

A number of theories on the design of persuasive textual and multimedial mes-

sages have been proposed in the last years [7,10–12,14,16,21,23]. Most of these
theories can be split in two narrow categories. The first category includes the the-
ories that approach the persuasion from a practical and empirical point of view,
by using strategies and methods typical of the psychology and of the interaction
design. The second category includes the theories that approach the persuasion
from a theoretical point of view, by using strategies and methods typical of
strong artificial intelligence and cognitive science. We discuss the three theories
that mostly influenced the design of the messages in relation to our project.
Towards Diet Management with Automatic Reasoning 87

CAPTology (Computers As Persuasive Technologies) is the study of com-

puters as persuasive technologies, i.e. “[. . . ] the design, research, and analysis
of interactive computing products (computers, mobile phones, websites, wireless
technologies, mobile applications, video games, etc.) created for the purpose of
changing peoples attitudes or behaviors” [10]. The starting point of Fogg’s the-
ory is that the computer is perceived by users in three coexisting forms, Tool-
Media-SocialActor, and each one of these three forms can exercise some forms of
persuasion. As a tool, the computer can enhance the capabilities of a user: our
system calculates the nutritional contents of the food, and so it enhances the
ability to correctly judge the compatibility of a dish with a diet. As a media the
computer “provides experience”: in our system, the human memory is enhanced
by the reasoner, which indirectly reminds her/him what s/he ate in the last
days. As a social actor the computer creates an empathic relationship with the
user reminding her/him the “social rules”: in our system the messages guide
the user towards the choice of a balanced meal, convincing her/him to follow the
diet that her/himself decided. Fogg recently defined a number of rules to design
effectively persuasive systems [11], and some of these rules have modeled our
messages. For example, the rule: Learn what is Preventing the target behavior,
proposes to classify an “uncorrect” behavior along three major lines: (1) lack
of motivation, (2) lack of ability, (3) lack of a well-timed trigger to perform the
behavior. In our system all the three components play a role. Indeed, a user
follows a bad diet because (i) s/he is not enough motivated, (ii) because s/he
does not know that the dish is in contrast to her/his diet (iii) because s/he does
not have the right stimulus at the time of choosing a dish. The reasoning and
the generated messages are working on the last two components: the reasoner
enhances the user’s abilities allowing her/him to have the relevant information
at the right time, the generation system creates a stimulus (the message) when
it is really necessary, kairos in the Fogg’s terminology, i.e. when the user has to
decide what to eat.
Another approach to computational persuasion is strongly related to the
concept of tailoring, i.e. the adaptation of the output of the computation to a
specific user. A pioneering work for tailoring in the field of NLG is described in
[21]: the authors have designed an NLG system, called STOP, to build a letter
that induces a specific reader to quit smoking. The key component of STOP
is the individuation of a user type by using the answers given to a question-
naire. In this way, one can build a specific user profile. By using this profile
the system generates a tailored letter on the basis of a template. This sim-
ple approach to persuasion unfortunately did not yield the desired results. The
experimental protocol has shown, through the use of a control group, that the
enhancement given by customization was negligible. At this stage, we do not
adopt in our system the ability to create custom messages for a specific user,
but, as evidenced by similar experiences, customization of the feedback could
improve the performances of the system. A system for tailoring that we partially
adopt in our messages is described in [16], where a series of messages are sent via
SMS to reduce the consumption of snacks. In this case, the messages adopt six
88 L. Anselma and A. Mazzei

patterns/templates for persuasion derived from the general theory of persua-

sion of Cialdini [7]. The six patterns are: (1) Reciprocity: people feel obligated
to return a favor, (2) Scarcity: people will value scarce products, (3) Authority:
people value the opinion of experts, (4) Consistency: people do as they said they
would, (5) Consensus: people do as other people do, (6) Liking: we say yes to
people we like. Compared to this classification, all the messages of our generator
belong to the patterns of authority and consistency.
One approach to persuasion strictly related to strong artificial intelligence
and cognitive science is based on the concept of the computer as an intelligent
agent [12,14,23]. The system behaves as a real autonomous entity and it is often
modeled as a BDI (Beliefs, Desires, Intentions) agent, whose main purpose is to
persuade the user to behave in a specific way. This approach has been adopted
essentially for research purposes rather than for commercial applications. In con-
trast to the design of our NL generator, where there is a single module based on
templates, such agent-based approach allows a great modularity in the design
of a persuasive system. We describe some issues of these systems in order to
understand the deficiencies of our simple approach. Hovy defines a number of
heuristic rules that constrain the “argument” defined in the process of sentence
planning. For example: Adverbial stress words can only be used to enhance or
mitigate expressions that carry some affect already [14]. In a similar way, De
Rosis and Grasso define a number of heuristic rules on the argument structure,
to lexically enhance or mitigate a message [23]. The use of certain adverbs, as
little bit (poco), very (molto), really (davvero), are used to enhance some specific
argument structures. Indeed, we adopt this strategy by using this kind of adverbs
in the messages I.1, I.2, C.1 and C.2. Guerini et al. define a detailed taxonomy
of persuasion strategies that a system can adopt and relate the strategies to the
theory of argumentation [12]. Moreover, they define an architecture for persua-
sion that follows the standard modularization of NLG systems. This allows for
a very rich persuasive action, which begins from the planning of a rhetorical
structure in the content planning. Compared to the taxonomy of the proposed
strategies, we can see that our messages belong to one single category, called
action inducement/goal balance/positive consequence. This strategy induces an
action (to choose a dish), by using the user’s goal (a balanced diet) and by using
the benefits deriving from this goal.
Finally, note that in the messages C.2 and C.3 we used emoticons. Indeed,
some studies showed that the use of emoticons in written texts can increase the
communicative strength of a message. For example Dirke shows that the use of
emoticons sets a tone of friendship to the message type and can increase the
positive value of the message [9].

4 Conclusions and Related Works

There are a number of academic studies that are related to our project, among
them [3,15], and there is also a great number of smartphone applications related
to nutrition, e.g. DailyBurn, Lose It!, MyNetDiary, A low GI Diet, Weight-
Watchers. However, our dietary system presents two elements of novelty: (1) the
Towards Diet Management with Automatic Reasoning 89

use of automatic reasoning as a tool for verifying the compatibility of a speciﬁc

recipe with a specific diet and for determining the consequences of the choice of
a specific dish and (2) the use of NLG techniques to produce the answer.
Some authors have applied Operational Research techniques to tackle the
problem of planning a diet (see the survey in [18] or the more recent paper [5]).
These techniques are based on the simplex method for solving linear program-
ming problems. However these approaches are meant to plan an entire diet and
they do not support the user in choosing a dish and in investigating the conse-
quences of her/his choice. In [6] the authors have tackled the problem of assessing
the compatibility of a single meal to a norm and of suggesting to the user some
actions to balance the meal (e.g. removing/adding food); they employed fuzzy
arithmetic to represent imprecision/uncertainty in quantity and composition of
food and heuristic search for determining the actions to be suggested. They did
not consider the problem of globally balancing the meals.
In the next future, we intend to improve the NLG module for tailoring. In
particular, we want 1) to build a corpus of sentences that a professional dieti-
cian would use to persuade users towards correct dish choices, 2) to separate
microplanning from realization, 3) to classify the users in types in order to per-
sonalize the messages on the basis, for instance, of the age. Finally, we plan to
experiment the system in two settings. First we intend to design a simulation
that includes 1) a database of real recipes, 2) a user model that allows to test
the persuasion efficacy and 3) a baseline built rigidly sticking to DRVs. Second,
we intend to test the system with a focus group in a clinical setting, in partic-
ular with patients affected by essential obesity. In this setting we imagine that
the system could be used also by human dieticians for the supervision of their
patients.

References
1. LARN - Livelli di Assunzione di Riferimento di Nutrienti ed energia per la popo-
lazione italiana - IV Revisione. SICS Editore, Milan (2014)
2. Anselma, L., Terenziani, P., Montani, S., Bottrighi, A.: Towards a comprehensive
treatment of repetitions, periodicity and temporal constraints in clinical guidelines.
Artiﬁcial Intelligence in Medicine 38(2), 171–195 (2006)
3. Balintfy, J.L.: Menu planning by computer. Commun. ACM 7(4), 255–259 (1964)
4. Barzilay, R., Mccullough, D., Rambow, O., Decristofaro, J., Korelsky, T., Lavoie, B.
Inc, C.: A new approach to expert system explanations. In: 9th International Work-
shop on Natural Language Generation, pp. 78–87 (1998)
5. Bas, E.: A robust optimization approach to diet problem with overall glycemic
load as objective function. Applied Mathematical Modelling 38(19–20), 4926–4940
(2014)
6. Buisson, J.C.: Nutri-educ, a nutrition software application for balancing meals,
using fuzzy arithmetic and heuristic search algorithms. Artif. Intell. Med. 42(3),
213–227 (2008)
7. Cialdini, R.B.: Inﬂuence: science and practice. Pearson Education, Boston (2009)
8. Dechter, R., Meiri, I., Pearl, J.: Temporal constraint networks. Artif. Intell. 49(1–
3), 61–95 (1991)
90 L. Anselma and A. Mazzei

9. Derks, D., Bos, A.E.R., von Grumbkow, J.: Emoticons in computer-mediated com-
munication: Social motives and social context. Cyberpsy., Behavior, and Soc. Net-
working 11(1), 99–101 (2008)
10. Fogg, B.: Persuasive Technology: Using computers to change what we think and
do. Morgan Kaufmann Publishers, Elsevier, San Francisco (2002)
11. Fogg, B.: The new rules of persuasion (2009). http://captology.stanford.edu/
resources/article-new-rules-of-persuasion.html
12. Guerini, M., Stock, O., Zancanaro, M.: A taxonomy of strategies for multimodal
persuasive message generation. Applied Artificial Intelligence 21(2), 99–136 (2007)
13. Holzinger, A., Dorner, S., Födinger, M., Valdez, A.C., Ziefle, M.: Chances of
increasing youth health awareness through mobile wellness applications. In: Leit-
ner, G., Hitz, M., Holzinger, A. (eds.) USAB 2010. LNCS, vol. 6389, pp. 71–81.
Springer, Heidelberg (2010)
14. Hovy, E.H.: Generating Natural Language Under Pragmatic Constraints. Lawrence
Erlbaum, Hillsdale (1988)
15. Iizuka, K., Okawada, T., Matsuyama, K., Kurihashi, S., Iizuka, Y.: Food menu
selection support system: considering constraint conditions for safe dietary life. In:
Proceedings of the ACM Multimedia 2012 Workshop on Multimedia for Cooking
and Eating Activities, CEA 2012, pp. 53–58. ACM, New York (2012)
16. Kaptein, M., de Ruyter, B.E.R., Markopoulos, P., Aarts, E.H.L.: Adaptive persua-
sive systems: A study of tailored persuasive text messages to reduce snacking. TiiS
2(2), 10 (2012)
17. Lacave, C., Diez, F.J.: A review of explanation methods for heuristic expert sys-
tems. Knowl. Eng. Rev. 19(2), 133–146 (2004)
18. Lancaster, L.M.: The history of the application of mathematical programming to
menu planning. European Journal of Operational Research 57(3), 339–347 (1992)
19. Montanari, U.: Networks of constraints: Fundamental properties and applications
to picture processing. Information Sciences 7, 95–132 (1974)
20. Nishida, C., Uauy, R., Kumanyika, S., Shetty, P.: The joint WHO/FAO expert
consultation on diet, nutrition and the prevention of chronic diseases: process,
product and policy implications. Public Health Nutrition 7, 245–250 (2004)
21. Reiter, E., Robertson, R., Osman, L.: Lessons from a Failure: Generating Tailored
Smoking Cessation Letters. Artificial Intelligence 144, 41–58 (2003)
22. Reiter, E., Dale, R.: Building Natural Language Generation Systems. Cambridge
University Press, New York (2000)
23. de Rosis, F., Grasso, F.: Affective natural language generation. In: Paiva, A. (ed.)
Affective Interactions. LNCS, vol. 1814, pp. 204–218. Springer, Heidelberg (2000)
24. Schofield, W.N.: Predicting basal metabolic rate, new standards and review of
previous work. Human Nutrition: Clinical Nutrition 39C, 5–41 (1985)
25. Terenziani, P., Anselma, L.: A knowledge server for reasoning about temporal con-
straints between classes and instances of events. International Journal of Intelligent
Systems 19(10), 919–947 (2004)
26. World Health Organization: Global strategy on diet, physical activity and health
(WHA57.17). In: 75th World Health Assembly (2004)
Predicting Within-24h Visualisation of Hospital
Clinical Reports Using Bayesian Networks

Pedro Pereira Rodrigues1,2,3(B) , Cristiano Inácio Lemes4 ,

Cláudia Camila Dias1,2 , and Ricardo Cruz-Correia1,2
1
CINTESIS - Centre for Health Technology and Services Research,
Rua Dr. Plácido Costa, s/n, 4200-450 Porto, Portugal
{pprodrigues,camila,rcorreia}@med.up.pt
2
CIDES-FMUP, Faculty of Medicine of the University of Porto,
Alameda Prof. Hernâni Monteiro, 4200-319 Porto, Portugal
3
LIAAD - INESC TEC, Artiﬁcial Intelligence and Decision Support Laboratory,
Rua Dr. Roberto Frias, 4200-465 Porto, Portugal
4
ICMC, Institute of Mathematical and Computer Sciences, University of São Paulo,
Avenida Trabalhador São-carlense, 400, São Carlos 13566-590, Brazil
[email protected]

Abstract. Clinical record integration and visualisation is one of the

most important abilities of modern health information systems (HIS).
Its use on clinical encounters plays a relevant role in the efficacy and effi-
ciency of health care. One solution is to consider a virtual patient record
(VPR), created by integrating all clinical records, which must collect doc-
uments from distributed departmental HIS. However, the amount of data
currently being produced, stored and used in these settings is stressing
information technology infrastructure: integrated VPR of central hospi-
tals may gather millions of clinical documents, so accessing data becomes
an issue. Our vision is that, making clinical reports to be stored either in
primary (fast) or secondary (slower) storage devices according to their
likelihood of visualisation can help manage the workload of these sys-
tems. The aim of this work was to develop a model that predicts the
probability of visualisation, within 24h after production, of each clinical
report in the VPR, so that reports less likely to be visualised in the fol-
lowing 24 hours can be stored in secondary devices. We studied log data
from an existing virtual patient record (n=4975 reports) with informa-
tion on report creation and report first-time visualisation dates, along
with contextual information. Bayesian network classifiers were built and
compared with logistic regression, revealing high discriminating power
(AUC around 90%) and accuracy in predicting whether a report is going
to be accessed in the 24 hours after creation.

Keywords: Bayesian networks · Health services · Virtual patient

records

1 Introduction
Evidence-based medicine relies on three information sources: patient records,
published evidence and the patient itself [25]. Even though great improvements

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 91–102, 2015.
DOI: 10.1007/978-3-319-23485-4 9
92 P.P. Rodrigues et al.

and developments have been made over the years, on-demand access to clinical
information is still inadequate in many settings, leading to less efficiency as a
result of a duplication of effort, excess costs and adverse events [10]. Further-
more, a lot of distinct technological solutions coexist to integrate patient data,
using different standards and data architectures which may lead to difficulties
in further interoperability [7]. Nonetheless, a lot of patient information is now
accessible to health-care professionals at the point of care. But, in some cases, the
amount of information is becoming too large to be readily handled by humans or
to be efficiently managed by traditional storage algorithms. As more and more
patient information is stored, it is very important to efficiently select which one
is more likely to be useful [8].
The identification of clinically relevant information should enable an improve-
ment both in user interface design and in data management. However, it is dif-
ficult to identify what information is important in daily clinical care, and what
is used only occasionally. The main problem addressed here is how to estimate
the relevance of health care information in order to anticipate its usefulness at
a specific point of care. In particular, we want to estimate the probability of
a piece of information being accessed during a certain time interval (e.g. first
24 hours after creation), taking into account the type of data and the context
where it was generated and to use this probability to prioritise the information
(e.g. assigning clinical reports for secondary storage archiving or primary storage
access).
Next section presents background knowledge on electronic access to clinical
data (2.1), assessment of clinical data relevance (2.2) and machine learning in
health care research (2.3), setting the aim of this work (2.4). Then, section 3
presents our methodology to data processing, model learning, and prediction of
within-24h visualisation of clinical data, which results are exposed in section 4.
Finally, section 5 finalises the exposition with discussion and future directions.

2 Background

The practice of medicine has been described as being dominated by how well
information is collected, processed, retrieved, and communicated [2].

2.1 Electronic Access to Clinical Data

Currently in most hospitals there are great quantities of stored digital data
regarding patients, in administrative, clinical, lab or imaging systems. Although
it is widely accepted that full access to integrated electronic health records
(EHR) and instant access to up-to-date medical knowledge signiﬁcantly reduces
faulty decision making resulting from lack of information [9], there is still very
little evidence that life-long EHR improve patient care [4]. Furthermore, there
use is often disregarded. For example, studies have indicated that data generated
before an emergency visit are accessed often, but by no means in a majority of
Predicting Within-24h Visualisation of Hospital Clinical Reports 93

times (5% to 20% of the encounters), even when the user was notified of the
availability of such data [12].
One usual solution for data integration in hospitals is to consider a virtual
patient record (VPR), created by integrating all clinical records, which must
collect documents from distributed departmental HIS [3]. Integrated VPR of
central hospitals may gather millions of clinical documents, so accessing data
becomes an issue. A paradigmatic example of this burden to HIS is the amount of
digital data produced in the medical imaging departments, which has increased
rapidly in recent years due mainly to a greater use of additional diagnostic
procedures, and an increase in the quality of the examinations. The management
of information in these systems is usually implemented using Hierarchical Storage
Management (HSM) solutions. This type of solution enables the implementation
of various layers which use different technologies with different speeds of access,
corresponding to different associated costs. However, the solutions which are
currently implemented use simple rules for information management, based on
variables such as the time elapsed since the last access or the date of creation of
information, not taking into account the likely relevance of information in the
clinical environment [6].
In a quest to prioritise the data that should be readily available in HIS, several
pilot studies have been endured to analyse for how long clinical documents are
useful for health professionals in a hospital environment, bearing in mind doc-
ument content and the context of the information request. Globally, the results
show that some clinical reports are still used one year after creation, regardless
of the context in which they were created, although significant differences existed
in reports created during distinct encounter types [8]. Other results show that
half of all visualisations might be of reports more than 2 years-old [20], although
this visualisation distribution also varies across clinical department and time of
production [21]. Thus, usage of patients past information (data from previous
hospital encounters), varied significantly according to the setting of health care
and content, and is, therefore, not easy to prioritise.

2.2 Assessment of Clinical Data Relevance

As previously noted, and especially in critical and acute care settings, the age
of data is one of the factors often used to assess data relevance, making new
information more relevant to the current search. However, studies have shown
that some clinical reports are still used after one year regardless of the context
in which they were created, although significant differences exist in reports cre-
ated in distinct encounter types and document content, which contradicts the
definition of old data used in previous studies. Hence the need to define better
rules for recommending documents in encounters.
Classifying the relevance of information based only on the time elapsed since
the date of acquisition is clearly inefficient. It is expected that the need to consult
an examination at a given time will be dependent on several factors beyond the
date of the examination, such as type of examination and the patient’s pathology.
Thus, a system that uses more factors to identify the relevance of information at
94 P.P. Rodrigues et al.

a given time would be more efficient in managing the information that is stored
in fast memory and slow memory. A recent study from the same group addressed
other possibly relevant factors besides document age, including type of encounter
(i.e. emergency room, inpatient care, or outpatient consult), department where
the report was generated (e.g. gynaecology or internal medicine) and even type
of report in each department, but the possibility of modelling visualisations with
survival analysis proved to be extremely difficult [21].
Nonetheless, if we could, for instance, discriminate solely between documents
that will be needed in the next 24 hours from the remaining, we could efficiently
decide which ones to store in a faster-accessible memory device. Furthermore,
we could then rank documents according to their probability of visualisation in
order to adjust the graphical user interface of the the VPR, to improve system’s
usability. By applying regression methods or other modelling techniques it is
possible to identify which factors are associated with the usage or relevance of
patient data items. These factors and associations can then be used to estimate
data relevance in a specific future time interval.

2.3 Machine Learning in Healthcare Research

The definition of clinical decision support systems (most of the times based on
expert systems) is currently a major topic since it may help the diagnosis, treat-
ment selection, prognosis of rate of mortality, prognosis of quality of life, etc.
They can even be used to administrative tasks like the one addressed by this
work. However, the complicated nature of real-world biomedical data has made
it necessary to look beyond traditional biostatistics [14] without loosing the
necessary formality. For example, naive Bayesian approaches are closely related
to logistic regression [22]. Hence, such systems could be implemented applying
methods of machine learning [16], since new computational techniques are bet-
ter at detecting patterns hidden in biomedical data, and can better represent
and manipulate uncertainties [22]. In fact, the application of data mining and
machine learning techniques to medical knowledge discovery tasks is now a grow-
ing research area. These techniques vary widely and are based on data-driven
conceptualisations, model-based definitions or on a combination of data-based
knowledge with human-expert knowledge [14].
Bayesian approaches have an extreme importance in these problems as they
provide a quantitative perspective and have been successfully applied in health
care domains [15]. One of their strengths is that Bayesian statistical methods
allow taking into account prior knowledge when analysing data, turning the data
analysis into a process of updating that prior knowledge with biomedical and
health-care evidence [14]. However, only after the 90’s we may find evidence of
a large interest on these methods, namely on Bayesian networks, which offer a
general and versatile approach to capturing and reasoning with uncertainty in
medicine and health care [15]. They describe the distribution of probabilities of
one set of variables, making possible a two-fold analysis: a qualitative model and
a quantitative model, presenting two types of information for each variable.
Predicting Within-24h Visualisation of Hospital Clinical Reports 95

On a general basis, a Bayesian network represents a joint distribution of one

set of variables, specifying the assumption of independence between them, with
the inter-dependence between variables being represented by a directed acyclic
graph. Each variable is represented by a node in the graph, and is dependent of
the set of variables represented by its ascendant nodes; a node X is a ascendant
of another node Y if exists a direct arc from X to Y [16]. To give more representa-
tional power to the relations represented by the arcs of the graph, it is necessary
to associate values to it. The matrix of conditional probability is given for each
variable, describing the distribution of probabilities of each variable given its
ascendant variables.
After the qualitative and quantitative models are constructed, the next step,
and one of the most important, is how to calculate the new probabilities when
new evidence is introduced in the network. This process is called inference and
works as follows. Each variable has a ﬁnite number of categories greater than
or equal to two. A node is observed when there is knowledge about the state
of that variable. The observed variables have a huge importance because with
conditional probabilities they deﬁne the prior probabilities of the non observed
variables. With the joint probabilities we can calculate the marginal probabilities
of each unobserved variable, adding for all categories the probabilities that the
variable is in the desired state [15].

2.4 Aim
The aim of this work is the development of a decision support model for dis-
criminating between reports that are going to be useful in the next 24 hours and
reports which can be otherwise stored in slower storage devices, since they will
not be accessed in the next 24 hours, thus improving performance of the entire
virtual patient record system.

3 Data and Methods

Between May 2003 and May 2004, a virtual patient record (VPR) was designed
and implemented at Hospital S. João, a university hospital with over 1350 beds.
An agent-based platform, Multi-Agent System for Integration of Data (MAID),
ensures the communication among various hospital information systems (see [24]
for a description of the system). Clinical documents are retrieved from clinical
department information systems (DIS) and stored into a central repository in a
browser friendly format. This is done by regularly scanning 14 DIS using different
types of agents [17]:
– For each department, a List Agent regularly retrieves report lists from the
DIS, with report file references and meta-data, and stores them in the VPR
repository.
– The Balancer Agent of that department retrieves the report file references
and distributes them to the departmental File Agents.
– File Agents retrieve the actual report files.
96 P.P. Rodrigues et al.

As the amount of information available to the agents increases throughout time,

there is also an increase in the difficulty of managing that information by humans.
Not rarely, a request for a report arrives (after the List Agent has published the
existence of that report) before the File Agent was able to retrieve the document.
In this cases, an Express Agent is called to retrieve the file, which stresses the
entire system’s workload, otherwise balanced.
To enable a quantitative analysis (e.g. the likelihood of document access),
all actions by users of the VPR are recorded in the log file. Intentionally and
originally created and kept for audit purposes, these logs can provide very inter-
esting insights into the information needs of health-care professionals in some
particular situations, although most of the times the quality of these logs is not
delivering [5].

3.1 Studied Variables and Outcomes

Data was collected from from the virtual patient record (VPR) with information
on report creation and report first-time visualisation dates, along with contextual
information. This study focuses on a sample of 5000 reports (2.7% of the entire
data for the studied year) and corresponding visualisations, stored in the VPR
in 2010. The data used in this study was collected using Oracle SQL Developer
from the VPR patient database, containing patient’s identification and references
to the clinical records. We developed models with seven explanatory variables,
including patient data (age and sex), context data (department and type of
encounter) and creation time data (hour, day-of-week, daily period), defined as
follows. The main outcome of this study was within-24h visualisation of reports.
AgeCat (cat) discretised in decades;
Sex (binary);
Department (cat);
EncType (cat) one of outpatient consult, inpatient care, emergency or other;
Hour (cat) truncated from creation time;
DoW (cat) one of Sun, Mon, Tue, Wed, Thu, Fri or Sat;
Period (cat) one of morning (Hour=7-12), afternoon (13-18), night (19-24) or
dawn (1-6);
Visual24h (binary) target outcome, whether the report has been visualised in
the first 24 hours after creation or not.

3.2 Model Building and Evaluation

In order to correctly fit the models, only complete cases were considered in
the analysis. Logistic regression was applied to all studied variables to predict
visualisation. Additionally, two Bayesian network classifiers were built - Naive
Bayes (NB) and Tree Augmented Naive Bayes (TAN) - which differ on the num-
ber of conditional dependencies (besides the outcome) allowed among variables
(NB: zero dependencies; TAN: one dependence), in order to choose the struc-
ture which could better represent the problem. Receiver Operating Character-
istic (ROC) curve analysis was performed to determine in-sample area under
Predicting Within-24h Visualisation of Hospital Clinical Reports 97

the curve (AUC). Furthermore, to assess the general structure and accuracy of
learned models, stratiﬁed 10-fold cross-validation was repeated 10 times, estimat-
ing accuracy, sensitivity, speciﬁcity, precision (positive and negative predictive
values) and the area under the ROC curve, for all compared models.

3.3 Software
Logistic regression was done with R package stats [18], Bayesian network struc-
ture was learned with R package bnlearn [23], Bayesian network parameters were
ﬁtted with R package gRain [11], ROC curves were computed with R package
pROC [19], and odds ratios (OR) were computed with R package epitools [1].

4 Results
A total of 4975 reports were included in the analysis. The main characteristics of
the reports are shown in Table 1, which were generated from patients with a mean
(std dev) age of 55.5 (20.5). Less than 23% of the reports were visualised in the 24
hours following their creation, which were nonetheless more from female patients
(almost 55%) with a 24h-visualisation OR=1.51 (95%CI [1.32,1.72]) for female-
patient reports. Also signiﬁcant was the context of report creation, with more
reports being created in inpatient care (44.4%) and outpatient consults (41.4%),
although compared with the latter context, 24-hour visualisations are more likely
for reports generated in inpatient care (OR=8.60 [7.04,10.59]) or in the emergency
room (OR=14.50 [11.22,18.83]). Regarding creation time, morning (OR=1.22
[1.05,1.41]), night (OR=1.82 [1.46,2.28]) and dawn (OR=2.88 [2.03,4.07]) have all
higher 24-hour visualisation likelihood than the afternoon period.

4.1 Qualitative Analysis of the Bayesian Network Model

Figure 1 presents the qualitative model for the Tree-Augmented Naive Bayes net-
work, where interesting connections can be extracted from the resulting model.
First, patient’s data features are associated. Then, creation time data and con-
text data are also strongly related. However, the most interesting feature is prob-
ably the department that created the report, since this was chosen by the algo-
rithm as ancestor of patient’s age, time of report creation and type of encounter.

4.2 In-Sample Quantitative Analysis

For a quantitative analysis, Figure 2 presents the in-sample ROC curves for
logistic regression (left), Naive Bayes (centre) and TAN (right). As expected,
increasing model complexity enhances the in-sample AUC (LR 88.6%, NB 86.9%
and TAN 90.7) but, globally, all models presented good discriminating power
towards the outcome.
98 P.P. Rodrigues et al.

Table 1. Basic characteristics of included reports: patient’s data (sex and age), report
creation context (department, encounter) and time (day of week, daily period) data.

Visualised in 24 hours
No Yes Total

Outcome, n (%) 3846 (77.3) 1129 (22.7) 4975 (100)

Female, n (%) 1716 (44.6) 619 (54.8) 2335 (46.9)

Age, μ(σ) 54.6 (19.8) 58.5 (22.4) 55.5 (20.5)
AgeCat, n (%)
[0,10[ 97 (2.5) 59 (5.2) 156 (3.1)
[10,20[ 58 (1.5) 23 (2.0) 81 (1.6)
[20,30[ 215 (5.6) 40 (3.5) 255 (5.1)
[30,40[ 583 (15.2) 115 (10.2) 698 (14.0)
[40,50[ 597 (15.5) 122 (10.8) 719 (14.5)
[50,60[ 601 (15.6) 150 (13.3) 751 (15.1)
[60,70[ 710 (18.5) 199 (17.6) 909 (18.2)
[70,80[ 554 (14.4) 207 (18.3) 761 (15.3)
[80,90[ 372 (9.67) 181 (16.0) 553 (11.1)
[90,100[ 55 (1.4) 31 (2.8) 86 (1.7)
≥100 4 (0.1) 2 (0.2) 6 (0.1)

Encounter Type, n (%)

Outpatient consult 1940 (50.4) 120 (10.6) 2060 (41.4)
Inpatient care 1442 (37.5) 768 (68.0) 2210 (44.4)
Emergency room 217 (5.6) 19 (1.7) 236 (4.7)
Other 247 (6.4) 222 (19.7) 469 (9.4)
Department, n (%)
1 76 (2.0) 11 (1.0) 87 (1.8)
2 1626 (42.3) 55 (4.9) 1681 (33.8)
3 646 (16.8) 469 (41.5) 1115 (22.4)
5 1057 (27.5) 529 (46.9) 1586 (31.9)
6 154 (4.0) 23 (2.0) 177 (3.6)
7 89 (2.3) 22 (2.0) 111 (2.2)
9 11 (0.3) 7 (0.6) 18 (0.4)
10 10 (0.3) 1 (0.1) 11 (0.2)
12 139 (3.6) 11 (1.0) 150 (3.0)
13 23 (0.6) 0 (0) 23 (0.3)
16 5 (0.1) 0 (0) 5 (0.1)
21 10 (0.3) 1 (0.1) 11 (0.2)

Day-of-Week, n (%)
Mon 728 (18.9) 303 (26.8) 1031 (20.7)
Tue 671 (17.5) 291 (25.8) 962 (19.3)
Wed 743 (19.3) 208 (18.4) 951 (19.1)
Thu 804 (20.9) 35 (3.1) 839 (16.9)
Fri 673 (17.5) 92 (8.2) 765 (15.4)
Sat 122 (3.2) 99 (8.7) 221 (4.4)
Sun 105 (2.7) 101 (9.0) 206 (4.1)
Daily Period, n (%)
Morning 1768 (46.0) 521 (46.2) 2289 (46.0)
Afternoon 1661 (43.2) 402 (35.6) 2063 (41.5)
Night 331 (8.6) 146 (13.0) 477 (9.6)
Dawn 86 (2.2) 60 (5.3) 146 (2.9)
Predicting Within-24h Visualisation of Hospital Clinical Reports 99

Visual24h

Department

AgeCat Hour EncType

Sex DoW Period

Fig. 1. Tree-Augmented Naive Bayes for predicting within 24h visualisation of clinical
reports in the virtual patient record.

Logistic Regression Naive Bayes Tree−Augmented Naive Bayes

1.0

1.0
0.8

0.8

0.8
0.6

0.6

0.6
Sensitivity

Sensitivity

AUC: 0.886 AUC: 0.869 AUC: 0.907

0.4

0.4
0.2

0.2

0.2
0.0

0.0

1.0 0.8 0.6 0.4 0.2 0.0 1.0 0.8 0.6 0.4 0.2 0.0 1.0 0.8 0.6 0.4 0.2 0.0

Specificity Specificity Specificity

Fig. 2. In-sample ROC curves for logistic regression (left), naive Bayes (centre) and
Tree-Augmented Naive Bayes (right).
100 P.P. Rodrigues et al.

Table 2. Validity assessment averaged from 10 times stratiﬁed 10-fold cross-validation

for logistic regression (LR), naive Bayes (NB) and Tree-Augmented Naive Bayes (TAN).

Measure, % [95%CI] LR NB TAN

Accuracy 82.30 [82.08,82.52] 82.43 [82.14,82.72] 82.80 [82.51,83.09]
Sensitivity 41.33 [40.50,42.16] 60.68 [59.81,61.55] 64.12 [63.36,64.89]
Speciﬁcity 94.33 [94.07,94.58] 88.81 [88.50,89.12] 88.28 [87.96,88.61]
Precision (PPV) 68.40 [67.45,69.35] 61.53 [60.80,62.25] 61.75 [61.04,62.47]
Precision (NPV) 84.57 [84.39,84.76] 88.51 [88.29,88.74] 89.35 [89.15,89.56]
AUC 87.58 [87.27,87.89] 86.37 [86.04,86.70] 85.50 [85.13,85.88]

4.3 Bayesian Network Generalisable Cross-Validation

In order to assess the ability of the models to generalise beyond the deriva-
tion cohort, cross-validation was endured. Table 2 presents the result of the
10-times-repeated stratified 10-fold cross-validation. Although the more compli-
cated model loses in terms of AUC (85% vs 87%), it brings advantages to the
precise problem of identifying reports that should be stored in secondary mem-
ory as they are less likely to be visualised in the next 24 hours, since it reveals a
negative precision of 89% vs 88% (NB) and 84% (LR). Along with this result, it
is much better at identifying reports that are going to be needed, as sensitivity
rises from 41% (LR) to 64%. Future work should consider different threshold
values for the decision boundary (here, 50%) in order to better suit the model
to the sensitivity-specificity goals of the problem a hands.

5 Concluding Remarks and Future Work

The main contribution of this work is the preliminary study for the development
of a decision support model for discriminating between reports that are going
to be useful in the next 24 hours and reports which can be safely stored in
secondary memory, since they will not be accessed in the next 24 hours.
An initial sample of clinical reports was used to derive Bayesian network
models which were then compared with a logistic regression model in terms of
in-sample discriminating power and generalisable validity with cross-validation.
The studied data was in accordance with previous works in terms of the rele-
vance that some factors may have on the likelihood of visualisation of clinical
reports, e.g. department and type of encounter that produced the report [21].
Additionally, patient data and time of report creation were also found relevant
for the global model of predicting within 24-hour visualisations.
Given that the main objective of this project is to enable a clear decision on
whether a report can safely be stored in secondary memory or not, focus should
be given to negative precision, since it represents the probability that a report
marked by the system to be stored away is, in fact, irrelevant for the present
day. The Bayesian network models achieved negative precision of around 89%,
while keeping speciﬁcity high (also around 88%).
Predicting Within-24h Visualisation of Hospital Clinical Reports 101

Future work will be concentrated in a) exploring other variables that might

inﬂuence the likelihood of visualisation of clinical reports (e.g. actual data from
the report, patient’s diagnosis, etc.); b) exploiting the maximum amount of data
from the log ﬁle of the virtual patient record (e.g. 2010 comprises of more than
184K reports); and c) inspecting the usefulness of temporal Bayesian network
models [13] for the precise problem of relevance estimation.
Overall, this study presents Bayesian network models as useful techniques
to integrate in a virtual patient record that needs to prioritise the accessible
documents, both in terms of user-interface optimisation and data management
procedures.

Acknowledgments. The authors acknowledge the help of José Hilário Almeida dur-
ing the data gathering process.

References
1. Aragon, T.J.: epitools: Epidemiology Tools (2012)
2. Barnett, O.: Computers in medicine. JAMA: the Journal of the American Medical
Association 263(19), 2631 (1990)
3. Bloice, M.D., Simonic, K.M., Holzinger, A.: On the usage of health records for
the design of virtual patients: a systematic review. BMC Medical Informatics and
Decision Making 13(1), 103 (2013)
4. Clamp, S., Keen, J.: Electronic health records: Is the evidence base any use? Med-
ical Informatics and the Internet in Medicine 32(1), 5–10 (2007)
5. Cruz-Correia, R., Boldt, I., Lapão, L., Santos-Pereira, C., Rodrigues, P.P., Ferreira,
A.M., Freitas, A.: Analysis of the quality of hospital information systems audit
trails. BMC Medical Informatics and Decision Making 13(1), 84 (2013)
6. Cruz-Correia, R., Rodrigues, P.P., Freitas, A., Almeida, F., Chen, R., Costa-Pereira,
A.: Data quality and integration issues in electronic health records. In: Hristidis, V.
(ed.) Information Discovery on Electronic Health Records, chap. 4. Data Mining and
Knowledge Discovery Series, pp. 55–95. CRC Press (2009)
7. Cruz-Correia, R.J., Vieira-Marques, P.M., Ferreira, A.M., Almeida, F.C., Wyatt,
J.C., Costa-Pereira, A.M.: Reviewing the integration of patient data: how systems
are evolving in practice to meet patient needs. BMC Medical Informatics and Deci-
sion Making 7, 14 (2007)
8. Cruz-Correia, R.J., Wyatt, J.C., Dinis-Ribeiro, M., Costa-Pereira, A.: Determi-
nants of frequency and longevity of hospital encounters’ data use. BMC Medical
Informatics and Decision Making 10, 15 (2010)
9. Dick, R., Steen, E.: The Computer-based Patient Record: An Essential Technology
for HealthCare. National Academy Press (1997)
10. Feied, C.F., Handler, J.A., Smith, M.S., Gillam, M., Kanhouwa, M., Rothenhaus,
T., Conover, K., Shannon, T.: Clinical information systems: instant ubiquitous clin-
ical data for error reduction and improved clinical outcomes. Academic emergency
medicine 11(11), 1162–1169 (2004)
11. Højsgaard, S.: Graphical independence networks with the gRain package for R.
Journal of Statistical Software 46(10) (2012)
12. Hripcsak, G., Sengupta, S., Wilcox, A., Green, R.: Emergency department access
to a longitudinal medical record. Journal of the American Medical Informatics
Association 14(2), 235–238 (2007)
102 P.P. Rodrigues et al.

13. Lappenschaar, M., Hommersom, A., Lucas, P.J.F., Lagro, J., Visscher, S., Korevaar,
J.C., Schellevis, F.G.: Multilevel temporal Bayesian networks can model longitudinal
change in multimorbidity. Journal of Clinical Epidemiology 66, 1405–1416 (2013)
14. Lucas, P.: Bayesian analysis, pattern analysis, and data mining in health care.
Current Opinion in Critical Care 10(5), 399–403 (2004)
15. Lucas, P.J.F., van der Gaag, L.C., Abu-Hanna, A.: Bayesian networks in
biomedicine and health-care. Artiﬁcial Intelligence in Medicine 30(3), 201–214
(2004)
16. Mitchell, T.M.: Machine Learning. McGraw-Hill (1997)
17. Patriarca-Almeida, J.H., Santos, B., Cruz-Correia, R.: Using a clinical document
importance estimator to optimize an agent-based clinical report retrieval system.
In: Proceedings of the 26th IEEE International Symposium on Computer-Based
Medical Systems, pp. 469–472 (2013)
18. R Core Team: R: A Language and Environment for Statistical Computing (2013)
19. Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.C., Müller, M.:
pROC: an open-source package for R and S+ to analyze and compare ROC curves.
BMC Bioinformatics 12, 77 (2011)
20. Rodrigues, P.P., Dias, C.C., Cruz-Correia, R.: Improving clinical record visualiza-
tion recommendations with bayesian stream learning. In: Learning from Medical
Data Streams, vol. 765, p. paper4. CEUR-WS.org (2011)
21. Rodrigues, P.P., Dias, C.C., Rocha, D., Boldt, I., Teixeira-Pinto, A., Cruz-Correia,
R.: Predicting visualization of hospital clinical reports using survival analysis
of access logs from a virtual patient record. In: Proceedings of the 26th IEEE
International Symposium on Computer-Based Medical Systems, Porto, Portugal,
pp. 461–464 (2013)
22. Schurink, C.A.M., Lucas, P.J.F., Hoepelman, I.M., Bonten, M.J.M.: Computer-
assisted decision support for the diagnosis and treatment of infectious diseases in
intensive care units. The Lancet infectious diseases 5(5), 305–312 (2005)
23. Scutari, M.: Learning Bayesian Networks with the bnlearn R Package. Journal of
Statistical Software 35, 22 (2010)
24. Vieira-Marques, P.M., Cruz-Correia, R.J., Robles, S., Cucurull, J., Navarro, G.,
Marti, R.: Secure integration of distributed medical data using mobile agents.
Intelligent Systems 21(6), 47–54 (2006)
25. Wyatt, J.C., Wright, P.: Design should help use of patients’ data. Lancet
352(9137), 1375–1378 (1998)
On the Eﬃcient Allocation of Diagnostic
Activities in Modern Imaging Departments

Roberto Gatta1(B) , Mauro Vallati2 , Nicola Mazzini1,2,3 , Diane Kitchin2 ,

Andrea Bonisoli3 , Alfonso E. Gerevini3 , and Vincenzo Valentini1
1
Radiotherapy Department, Università Cattolica Del Sacro Cuore, Milan, Italy
[email protected]
2
School of Computing and Engineering, University of Huddersﬁeld, Huddersﬁeld, UK
3
Dipartimento d’Ingegneria dell’Informazione,
Università degli Studi di Brescia, Brescia, Italy

Abstract. In a modern Diagnostic Imaging Department, managing the

schedule of exams is a complex task. Surprisingly, it is still done mostly
manually, without a clear, explicit and formally deﬁned objective or tar-
get function to achieve.
In this work we propose an eﬃcient approach for optimising the
exploitation of available resources. In particular, we provide an objec-
tive function, that considers the aspects that have to be optimised, and
introduce a two-steps approach for scheduling diagnostic activities. Our
experimental analysis shows that the proposed technique can easily scale
on large and complex Imaging Departments, and generated allocation
plans have been positively evaluated by human experts.

1 Introduction
In a modern Diagnostic Imaging Department the allocation and re-allocation of
exams is a complex task, that is time-consuming and is still done manually. On
the one hand, it is fundamental to keep the waiting lists as short as possible, in
order to meet the established waiting time; on the other hand, it is of critical
importance to minimise expenses for the Department. Moreover, patient schedul-
ing has to be balanced, in order to plan the best possible allocation according to
the staff organisation/skills on different modalities, i.e. computed tomography
(CT), Radiography (RX), magnetic resonance (MR) and ultrasound (US) equip-
ment. In order to plan the best possible allocation a lot of available resources
must be taken into account: staff (radiologists, nurses, etc.), equipment (US,
CT, MR, etc.), examinations performed (tagged by imaging modalities, reim-
bursement rate, clustered by regions and/or pathologies) and staff characteristics
(part-time, full-time, etc.).
The literature on medical appointment scheduling is extensive, but
approaches –either automated or in the form of formal guidelines– to deal with
diagnostic activities in radiology Departments are rare. Nevertheless, the impor-
tance of scheduling activities in hospital services is well-known [8]. Even though
a few techniques have been proposed for dealing with part of the allocation prob-
lem (see, e.g., [1,4–6]), a complete approach able to manage all the aspects of the

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 103–109, 2015.
DOI: 10.1007/978-3-319-23485-4 10
104 R. Gatta et al.

allocation problem of a Radiological Department is still missing. Improvements

in that area can lead to a significant reduction of costs and human time.
In this work we propose a formalisation of the activities allocation problem.
The formalisation is composed by a set of definitions, constraints and a tar-
get function. The model is then exploited by an efficient scheduling approach,
based on enforced hill climbing, that aims at optimising the use of available
resources. This work encompasses a previous preliminary study by Mazzini et
al. [7]. According to our experimental analysis, and to the feedback we received
from medical experts, the proposed approach is efficiently able to produce good
quality scheduling, following the metric it is required to optimise, which considers
at the same time temporal and economic indexes.

2 The Diagnostic Activities Allocation Problem

In this section we define the relevant entities involved in the allocation problem,
and describe the function used for evaluating the quality of allocation plans.
Entities
We define a set of entities that can easily fit into most of the Radiology Infor-
mation Systems (RIS) currently used in diagnostic Imaging Departments [2].
Specifically, the proposed elements can directly fit with Paris, provided by ATS-
Teinos, and PRORAM from METRIKA. With some minor changes it can be
also fit with Estensa, of Esaote. We are confident it can also be easily adopted
in other situations. The most important entities are the following:
Exam represents the diagnostic examination that can be performed, e.g., “CT
brain”.
Exam Group (or cluster) is a group of exams. In many cases it is useful, due
to some team specialisation, to group exams for the area of the body (e.g.,
“head and neck” or “abdominal”) or to group them in order to reflect which
Department the patient comes from (e.g., “CT from GPs”). The grouping is
done according to the habits of the Imaging Department, team and work-flow:
hybrid models can also be implemented.
Modality this entity represents a medical device, like an ultrasound, a CT scan,
etc. In our model, a modality corresponds to an actual room. This is reasonable
since the machinery used for exams is usually not moved between rooms. Such
modelisation leads to having an independent agenda per modality. In principle, it
is possible to have the machines required for different sorts of exams in the same
room. This case, which is extremely rare since it leads to underused resources,
is not modelled.
Personnel represents the human resources (staff members) available. Each mem-
ber of staff has at least one role. Each Exam Group has a set of roles assigned, this
indicates the specific needs of that group of exams in terms of human resources.
On the Efficient Allocation of Diagnostic Activities 105

For instance, some exam groups require several nurses to be present in the room,
while other exams require technicians to be available.
Time Slot we adopted an atomic time slot of 5 minutes in a weekly calendar.
The granularity of 5 minutes has been chosen since it is not extremely long, thus
limiting the waste time; also, it is not too short, therefore short delays do not
affect the overall daily scheduling.
Temporal horizon catches the requirements in terms of queue governance. This
can represent constraints like “the queue for ’Brain MRI’ must be lower than 3
months for, at least, the next 12 months”. It should be noted that this is the
usual way in which queue governance requirements are expressed.
Objective Function and Constraints
The optimisation of the scheduling of exams has to deal with two main com-
ponents. On the one hand, it is important to maximise income for the Depart-
ment. This should result in the prioritisation of exams that are both frequently
requested and expensive. On the other hand, the Department is also providing
an important service to the community. Therefore, keeping all the queue lists
as short as possible is fundamental. In public hospitals there are strict upper
bounds for queues.
For considering both the aforementioned aspects, we designed an objective
function (to minimize) that combines the two perspectives. The adopted function
is depicted in Equation 1. In particular, n indicates the number of exams. ri is
the cost of the i-th exam. qi is the amount of exams of group i that should be
performed. Δj represents the difference between waiting queue and desired queue
length for the j-th exam group. Wj is the importance of the queue for the j-th
exam. Finally, α and β indicate, respectively, the importance that is given to the
economic side and to the respect of the limits on the queues’ length. Intuitively,
the function synthesises the point of view of the hospital administration (first
addend) –focused on the economic side– and of the doctor (second addend)
–focused on the quality of the service.
n
n

f =α (ri ∗ qi ) + β (Δj ∗ Wj ) (1)
1 1

In order to guarantee the feasibility of identiﬁed allocation plans, a number

of constraints have to be satisfied. In particular for each exam group the queue
length must not be higher than the provided upper bound for respecting the
governance rules on waiting time. Also there is a set of constraints that regulate
the correct allocation of exams to suitable modalities and time slots and correct
assignment of medical personnel to exam rooms respecting specific limits and
maximum consecutive working hours that are in place in hospitals for all the
staff. Our implemented prototype takes into account all such constraints.
106 R. Gatta et al.

3 The Proposed Two-Steps Algorithm

For the automatic allocation of diagnostic activities, we designed an approach
that is composed of two main steps: generation and improvement. In both steps,
groups of exams are considered. In a modern Radiology Department the number
of possible exams is usually high (300 – 400), but they can be easily organised
in groups, both for administrative issues and similarity of requirements. For
example: all the “RX bones” exams are very similar in terms of required roles,
funding, used time slots and required modality. Therefore, considering a group
instead of a single exam provides a good abstraction: it reduces the number of
variables to deal with, thus improving the computability, but does not lead to
loss of information. Also, RIS systems normally manage their agenda following
the same approach: instead of declaring all the exams which can be assigned to
a modality in a period of time, they allow the definition of clusters of exams
that can be linked to the modality. As a matter of fact, the medical personnel
are trained for performing a specific cluster of exams rather than a single type
of examination.
Generation
This step aims at quickly identifying an allocation of activities and personnel
that satisfies all the constraints, regardless of the overall quality. This task is
probably the most difficult, because it has to search the huge allocation space and
identify a solution that satisfies all the allocation constraints. In particular, the
generation step is based on the following tasks, which are executed by considering
one week at a time:

1. the available personnel is fully assigned to the rooms. Most of the staff are
assigned to morning slots, since it is the period of the day where most exams
take place. Some heuristics are followed for reducing the spread of exams of
the same group in different rooms, or in very different time slots.
2. A random number of time slots is assigned to each cluster of examinations,
according to the hard constraints related to human-resources.
3. After the allocation, free time slots or free human resources are analysed in
order to be exploited. For reducing the fragmentation, the preferred solution
is to extend the time slot of exams allocated before/after the free slot. Frag-
mentation leads to waste time due to switching exam equipment between
modalities and personnel moving between rooms.

In parallel with generation, a waiting lists estimation is performed. This

allows assessment of whether the requirements on exam queue lengths will be
satisfied, since the proposed approach provides a week-by-week allocation plan.
Improvement
The first allocation plan is then improved through an enforced hill climbing. At
each iteration, a neighbour of the current plan is generated by de-allocating a
cluster of exams from a modality, and substituting it with a different cluster.
The substitution is a complex step; it is not guaranteed it can be applied for
On the Efficient Allocation of Diagnostic Activities 107

all the pairs of clusters. This is due to, for example, different requirements in
terms of personnel, equipment or time. The choice of the group of exam to be
substituted is done by ordering clusters according to the requested resources and
number of requests per week. Clusters that require many resources and are rarely
performed are suitable to be substituted. The selected cluster is substituted with
another that can fit in the released time-slots.
If the new allocation plan has a better target function than the current plan,
the former is saved; otherwise the algorithm restarts by considering a different
suitable cluster to substitute. The search stops when a specified number of re-
scheduling attempts, or the time limit, is reached. It should be noted that the
designed algorithm is able to provide several solutions of increasing quality.

4 Experimental Analysis

We implemented a prototype of the proposed algorithm in C++; as input it

requires an XML description of available resources and exams to be performed.
We tested the proposed approach in various scenarios. We considered 3 different
possible Departments, several values of α and β, and different settings up to
at most 25 exam groups, 90 exams, 10 modalities and 24 staff members, dis-
tributed among radiologists, radiographers and nurses. This reflects a medium-
sized Department. We observed that a first solution is usually found in about 15
seconds. Incremental improvements, generally from 60 to 250 increasing quality
allocation plans, are usually found in less than 10 minutes.
Given the fact that in real-world Departments no quantitative functions are
used for evaluating the quality of allocation plans, it is incorrect to compare the
allocation plans generated by our approach with existing scheduling.
In order to have a first validation of the generated plans we showed some of
them to human experts; namely, to two radiologists, one radiographer techni-
cian and one IT specialist employed in a Diagnostic Imaging Department. They
considered the plans to be realistic and feasible. Considering the context and
the environment of a modern hospital, this is a reasonable way for validating the
generated schedules. Moreover, from experts feedback we synthesised a number
of useful insights, that are described in the following.
The generation step is computationally hard, but it is crucial in order to
obtain good final solutions. We observed that a low quality initial solution usually
leads to small increments in the subsequently improved allocations. On the other
hand, this generation step is not fundamental in real-world applications, since
Radiology Departments already have a feasible scheduling in place. Moreover,
their current scheduling usually includes thousands of reservations, therefore
cannot be suddenly changed without dramatic effects.
Fragmentation should be included in the objective function. In the current
model it is not considered, since we believed that dealing with exam groups
instead of single exams would have been enough. On the contrary, we observed
that exam groups allocated to a modality change frequently; this also causes a
frequent change in the staff, which affects the quality of the allocation plan. In
108 R. Gatta et al.

large hospitals, where examination rooms are far from each other, frequent staff
movement results in a significant waste of time.
Data entry is time-expensive: changes in personnel, exams, instrumentation
or policies are quite frequent in a medium-big Radiology Department, and require
updating of the data and re-planning. The best way to efficiently support an
operator would be by exporting, from existent RIS, data regarding staff, modal-
ities and dates, in order to save time. Currently, HL7 [3] would probably be the
best standard for such integration.
The proposed algorithm can provide useful information about the available
resources. In particular, it can be used for identifying the most limiting resource
(personnel, modalities, etc.) and evaluating the impact of new resources. Head of
Departments highlighted that it is currently a very complex problem to estimate
the impact of a new modality, or of increased personnel. By using the proposed
algorithm, the impact of new resources can be easily assessed by comparing the
quality of plans with and without them, in different scenarios.
New modalities or new staff require an initial “training” period. In the case
of new modalities, the staff will initially require more time for performing exams.
Newly introduced staff usually need to be trained. The current approach is not
able to catch such situations. A possible way for dealing with this is considering
a “penalty” for some instrumentation or personnel; a time slot is “longer” –by
a given penalty factor– when they are assigned to it. Penalties can be reduced
over time.
The current approach does not have a year overview. Some exams are more
likely to be required in some periods, thus they should have different priorities
with regards to their waiting lists’ requirements. Also, it is common practice that
in summer hospital’s personnel is reduced. A good integration with medical and
administrative databases will be useful for further refinements of the allocation
abilities.

5 Conclusion
In many Diagnostic Imaging Departments the allocation of exams is currently
done manually. As a result, it is time consuming, it is hard to assess its overall
quality, and no information about limiting resources are identified.
In this paper, we addressed the aforementioned issues by introducing: (i) a
formal model of the diagnostic activities allocation problem, and (ii) an efficient
algorithm for the automated scheduling of diagnostic activities. The proposed
model is general, and can therefore fit with any existing Imaging Department.
Moreover, a quantitative function for assessing the quality of allocation plans is
provided. The two-steps algorithm allows the generation and/or improvement
of allocation plans. An experimental analysis showed that the approach is effi-
ciently able to provide useful and valid scheduling for examinations. Feedback
received from experts confirms its usefulness, also for evaluating the impact of
new instrumentation or staff members.
This work can be seen as a pilot study, which can potentially lead to the
exploitation of more complex and sophisticated Artificial Intelligence techniques
On the Efficient Allocation of Diagnostic Activities 109

for handling the Radiological resource allocation problem. Further investigations

are needed both on the reasoning and on the knowledge representation sides.
Moreover, larger experimental analysis and stronger validation approaches are
envisaged. Future work includes also an extended experimental analysis with
data gathered from existing Departments, and the investigation of techniques
for reducing fragmentation.

References
1. Barbati, M., Bruno, G., Genovese, A.: Applications of agent-based models for opti-
mization problems: A literature review. Expert Systems with Applications 39(5),
6020–6028 (2012)
2. Boochever, S.S.: HIS/RIS/PACS integration: getting to the gold standard. Radiol
Manage 26(3), 16–24 (2004)
3. Dolin, R.H., Alschuler, L., Boyer, S., Beebe, C., Behlen, F.M., Biron, P.V., Shvo, A.S.:
Hl7 clinical document architecture, release 2. Journal of the American Medical Infor-
matics Association 13(1), 30–39 (2006)
4. Eagen, B., Caron, R., Abdul-Kader, W.: An agent-based modelling tool (abmt)
for scheduling diagnostic imaging machines. Technology and Health Care 18(6),
409–415 (2010)
5. Falsini, D., Perugia, A., Schiraldi, M.: An operations management approach for
radiology services. In: Sustainable Development: Industrial Practice, Education and
Research (2010)
6. Macal, C.M., North, M.J.: Agent-based modeling and simulation. In: Winter Simu-
lation Conference, pp. 86–98 (2009)
7. Mazzini, N., Bonisoli, A., Ciccolella, M., Gatta, R., Cozzaglio, C., Castellano, M.,
Gerevini, A., Maroldi, R.: An innovative software agent to support eﬃcient planning
and optimization of diagnostic activities in radiology departments. International
Journal of Computer Assisted Radiology and Surgery 7(1), 320–321 (2012)
8. Welch, J.D.: N.B.T.: Appointment systems in hospital outpatient departments. The
Lancet 259(6718), 1105–1108 (1952)
Ontology-Based Information Gathering System
for Patients with Chronic Diseases: Lifestyle
Questionnaire Design

Lamine Benmimoune1,3(B) , Amir Hajjam1 , Parisa Ghodous2 ,

Emmanuel Andres4 , Samy Talha4 , and Mohamed Hajjam3
1
IRTES-SET, Université de Technologie Belfort-Montbéliard, 90000 Belfort, France
[email protected]
2
LIRIS, Université Claude Bernard Lyon 1, 69100 Villeurbanne, France
3
Newel, 68100 Mulhouse, France
4
Hôpital Civil de Strasbourg, 67000 Strasbourg, France

Abstract. The aim of this paper is to describe an original approach

which consists of designing an Information Gathering System (IGS).
This system gathers the most relevant information related to the patient.
Our IGS is based on using questionnaire ontology and adaptive engine
which collects relevant information by prompting the whole significant
questions in connection with the patient’ s medical background. The for-
merly collected answers are also taken into consideration in the questions
selection process. Our approach improves the classical approach by cus-
tomizing the interview to each patient. This ensures the selection of all
of the most relevant questions. The proposed IGS is integrated within
E-care monitoring platform for gathering lifestyle-related patient data.

Keywords: Information gathering system · Questionnaire ·

Health-care · Clinical decision support system · Ontology · Monitoring

1 Introduction
Computer-based questionnaires are a new form of data collection, which are
designed to offer more advantages compared to pen and paper questionnaires
or oral interviewing [13]. They are less time-consuming and more efficient by
offering more structure and more details compared to the classical methods [2].
The Information Gathering Systems (IGSs) have had measurable benefits in
reducing omissions and errors arising as a result of medical interviews [14]. The
medical and health care domain is one of the most active domain in using IGS
for gathering patients data [13].
Recently, various research works were conducted to design and to use the IGS
as part of clinical decision support system (CDSS). Among them. Bouamrane
et al. [2], [3] proposed a generic model for context-sensitive self-adaptation of
IGS based on questionnaire ontology. The proposed model is implemented as an
data collector module in [4] to collect patient medical history for preoperative

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 110–115, 2015.
DOI: 10.1007/978-3-319-23485-4 11
Ontology-Based Information Gathering System for Patients 111

risk assessment. Sherimon et al. [5], [6], [15] proposed an questionnaire ontology
based on [2]. This ontology is used to gather patient medical history, which is
then integrated within CDSS to predict the Risk of hypertension. Farooq et
al. [7] proposed an ontology-based CDSS for chest pain risk assessment, based
on [2] the proposed CDSS integrates a data collector to collect patient medical
history. Alipour [13] proposed an approach to design an IGS based on the use of
ontology-driven generic questionnaire and Pellet inference engine for questions
selection process.
Although the presented IGS in the literature permit gathering patient data
using ontologies, the created questionnaires are hard coded for specific domains
and they are defined under the domain ontologies. These, make them less flexible,
more difficult to maintain and even hard to share and to reuse.
Unlike previous approaches, our approach offers more flexibility by separat-
ing the ontologies and by integrating a domain ontology to drive the creation
of questionnaire. This allows to give meaning to the created questions, and con-
figuring different models of questionnaires without coding and regardless of the
content of the domains. Therefore, many CDDSSs can easily integrate and use
the proposed IGS for their specific needs.
Furthermore, the proposed approach permits to collect relevant information
by prompting the whole significant questions in connection with the patient
profile. The formerly collected answers are also taken into consideration in the
questions selection process. This improves the classical approach by customizing
the interview to each patient.
The proposed IGS is integrated within E-care home health monitoring plat-
form [1], [8] for gathering lifestyle-related patient data.

2 Information Gathering System within E-Care Platform

E-care is a home health monitoring platform for patients with chronic diseases
such as diabetes, heart failure, high blood pressure, etc. [1] [8]. The aim is early
detection of any anomalies or dangerous situations by collecting relevant data
from the patient such as physiological data (heart rate, blood pressure, pulse,
temperature, weight, etc.) and lifestyle data (tobacco-use, eating habits, physical
activity, sleep, stress, etc.).
To improve the accuracy in anomalies detection, the platform needs relevant
information that describes as precisely as possible the patient’s health status and
his lifestyle changes (tobacco-use, lack of physical activity, poor eating habits,
etc.). That is why the patient is invited daily to collect his physiological data
using medical sensors (Blood Pressure Monitor, Weighing Scale, Pulse Oximeter,
etc.) and to answer on lifestyle questionnaires. These questionnaires are auto-
matically generated by the IGS which permits gathering relevant information
about the patient lifestyle.
All collected data (physiological data and lifestyle data) is stored in the
patient proﬁle ontology which models the health status of patient and then
analysed by the inference engine for anomalies detection.
112 L. Benmimoune et al.

3 Information Gathering System Architecture

The proposed architecture consists of four main components: Questionnaire
Ontology, Survey History Ontology, Adaptive Engine and User Interfaces.

Fig. 1. Information Gathering System architecture

Questionnaire Ontology (QO): models the concepts representing the com-

mon components of a questionnaire. The QO is created based on Bouamrane et
al.’s research works [2]. It is designed as generic, structured and ﬂexible to accept
most of the questionnaire models. The main classes are: Questionnaire, SubQues-
tionnaire, Question and PotentialAnswer. The Questionnaire class is composed
of Sub-questionnaires, which represent a group of thematically related question
classes. The question classes could be inter-related by structural properties such
as hasParent, hasChild, hasSibling, etc. Each question is characterised by a type
and related to one or more potential answers using Adaptive properties such as
ifAnswerToThisQuestionEqualsTo, thenGoToQuestion, etc.

Survey History Ontology (SHO): stores all the patient surveys. It includes
all the asked questions and the given answers by the patient. It is used in the
questions selection process.

The Adaptive Engine (AE): it interprets the properties asserted in the ques-
tionnaire ontology and prompts the corresponding questions in connection with
the patient profile and the formerly collected answers. The AE initially loads all
questions except the children questions. It prompts the first question and checks
if the question is appropriate to the patient profile (e.g. AE doesn’t ask questions
about the smoking habits, if the patient is a non-smoker). If it is, the AE asks
the question and gets the answer from the UI. If it is not, the AE just prompts
the next question.
Ontology-Based Information Gathering System for Patients 113

If however the current question happens to be adaptive (i.e. it has at least

a child question), the given answer is then checked against the answers that are
expected to lead to children questions. If a match is found, the AE loads the
children questions. If no match is found, the next question is prompted. The
interaction loop is repeated until there are no more questions to be asked.

The User Interfaces (UI): consist of two parts of UI namely: expert UI and
Patient UI.

• Expert UI: permits the domain experts (clinicians) to conﬁgure the IGS
by deﬁning questionnaires and to consult the surveys history.
• Patient UI: permits to start/stop the survey. It is designed in such a way
that the patient can respond to the questionnaire from anywhere using his
mobile device (tablet or smart phone).

4 Domain Ontology Driven Questionnaire

The domain ontology aims to drive the creation of questionnaires by oﬀering a

common and controlled vocabulary. To achieve this goal, we have developed a
domain ontology for lifestyle concepts based on recommendations provided by
Haute Autorité de Santé (HAS)1 . The ontology is structured as an hierarchy of
concepts and relations between concepts. It is composed of three main entities:

• LifeStyleEntities: hierarchical concepts that model lifestyle entities such

as eating habits, physical activity, smoking habits, etc.
• DimensionsEntities: hierarchical concepts that model temporal dimen-
sions and physical dimensions (quantity). Each dimension includes a hier-
archy of concepts (e.g. TimesOfDay, TimeFrequency and TimeUnit are
grouped under the timeDimension concept).
• CataloguesEntities: includes concepts used to give more semantic for the
LifeStyleEntities concepts. Each catalogue entity includes a hierarchy of
concepts that model the types of LifeStyleEntities concepts (e.g. Cigarette,
E-cigarette and Drug are types of Tobacco concept for the SmokingHabits
concept).

The concepts are related amongst them through properties as follows.

• DimensionProperties: used to relate the LifeStyleEntities to the Dimen-

sionsEntities, they include a set of properties such as hasQuantity, hasFre-
quency, hasTimesOfDay, etc.
• CatalogueProperties: used to relate the LifeStyleEntities to the Cata-
loguesEntities, they include a set of properties such as hasExercise, hasTo-
bacco, hasFood, etc.
114 L. Benmimoune et al.

Fig. 2. Related questions for the “smoking habits” concept

The example illustrated by ﬁgure 2 shows how the domain concepts can be
related amongst them and how they are used to design lifestyle questionnaires.
Given the smoking habit that is characterized by a type of tobacco (e.g.
cigarette, electronic cigarette, drugs, etc.), time frequency (daily, monthly,
weekly, etc.), smoking quantity, etc. Several questions can be created based on
SmokingHabits concept, with each smoking-related question should be related to
the SmokingHabits through the domain properties, while the potential answers
are related either to the DimensionsEntities or to the CatalogueProperties. (see
ﬁgure 2).

5 Conclusion and Future Work

In this paper, we presented a novel approach, which aims to design an Ontology-

Based Information Gathering System. This system permits gathering the most
relevant information by providing personalised questionnaires related to the
patient proﬁle. Our IGS consists of a questionnaire ontology which is driven
by domain ontology. We have seen how the domain ontology is used to con-
trol vocabulary and to give a meaning to the asked questions. Furthermore, the
use of a domain ontology can improve the gathering of data and the design of
questionnaires can be made easier and faster compared to the hard coding of
questionnaires. On the other hand, we have highlighted the interest of using
the proposed IGS within E-care health monitoring platform, since it permits
gathering relevant information about patients’ lifestyle.
In the near future, we will experiment the proposed IGS in the real life with
chronic patients.
1
http:www.has-sante.fr
Ontology-Based Information Gathering System for Patients 115

References
1. Benyahia, A.A., Hajjam, A., Hilaire, V., Hajjam, M.: E-care ontological archi-
tecture for telemonitoring and alerts detection. In: 5th IEEE International
Symposium on Monitoring & Surveillance Research (ISMSR): Healthcare-Safety-
Security (2012)
2. Bouamrane, M.-M., Rector, A.L., Hurrell, M.: Ontology-driven adaptive medical
information collection system. In: An, A., Matwin, S., Raś, Z.W., Ślezak, D.
(eds.) Foundations of Intelligent Systems. LNCS (LNAI), vol. 4994, pp. 574–584.
Springer, Heidelberg (2008)
3. Bouamrane, M.M., Rector, A., Hurrell, M.: Gathering precise patient medical his-
tory with an ontology-driven adaptive questionnaire. In: 21st IEEE International
Symposium on Computer-Based Medical Systems, CBMS 2008, June 17–19, 2008
4. Bouamrane, M.-M., Rector, A., Hurrell, M.: Using ontologies for an intelligent
patient modelling, adaptation and management system. In: Meersman, R., Tari, Z.
(eds.) OTM 2008, Part II. LNCS, vol. 5332, pp. 1458–1470. Springer, Heidelberg
(2008)
5. Sherimon, P.C., Vinu, P.V., Krishnan, R., Takroni, Y.: Ontology Based System
Architecture to Predict the Risk of Hypertension in Related Diseases. IJIPM:
International Journal of Information Processing and Management 4(4), 44–50
(2013)
6. Sherimon, P.C., Vinu, P.V., Krishnan, R., Takroni, Y., AlKaabi, Y., AlFars, Y.:
Adaptive questionnaire ontology in gathering patient medical history in diabetes
domain. In: Herawan, T., Deris, M.M., Abawajy, J. (eds.) DaEng-2013. LNEE,
vol. 285, pp. 453–460. Springer, Singapore (2014)
7. Farooq, K., Hussain, A., Leslie, S., Eckl, C., Slack, W.: Ontology driven cardiovas-
cular decision support system. In: 2011 5th International Conference on Pervasive
Computing Technologies for Healthcare (PervasiveHealth), May 23–26, 2011
8. Benyahia, A.A., Hajjam, A., Hilaire, V., Hajjam, M., Andres, E.: E-care telemon-
itoring system: extend the platform. In: 2013 Fourth International Conference on
Information, Intelligence, Systems and Applications (IISA), July 10–12, 2013
9. Saripalle, R.K.: Current status of ontologies in Biomedical and clinical Infor-
matics. University of Connecticut. http://www.engr.uconn.edu/steve/Cse300/
saripalle.pdf (retrieved January 16, 2014)
10. Gruber, T.R.: Toward principles for the design of ontologies used for knowl-
edge sharing? International Journal Human-Computer Studies 43(5–6), 907–928
(1995)
11. Guarino, N.: Formal Ontology and information systems. Formal ontology in infor-
mation systems. In: Proceedings of FOIS 1998, Trento, Italy, June 6–8, 1998
12. Noy, N.F., McGuinness, D.L.: Ontology Development 101: A Guide to Creating
Your First Ontology. Stanford University (2005)
13. Alipour-Aghdam, M.: Ontology-Driven Generic Questionnaire Design. Thesis for
the degree of Master of Science in Computer Science. Presented to The University
of Guelph, August 2014
14. Bachman, J.W.: The patient-computer interview: a neglected tool that can aid
the clinician. Mayo Clinic Proceedings 78, 67–78 (2003)
15. Sherimon, P.C., Vinu, P.V., Krishnan, R., Saad, Y.: Ontology driven analysis
and prediction of patient risk in diabetes. Canadian Journal of Pure and Applied
Sciences 8(3), 3043–3050 (2014). SENRA Academic Publishers, British Columbia
Predicting Preterm Birth in Maternity Care
by Means of Data Mining

Sónia Pereira, Filipe Portela(), Manuel F. Santos, José Machado,

and António Abelha

Algoritmi Centre, University of Minho, Braga, Portugal

[email protected], {cfp,mfs}@dsi.uminho.pt,
{jmac,abelha}@di.uminho.pt

Abstract. Worldwide, around 9% of the children are born with less than 37
weeks of labour, causing risk to the premature child, whom it is not prepared to
develop a number of basic functions that begin soon after the birth. In order to
ensure that those risk pregnancies are being properly monitored by the obstetri-
cians in time to avoid those problems, Data Mining (DM) models were induced
in this study to predict preterm births in a real environment using data from
3376 patients (women) admitted in the maternal and perinatal care unit of Cen-
tro Hospitalar of Oporto. A sensitive metric to predict preterm deliveries was
developed, assisting physicians in the decision-making process regarding the
patients’ observation. It was possible to obtain promising results, achieving sen-
sitivity and specificity values of 96% and 98%, respectively.

Keywords: Data mining · Preterm birth · Real data · Obstetrics care · Maternity
care

1 Introduction

Preterm birth portrays a major challenge for maternal and perinatal care and it is a
leading cause of neonatal morbidity. The medical, education, psychological and social
costs associated with preterm birth indicate the urgent need of developing preventive
strategies and diagnostic measures to improve the access to effective obstetric and
neonatal care [1]. This may be achieved by exploring the information provided from
the information systems and technologies increasingly used in healthcare services.
In Centro Hospitalar of Oporto (CHP), a Support Nursing Practice System focused
on nursing practices (SAPE) is implemented, producing clinical information. In addi-
tion, patient data plus their admission form are recorded though EHR (Electronic
Health Record) presented in Archive and Diffusion of Medical Information (AIDA)
platform. Both SAPE and EHR are also used by the CHP maternal and perinatal care
unit, Centro Materno Infantil do Norte (CMIN). CMIN is prepared to provide medical
care / services for women and child. Therefore, using obstetrics and prenatal informa-
tion recorded from SAPE and EHR, it is possible to extract new knowledge in the
context of preterm birth. This knowledge is achieved by means of Data Mining (DM)
techniques, enabling predictive models based on evidence. This study accomplished
© Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 116–121, 2015.
DOI: 10.1007/978-3-319-23485-4_12
Predicting Preterm Birth in Maternity Care by Means of Data Mining 117

DM models with sensitivity and specificity values of approximately 96% and 98%,
which are going to support the making of preventive strategies and diagnostic meas-
ures to handle preterm birth.
Besides the introduction, this article includes a presentation of the concepts
and related work in Section 2, followed by the data mining process, described in Sec-
tion 3. Furthermore, the results are discussed and a set of considerations are made in
Section 4. Section 5 presents the conclusions and directions of future work.

2 Background and Related Work

2.1 Preterm Birth

Preterm birth refers to a delivery prior to 37 completed weeks (259 days) of labour.
Symptoms of preterm labour include uterine contractions occurring more often than
every ten minutes, or the leaking of fluids. Preterm birth is the leading cause of long-
term disability in children, since many organs, including the brain; lungs and liver are
still developing in the final weeks of pregnancy [2]. Preterm Birth has not decreased
in the last 30 years, due to the failure identifying the high-risk group during routine
prenatal care [3]. Many studies were conducted to identify a way to predict preterm
deliveries, focusing on physiologic measures, ultrasonography, obstetrics history and
socioeconomic status [4]. For instance, in 2011 a model was developed for predicting
spontaneous delivery before 34 weeks based on maternal factors, placental perfusion
and function at 11-13 weeks’ gestation, through screening maternal characteristics
and regression analysis. They detected 38.2% of the preterm deliveries in women with
previous pregnancies beyond 16 weeks and 18.4% in those without [3]. Most of the
efforts to predict preterm birth face limited provision of population based data, since
registration of births is incomplete and information is lacking on gestational age [6].

2.2 Interoperability Systems and Data Mining in Healthcare

As mentioned in the previous section, this study is based on real data acquired
from CMIN. The knowledge extraction depends substantially on the interoperability
between SAPE and EHR systems assured through AIDA. This multi-agent platform
enables the standardization of clinical systems and overcomes the medical
and administrative complexity of the different sources of information from the hospit-
al [5].
In healthcare systems, there is a wealth of data available, although there is a lack of
effective analysis tools to extract useful information. Thus, data mining have found
numerous applications in scientific and clinical domain [8]. Successful mining appli-
cations have been implemented in the healthcare. In obstetrics and maternal care,
some of these studies were employed to predict the risk pregnancy in women per-
forming voluntary interruption of pregnancy (VIP) [9] and manage VIP by predicting
the most suitable drug administration [7].
118 S. Pereira et al.

3 Study Description

This study was conducted by following the Knowledge Discovery in Database

(KDD), allowing the extraction of implicit and potentially useful information, through
algorithms, taking account the magnitudes of data increasing [10].
The DM methodology employed was the Cross Industry Standard Process for Data
Mining (CRIP-DM), a non-rigid sequence of six phases, carried out in this section,
which allow the implementation of DM models to be used in real environments [11].
To induce the DM models, four different algorithms were implemented: Decision
Trees (DT), Generalized Linear Models (GLM), Support Vector Machine (SVM) and
Naïve Bayes (NB). This study used data collected from 3376 patients (women) admit-
ted in the maternal and perinatal care unit (CMIN) of CHP comprising a period be-
tween 2012-07-01 and 2015-01-31, in a total of 1120 days.

3.1 Business Understanding

The Business aim of this project is to identify the risk group of preterm delivery, to
ensure the proper monitoring and to avoid its associated problems. The DM goal is to
develop accurate models able to support the decision-making process by predicting
whether or not a woman will be subjected to a preterm delivery, based on data from
clinic cases.

3.2 Data Understanding

The initial dataset extracted from SAPE and EHR admission records was analysed and
processed in order to be used in the DM process. A set of 13 variables were selected:
age (corresponds to the age of the pregnant patient), programmed (indicates whether or
not a delivery is programmed), gestation (singular or multiple pregnancies), PG1 and
PG2 (first echography measures), motive (reason of intervention - normal delivery or
unexpected events), patients’ weight and height, BMI (body mass index), blood type,
cardiotocography (CTG) (biophysics exam that evaluates the fetal wellbeing), strepto-
coccus (presence of the bacterium streptococcus in the pregnant system) and finally,
marital status of the pregnant patient. The target variable Group Risk denotes the pre-
term birth risk and it is presented in Table 1.

Table 1. Representation of the target variable Group Risk.

Description Value Target Distribu- Percentage
>=37 weeks of gestation (Term) 0 3137 92.92%
< 37 weeks of gestation (Preterm) 1 239 7.08%

In Table 2 are shown statistics measures related to the numerical variables age, ges-
tation, PG1, PG2 and BMI, while in Table 3 it is represented the percentage of occur-
rences for some used variables.
Predicting Preterm Birth in Maternity Care by Means of Data Mining 119

Table 2. Statistics measures of age, PG1, PG2, weight, height, BMI variables.
Minimum Maximum Average Standard Deviation
Age 14 46 29.88 5.81
PG1 5 40 12.81 2.96
PG2 0 8 3.09 1.96
BMI 14.33 54.36 29.40 4.57

Table 3. Percentage of occurrences of some variables.

Variable Class Cases
Programmed True 12.53%
Gestation Singular 89.90%
Motive Normal 81.33%
Streptococcus Positive 13.27%
Cardiotocography Suspect 2.19%

3.3 Data Preparation

After understanding the data collected, the variables were prepared to be used by the
DM models. The data pre-processing phase started with the identification of null and
noise values. These values were eliminated from the dataset. To ensure the data nor-
malization, all the values, such as weight and height, were transformed to Internation-
al System measures, using the point to separate decimal values.
As shown in Table 1, there is a disparity in the distribution of values of the target
variable Risk Group (low percentage of preterm birth cases). In order to balance the
target, the oversampling technique was implemented by replicating the preterm birth
cases until it reached approximately 50% of the dataset, obtaining 6244 entries.

3.4 Modelling
A set of Data Mining models (DMM) were induced using the four DM techniques
(DMT) mentioned in Section 3: GLM, SVM, DT and NB. The developed models
used two sampling methods Holdout sampling (30% of data for testing) and Cross
Validation (all data for testing). Additionally there were implemented two different
approaches, one using the raw dataset (3376 entries) and another with oversampling.
Different combinations of variables were used, obtaining 5 different scenarios:
S1: {Age (A), Gestation (G), Programmed (P), PG1, PG2, Motive (M), Height (H), Weight (W), BMI,
Blood Type (B), Marital Status (MS), CTG, Streptococcus (S)}
S2: {A, H, W, BMI, B, MS, CTG, S}
S3: {G, P, PG1, PG2, M, CTG, S}
S4: {A, G, PG1, PG2, M, H, W, BMI, B, CTG, S}
S5: {A, G, P, M, H, W, BMI, B}

Therefore, a total of 80 Data Mining models (DMM) were induced:

DMM = {5 Scenarios, 4 Techniques, 2 Sampling Methods, 2 Approaches}

All the models were induced using the Oracle Data Miner with its default configu-
rations. For instance, GLM was induced with automatic preparation, with a confi-
dence level of 0.95 and a reference value of 1.
120 S. Pereira et al.

3.5 Evaluation
The study used the confusion matrix (CMX) to assess the induced DM models. Using
the CMX, the study estimated some statistical metrics: sensitivity, specificity and
accuracy. Table 4 presents the best results achieved by each technique, sampling me-
thod and approach. The best accuracy (93.00%) was accomplished with scenario 3 by
both DT and NB techniques using oversampling and 30% of data for testing. The best
sensitivity (95.71%) was achieved by scenario 4 with oversampling using SVM tech-
nique and all the data for testing. Regarding specificity, scenario 2 reached 97.52%
using SVM with oversampling and all the data for testing.

Table 4. Sensitivity, specificity and accuracy values for the best scenarios for each DMT,
approach and sampling method. Below, the best metric values highlighted for each DMT.
DMT Oversampling Sampling Scenario Sensitivity Specificity Accuracy
DT No 30% 3 0.8889 0.9303 0.9300
No All 1 0.2896 0.9723 0.8599
GML No All 4 0.2896 0.9723 0.8599
Yes All 4 0.8674 0.7126 0.7687
NB No 30% 3 0.8889 0.9303 0.9300
No All 1 0.4868 0.9646 0.9271
SVM No All 2 0.1023 0.9752 0.4570
Yes All 4 0.9571 0.6647 0.7410

In order to choose the best models a threshold was established, considering sensi-
tivity, accuracy and sensitivity values upper than to 85%. Table 5 shows the models
that fulfil the threshold.

Table 5. Best model achieving the established threshold.

Scenario Model Oversampling Sampling Sensitivity Specificity Accuracy
3 NB,DT No 30% 0.8889 0.9303 0.9300

4 Discussion

Should be noted that the best sensitivity (95.71%) and specificity (97.52%) are
reached by models that did not achieve the threshold defined, showing low values in
the remaining statistical measures used to evaluate the models. It can be settled that
scenario 3 meets the defined threshold, presenting good results in terms of specificity
and sensitivity, as seen in Table 5. Thus, it appears that the most relevant factors that
affect the term of birth are: pregnancy variables, Gestation and physical conditions of
the pregnant woman. In a clinic perspective, the achieved results will enable the pre-
diction of preterm birth, with low uncertainty, allowing those responsible better moni-
toring and resource management. In a real time environment, physicians can rely on
the model to send a warning informing that a specific patient has a risk pregnancy and
it is in danger of preterm delivery. Consequently, the physician can be observant and
alert to these cases and can put the patients on special watch, saving resources and
time to the healthcare institution.
Predicting Preterm Birth in Maternity Care by Means of Data Mining 121

5 Conclusions and Future Work

At the end of this work it is possible to assess the viability of using these variables
and classification DM models to predict Preterm Birth. The study was conducted
using real data. Promising results were achieved by inducing DT and NB, with over-
sampling and 30% of the data for testing, in scenario 3, achieving approximately 89%
of sensitivity and 93% of specificity, suited to predict preterm births. The developed
model support the decision-making process in maternity care by identifying the preg-
nant patients in danger of preterm delivery, alerting to their monitoring and close
observation, preventing possible complications, and ultimately, avoiding preterm
birth.
In the future new variables will be incorporated in the predictive models and other
types of data mining techniques will be applied. For instance, inducing Clustering
techniques would create clusters with the most influential variables to preterm birth.

Acknowledgments. This work has been supported by FCT - Fundação para a Ciência e Tecno-
logia within the Project Scope UID/CEC/00319/2013.

References
1. Berghella, V. (ed.): Preterm birth: prevention and management. John Wiley & Sons (2010)
2. Spong, C.Y.: Defining “term” pregnancy: recommendations from the Defining “Term”
Pregnancy Workgroup. Jama 309(23), 2445–2446 (2013)
3. Beta, J., Akolekar, R., Ventura, W., Syngelaki, A., Nicolaides, K.H.: Prediction of sponta-
neous preterm delivery from maternal factors, obstetric history and placental perfusion and
function at 11–13 weeks. Prenatal diagnosis 31(1), 75–83 (2011)
4. Andersen, H.F., Nugent, C.E., Wanty, S.D., Hayashi, R.H.: Prediction of risk for preterm deli-
very by ultrasonographic measurement of cervical length. AJOG 163(3), 859–867 (1990)
5. Abelha, A., Analide, C., Machado, J., Neves, J., Santos, M., Novais, P.: Ambient intelli-
gence and simulation in health care virtual scenarios. In: Camarinha-Matos, L.M., Afsar-
manesh, H., Novais, P., Analide, C. (eds.) Establishing the Foundation of Collaborative
Networks. IFIP — The International Federation for Information Processing, vol. 243, pp.
461–468. Springer, US (2007)
6. McGuire, W., Fowlie, P.W. (eds.): ABC of preterm birth, vol. 95. John Wiley & Sons
(2009)
7. Brandão, A., Pereira, E., Portela, F., Santos, M.F., Abelha, A., Machado, J.: Managing volun-
tary interruption of pregnancy using data mining. Procedia Technology 16, 1297–1306
(2014)
8. Kaur, H., Wasan, S.K.: Empirical study on applications of data mining techniques in
healthcare. Journal of Computer Science 2(2), 194 (2006)
9. Brandão, A., Pereira, E., Portela, F., Santos, M.F., Abelha, A., Machado, J.: Predicting the risk
associated to pregnancy using data mining. In: ICAART 2015 Portugal. SciTePress (2015)
10. Maimon, O., Rokach, L.: Introduction to knowledge discovery in databases. Data Mining
and Knowledge Discovery Handbook, pp. 1–17. Springer, US (2005)
11. Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.:
CRISP-DM 1.0 Step-by-step data mining guide (2000)
Clustering Barotrauma Patients in ICU–A Data Mining
Based Approach Using Ventilator Variables

Sérgio Oliveira, Filipe Portela(), Manuel F. Santos, José Machado,

António Abelha, Álvaro Silva, and Fernando Rua

Algoritmi Centre, University of Minho, Braga, Portugal

[email protected], {cfp,mfs}@dsi.uminho.pt,
{jmac,abelha}@di.uminho.pt, [email protected],
[email protected]

Abstract. Predicting barotrauma occurrence in intensive care patients is a diffi-

cult task. Data Mining modelling can contribute significantly to the identifica-
tion of patients who will suffer barotrauma. This can be achieved by grouping
patient data, considering a set of variables collected from ventilators directly re-
lated with barotrauma, and identifying similarities among them. For clustering
have been considered k-means and k-medoids algortihms (Partitioning Around
Medoids). The best model induced presented a Davies-Bouldin Index of 0.64.
This model identifies the variables that have more similarity among the va-
riables monitored by the ventilators and the occurrence of barotrauma.

Keywords: Barotrauma · Plateau pressure · Intensive medicine · Data mining ·

Clustering · Similarity · Correlation

1 Introduction

Data Minng (DM) process provides not only the methodology but also the technology
to transform the data collected into useful knowledge for the decision-making process
[1]. In critical areas of medicine some studies reveal that one of the respiratory dis-
eases with higher incidence in the patients is Barotrauma [2]. Health professionals
have identified high levels of Plateau pressure as having a significantly contribute to
the Barotrauma occurrence [3]. This study is part of the major project INTCare. In
this work a clustering process was addressed in order to characterize patients with
barotrauma and analyze the similarity among ventilator variables. The best models
achieved a Davies-Bouldin Index of 0.64. The work was tested using data provided by
the Intensive Care Unit (ICU) of the Centro Hospitalar do Porto (CHP).
This paper consists of four sections. The first section corresponds to the introduc-
tion of the problem and related work. Aspects directly related to this study and sup-
porting technologies for knowledge discovering from databases are then addressed in
the second section. The third section formalizes the problem and presents the results
in terms of DM models following the methodology Cross Industry Standard Process
for Data Mining (CRISP-DM). In the fourth section some relevant conclusions are
taken.

© Springer International Publishing Switzerland 2015

F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 122–127, 2015.
DOI: 10.1007/978-3-319-23485-4_13
Clustering Barotrauma Patients in ICU–A Data Mining Based Approach 123

2 Background

2.1 Plateau Pressure, Acute Respiratory Distress Syndrome and Barotrauma

The occurrence of barotrauma happens when a patient has complications in mechani-
cal ventilation. Patients with Acute Respiratory Distress Syndrome (ARDS) have
shown that the incidence of pneumothorax and barotrauma varies between 0% and
76% [4]. The occurrence of barotrauma is one of the most dreaded complications
when a patient is mechanically ventilated. This occurrence is associated with an in-
creased morbidity and mortality. Several researchers argue that the Positive end-
expiratory pressure (PEEP) is related to the occurrence of barotrauma, however there
are other researchers not supporting this relationship and enforcing that there was not
identified any relationship between PEEP and barotrauma [2]. The Plateau Pressure
(PPR) values shall be continuously monitored providing important information for
patient diagnosis. It is important to maintain the value of PPR <= 30 cm O in order
to protect the patient's lungs. An increased in PPR is associated to an increasing elas-
ticity of the respiratory system and decreased compliance of the respiratory system
[3].

2.2 Related Work

Predicting barotrauma occurrence is important for the patient wellbeing. So it is fun-
damental to explore the prediction accuracy of the variables and their correlation with
barotrauma – PPR values >= 30 cm O. In a first stage of this project it was predicted
the probability of occurring barotrauma considering only the data provided by the
ventilator. This study [5] shown that it is possible to predict PPR class <30 cm O
and >= 30 cm O, with an accuracy between 95.52% and 98.71%. The best model
was achieved using Support Vector Machines and all the variables considered in the
study. However, another good model was obtained (95.52% of accuracy) using only
three variables. This model showed a strong correlation among: Compliance Dynamic
(CDYN), Means Airway Pressure and Pressure Peak.

2.3 INTCare
This work was carried out under the research project INTCare. INTCare is an Intelligent
Decision Support System (IDSS) [6] for Intensive Care which is in constantly develop-
ing and testing. This intelligent system was deployed in the Intensive Care Unit (ICU)
of Centro Hospitalar do Porto (CHP). INTCare allows a continuous patient condition
monitoring and a prediction of clinical events using DM. One of the most recent goals
addressed is the identification of patients who may have barotrauma.

2.4 Data Mining

DM corresponds to the process of using technical features of artificial intelligence,
statistical calculations and mathematical metrics able to extract information and
124 S. Oliveira et al.

useful knowledge. The knowledge discovery may represent various forms, business
rules, similarities, patterns or correlations [7].
This work is mainly focused on the development and analysis of clusters. This is a
grouping process based on observing the similarity or interconnection density. This
process aims to discover data groups according to the distributions of the attributes
that make up the dataset [8]. To develop and assess the application of clustering algo-
rithms in the barotrauma dataset, the statistical system R was choosen.

3 Knowledge Discovering Process

3.1 Business Understanding

The main goal is to use ventilation data in order to identify groups of objects that
belong to the same class, i.e. group sets of similar objects in a single set and dissimilar
in different sets. The data used to conduct this study were collected in the ICU of
CHP. The clusters were supported only for data monitored by ventilators; the values
used were numeric and were from discrete quantitative type.

3.2 Data Understanding

The initial data sample contained several records without patient identification (PID).
This happens because sometimes the patients are admitted for a few hours in the ICU
but are not assigned to an Electronic Health Record (EHR). These records were dis-
carded for this study. The sample used was collected from the ventilators and com-
prises a period between 01.09.2014 and 10.12.2014 and a total of 33023 records. Each
record contains fourteen fields: CDYN – (F_1); CSTAT – (F_2); FIO2 – (F_3); Flow –
(F_4); RR – (F_5); PEEP – (F_6); PMVA – (F_7);
Plauteau pressure – (F_8); Peak pressure – (F_9); RDYN – (F_10); RSTAT –
(F_11); Volume EXP – (F_12); Volume INS – (F_13); Volume Minute – (F_14).
The coefficient of variation shows that the distributions are heterogeneous for all
the attributes since the results obtained are higher than 20%. This measure corres-
ponds to the dispersion ratio between the standard deviation and the average.

3.3 Data Preparation

Data transformations were necessary to perform data segmentation using clustering
techniques based in resource partition methods. Because these techniques do not han-
dle null values and qualitative data, two operations have been performed:

• Firstly, records having at least one null value were eliminated;

• Then, the records containing qualitative values were eliminated.

3.4 Modelling
The algorithms k-means and k-medoids were used to create the cluster. The choice is
justified by the principle of partition method and the difference in their sensitivity to
find outliers.
Clustering Barotrauma Patients in ICU–A Data Mining Based Approach 125

K-means algorithm is sensitive to outliers, because the objects are far from the
majority, which can significantly influence the average value of the cluster. This ef-
fect is particularly exacerbated by the use of the squared error function [9].
On the other hand, K-medoids instead of using the value of a cluster object as a re-
ferencing point takes on real objects and represents the clusters, creates an object for
each cluster. The partitioning method is then performed based on the principle of
adding the differences between each p (intra-clusters distance). It is representing an
object (dataset partition) [9] where the p is always >= 0. The K-medoids algorithm is
similar to K-means, except that the centroids must belong to a set of grouped data
[10]. Some configurations were atempted for each one of the algorithms. In the k-
means algorithm the value (cluster number) varies between 2 and 10. In order to
obtain the appropriate number of it was used the sum of squared error (SSE). Each
dataset was executed 10 times.
The model belongs to an approach A and it is composed by the fields F, a type
of variable TV and an Algorithm AG:

_1 , _2 , _3 , _4 , _5 , _6 , _7 , _8 , _9 , _10 , _11
, _12 , _13 , _14

Being this study related with Barotrauma and Plateau Pressure, all the models in-
cluded the variable . Some of the clusters induced are composed by the group of
variables defined in the first approach.

3.5 Evaluation
This is the last phase of the study. It focuses mainly on the analysis of the results pre-
sented through the implementation of clustering algorithms (K-means and PAM).
The evaluation of the induced models was made by using the Davies-Bouldin Index.
The models which presented most satisfactory results were those obtained by means
of the K-means algorithm. In general, some models presented good results, however
the models did not achieve optimal results (index near 0). Table 2 presents the best
models and the correspondent results.

Table 1. Models for clustering

Number of Davies-Boldin
Model Fields Algorithm
Clusters Index
, , , , 2 0.82
, , , , 5 0.86
, , , 2 0.64
, , , 6 1.17
126 S. Oliveira et al.

The model shown to be the most capable in designing a clusters with better dis-
tances. Davies-Bouldin Index tends to +∝ however model has an index of 0.64.
This is not the optimal value, but it is the most satisfactory because it is closest to 0.
Figure 1 presents M3 results.

1 1 11
11
1 111 1
11 1111 111 11 1
1
5
1 1 2
1 111 1 1 2
1111 11
11111 1 1 11
111 1 1 1
111111111111 111 22
11111 1111111 11 1 1 2
11111111111 111 11111 1 11 1 1 1 11
11111111 1
111 11111111 1111 11 11 1111
111 2
1
11111111
11 11 1 1111 11 11111 11111111 1111 1 1 2 2 2
1
11111111 11 1111
11
1111 1111 11111 1 11111 2 22 222 22 2 2 2 22
0

11 1 1 111 1
1 1
1 1
111 1111 1 11111 1
1
1 111 1 1 1 1
1 1 1 1 11 1 1 1 1
1 1 1 1
111111111 1 1 1 1 111
11 1 1 1 1 1 1 1 1 2 22 2222222 2 2 22 22 2 22222 2
2 2 2 2 2
11111111111 111111 11
1
11
1
11 111
11 1111111111 11
11 1111 111 111 111
11
1 111 111111111111111111 11 111
1111111111
11 111 11 1111111 22
222 22
22
2222222222
1111
11 1111 111
1 1 111
1 11
1 11
1 11 11111
1 11
11 111 111
111
1 11111
11
11 11
1
1111 11
11
1
11
1 1 1 111
11
1 11
1
11
11
11111
11111111
1
11 1111
11
111
1111 1
11
111
11111
11
1111
11
1111
1
111
1 11
1 1
121
1 22
2222 222 2
222
222222222 22 2222222222222222 222222 2222 22 2 2 2
111111 1111111 1 1
11
1
11
11111
11
1
1
111111
11 111
11 1 1
1 1
11
1
1
111
1 11
1
1 1
1
1 111
1
1
11 1
1
11
11
1
1 1
11
11
111 1
111
1
11
11
11
1
111
1 11111
11
1 11
111
111
1
11
11
1
11
1
111
11
1
11
1
11
1
1
1
1
11
1
1
111
11
11 1
1 1
1
1
1
1
1
1
1
1
1
1
1
11
11
1
1
1 1
1 11 1
1
111
1 1 1
1
11
1
1
1 1111 1
1111
1 1
11 1 1
1 1
11
1
111112
2 2
2
2 2
2 2 222
222222
22
2 2
2
2222222222
2 2222
222 22222 22 22 2222 22 2222222 22 2 22 2 2 2
222
1 1 1 1 1 1 111
1 1 1 11
1 11
1 111
11 1 111
1111
1 11
1 1 11
1
11 11
11
1 1 111
1 11 11
1111 1111
111
11 1 1 1111 11 1 11
111 1 11 1 1 1
11111 11 1 2 222 2 2 2 22
2222 2 22 222
22222222222 2
2222 22 2 2
222222222 2222222222222 2 2 22 222
11111111 111 11111 1
11 1 1 1 11
11
1 1 111
11 11111 11 1 1 111 11
111 11
1111111111 111111111111
111 1111111
11 11 111 111111 1111 1
112 222 22 2222222222222 222222222222222 2 2222 22222222
22
dc 2

11
111 11111 11
1 11
1111111 11 111111
11 1 1
11 1 11
1 1
111
1111
111
11
11
1
111111
1 1 11 11 1
111
11 1 1
11
1 1 1
11 11
1
11
1
111 11 111
11
1 11 1
11
11111111 11 222 2222
22 2 22
2 222 222222222222 22
22222 222222 2222
222
222 22222
111
11111
111111111 1 11
1 1 1 11
1
111111 1
1111
11
11
11 1
11
111111 1
11
1
11
111
111
1 11
1
111
1 11
1
1111
11
11111
11
1111
1 1
11
111
1 1
111111
1 11
1 11
11
111111
1
11
1 1
1111
1
111
1
11
1
111
111
1
111
111
11
1111 1
11
11
1
1
11
1 11
1
11111 1
1
1111
11111 111
1 1
11 11
11
11111
1111
1
2222
1 2 22
22222 22
222 22 2222
22 22
22
222
222222 2222222 22
2222
22 222222222222222
22222 2222 22 2222222222222222 222222222
1 1 1 1 111
1 1 1 111
111
11
11
1 1 11111 11
1 1
111
1 11 11 11
11
1 1 1 1 1 11 1 111 1
11
1 1
1
11 111
11 11111111
1111
1 1 111 1 1 111 11
11 1111 111 1222 2222 2
2 22 2 2 2
2 2
2 22 2 2 22
22 22
2 22 2 2 22 2 2 2
22 2 2 2 2 22 2 2
1111111
11
1 1111111
1111 111 111 1
111
11 11111
11 111111111111
111 1111
11 111 11111111 111111 1111 1111111111111 1111 111 1112 22 222222222 2 22222222
22 2 2
2
1 11 111 111
1
1111
11111 111 111111
11
1 11 1111 11
1 11 1 11111 111111111 1111111112 22 222 222 22 22 22 2
111 111111 111
11111 11 111
11 11 1 11111 11111 11 1 111
111111111111 1 1 11 2 2 22
−5

1 11 1
1 11111
11 11111111
11 11111 1111111 1111 1 11 111
2
1 11111111 111 1 1 1 1 2 2 2
1 1111 11 1 1 1 1 2
11
1 1 1 2
1 2
1 1
1
−10

111

1
11

−2 0 2 4 6 8 10

dc 1

Fig. 1. Clusters of model

Table 3 presents the minimum, maximum, average, standard deviation and coeffi-
cient of variation of each variable used to host the clusters in 3 .

Table 2. Distributions for each cluster

Clusters Fields Min Max Average StDev CoeVariation

0 73 31.45 16.55 52.63%
Cluster 1
0 38 11.94 2.71 22.73%
(30017
0 100 14.18 7.26 51.2%
rows)
0 99 22.04 6.57 29.82%
Cluster 2 73 200 114.72 32.62 28.43%
(3006 3.9 22 11.00 1.68 15.3%
rows) 1.4 46 13.22 5.27 39.88%
0.2 84 17.88 6.24 34.87%

4 Conclusion

This study identified a set of variables that have a great similarity. These variables are
related with Plateau Pressure - variable with greater influence in the occurrence of
barotrauma. The better result was achieved with the model obtaining a Davies-
Bouldin Index of 0.64, a value near to the optimum value (0).
It should be noted that most of the variables used presented some dispersion how-
ever in one of the clusters the higher dispersion value is quite acceptable: 19.42. This
result was obtained with the implementation of the K-means algorithm. The CDYN is
one of the variables that most influences the clustering, demonstrating a strong
Clustering Barotrauma Patients in ICU–A Data Mining Based Approach 127

relationship with the PPR and Barotrauma. From the results shown in Table 3 it can
be noted that the best corresponding field is CDYN, presenting only a few intersect-
ing values (minimum and maximum). This means that Cluster 1 has CDYN values
ranging between [0; 73] and Cluster 2 has CDYN values between [73;200]. The re-
maining fields used have only few interceptions. Finally, this study demonstrated the
feasibility of creating clusters using only data monitored by ventilators and analyzing
similar populations. These results motivate further studies in order to induce more
adjusted models reliable for classification and clustering at the same time.

Acknowledgements. This work has been supported by FCT - Fundação para a Ciência e
Tecnologia within the Project Scope UID/CEC/00319/2013 and the contract PTDC/EEI-
SII/1302/2012 (INTCare II).

References
1. Koh, H., Tan, G.: Data mining applications in healthcare. J. Healthc. Inf. Manag. 19(2),
64–72 (2005)
2. Anzueto, A., Frutos-Vivar, F., Esteban, A., Alía, I., Brochard, L., Stewart, T., Benito, S.,
Tobin, M.J., Elizalde, J., Palizas, F., David, C.M., Pimentel, J., González, M., Soto, L.,
D’Empaire, G., Pelosi, P.: Incidence, risk factors and outcome of barotrauma in mechani-
cally ventilated patients. Intensive Care Med. 30(4), 612–619 (2004)
3. Al-Rawas, N., Banner, M.J., Euliano, N.R., Tams, C.G., Brown, J., Martin, A.D., Gabriel-
li, A.: Expiratory time constant for determinations of plateau pressure, respiratory system
compliance, and total resistance. Crit Care 17(1), R23 (2013)
4. Boussarsar, M., Thierry, G., Jaber, S., Roudot-Thoraval, F., Lemaire, F., Brochard, L.: Re-
lationship between ventilatory settings and barotrauma in the acute respiratory distress
syndrome. Intensive Care Med. 28(4), 406–413 (2002)
5. Oliveira, S., Portela, F., Santos, M.F., Machado, J., Abelha, A., Silva, A., Rua, F.: Predict-
ing plateau pressure in intensive medicine for ventilated patients. In: Rocha, A., Correia,
A.M., Costanzo, S., Reis, L.P. (eds.) New Contributions in Information Systems and
Technologies, Advances in Intelligent Systems and Computing 354. AISC, vol. 354, pp.
179–188. Springer, Heidelberg (2015)
6. Portela, F., Santos, M.F., Machado, J., Abelha, A., Silva, A., Rua, F.: Pervasive and intel-
ligent decision support in intensive medicine – the complete picture. In: Bursa, M., Khuri,
S., Renda, M. (eds.) ITBAM 2014. LNCS, vol. 8649, pp. 87–102. Springer, Heidelberg
(2014)
7. Turban, E., Sharda, R., Delen, D.: Decision Support and Business Intelligence Systems. 9a
Edição. Prentice Hall (2011)
8. Anderson, R.K.: Visual Data Mining: The VisMiner Approach, Chichester, West Sussex,
U.K., 1st edn. Wiley, Hoboken (2012)
9. Han, J., Kamber, M., Pei, J.: Data Mining Concepts and Techniques. 3a Edição. Morgan
Kaufmann (2012)
10. Xindong, W., Vipin, K.: The Top Ten Algorithms in Data Mining. CRC Press–Taylor &
Francis Group (2009)
Clinical Decision Support for Active and Healthy Ageing:
An Intelligent Monitoring Approach
of Daily Living Activities

Antonis S. Billis1, Nikos Katzouris2,3, Alexander Artikis2,

and Panagiotis D. Bamidis1()
1
Medical Physics Laboratory, Medical School, Faculty of Health Sciences,
Aristotle University of Thessaloniki, Thessaloniki, Greece
{ampillis,bamidis}@med.auth.gr
2
Institute of Informatics and Telecommunications, National Center for Scientific Research
‘Demokritos’, Aghia Paraskevi, Greece
{nkatz,a.artikis}@iit.demokritos.gr
3
Department of Informatics and Telecommunications,
National Kapodistrian University of Athens, Athens, Greece

Abstract. Decision support concepts such as context awareness and trend anal-
ysis are employed in a sensor-enabled environment for monitoring Activities of
Daily Living and mobility patterns. Probabilistic Event Calculus is employed
for the former; statistical process control techniques are applied for the latter
case. The system is tested with real senior users within a lab as well as their
home settings. Accumulated results show that the implementation of the two
separate components, i.e. Sensor Data Fusion and Decision Support System,
works adequately well. Future work suggests ways to combine both compo-
nents so that more accurate inference results are achieved.

Keywords: Decision support · Unobtrusiveness · Sensors · Context awareness ·

Trend analysis · Statistical process control · Event calculus

1 Introduction
Europe’s ageing population is drastically increasing in numbers [1], thereby bearing
serious health warnings such as dementias or mental health disorders such as depres-
sion [2]. Hence, the immediate need for early and accurate diagnoses becomes appar-
ent. Ambient-Assisted Living (AAL) technologies can provide support to this end [3].
However, most of these research efforts fail to either become easily acceptable by
end-users or be useful at a practice level; the obtrusive nature of the utilized technolo-
gies invading the daily life of elder adults is probably the one to be blamed [4]. To
this end, the approach followed in this paper, which is also aligned with the major
objective of the USEFIL project [5], is to apply remote monitoring techniques within
an unobtrusive sensor-enabled intelligent monitoring system. The first part of the
intelligent monitoring system is an event-based sensor data fusion (SDF) module,

© Springer International Publishing Switzerland 2015

F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 128–133, 2015.
DOI: 10.1007/978-3-319-23485-4_14
C
Clinical Decision Support for Active and Healthy Ageing 129

while the second part consiists of two major components, i) the trend analysis commpo-
nent and ii) a higher leveel formal representation model based on Fuzzy Cognittive
Maps (FCMs) [5]. The aim m of this paper is to present a feasibility study of SDF and
Trend Analysis componentts in real life settings and evaluate their capability of inntel-
ligent health monitoring.

2 Materials and Methods

2.1 The USEFIL Platfo

orm
The USEFIL intelligent monitoring
m system (cf. Fig. 1) comprises of three differrent
c sensors provide unobtrusive low level information, e.g.
layers of processing. Low cost
activity, mood and physiolo ogical signs. Event fusion module is the intermediary layyer,
which combines multimodaal low level events and translates them into contextual in-
formation. A server-side Deecision Support System consumes time stamped contexttual
information and projects th hem in the long run, producing alerts, upon recognitionn of
data abnormalities or health deteriorating trends. This information is channeledd to
seniors or their carers via usser-friendly interfaces.

Fig. 1. Intelligent Monitoring components

2.2 Sensor Data Fusion

n
The role of the data fusion
n component in USEFIL is to interpret sensor data intto a
semantic representation of the user’s status. Its tasks range from contextualizationn of
sensor measurements to chaaracterization of functional ability. We employ a Compplex
Event Recognition method dology [7], which allows to combine heterogeneous ddata
sources by means of eventt hierarchies. In our setting, input consists of a streamm of
low-level events (LLEs) – such as time-stamped sensor data, and output consistss of
130 A.S. Billis et al.

recognized complex, or high-level events (HLEs), that is, spatio-temporal combina-

tions of simpler events and domain knowledge. Our approach is based on the Event
Calculus [8], a first-order formalism for reasoning about events and their effects. To
address uncertainty, we ported the Event Calculus in the ProbLog language [10], as in
[9]. ProbLog is an extention of Prolog, where inference has a robust probabilistic
semantics.
Constructing patterns (rules) for the detection of an HLE, amounts to specifying its
dependencies with LLEs and other HLEs. As an example, we use the case study of
Barthel-scoring of Activities of Daily Living (ADL), adapted from [12]. ADL refers
to fundamental self-care activities, while the Barthel Index [11] is considered as the
“golden standard” for assessing functional ability in ADL. The Transfer ADL refers
to the ability of a person to sit down or get up from a bed or chair. The corresponding
scores in the Barthel Index are evaluations of the performance in this task, based on
the ability to perform and the amount of help needed.

Fig. 2. An event hierarchy for the Transfer ADL

Fig. 2 presents an event hierarchy for the transfer ADL, developed in USEFIL. The
leaves in the tree structure represent LLEs, obtained from sensor measurements, while
each node represents an HLE. According to this representation, to Barthel-score the
transfer ADL (root node), one should determine whether the user changed position,
while receiving help for this task, taking into account the ease and safety with which
the user performs. Each of these indicators (position change, help offered, ease-
safety) is represented by an HLE, defined in terms of LLEs and other HLEs in lower
levels of the hierarchy. The reader is referred to [12] for a detailed account of the
implementation of such a hierarchy in the probabilistic Event Calculus.

2.3 Decision Support Module

Health trends identification is based on statistical process control principles. Each
parameter may be modeled as a random process with a time-varying mean value and
standard deviation. In this work we have followed two steps, namely the baseline
extraction and the identification of acute events.
A personalized baseline profile is computed through time-series analysis and statis-
tical process control concepts. More specifically, each variable under investigation
(e.g. walking speed) is modeled as a time-series, random process with a time-varying
mean value and standard deviation. The computations involved are the following:
Clinical Decision Support for Active and Healthy Ageing 131

Time-series observations are divided into n overlapping windows. The mean ( x ) and the
standard deviation ( r ) of each time window are computed. Then, the mean value and the
standard deviation of the entire process are averaged based on the individual runs:
̂ (1)
The baseline profile is also consisted of a confidence interval for both the process mean
value and the standard deviation. These intervals are defined by the following limits:
̂ ̂
lim lim (2)
√ √

Ongoing monitoring of the process under consideration is facilitated through the cha-
racterization of a further follow-up period based on comparison against the control
limits of the baseline process. Single runs that are out of the control limits are consi-
dered as acute events.

3 Data Collection
In the process of system integration and pilot setup, an e-home like environment was
established serving as an Active & Healthy Aging (AHA) Living Lab (see Fig. 3). A
total of five (5) senior women aged 65+ (mean 74.6±3.85 years) were recruited. All
users provided voluntary participation forms to denote that they chose to participate to
this trial voluntarily after being informed of the requirements of their participation.
The ability of independent living was assessed by the Barthel index. The real testing
and use of the environment took place for several days. Seniors executed several ac-
tivities of everyday life in a free-form manner, meaning that they were left to perform
activities without strict execution orders.

Fig. 3. Performance of directed activities, interaction with the system

Apart from the lab environment, the system was also installed in home of lone-
living seniors for several days lasting from one to three months. Four (4) elderly
women aged 75.3±4.1 years provided their informed consent for their participation in
the home study. Recordings over several days in these senior apartments measured,
among others, gait patterns, emotional fluctuations and clinical parameters.

4 Results
4.1 Short-Term Monitoring – Scoring of ADLs
In order to evaluate the SDF module, the Transfer ADL was extracted for each senior.
Carers examined seniors and assessed all of them as totally independent, with a
132 A.S. Billis et al.

Barthel score equal to “3”, which is the ground truth for all cases. Therefore, an over-
all confusion matrix for all five seniors is built in Table 1. As shown, SDF several
times scores seniors as needing help with the Transfer Activity (scores “1” and “2”),
although they are totally independent. This paradox can be attributed to the presence
of a facilitator during the monitoring sessions.

4.2 Long-Term Monitoring – Gait Trends

Data from an initial period of two to four weeks were used so as to calculate the base-
line for each elderly participant. The rest of the period was used as a "follow up pe-
riod", where the actual monitoring was examined. Walking speed as measured by the
Kinect sensor was used as the gait parameter to monitor in the long run. Fig. 4 illu-
strates the baseline process formation of a senior suffering from mobility problems
due to osteoporosis. During the monitoring period there are significant deviations
from the baseline process (42.4% of days were out of control). This means that older
woman’s walking speed decreases with respect to her baseline period. Walking
speed levels decrease might correlate to early health risk signs, such as falls.

Table 1. Confusion matrix of ADL Transfer scoring for all 5 participants

Predicted class
ADL ADL ADL
TransferScore1 TransferScore2 TransferScore3
ADL 11 7095 6358
Actual
Class

TransferScore3

Fig. 4. Walking speed control chart. Yellow line: lower control limit, Red line: upper control
limit, Blue continuous line: baseline period, Dots: follow up days.

5 Discussion
In this paper, mechanisms towards a truly intelligent and unobtrusive monitoring
system for active and healthy aging were demonstrated together with a sample of the
first series of results. Short-term context awareness was tested with the ADL scenario,
Clinical Decision Support for Active and Healthy Ageing 133

while long-term trend analysis with the gait patterns scenario. In the first case, there
were many false positives, due to the challenging, unconstrained nature of the expe-
riment. Personalized thresholds would help SDF algorithm to avoid scoring Barthel
equal to “1”. Trend analysis’ baseline extraction and process control limits would
possibly refine latter inference results. On the other hand, long term analysis, may
benefit by the SDF output, since outliers found to be out of control, could possibly be
annotated as logical “noise” through context awareness. This way pathological values
may be interpreted as normal based on a-priori knowledge of the context.
This whole notion is remarkably appealing, as it could lead to potential applica-
tions where the synergy between the short-term component of the SDF and the long-
term Trend analysis component may prove pivotal. Further data collection from home
environments, will prove pivotal upon integrating successfully the two components.

Acknowledgements. This research was partially funded by the European Union's Seventh
Framework Programme (FP7/2007-2013) under grant agreement no 288532. (www.usefil.eu).
The final part of this work was supported by the business exploitation scheme of LLM, namely,
LLM Care which is a self-funded initiative at the Aristotle University of Thessaloniki (www.
llmcare.gr). A.S. Billis also holds a scholarship from Fanourakis Foundation (http://www.
fanourakisfoundation.org/).

References
1. Lutz, W., O’Neill, B.C., Scherbov, S.: Europe’s population at a turning point. Science
28(299), 1991–1992 (2003)
2. Murrell, S.A., Himmelfarb, S., Wright, K.: Prevalence of depression and its correlates in
older adults. Am. J. Epidimiology 117(2), 173–185 (1983)
3. Kleinberger, T., Becker, M., Ras, E., Holzinger, A., Müller, P.: Ambient intelligence in as-
sisted living: enable elderly people to handle future interfaces. In: Stephanidis, C. (ed.)
UAHCI 2007 (Part II). LNCS, vol. 4555, pp. 103–112. Springer, Heidelberg (2007)
4. Wild, K., Boise, L., Lundell, J., Foucek, A.: Unobtrusive in-home monitoring of cognitive
and physical health: Reactions and perceptions of Older Adults. Applied Gerontology
27(2), 181–200 (2008)
5. https://www.usefil.eu/. Retrieved from web at 29/05/2015
6. Billis, A.S., Papageorgiou, E.I., Frantzidis, C.A., Tsatali, M.S., Tsolaki, A.C., Bamidis,
P.D.: A Decision-Support Framework for Promoting Independent Living and Ageing Well.
IEEE J. Biomed. Heal. Informatics. 19, 199–209 (2015)
7. Opher Etzion and Peter Niblett. Event processing in action. Manning Publications Co. (2010)
8. Kowalski, R., Sergot, M.: A logic-based calculus of events. In: Foundations of Knowledge
Base Management, pp. 23–55. Springer (1989)
9. Skarlatidis, A., Artikis, A., Filippou, J., Paliouras, G.: A probabilistic logic programming
event calculus. Journal of Theory and Practice of Logic Programming (TPLP) (2014)
10. Kimmig, A., Demoen, B., De Raedt, L., Santos Costa, V., Rocha, R.: On the implementa-
tion of the probabilistic logic programming language ProbLog. In: de la Banda, M.G., Pon-
telli, E. (eds.) Theory and Practice of Logic Programming, vol. 11, pp. 235–262 (2011)
11. Collin, C., Wade, D.T., Davies, S., Horne, V.: The barthel ADL index: a reliability study.
Disability & Rehabilitation 10(2), 61–63 (1988)
12. Katzouris, N., Artikis, A., Paliouras, G.: Event recognition for unobtrusive assisted living.
In: Likas, A., Blekas, K., Kalles, D. (eds.) SETN 2014. LNCS, vol. 8445, pp. 475–488.
Springer, Heidelberg (2014)
Discovering Interesting Trends in Real Medical
Data: A Study in Diabetic Retinopathy

Vassiliki Somaraki1,2(B) , Mauro Vallati1 , and Thomas Leo McCluskey1

1
School of Computing and Engineering, University of Huddersﬁeld, Huddersﬁeld, UK
{v.somaraki,m.vallati,t.l.mccluskey}@hud.ac.uk
2
St. Paul’s Eye Unit, Royal Liverpool University Hospital, Liverpool L7 8XP, UK

Abstract. In this work we present SOMA: a Trend Mining framework,

based on longitudinal data analysis, that is able to measure the interest-
ingness of the produced trends in large noisy medical databases. Medical
longitudinal data typically plots the progress of some medical condition,
thus implicitly contains a large number of trends. The approach has been
evaluated on a large collection of medical records, forming part of the
diabetic retinopathy screening programme at the Royal Liverpool Uni-
versity Hospital, UK.

1 Introduction
Knowledge discovery is the process of automatically analysing large amount vol-
umes of data searching for patterns that can be considered as knowledge about
the data [2]. In large real-world datasets, it is possible to discover a large num-
ber of rules and relations, but it may be diﬃcult for the end user to identify
the interesting ones. Trend mining deals with the process of discovering hidden,
but noteworthy, trends in a large collection of temporal patterns. The number of
trends that may occur –especially in large medical databases– is huge. Therefore,
a methodology to distinguish interesting trends is imperative.
In this paper we report on a framework (SOMA) that is capable of perform-
ing trend mining in large databases and evaluating the interestingness of the
produced trends. It uses a three-step approach that: (i) exploits logic rules for
cleaning noisy data; (ii) mines the data and recognises trends, and (iii) evalu-
ates their interestingness. This work encompasses a previous preliminary study
of Somaraki et al. [10]. The temporal patterns of interest, in the context of this
work, are frequent patterns that feature some prescribed change in their fre-
quency between two or more “time stamps”. A time stamp is the sequential
patient consultation event number, i.e., the date in which some medical features
of the patient have been checked and registered. We tested SOMA on the diabetic
retinopathy screening data collected by The Royal Liverpool University Hospi-
tal, UK, which is a major referral centre for patients with Diabetic Retinopathy
(DR). DR is a critical complication of diabetes, and it is one of the most common
cause of blindness in working age people in the United Kingdom.1 It is a chronic
1
http://diabeticeye.screening.nhs.uk/diabetic-retinopathy

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 134–140, 2015.
DOI: 10.1007/978-3-319-23485-4 15
Discovering Interesting Trends in Real Medical Data 135

disease affecting patients with Diabetes Mellitus, and causes significant damage
to the retina.
The contribution of this paper is twofold. First, we provide an automatic app-
roach for evaluating the interestingness of temporal trends. Second, we test the
ability of the SOMA framework in automatically extracting interesting tempo-
ral trends from a large and complex medical database. The extracted interesting
trends have been checked by clinicians, who confirmed their interestingness and
potential utility for the early diagnosis of diabetic retinopathy.

2 Background

Association Rule Mining (ARM) is a popular, and well researched, category

of data mining for discovering interesting relations between variables in large
databases. In (ARM) [1], an observation or transaction (e.g. the record of a
clinical consultation) is represented as a set of items where an item is an attribute
- value pair. Given a set of transactions, we can informally define an Association
rule (AR) as a rule in the form of X =⇒ Y . The support of an AR is defined as
the percentage of transactions
that contain both X and Y . It can also be defined
as the probability P X ∩ Y . Finally, the confidence of an AR is defined as the
ratio between the number of transactions that contain X ∪ Y and the number
of transactions that contain X.
ARM procedures contain two stages: (i) frequent item set identification, and (ii)
AR generation. Piatetsky-Shapiro [8] defined ARM as a method for the descrip-
tion, analysis and presentation of ARs, discovered in databases using different
measures of interestingness. ARM is concerned with the discovery, in tabular
databases, of rules that satisfy defined threshold requirements. Of these require-
ments, the most fundamental one is concerned with the support (frequency) of
the item sets used to make up the ARs: a rule is applicable only if the relationship
it describes occurs sufficiently often in the data.
The topic of mining interesting trends has recently grown due to the availabil-
ity of large and complex databases. Liu et al. [6] introduced the concept of general
impressions. General impressions are if-then clauses that describe the relation
between a condition variable and a class value, and reflect the users knowledge
of the domain. Based on the users knowledge, the rules can be classified as:
unexpected (previously unknown), confirming (confirms previous knowledge) or
actionable (can be fruitfully exploited).
Geng and Hamillton [3] used the term “interesting measures” to facilitate
a general approach to automatically identifying interesting patterns. They used
this term in three ways, or roles, to use their terminology. First, the measures
can be used to prune uninteresting patterns during the mining process. Second,
measures can be used to rank the patterns according to their interestingness,
and finally they are used during post-processing to select interesting patterns.
136 V. Somaraki et al.

3 The SOMA Framework

The SOMA framework receives raw data from a range of repositories. Firstly,
data are pre-processed: data cleansing, creation of data timestamps, selection
of subsets for analysis and the application of logic rules take place. Then, by
analysing pre-processed information, frequent patterns are generated and trend
mining is performed in order to identify interesting trends.
Pre-processing
Medical data values are usually from continuous domains. The presence of con-
tinuous domains makes it difficult to apply the frequent item set techniques.
For this reason, in SOMA pre-processing discretisation is applied. Discretisation
allocates continuous values into a limited number of intervals, called bands [5,7].
Bands can be either defined by domain experts –in our analysis, physicians– or
automatically identified.
Real data, particularly from medical repositories, are usually variegated
(text, discrete and continuous values) and noisy. In order to deal with miss-
ing data, we adopted logic rules, based on expert knowledge. Logic rules are a
sequence of if-then-else cases that consider the values of various related fields
for identifying missing ones. Given the considered context, clinicians considered
logic rules as the most appropriate method to fill in the data.
Finally, pre-processing performs the task of creating time stamped datasets.
Data from different sources are usually not collected with the same frequency,
thus is it complex to automatically define a clear association between data and
time stamps. In fact, this process is domain-specific, and must be addressed by
designing domain-specific solutions crafted by domain experts.
Processing
The main step of the SOMA framework consists of two processes: (i) association
rule mining (ARM), and (ii) trend generation and categorisation.
The ARM process is repeated for every time stamp, and performs the fol-
lowing tasks: (i) identify rules; (ii) evaluate interestingess of rules, and (iii) filter
rules accordingly to their evaluation. Rules are created in the usual AR form:
X =⇒ Y . On frequent item sets, characteristics that measure interestingness
are computed. Finally, rules are filtered; only those which are both frequent
and interesting are kept. The corresponding threshold are determined by users,
according to the amount of knowledge they want to extract.
In applications involving large medical datasets, efficient processing is a fun-
damental factor. Since the ARM process is repeated for every time step, the tech-
nique must be capable of efficiently identifying rules. Given that, we decided to
exploit matrix algorithm principles [11] to efficiently identify frequent rules with
acceptable confidence, which only requires one pass through the whole dataset.
The subsequent mining of trends is implemented by considering relations on
the vectors of support count, in order to show how the support count for each rule
changes over time. We considered the following well-known relations: Increasing,
Discovering Interesting Trends in Real Medical Data 137

decreasing, constant, jumping and disappearing. For further information, the

interested reader is referred to [9].
Measuring Interestingness
Strong rules based on support and confidence of association rules are not always
interesting. The pitfall of confidence can be traced to the fact that its defini-
tion ignores the support of the right-hand (Y) part of the rule. To determine
which trend is interesting or not, we exploited measures introduced by Han
et al. [4]. They propose the lift correlation and a set of pattern evaluation mea-
sures, in order
to mathematically
determine
interestingness. Lift is calculated as
: Lif t X, Y = P X ∪ Y /P X · P Y . Where Lif t = 1, it indicates that X
and Y are independent; a value greater than 1 indicates a positive correlation,
a Lif t value smaller than 1 refers to negative correlation. The pattern evalu-
ation measures proposed in [4] are: all confidence, max confidence, Kulczynski
and cosine. Each of them has the following property: its values is only influenced
by the supports of X, and Y, but not by the total number of transactions. Their
values range from 0 to 1 and the higher their value the closer the relationship
between X and Y. We selected them since they analyse different aspects of data.
The values of those measures are calculated during the trend mining process
from SOMA. For each trend, a matrix with dimension 5 × T , where T is the
number of time stamps is built. There is one line for each measure and in the first
column the values for the first time are stored, in the second column the values
for the second time stamp and so on. The values are in binary form following
that if the value of the measure is equal or greater to its threshold then the
value at the matrix is 1 and 0 otherwise. The maximum score for each trend
is 5 × T which is the sum of the elements of the matrix if a trend has values
equal or greater than the threshold for all measures for all time stamps. The final
score is transformed to a percentage. Thus, according to the threshold provided
by the users through a process of empirical validation, rules are deemed to be
interesting. Further details can be found in [9].

4 Experimental Analysis
In this work we considered the data of the Saint Paul’s Eye Clinic of the Royal
Liverpool University Hospital, UK. The data (anonymised in order to guarantee
patients’ privacy) was collected from a warehouse with 22,000 patients, 150,000
visits to the hospital, with attributes including demographic details, visual acuity
data, photographic grading results, data from biomicroscopy of the retina and
results from biochemistry investigations. Stored information had been collected
between 1991 and 2009. Data are noisy and longitudinal; they are repeatedly
sampled and collected over a period of time with respect to some set of subjects.
Typically, values for the same set of attributes are collected at each sample
points. The sample points are not necessarily evenly spaced. Similarly, the data
collection process for each subject need not necessarily be commenced at the
same time.
138 V. Somaraki et al.

In our experimental analysis we considered all the 1420 patients who had
readings over 6 time stamps, 887 patients over 7 time stamps, and 546 patients
on 8 time stamps. The number of patients decreases when time windows are
larger. This is due to the fact that not all the patients are followed by the Clinic
for the same amount of time. For instance, a significant amount of patients from
the database had 2 visits only; clearly, this does not allow us to derive any
meaningful information about general trends. The percentage of missing values
is 9.67% for the test with 6 time stamps, 13.16% for the test with 7 time stamps
and 18.26% for the test with 8 time stamps.
The medical experts working with us required that the experiments focused
on 7 medical features that they believed to be important: age at exam, treat-
ment of the patient, diabetes type, diabetes duration, age at diagnosis, presence
of cataract and presence of DR. Features that represent time are continuous,
and have been discretised in bands. In particular, age at exam and at diagno-
sis have been discretised in bands of approximately 10 years; duration features
are discretised in 5-year bands. This was done following advice from medical
experts. These medical features have some known relationships with regards to
the diagnosis of diabetic retinopathy. Therefore, selecting the aforementioned
features, and following medical experts’ indications, allow us to validate the
SOMA framework in the following way: if the known interesting trends are iden-
tified by SOMA, it increases our confidence that the process is finding valid and
interesting knowledge. The interested reader can find more information about
the validation process in [9].
For each time stamp we used as support threshold 15% and for confidence
80%. The threshold for lift is 1.5 and for the other measures is set to 0.75. The
overall score threshold is set to 80% or where 24 out of the total 30 entries must
be 1. Such thresholds have been identified by discussing with medical experts,
and by performing some preliminary analysis on small subsets of the available
data. It should be noted that different thresholds can significantly affect the set of
identified interesting rules. Lower thresholds lead to larger numbers of potentially
less interesting rules output, while higher thresholds results in a very small –but
highly interesting– set of rules. A major influence on setting parameter values was
to guarantee that the configuration would produce already known associations
and trends, following the aforementioned validation approach.

Results
SOMA was implemented and executed in a MATLAB environment. On the
considered data, six interesting medical clauses regarding DR were discovered:

1. If a diabetic patient has not developed cataract it is not likely to develop

diabetic retinopathy.
2. The younger a patient is diagnosed with diabetes the more likely it is that
this patient will develop diabetic retinopathy.
3. If a patient has suﬀered from diabetes type 2 for more than 20 years it is
very likely that this patient will develop diabetic retinopathy.
Discovering Interesting Trends in Real Medical Data 139

Table 1. Scores, with regards to considered metrics, of the six medical clauses at
different time stamps (1–6). All Conf stands for the All confidence criteria. Max Conf
indicates the Max confidence criteria.

Clause 1 Clause 2 Clause 3 Clause 4 Clause 5 Clause 6

123456 1 23456 1 23456 1 23456 1 23456 123456
Lift 111111 1 11111 1 11111 1 11111 1 11111 111111
All Conf 1 1 1 0 0 0 1 00101 1 00011 1 00001 1 00101 100001
Max Conf 1 1 1 1 1 1 1 01111 1 11111 1 00111 1 00111 100111
Kulc 111110 1 00100 1 10111 1 10001 1 10101 110001
Cosine 1 1 1 1 0 0 1 00100 1 00011 1 00111 1 00101 100001

4. If a patient suffers from diabetes type 1 it is very likely that this patient will
develop diabetic retinopathy.
5. If a patient suffers from diabetes type 2 and is on insulin treatment, this
patient is likely to suffer from diabetic retinopathy.
6. If a patient suffers from diabetes type 2 and the duration of diabetes is longer
than 20 years, this patient is likely to develop diabetic retinopathy.
Table 1 shows the value of the clauses, with regards to the considered criteria,
per time stamp. Given a row, it is possible to assess if the corresponding crite-
rion changed its value over time. The maximum score that a clause can get by
considering 6 time stamps and 5 criteria is 30; therefore, given the threshold of
80%, at least 24 values should be 1. According to Table 1, only the first medical
clause achieves an overall score which is above the threshold. Therefore, clause 1
is deemed to be the most interesting. When using 7 or 8 time stamps, the overall
interestingness reduced respectively to 73% and 64%. This is possibly due to the
smaller number of considered patients, and to the different impact of missing
values on the different datasets. Interestingly, for every considered clause, the
lift value is well above the threshold; this means that in rules in the form of
X =⇒ Y , X and Y of the medical clauses are positively correlated, and there
is associations between X and Y for all medical clauses. It should also be noted
that the confidence of the reverse rules is below the threshold: that explains
why Kulczynski and cosine and All conf could not exceed the threshold, and
indicates that there is not a strong relevance in the reverse rule. On the con-
trary, SOMA revealed a very good confidence also for the inverse rule of clause
1. Ophthalmologists of the Saint Paul’s Eye Unit confirm that this result is very
interesting, and it highlights a cause-and-effect relationship between cataract of
diabetic patients and diabetic retinopathy.
It is well known that diabetes has many factors that affect its progress,
and not all of them will necessarily appear in databases. However, in general,
according to the clinicians of Saint Paul’s Eye Unit, the first two clauses appear
to provide new evidence of previously unknown relations, and are thus worth
investigating further. The other 4 clauses fit with accepted thinking, and so while
not being actionable knowledge, provide validation to the approach described in
this work.
140 V. Somaraki et al.

5 Conclusion
In this paper we described SOMA, a framework that is able to identify inter-
esting trends in large medical databases. Our approach has been empirically
evaluated on the the data of the Saint Paul’s Eye Clinic of the Royal Liverpool
University Hospital. SOMA is highly configurable. In order to meaningfully set
available values, we involved medical experts in the process. In particular, we set
parameters in order to allow SOMA to find previously known interesting trends.
We used this as a heuristic to indicate that the previously unknown interest-
ing trends identified by SOMA within the same configuration may be valuable.
As clinicians confirmed, SOMA was able to identify suspected relations, and to
identify previously unknown causal relations, by evaluating the interestingness
of corresponding trends in the data.
Future work includes applying SOMA to other medical databases, and the
investigation of techniques for visualising trends and results.

Acknowledgments. The authors would like to thank Professor Simon Harding and
Professor Deborah Broadbent at St. Paul’s Eye Unit of Royal Liverpool University
Hospital for providing information and support.

References
1. Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In:
Proc. 20th Int. Conf. Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)
2. Frawley, W.J., Piatetsky-Shapiro, G., Matheus, C.J.: Knowledge discovery in
databases: An overview. AI Magazine 13(3), 57 (1992)
3. Geng, L., Hamilton, H.J.: Interestingness measures for data mining: A survey. ACM
Computing Surveys (CSUR) 38(3), 9 (2006)
4. Han, J., Kamber, M., Pei, J.: Data mining: Concepts and techniques, (the morgan
kaufmann series in data management systems) (2006)
5. Kotsiantis, S., Kanellopoulos, D.: Discretization techniques: A recent survey.
GESTS International Transactions on Computer Science and Engineering 32(1),
47–58 (2006)
6. Liu, B., Hsu, W., Chen, S.: Using general impressions to analyze discovered clas-
siﬁcation rules. In: KDD, pp. 31–36 (1997)
7. Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: An enabling technique.
Data Mining and Knowledge Discovery 6(4), 393–423 (2002)
8. Piatetsky-Shapiro, G.: Discovery, analysis and presentation of strong rules. In:
Knowledge Discovery in Databases, pp. 229–238 (1991)
9. Somaraki, V.: A framework for trend mining with application to medical data.
Ph.D. Thesis, University of Huddersﬁeld (2013)
10. Somaraki, V., Broadbent, D., Coenen, F., Harding, S.: Finding temporal patterns
in noisy longitudinal data: a study in diabetic retinopathy. In: Perner, P. (ed.)
ICDM 2010. LNCS, vol. 6171, pp. 418–431. Springer, Heidelberg (2010)
11. Yuan, Y.B., Huang, T.Z.: A matrix algorithm for mining association rules. In:
Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644,
pp. 370–379. Springer, Heidelberg (2005)
Artificial Intelligence
in Transportation Systems
A Column Generation Based Heuristic
for a Bus Driver Rostering Problem

Vítor Barbosa1(), Ana Respício2, and Filipe Alvelos3

1
Escola Superior de Ciências Empresariais do Instituto Politécnico de Setúbal,
Setúbal, Portugal
[email protected]
2
Centro de Matemática, Aplicações Fundamentais e Investigação Operacional,
Faculdade de Ciências da Universidade de Lisboa, Lisbon, Portugal
[email protected]
3
Departamento de Produção e Sistemas da Universidade do Minho, Braga, Portugal
[email protected]

Abstract. The Bus Driver Rostering Problem (BDRP) aims at determining op-
timal work-schedules for the drivers of a bus company, covering all work du-
ties, respecting the Labor Law and the regulation, while minimizing company
costs. A new decomposition model for the BDRP was recently proposed and the
problem was addressed by a metaheuristic combining column generation and an
evolutionary algorithm. This paper proposes a new heuristic, which is inte-
grated in the column generation, allowing for the generation of complete or
partial rosters at each iteration, instead of generating single individual work-
schedules. The new heuristic uses the dual solution of the restricted master
problem to guide the order by which duties are assigned to drivers. The know-
ledge about the problem was used to propose a variation procedure which
changes the order by which a new driver is selected for the assignment of a new
duty. Sequential and random selection methods are proposed. The inclusion of
the rotation process results in the generation of rosters with better distribution
of work among drivers and also affects the column generation performance.
Computational tests assess the proposed heuristic ability to generate good quali-
ty rosters and the impact of the distinct variation procedures is discussed.

Keywords: Rostering · Column generation · Heuristic

1 Introduction

Personnel scheduling or rostering [1] consists in defining a work-schedule for each of

the workers in a company during a given period. A roster is a plan including the sche-
dules for all workers. An individual work-schedule defines, for each day, if the work-
er is assigned to work or has a day-off and, in the first case, which daily duty/shift has
to be performed. The rostering problem arises because the company usually has
diverse duties to assign on each day, sometimes needing particular skills and, on
the other hand, the Labour Law and company rules (days-off, rest time, etc.)

© Springer International Publishing Switzerland 2015

F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 143–156, 2015.
DOI: 10.1007/978-3-319-23485-4_16
144 V. Barbosa et al.

restrict the blind assignment of duties to workers. Rostering is addressed in many

types of business as recently surveyed in [2] and also some years ago in [3].
The Bus Drivers Rostering problem (BDRP) and most rostering problems are NP-
Hard combinatorial optimization problems [4, 5], being computationally challenging
to obtain optimal solutions. Many authors address rostering problems with heuristic
methods which are usually faster in the achievement of good solutions comparatively
to exact methods [4, 6, 7].
The BDRP occurs in the last phase of the transportation planning system, which
also includes timetabling, vehicle scheduling and crew scheduling in order to know
the drivers demand in each day [8]. It is concerned with the assignment of duties (set
of consecutive trips and rest times defining a day of work, previously generated) to
drivers, respecting the labour/contractual rules and pursuing the bus company inter-
ests in optimizing the drivers use.
Considering the BDRP model proposed in [4], a new decomposition model was
proposed for the problem in [9], as well as a new metaheuristic based in the Search-
Col framework [10]. In the proposed metaheuristic, column generation and an evolu-
tionary algorithm are used to obtain valid solutions for the problem. The column
generation is used to build a pool of schedules for the drivers, resulting from the sub-
problems' optimization (individual work-schedules), and also to get information about
the quality of those schedules (considering their contribution in the optimal linear
solution of the column generation).
This paper proposes the integration of a new heuristic in the column generation
exact method [11]. The combination of exact and heuristic methods is not new. Ac-
cording to the classification proposed in [12], our combination can be included in the
“integrative combinations”, since the heuristic is incorporated in the normal cycle of
the column generation, but it can also be classified as a sequential “collaborative
combination” since the column generation helps the heuristic, and the heuristic
returns new solutions to the column generation.
The new heuristic solves all the subproblems together, avoiding the multiple as-
signments of the duties to more than one driver, as happens when the subproblems are
solved independently. The main contribution of this new heuristic is that it is capable
to obtain integer and good quality solutions for the complete problem while perform-
ing column generation, without harming its performance. A secondary contribution is
that the search-space composed by the solutions obtained with this heuristic is richer
in complementary solutions that can be further explored with SearchCol algorithms
[10].
The way the heuristic builds rosters or schedules is simple. The novelty is in using
the dual solution of the restricted master problem to guide the heuristic. The dual
solution is used to set the order by which the duties are selected to be assigned, as
well as it is used to define the order by which the drivers are picked to test the as-
signment of the duty on their schedule. Some variations on the heuristic behaviour are
tested where knowledge about the problem is used to obtain rosters with the overtime
distributed more evenly between the drivers.
Computational tests show the impact on the column generation performance and
on the integer solutions obtained by three configurations of the proposed heuristic.
A Column Generation Based Heuristic for a Bus Driver Rostering Problem 145

In the next section the decomposition model for the BDRP is introduced. Section 3
introduces the column generation method, the improvements made by using an heuristic
to solve the subproblems and the global heuristic used to solve all the subproblems
together. Section 4 presents the computational tests run in a set of BDRP instances,
using three configurations of the global heuristic. Section 5 provides some conclusions.

2 BDRP Model for Column Generation

The adopted model for the BDRP is an integer programming formulation adapted
from the one proposed in [4]. The complete adapted compact model and the decom-
position model were presented in [9]. The model is only concerned with the rostering
stage, assuming that the construction of duties was previously done by joining trips
and rest times to obtain complete daily duties ready to assign to drivers.
In the decomposition model, for each driver, the model considers a set of feasible
schedules, represented by the columns built with subproblems’ solutions. The set of
all the possible valid columns can be so large, making impossible its enumeration.
Therefore, only a restricted subset of valid columns are considered, leading to the
formulation of a restricted master problem (RMP) of the BDRP decomposition model.

RMP Formulation:
∑ ∑ (1)

Subject to:

∑ ∑ 1, , 1, … ,28, (2)

∑ 1, , (3)

0,1 , , . (4)

Where:
– Binary variable associated to the schedule j of driver v, from set of drivers V;
– Set of valid schedules for driver v (generated by subproblem v);
– Cost of the schedule j obtained from the subproblem of driver v;
– Assumes value 1 if duty i of day h is assigned in the schedule j of driver v;

In this model, the valid subproblem solutions are represented as columns, with cost
for the solution with index j of the subproblem v, with the assignment of duty i on
day h, if =1;
is the set of work duties available on day h.
The objective function is to minimize the total cost of the selected schedules, the
first set of constraints, the linking constraints (2), assure that all duties, in each day,
are assigned to someone and the last set of constraints, the convexity constraints (3),
assure that a work-schedule is selected for each driver/subproblem.
146 V. Barbosa et al.

To give some context about the subproblem constraints for the next sections, we
describe below the constraints included in its formulation. To see the complete model
with the description of the variables and data, we recommend the reading of [9]. The
constraints are the following:
─ A group of constraints assures that, for each day of the rostering problem, a duty is
assigned to the driver (the day-off is also represented as a duty);
─ A group of constraints avoids the assignment of incompatible duties in consecutive
days (if a driver works in a late duty on day h, the minimum rest time prevents the
assignment of an early duty on day h+1). A subset of these constraints considers
information from last duty assigned on the previous roster to be considered on the
first day assignment;
─ A group of constraints avoids the assignment of sequences of work duties that do
not respect the maximum number of days without a day-off. A subset of these con-
straints also considers information from the last roster to force the assignment of
the first day-off considering the working days on the end of the previous rostering
period;
─ A group of constraints forces a minimum number of days-off in each week of the
rostering period and also a minimum number of days-off in a Sunday during the
rostering period;
─ Another group of constraints sets limits on the sum of the working time units each
driver can do in each week and in all the rostering period;
─ A constraint is used to apply a fixed cost whenever a driver is used (at least one
work duty is assigned in the driver’ schedule).

3 Column Generation Heuristic

In this section an overview of the column generation method is presented to explain

how the decomposition model described in the previous section can be solved. After,
some improvements to the implementation of the algorithm are detailed. Those im-
provements intent to reduce the computational time observed when using the standard
implementation. The last subsection presents a new heuristic which is combined with
the column generation to solve all the subproblems simultaneously, that obtains com-
plete or partial rosters.
Column Generation (CG) is a well-known exact solution method used to obtain so-
lutions to problems where the number of variables is huge compared to the number of
constraints. An overview of the origins and evolution of column generation can be
found in [13]. For a comprehensive description we propose the reading of [11].
Usually the CG is used to solve problems modeled by a decomposition model
where an original model is decomposed in a master problem and some subproblems.
Dantzig-Wolfe decomposition [14] is commonly used to obtain the new model.
In the master problem of the decomposition model, new variables represent all the
possible solutions of all the subproblems. To avoid the enumeration of all those variables,
a restricted master problem is considered. The CG is used to obtain the subproblems
solutions that may have a contribution to the improvement of the global solution.
A Column Generation Based Heuristic for a Bus Driver Rostering Problem 147

Generally, the CG consists in iteratively optimize the restricted master problem

(RMP) to obtain an optimal linear solution (using a simplex algorithm from a solver
or similar algorithm). The dual solution of the linking and convexity constraints of the
RMP is used to update the objective function of the subproblems. The subproblems
are solved with the new objective function values and the suproblems’ solutions with
negative reduced cost are added as new columns (variables) in the RMP, starting the
next iteration. When no new column with negative reduced cost is found, the optimal
solution is reached and the algorithm ends.

3.1 Improving Column Generation

When using the general CG a tailing-off effect is commonly observed. It consists in a
slow approximation to the optimal solution [11]. If a high number of iterations are
expected, one approach to reduce the global computational time is by reducing the
time in each iteration. One option is a deviation of the normal cycle, by changing the
number of subproblems solved in each iteration or deciding if all columns are added
to the RMP or only the best ones. In the framework presented in [10] these configura-
tions are allowed when running the CG algorithm.
Besides changing the normal path of the CG, a usual approach is to use efficient
combinatorial algorithm or heuristics to solve the subproblems, if available, reducing
considerably the optimization time. Multiple examples are found in the literature
where dynamic programming [15], constraint programming [16] and heuristics [17]
are used to obtain subproblem’ solutions.
Considering the computational time of the CG in the optimization of the decomposi-
tion model for the BDRP presented in [9] and since multiple configurations of the CG
algorithm path are already available in the framework where the algorithm is being im-
plemented, we started using an heuristic to obtain valid solutions for the subproblems.
The heuristic used to solve the subproblems of the BDRP decomposition model is
based in the decoder algorithm proposed in [4]. The objective of the heuristic is to
build schedules with the highest contribution to improve the global solution.
The heuristic described in Figure 1 builds a schedule for a driver trying to assign
the duties with the most negative costs (after the update of the objective function with
the dual solution of the RMP) following a greedy behavior.
A duty i cannot be included in the schedule if:
─ The day of duty i has already a duty assigned;
─ Duty i is incompatible with the assigned duty on (day of duty i)+1 or (day of duty
i)-1 considering the minimum rest time between duties;
─ Assignment of duty i makes a sequence of working days (without a day-off) longer
than the maximum allowed;
─ Assignment of duty i exceeds the maximum of working hours allowed by week or
for all the rostering period;
─ Assignment of duty i makes it impossible to have the minimum number of days-off
in each week of the rostering period or the minimum number of days-off on Sun-
days in all the rostering period.
148 V. Barbosa et al.

Get dual solution from RMP optimization ( );

Update objective function of the subproblem;
Order updated costs (costs[]) in increasing order, keeping information from original
position of duty i (origDuty[i]);
Build empty schedule for the rostering period size;
Initialize driver data: working time (total and week);
FOR i=0 to size of costs[]
IF costs[i]>0 THEN
Next i;
Assign=TestAssignment(origDuty[i]);
IF Assign=true THEN
set driver as full in the day of origDuty[i];
Update schedule: add original cost of origDuty[i] to schedule;
Update driver data: add origDuty[i] time length to total working time and
corresponding week working time;
FOR d=0 to number of days of the rostering period;
IF no duty was assigned to driver on day d THEN
Assign a day-off to driver on day d;
IF number assigned duties >0 THEN
Update schedule: add fixed cost of driver use;
Return schedule;
Fig. 1. Driver Schedule Builder Heuristic Algorithm

The function TestAssignment used in the heuristic algorithm tests all the conditions
previously enumerated, which represent the constraints of the subproblems formula-
tion. If any of the conditions fails, the function returns false and only if all the condi-
tions are verified the function returns true, allowing the assignment of the duty to the
schedule of the driver represented by the subproblem.
Having an heuristic to obtain solutions to the subproblems, the column generation al-
gorithm is changed to use the heuristic, since it does not replace the exact optimization
solver, because the solutions of the heuristic are not optimal, only valid. The resulting
algorithm is presented in Figure 2 and details the column generation using the heuristic.

DO
Optimize RMP;
Update subproblems objective function with current dual solution of the RMP;
FOR EACH subproblem
Solve using heuristic;
Add new columns into the RMP with subproblems attractive solutions;
IF no new columns added THEN
FOR EACH subproblem
Solve using exact optimization solver;
Add new columns into the RMP with subproblems attractive solutions;
WHILE new columns added >0
Fig. 2. Column Generation with Subproblem Heuristic Algorithm
A Column Generation Based Heuristic for a Bus Driver Rostering Problem 149

In the new configuration of the column generation cycle, the heuristic is used until
no new columns are added from the obtained solutions. At that point, the exact opti-
mization solver is used to obtain the optimal solutions of the subproblems and even-
tually add new attractive columns. In the next iteration the heuristic is tested again.
In the SearchCol++ framework, the algorithm presented in Figure 2 can have other
configurations. It is possible to solve only a single subproblem in each iteration, op-
timize the RMP again and, in the following iteration solve the next subproblem, iterat-
ing by all the subproblems. This strategy results in less columns added to the RMP
when the subproblems are returning similar solutions, allowing a faster optimization
of the RMP, due to a reduced number of variables.

3.2 New Rosters Using Column Generation

Although the improvements to the column generation presented in the previous sec-
tion, the objective pursued in [9] was to use subproblems’ solutions to build good
quality rosters by searching the best combination of schedules covering all the duties.
When the BDRP decomposition model was implemented in [9], the standard col-
umn generation spent the time set as limit for the column generation and not even the
optimization of the RMP using the branch-and-bound method from a commercial
solver considering the new column as binary variables was able to achieve low cost
rosters.
The use of the heuristic to solve the subproblems was introduced in [18] and an
improved version of an evolutionary algorithm was tested to search for valid rosters in
the space of solutions resulting from the column generation using the heuristic to
solve the subproblems in both configurations: solving all the subproblems in each
iteration or solving only one. The improved algorithm behaved better in the search
space obtained from the configuration where all the subproblems are solved in each
iteration. The search space from that configuration is larger, but similar solutions are
repeated for more than one driver/subproblem, allowing finding the best combination
of good schedules more easily.
The results obtained in [9, 18] suggest that it is hard to find a combination of sche-
dules that fit together covering all the duties and avoiding over-assignment (a duty
assigned to more than one driver) without some additional information in the column
generation.
To assure the existence of complementary schedules between each other, we now
present a new heuristic that, in the column generation cycle, assigns the duties consi-
dering all the subproblems together, as a single one. The cycle does not change, how-
ever, instead of generating individual schedules one by one, a new heuristic is called
to generate a feasible combination of schedules as well as the schedules per se. The
primary purpose of solving the subproblems in an aggregated way was to assure the
existence of complete or partial rosters without the over-assignment of duties when-
ever the column generation was stopped. If the solutions included in the initial popu-
lation of the evolutionary algorithm are already valid rosters, the expected result of
the evolution is a better roster.
150 V. Barbosa et al.

The heuristic presented in Figure 3 is able to build rosters by testing the assignment
of each of the available duties in the schedules of free drivers. Since in each iteration
of the column generation a new dual solution is used to update the costs of the duties
in the subproblems, the order in which the duties are assigned may vary from iteration
to iteration. The objective is that the dual solution of the RMP can guide the genera-
tion of distinct, and valid, rosters through the iterations.
When using the aggregated heuristic in the column generation algorithm in Figure 2,
the cycle solving the subproblems is replaced by a single call to the new heuristic,
which returns schedules for all subproblems/drivers. The exact solver continues to be
used when no new attractive columns are built from the heuristic solutions.

Get dual solution from RMP optimization (π);

Order duties (duties[]) in ascending order of the dual solution value of the linking
constraints, keeping information from original position of duty i (origDuty[i]);
Order drivers (drivers[]) in ascending order of the dual solution value of the convex-
ity constraints;
Build an empty schedule for the rostering period size to each of the available drivers
(subproblems);
Initialize drivers data: working time (total and week);
FOR i=0 to size of duties[]
FOR v=0 to number of drivers
Select schedule of driver[v]
Assign=TestAssignment(origDuty[i], schedule [driver[v]]);
IF Assign THEN
Set driver v as full in the day of origDuty[i];
Update schedule: add original cost of origDuty[i] to driver[v]’ schedule;
Update driver v data: add origDuty[i] time length to total working time
and corresponding week ;
EXIT FOR
FOR v=0 to number of drivers
FOR d=0 to number of days of the rostering period;
IF no duty was assigned to driver v on day d THEN
Assign a day-off to driver v on day d;
IF number assigned duties to driver v >0 THEN
Update driver v schedule: add fixed cost of driver use;
Return schedule[];
Fig. 3. Roster Builder Heuristic Algorithm

The BDRP model defines a cost to each unit of time of overtime which may be dif-
ferent to all drivers. However, in our test instances, the drivers are split in a limited
number of categories. All the drivers in the same category have the same cost for the
overtime labor. This means that we still want to assign first the duties with bigger
overtime to the drivers from the category with lower cost of overtime, if possible.
However, we want to distribute them among all, avoiding the schedules with extra
days-off because of a large concentration of duties with overtime.
A Column Generation Based Heuristic for a Bus Driver Rostering Problem 151

Although the ability of the Roster Builder Heuristic to generate valid and distinct ros-
ters, preliminary tests showed that the schedules of the first drivers were filled with the
duties with higher overtime. Even if we want to assign the duties with higher overtime
to drivers with lower salary, which are the first group in the set of all drivers, if the as-
signment starts always from the same driver, his/her schedule will be filled with the
duties with larger overtime, resulting in an unbalanced work distribution.
Given the existence of different drivers’ categories, concerning the value paid by
overtime labor, drivers of the same category are grouped and the dual solution values
of the convexity constraints are used to order them inside each group.
To assure that when the dual values of the convexity constraints do not lead to the
desired diversity in the order of the driver inside each group, we added an additional

Get dual solution from RMP optimization (π);

Order duties (duties[]) in ascending order of the dual solution value of the linking
constraints, keeping information from original position of duty i (origDuty[i]);
Split drivers in groups with the same category of salary;
Order drivers inside each group according to the dual solution value of the corres-
ponding convexity constraint;
Build an empty schedule for each of the available drivers (subproblems);
Initialize drivers data: working time (total and week);
FOR i=0 to size of duties []
FOR g=0 to size of groups of drivers
Select starting driver position according to configuration r= (0 or 1 or ran-
dom);
FOR j=0 to r
Rotate drivers inside group (remove from the begin and add to the end);
FOR v=0 to number of drivers in group g
Select schedule of driver v
Assign=TestAssignment(origDuty[i], schedule[v]);
IF Assign THEN
Set driver v as full in the day of origDuty[i];
Update schedule: add original cost of origDuty[i] to driver[v]’ sche-
dule;
Update driver v data: add origDuty[i] time length to total working
time and corresponding week;
EXIT FOR
IF Assign THEN EXIT FOR
FOR v=0 to number of drivers
FOR d=0 to number of days of the rostering period;
IF no duty was assigned to driver v on day d THEN
Assign a day-off to driver v on day d;
IF number assigned duties to driver v >0 THEN
Update driver v schedule: add fixed cost of driver use;
Return schedule[];
Fig. 4. Roster Builder Heuristic with drivers’ rotation Algorithm
152 V. Barbosa et al.

procedure to select the first driver inside each ordered group. We started considering
each group of drivers as a circular array. After that, two configuration were prepared
to define how a driver is selected when a new duty needs to be assigned.
By default, when a new duty is selected for assignment, the driver to select is the
one in the position 0 of the first group. We developed two configurations of the Ros-
ter Builder Heuristic with drivers’ rotation, namely the sequential and the random
configurations. In both, after the assignment of a duty, we rotate the drivers inside the
group, the first is removed and inserted at the end. In the sequential configuration, the
rotation is of a single position, and in the random configuration, the number of posi-
tions rotated is randomly selected between one and the number of drivers in the group
minus one, to avoid a complete rotation to the same position.
The inclusion of the rotation leads to a better distribution of the duties with over-
time among the group drivers. Figure 4 presents the algorithm of the roster builder
heuristic with drivers’ rotation. The changes are: the inclusion of the groups of driv-
ers, the selection of the configuration: ‘normal’ – without rotations; ‘sequential’ – to
rotate one position, picking the drivers sequentially; ‘random’ - using the stochastic
selection by rotating the driver inside the group using a random number of positions.
If a new roster built by the heuristic is better than the best found in previous iterations
of the metaheuristic, the best is updated accordingly. The schedules composing the roster
are saved in the poll of solutions whenever considered attractive by column generation.
In the next section the computational tests and the results obtained using this new heu-
ristic (column generation with heuristic solving subproblems aggregated) are presented.

4 Computational Tests
The decomposition model for the adopted BDRP was implemented in the computa-
tional framework SearchCol++ [10]. The BDRP test instances are the ones designated
as P80 in [4]. All the instances have 36 drivers available, distributed by four salary
categories in groups of equal size (9 drivers). All the tests ran on a computer with
Intel Pentium CPU G640, 2,80GHz, 8 Gb of RAM, Windows 7 Professional 64 bits
operating system and IBM ILOG 12.5.1 64 bits installed. In all the test configurations,
only the column generation stage with the use of the new heuristic was run. It allows
to retrieve the lower bounds of the optimal solution (linear), the time consumed to
obtain that solution and the integer solution found by the global heuristic (solve all the
subproblems aggregated).
In both heuristic configurations where the rotation of the drivers is used, we set
that the rotation is not applied in 20% of the iterations (nearly the double of the prob-
ability of a driver to be selected randomly inside each group). In practice, for these
iterations the assignment starts by the first driver of the ordered group keeping the
order defined by the dual values.
Table 1 presents the results obtained from running the three configurations of the
Roster Builder Heuristic in each instance, namely, the computational time used by
the column generation to achieve the optimal solution (Time) and the value (Value)
of the integer solution found. The lower bound (LB) provided by the CG (ceiling
of the optimal solution value) is included in the table (the value is the same for all
A Column Generation Based Heuristic for a Bus Driver Rostering Problem 153

configurations, only the normal configuration was unable to obtain an optimal solu-
tion for the instance P80_6 in the time limit of two hours). For the random configura-
tion, each instance was solved 20 times. In addition to the best value and its computa-
tional time, the table also display the average (Avg) and standard deviation (σ) of the
values and times of the runs.

Table 1. Results from the three configurations of the heuristic

Random Sequential Normal
Time (s) Value Time (s) Value Time (s) Value
Instance LB Best Avg σ Best Avg σ
P80_1 3512 419.0 534.6 158.5 3601 3679.8 34.7 337.9 3695 917.2 3716
P80_2 2703 150.4 145.0 5.7 2819 2823.2 2.8 143.2 2821 128.2 2830
P80_3 4573 271.1 275.3 45.0 4694 4701.2 5.5 260.3 4694 451.2 4697
P80_4 3566 971.3 827.6 225.6 3759 3761.6 3.1 433.1 3755 860.8 3776
P80_5 3465 535.5 768.6 387.2 3608 3612.2 1.7 766.3 3608 988.0 3617
P80_6 3576 1403.0 1249.6 237.8 3650 3655.2 2.8 2929.4 3666 7315.8 3679
P80_7 3703 765.6 761.9 120.5 3840 3886.5 13.8 768.6 3889 741.4 3895
P80_8 4555 2871.7 2901.5 512.4 4809 4813.5 2.7 4387.0 4813 1696.3 4812
P80_9 3501 323.7 356.7 29.4 3594 3603.2 8.8 383.0 3599 495.7 3611
P80_10 4005 1390.7 1398.3 64.5 4183 4305.0 52.2 1224.7 4269 1218.9 4268
Average 910.2 921.9 1163.4 1481.4

Under the Time columns, the average time is presented. The “normal” configura-
tion is penalized by instance P80_6 where the time limit of two hours was reached
before obtaining the optimal solution. The best values (time and value) are displayed
in bold. Generally the computational times of the rotation heuristics are better, how-
ever for the P80_8 the time is considerably higher when comparing both with the
“normal” one. The configurations with rotation were able to reach the best solutions
for all instances, particularly the random configuration, which also reduces the aver-
age computational time by 39% relatively to normal configuration.
The heuristic solutions were compared with the solution value of the optimization
of the compact model using the CPLEX solver with the time limit of 24 hours. Table
2 presents the gaps between the best heuristic solutions with the best known solutions.
Only for the instances where the gaps are marked with bold the optimal solution was
found by the CPLEX solver before the time limit. The gap of the solutions found by
our heuristic is in average 3.2%.

Table 2. Gap of the best heuristic integer solution to the best known solution
Instance Solution Gap
P80_1 3601 2.0%
P80_2 2819 3.9%
P80_3 4694 2.6%
P80_4 3755 5.2%
P80_5 3608 3.0%
P80_6 3650 1.4%
P80_7 3840 3.7%
P80_8 4809 4.9%
P80_9 3594 1.9%
P80_10 4183 3.5%
154 V. Barbosa et al.

The previous results show that all the configurations are able to obtain good quality
rosters for the BDRP instances in test and that the separation of the drivers by catego-
ry groups with the inclusion of the rotation procedure has a significant impact in the
column generation optimization time. Besides that, Table 3 shows the impact on the
roster when changing the configuration used. For all the configurations, the table
presents the average (Avg) units of overtime assigned to a member of the first group
(lower cost), the second column (Δ) presents the maximum difference of overtime
assigned between the drivers and the last column (days-off) presents the number of
extra days-off counted in the schedules of the 9 members of the group.
With the rotation procedures more days-off are counted. However, it is observed
that, in average, additional units of overtime were assigned to the drivers, reaching
one additional unit, when comparing the sequential and the normal heuristics. The
most important change is observed in the uniformity of the distribution of the over-
time, where the random configuration reduces the difference for the normal configu-
ration by 5.6 units of time, and the sequential which reduces that value to less than
half.

Table 3. Comparison of driver’ schedules from first group

Random Sequential Normal
Instance Avg Δ days-off Avg Δ days-off Avg Δ days-off
P80_1 90.6 28 12 91.2 17 9 88.9 40 10
P80_2 85.4 52 9 85.2 26 8 84.2 61 7
P80_3 91.1 6 9 90.9 9 9 90.6 11 9
P80_4 90.3 41 10 90.8 22 9 88.4 44 8
P80_5 93.3 24 9 93.3 13 9 92.3 29 9
P80_6 99.3 36 11 97.6 22 11 96.1 36 9
P80_7 78.4 21 9 81.7 8 9 81.2 11 9
P80_8 84.6 22 9 84.8 10 9 84.9 34 8
P80_9 80.7 54 9 80.8 27 7 79.4 59 4
P80_10 64.1 24 4 65.4 11 1 65.6 39 4
Average 85.8 30.8 9.1 86.2 16.5 8.1 85.2 36.4 7.7

5 Conclusions

In this paper, we presented a new heuristic capable of building good quality rosters to
the BDRP. The heuristic is integrated with the column generation exact optimization
method, using the information from the dual solutions.
In the BDRP, the objective is to define the schedules for all the drivers in the ros-
tering period considered, assuring the assignment of all the duties and optimizing each
driver use, reducing bus company costs.
In the proposed method, a decomposition model is implemented in a framework
and column generation is used to optimize it. The standard optimization of the sub-
problems in the column generation iterations is replaced by a global heuristic which
solves all the subproblems together. The heuristic is guided by the information from
the RMP solution, as it sets the order by which duties are assigned, and also by which
order the drivers are selected when assigning a new duty. Three configurations of
this heuristic are presented: the normal configuration makes use of the dual
A Column Generation Based Heuristic for a Bus Driver Rostering Problem 155

information to guide the assignment of all the duties; the sequential and the random
configurations group the drivers by category, and implement a rotation of drivers
inside the groups (by 1 and a random number, respectively). The last two configura-
tions intend to obtain rosters with a better distribution of work among drivers and
more diversity of schedules.
Computational tests were made in a set of BDRP instances and the results pre-
sented. In the results it is observed that the different configurations of the heuristics
have impact in the performance of the column generation and also that good quality
rosters are obtained by all configurations. The quality of the obtained rosters is eva-
luated by comparison with the best known integer solutions, where the average gap is
3.2%. An evaluation of the schedules of the first group of drivers shows that the rota-
tion procedure has impact in the distribution of overtime among drivers, particularly
when the sequential configuration is used. Besides the better distribution of overtime,
the rotation configuration was able to obtain better solutions by augmenting the aver-
age overtime units assigned to the drivers of the first group (with lower cost), even
with the additional days-off counted. The additional overtime assigned to drivers of
the first group compensates the extra days-off assigned.
Our heuristic with the variation configurations seems to work well in the BDRP in
most of the instances tested, however it is not guaranteed that the rotation is able to
improve the performance of the CG or obtain better solution, as in the instance P80_8
where the computational time increased greatly when comparing with the normal
configuration. If the solutions obtained by the heuristic do not include attractive solu-
tions to the column generation, the computational time can increase.
The proposed heuristic can be used with other problems, provided that there is an
heuristic to solve the subproblems and that it is possible to use it in an aggregated
way. The variation strategies used need to be tailored using knowledge about each
problem.
Future work will focus on tuning this heuristic to improve the column generation
performance and, if possible, obtain better integer solutions for the rostering problem.
We also intend to generate a search-space composed of solutions provided by this
heuristic, so that the concept of the SearchCol can be followed and other metaheuris-
tics can explore the recombination of the obtained rosters (complete or partial) to get
closer to the optimal solutions. Application of the current approach to other rostering
problems is being considered as future work, since minor changes are needed for
adaptation of the general metaheuristic, as well as for the roster generation heuristic
here proposed.

Acknowlegments. This work is supported by National Funding from FCT - Fundação para a
Ciência e a Tecnologia, under the project: UID/MAT/04561/2013.

References
1. Ernst, A.T., Jiang, H., Krishnamoorthy, M., Sier, D.: Staff scheduling and rostering:
A review of applications, methods and models. European Journal of Operational Research
153, 3–27 (2004)
156 V. Barbosa et al.

2. Van den Bergh, J., Beliën, J., De Bruecker, P., Demeulemeester, E., De Boeck, L.:
Personnel scheduling: A literature review. European Journal of Operational Research 226,
367–385 (2013)
3. Ernst, A.T., Jiang, H., Krishnamoorthy, M., Owens, B., Sier, D.: An Annotated Bibliogra-
phy of Personnel Scheduling and Rostering. Annals of Operations Research 127, 21–144
(2004)
4. Moz, M., Respício, A., Pato, M.: Bi-objective evolutionary heuristics for bus driver roster-
ing. Public Transport 1, 189–210 (2009)
5. Dorne, R.: Personnel shift scheduling and rostering. In: Voudouris, C., Lesaint, D., Owusu, G.
(eds.) Service Chain Management, pp. 125–138. Springer, Heidelberg (2008)
6. Burke, E.K., Kendall, G., Soubeiga, E.: A Tabu-Search Hyperheuristic for Timetabling
and Rostering. Journal of Heuristics 9, 451–470 (2003)
7. Respício, A., Moz, M., Vaz Pato, M.: Enhanced genetic algorithms for a bi-objective bus
driver rostering problem: a computational study. International Transactions in Operational
Research 20, 443–470 (2013)
8. Leone, R., Festa, P., Marchitto, E.: A Bus Driver Scheduling Problem: a new mathematical
model and a GRASP approximate solution. Journal of Heuristics 17, 441–466 (2011)
9. Barbosa, V., Respício, A., Alvelos, F.: A Hybrid Metaheuristic for the Bus Driver Rostering
Problem. In: Vitoriano, B., Valente, F. (eds.) ICORES 2013–2nd International Conference on
Operations Research and Enterprise Systems, pp. 32–42. SCITEPRESS, Barcelona (2013)
10. Alvelos, F., de Sousa, A., Santos, D.: Combining column generation and metaheuristics.
In: Talbi, E.-G. (ed.) Hybrid Metaheuristics, vol. 434, pp. 285–334. Springer,
Heidelberg (2013)
11. Lübbecke, M.E., Desrosiers, J.: Selected Topics in Column Generation. Oper. Res. 53,
1007–1023 (2005)
12. Puchinger, J., Raidl, G.R.: Combining metaheuristics and exact algorithms in combinatori-
al optimization: a survey and classification. In: Mira, J., Álvarez, J.R. (eds.) First Interna-
tional Work-Conference on the Interplay Between Natural and Artificial Computation.
Springer, Las Palmas (2005)
13. Nemhauser, G.L.: Column generation for linear and integer programming. Documenta
Mathematica Extra Volume: Optimization Stories, 65–73 (2012)
14. Dantzig, G.B., Wolfe, P.: Decomposition Principle for Linear Programs. Operations
Research 8, 101–111 (1960)
15. Cintra, G., Wakabayashi, Y.: Dynamic programming and column generation based ap-
proaches for two-dimensional guillotine cutting problems. In: Ribeiro, C.C., Martins, S.L.
(eds.) WEA 2004. LNCS, vol. 3059, pp. 175–190. Springer, Heidelberg (2004)
16. Yunes, T.H., Moura, A.V., de Souza, C.C.: Hybrid Column Generation Approaches for
Urban Transit Crew Management Problems. Transportation Science 39, 273–288 (2005)
17. dos Santos, A.G., Mateus, G.R.: General hybrid column generation algorithm for crew
scheduling problems using genetic algorithm. In: IEEE Congress on Evolutionary Compu-
tation. CEC 2009, pp. 1799–1806 (2009)
18. Barbosa, V., Respício, A., Alvelos, F.: Genetic Algorithms for the SearchCol++ framework:
application to drivers’ rostering. In: Oliveira, J.F., Vaz, C.B., Pereira, A.I. (eds.) IO2013 -
XVI Congresso da Associação Portuguesa de Investigação Operacional, pp. 38–47. Instituto
Politécnico de Bragança, Bragança (2013)
A Conceptual MAS Model for Real-Time Traffic Control

Cristina Vilarinho1(), José Pedro Tavares1, and Rosaldo J.F. Rossetti2

1
CITTA, Departamento de Engenharia Civil,
Faculdade de Engenharia da Universidade do Porto, Porto, Portugal
{cvilarinho,ptavares}@fe.up.pt
2
LIACC, Departamento de Engenharia Informática,
Faculdade de Engenharia da Universidade do Porto, Porto, Portugal
[email protected]

Abstract. This paper presents the description of the various steps to analyze
and design a multi-agent system for the real-time traffic control at isolated in-
tersections. The control strategies for traffic signals are a high-importance topic
due to impacts on economy, environment and society, affecting people and
freight transport that have been studied by many researches during the last dec-
ades. The research target is to develop an approach for controlling traffic sig-
nals that rely on flexibility and maximal level of freedom in control where the
system is updated frequently to meet current traffic demand taking into account
different traffic users. The proposed model was designed on the basis of the
Gaia methodology, introducing a new perspective in the approach where each
isolated intersection is a multi-agent system on its own right.

Keywords: Multi-agent system · Traffic signal control · Isolated intersections

1 Introduction

Traffic signal control is considered a competitive traffic management strategy for

improving mobility and addressing environmental issues in urban areas [1]. Neverthe-
less, inefficient operation of traffic lights is a common problem that is certainly expe-
rienced by all drivers, passengers and pedestrians. This problem annoys road users
and negatively affects the local economy.
The research community has been focused on the optimization of traffic signal
plans. A traffic signal plan regulates traffic flow through an intersection. Permission
for one or more traffic streams to move through the intersection is granted during
green-light time intervals. Although there have been relatively successful efforts in
optimizing traffic control, these plans often exhibit some shortcomings during opera-
tion. This is mainly because traffic control systems are often blind to the surrounding
environment, missing the current traffic state, different traffic users and their needs.
In general, the traffic signal controls developed are characterized by lack of flex-
ibility in their control systems, constraining the definition of new green-interval
values or new design structures. Their rigidity is responsible for constraining the ef-
fectiveness of these strategies. So, this work focuses on isolated intersections to have
© Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 157–168, 2015.
DOI: 10.1007/978-3-319-23485-4_17
158 C. Vilarinho et al.

more flexibility in operation because coordinated intersections have the drawback of

having a common cycle time set to meet the needs of the largest and most complex
intersection in a series, which signals at smaller intersections in the series are required
to follow. In case traffic flow pattern changes as well, the common cycle time update
has to be done more slowly.
The main outcome of this research is the development of an approach for control-
ling traffic signals that relies on flexibility and a maximal level of control freedom in
which the system is updated frequently to match current traffic demand and maintain
awareness of the various different traffic users. The use of a multi-agent system
(MAS) approach seems to be a step forward to create a system more autonomous and
cooperative in real-time control without sacrificing the safety of road users or com-
promising operation with a significant computational effort.
Researchers attempting to optimize traffic signal control have investigated a wide
range of approaches, but several operational challenges have not received sufficient
attention from the community. The main problems and challenges at an intersection
with which a traffic signal control should be prepared to cope with so as to have an
effective control strategy are the following: traffic congestion effect, traffic demand
fluctuation, hardware failure, incidents and the mix of different types of road users.
As described, this theme is a very complex system. One way to address the afore-
mentioned issues is to make the traffic control system more intelligent and flexible.

2 Literature Review

MASs have been suggested for many transportation problems such as traffic signal
control. Zheng, et al. [2] describe their autonomy, their collaboration, and their reac-
tivity as the most appealing characteristics for MAS application in traffic manage-
ment. The application of MASs to the traffic signal control problem is characterized
by decomposition of the system into multiple agents. Each agent tries to optimize its
own behavior and may be able to communicate with other agents. The communication
can also be seen as a negotiation in which agents, while optimizing their own goals,
can also take into account the goals of other agents. The final decision is usually a
trade-off between the agent’s own preferences against those of others. MAS control is
decentralized, meaning that there is not necessarily any central level of control and
that each agent operates individually and locally. The communication and negotiation
with other agents is usually limited to the neighborhood of the agent, increasing ro-
bustness [3]. Although there are many actors in a traffic network that can be consi-
dered autonomous agents [4] such as drivers, pedestrians, traffic experts, traffic lights,
traffic signal controllers, the most common approach is that in which each agent
represents an intersection control [3]. A MAS might have additional attributes that
enable it to solve problems by itself, to understand information, to learn and to eva-
luate alternatives. This section reviews a number of broad approaches in previous
research that have been used to create intelligent traffic signal controllers using
MASs. In some work [5-8] it was argued that the communication capabilities of MAS
can be used to accomplish traffic signal coordination. However, there is no consensus
A Conceptual MAS Model for Real-Time Traffic Control 159

on the best configuration for a traffic-managing MAS and its protocol [7]. To solve
conflicts between agents, in addition to communication approaches, work has been
done on i) hierarchical structure, so that conflicts are resolved at an upper level,
ii) agents learning how to control, iii) agents being self-organized.
Many authors make use of a hierarchical structure in which higher-level agents are
able to monitor lower level agents and intervene whenever necessary. In some ap-
proaches [6, 8, 9] there is no communication between agents at the same level. The
higher-level agents have the task of resolving conflicts between lower-level agents
which they cannot resolve by themselves. In approaches (ii) and (iii), agents need
time to learn or self-organize, which may be incompatible with the dynamics of the
environment. Agents learning to control (ii) is a popular approach related to control-
ling traffic signals. One or more agents learn a policy for mapping states to actions by
observing the environment and selecting actions; the reinforcement learning technique
is the most popular method used [4, 10, 11]. The approach of self-organizing agents
(iii) is a progressive system in which agents interact to communicate information and
make decisions. Agent behavior is not imposed by hierarchical elements but is
achieved dynamically during agent interactions creating feedback to the system [12].
Dresner and Stone [13] view cars as an enormous MAS involving millions of hete-
rogeneous agents. The driver agents approaching the intersection request the intersec-
tion manager for a reservation of “green time interval.” The intersection manager
decides whether to accept or reject requested reservations according to an intersection
control policy. Vasirani and Ossowski [14, 15] extended Dresner’s and Stone’s ap-
proach to network intersections. The approach is called market-based in which driver
agents, i.e., buyers, trade with the infrastructure agents, i.e., sellers in a virtual mar-
ketplace, purchasing reservations to cross intersections. The drivers have an incentive
to choose an alternative to the shortest paths.
In summary, since the beginning of this century, interest in application of MAS to
traffic control has been increasing. Further, the promising results already achieved by
several authors have helped to establish that agent-based approaches are suitable to
traffic management control. Most reviewed MASs have focused their attention on
network controllers, with or without coordination, rather than on isolated intersec-
tions. Another issue is that traffic control approaches focus on private vehicle as the
major component of traffic, and may be missing important aspects of urban traffic
such as public transport and soft modes (pedestrian, bicycles).

3 Methodological Approach

The development of a MAS conceptual model for real-time traffic control at an iso-
lated intersection followed a methodology for agent-oriented analysis and design. In
this section an increasingly detailed model is constructed using Gaia [16, 17] as the
main methodology, complemented by concepts introduced by Passos, et al. [18].
The first step is an overview of the scenario description and system requirements. The
Gaia process starts with an analysis phase whose goal is to collect and establish the or-
ganization specifications. The output of this phase is the basis for the second phase,
namely the architectural design, and the third phase, which is a detailed design phase.
160 C. Vilarinho et al.

3.1 Scenario Description

The problem addressed is the control of a traffic signal at an isolated intersection at
which, depending on the intersection topology and the detected amounts of traffic of
various types of road users, the lights regulating traffic streams are to change color to
achieve a more efficient traffic management strategy.
The scenario of the proposed traffic signal control is as follows:

• At time X (e.g., each 5 min) or event Y (e.g., traffic conditions, new topology,
system failure), a request for a new traffic signal plan is created;
• All information about current topology and traffic conditions is updated to generate
new traffic data predictions for the movements of each traffic component. In this way
a new traffic signal plan is defined to meet the new intersection characteristics;
• During processing of the new traffic signal plan, if topology has changed, the stage
design is developed following the new topology;
• The traffic signal plan is selected based on criteria such as the minimum delay, the
system saves the traffic plan information (design, times) and implements it;
• During monitoring, current traffic data are compared with traffic predictions, the
topology is verified and data are analyzed by the auditor, which computes the ac-
tual level of service and informs the advisor of the results. Depending on the re-
sults, the auditor decides if it should make a suggestion for the traffic streams such
as to terminate or to extend the current stage or if a new plan should be requested;
• Depending on the information received, traffic streams can continue with the traf-
fic signal plan or negotiate adjustments to it;

The system is responsible for defining and implementing a traffic signal plan as
well as deciding when to suspend it, in which case it initiates negotiation between
traffic streams to adjust the plan according to traffic flow fluctuations and characteris-
tics (e.g., traffic modes, priority vehicles), or even decides to design a new plan.
As input, the Gaia methodology uses a collection of requirements. The require-
ments can be collected through analyzing and understanding the scenario in which the
organizations are identified, as well as the basic interactions between them to achieve
their goals. For early requirements collection, it uses the Tropos methodology [19], in
which relevant roles, their goals and intentions, as well as their inter-dependencies are
identified and modeled as social actors with dependencies.

3.2 Analysis Phase

The goal of the analysis phase is to develop an overview of the system and capture its
structure. The division into sub-organizations helps finding system entities with spe-
cific goals that interact with other entities of the system and require competencies that
are not needed in other parts of the system. From the diagram of the early require-
ments (Fig.1), 7 actors were found whose goals, soft goals and dependencies are
described below.
A Conceptual MAS Model for Real-Time Traffic Control 161

Collect
Design Traffic Generate Traffic
Information
Stream Data Prediction
Assume traffic Keep movement
In case of topology change data in case of information
(user or event) update no sensor update Reliable prediction
traffic stream Split into
Detect Minimize data traffic
Respect the sensor processing mode Split into vehicle
street topology time occupancy
fault
Provide all
Traffic Traffic Provide actual
Traffic Stream Traffic
Stream Data Traffic Data
information Predictor
Provider Provider
Ask actual
traffic data Provide Traffic
Ask actual Predictions
Verify Topology traffic data Provide actual
Traffic Data Ask actual traffic Generate
Observe objective function
predictions Stage Design
“actual” and prediction
Monitor Generate Stage Sequence
Observe traffic Calculate the Provide Traffic
data “actual” level of service /Auditor Traffic
Planner Information Determine Traffic Signal Time
and prediction of intersection Signal
Planner Choose Traffic Signal Plan
Track system Keep information Provide
Information Ask new plan
performance updated Min/ Max objective function
of the intersection
Assist timely decision Provide Provide
making by advisor actuation Traffic Traffic
Satisfy traffic signal
Advisor Signal Plan
suggestions Stream plan constraints

Evaluate future of plan Apply

(new, adjust, no adjust) Traffic Decide actuation
Signal
Suggest actuation action Plan Negotiate actuation

Restrict actuation Verify transition to next stage

(inter-green) – safety Actor
actions
Satisfy user belief Goal
about traffic light - Softgoal
frustration

Fig. 1. Actors and goal diagram for traffic signal control

The TrafficStreamProvider has the goal to design a traffic stream. Each traffic
stream is described by movements and by lanes assigned to each movement, including
information about traffic sensor locations. To achieve its goal, two soft goals were
defined: to respect the intersection topology and to keep the topology information
updated in case of topology changes (e.g. road works, accidents). The actor should
“provide all traffic stream information” to TrafficDataProvider.
The TrafficDataProvider has the main objective to collect information about traffic
data from sensors installed at a signalized intersection and aggregate data according to
the traffic stream information received. The goal is built upon four sub-goals: to keep
traffic data information updated, to minimize data processing time when dependent
actors are waiting for the information, assume traffic data if no sensor is installed or if
a sensor seems to act strangely. TrafficPredictor requests recent traffic data from this
actor and makes its own traffic predictions. Monitor/Advisor also requests traffic data
from this actor and uses them for early detection of possible problems and improve-
ments at the intersection control.
The TrafficPredictor has the main goal to generate a traffic data prediction for
each movement; this is to optimize signal control for imminent demand rather than
being reactive to current flow. The strategy may include traffic measurements from a
past time period and the current time and uses them to estimate the near future. The
generated traffic prediction should be: reliable for future traffic and comprehensive,
with total values and splits into traffic modes. The actor requests recent traffic data
from TrafficDataProvider and makes its own traffic predictions. TrafficSignalPlanner
requests this actor for recent traffic predictions and uses them to optimize the traffic
plan.
The TrafficSignalPlanner has the following objectives. Generate stage design:
search possible signal group sets that can run concurrently respecting a set of safety
162 C. Vilarinho et al.

constraints; generate stage sequence: once possible stage designs are defined, this
step compiles strategic groupings of stages to have signal plans designed; determine
traffic signal times: for each traffic signal plan, the green-interval durations, inter-
green and cycle lengths are calculated; and choose traffic signal plan, based on a
criterion or a weighted combination. Two soft goals were defined for the objective:
traffic signal plan selection is based on the best objective function and plan design
and timing should be conducted respecting some operational constraints, such as
maximum and minimum cycle lengths. The actor requests TrafficPredictor for recent
traffic predictions and uses these to optimize its traffic signal plan. It provides the
selected plan to TrafficStream to be applied. Advisor asks for a new plan search if the
current plan is not adequate to remain active. Finally, Monitor/Auditor receives traffic
planner information such as traffic predictions and the objective function so it can
monitor independently.
The TrafficStream has three main goals. Apply a traffic signal plan, each traffic
stream assumes a signal state: red, yellow or green according to the plan or the current
actuation action, if it has been defined; negotiate actuation, traffic streams cooperate
to find possible actuation actions following the advisor’s suggestions; and decide
actuation, traffic stream actors together decide an actuation action to implement. To
accomplish its goal, the actor intends: to verify transition to next stage and to satisfy
user beliefs about the traffic light to prevent frustration. The actor receives the se-
lected traffic signal plan from TrafficSignalPlanner to apply it and actuation sugges-
tions from Advisor to guide the negotiation phase. If negotiations are needed, Traffic
Stream actors discuss these among themselves.
The Advisor´s two main objectives are: to evaluate the future of plan, choose a
possible action depending on information received from Monitor/Auditor: find a new
plan, adjust the current plan or continue the implementation; and to suggest actuation
action, if it is decided to adjust the plan through actuation, the actor prepares a rec-
ommendation to guide the actuation process. The suggest actuation action has a soft
goal defined: formulate a recommendation that will restrict the solution space of actu-
ation negotiation. The actor provides actuation suggestions to TrafficStream. It re-
quests a new plan search from TrafficSignalPlanner if the current plan is not adequate
to remain active. Monitor/Auditor sends monitor information to this actor.
The Monitor/Auditor´s four main objectives are: verify topology, check if any topol-
ogy change occurred, and report it to Advisor if so; observe traffic data, “actual” and
prediction; observe objective function; and calculate the level of service of the intersec-
tion. The data acquired through monitoring are used to evaluate if Advisor should be
asked for any plan change. The objectives are complemented with three sub-goals: data
collection to keep information updated, track system performance and assist timely
decision-making by Advisor to exploit every opportunity to improve the intersection
system. The actor requests recent traffic data from TrafficDataProvider and receives
them for early detection of possible problems and improvements at the intersection
control. It receives traffic planner information such as traffic predictions and the objec-
tive function from TrafficSignalPlanner. It sends monitor information to Advisor.
Modeling the environment is one of the agent-oriented methodologies’ major activi-
ties. The environment model can be viewed in its simplest form as a list of resources
that the MAS can exploit, control or consume when working towards the accomplish-
ment of its goal. The resources can be information (e.g., a database) or physical entity
A Conceptual MAS Model for Real-Time Traffic Control 163

(e.g., a sensor). Six resources were defined for the proposed traffic signal control: topol-
ogy, traffic detector, traffic database, traffic prediction, traffic signal plan and traffic
light. The resources are identified by name and characterized by their types of actions.
A partial list of those resources is:

• Topology has the action to read and change when new topology is detected. The
resource contains information regarding intersection topology such as number of
traffic arms, their direction, number of approach lanes in each traffic arm, move-
ments assigned in each lane and traffic detector position;
• Traffic Detector is essential for the system because it contains all traffic data (read)
and also needs to be frequently updated (change) so it can correspond to the real
traffic demand. This makes it possible to know information in each detector about:
current traffic data in lane, number of users type, vehicle occupancy, traffic flow
distribution by movement, lanes without sensor or equipment failure;
Complex scenarios such as this are very dynamic, so the approach presented by Pas-
sos, Rossetti and Gabriel [18] extends Gaia methodology to include Business Process
Management Notation (BPMN) to capture the model dynamics. Business Process (BP)
collects related and structured activities that can be executed to satisfy a goal.
Traffic Stream

Gather
Traffic Data Provider

Provider
Traffic Predictor
Signal Traffic Planner
Traffic Stream
Advisor
Monitor / Auditor

Fig. 2. Collaboration diagram of traffic signal control intersection

164 C. Vilarinho et al.

The diagram in Fig. 2 shows the interactions between the seven participants (actors
of Fig. 1) with message exchanges and includes tasks within participants, providing a
detailed visualization of the scenario. Their interactions with resources are also
present in the diagram.
The actors and goals in diagrams in Fig.1 and Fig.2 help to identify the roles that
will build up the final MAS organization. The preliminary roles model defined first,
as the name implies, is not a complete configuration at this stage, but it is appropriate
to identify system characteristics that are likely to remain. It identifies the basic skills,
functionalities and competences required by the organization to achieve its goals. For
traffic signal control 13 preliminary roles were defined. A partial list of those roles is:

• RequestTrafficSignalPlan role associated with creating all possible traffic signal

plans in order to select one (ChooseTrafficPlan) to be implemented.
• ChooseTrafficPlan role involves deciding on the best plan to choose based on some
criteria.
• The goal of the preliminary interaction protocol is to describe the interactions be-
tween the various roles in the MAS organization. Moreover, the interaction model
describes the characteristics and dynamics of each protocol (when, how, and by
whom a protocol is to be executed).

3.3 Design Phase

The goal of the analysis phase is to define the main characteristics and understand
what the MAS will have to be. In the design phase, the preliminary models must be
completed. The design phase usually detects missing or incomplete specifications or
conflicting requirements demanding a regression back to previous stages of the devel-
opment process.
From the analysis phase, the organization structure is presented in Fig.3. It is a cru-
cial phase and affects the following steps in MAS development. To represent the or-
ganizational structure, we have adopted a graphical representation proposed by Castro
and Oliveira [20] that uses the Gaia concept in UML 2.0 representation.

Fig. 3. Organizational structure for all system

A Conceptual MAS Model for Real-Time Traffic Control 165

There are three types of relationships: “depends on”, “controls” and “peer”. “De-
pends on” is a dependency relationship that means one role relies on resources or
knowledge from the other. “Controls” is an association relationship usually meaning
that one role has an authoritative relationship with the other role, controlling its ac-
tions. “Peer” is a dependency relationship also and usually means that both roles are
at the same level and collaborate to solve problems.
After achieving the structural organization, the roles and interactions of the prelim-
inary model can be fulfilled. To complete the role model, it is necessary to include all
protocols, the liveness and safety responsibilities. In Table 1, one of 13 roles is de-
scribed according to the role schema. For complete definition of interaction protocols,
they should be revised to respect the organizational structure. Table 2 shows the defi-
nition of the “InformPlanEvents" protocol using Gaia notation.

Table 1. Role Model example

Id Properties
Description: role involves monitoring the traffic condition and traffic signal plan for events related
to: Substantial difference: between objective function acceptable value versus current value, or
traffic flow (total and by traffic) prediction versus current; Reach some limit of measure of effec-
tive (such as maximum queue length); Number maximum of plan repetition; Sensor system fault.
After detecting one of these events the RosterPlanMonitor role will request to evaluate what should
be done for EvaluateFutureOfPlan role.
RosterPlanMonitor

- Protocols and activities: CheckForNewPlanEvents, UpdatePlanEventStatus

Send:reportEventStatus, requestPlanEvaluation; Receive: informPlanEvent, sendEvaluationPlanRe-
quest
- Permissions: Read Traffic Detector // to obtain information about sensor system
Read Traffic Database // to obtain traffic information about current condition
Read Traffic Signal Plan // read and compare plan prediction and current
- Responsibilities:
Liveness: RosterPlanMonitor = (CheckForNewPlanEventsW. informsPlanEvent)W || (reportPlanE-
ventStatusW. UpdatePlanEventStatus)W , w is indefinitely
Safety: successful_connection_with_TrafficDetector = true
successful_connection_with_TrafficDatabase = true
successful_connection_with_TrafficSignalPlan = true

Table 2. Protocol model example

Protocol schema: InformPlanEvents

Initiator Role: PlanApply Partner Role: RosterPlanMonitor Input: Open position information
Description: After an event has been detected it is necessary to analyses Output: The plan is analyzed
the plan adequacy. For that it is necessary to send details about taking into account the cur-
position so plan with his current position can be generated. rent traffic information.
CheckForNewPlanEvents, UpdatePlanEventStatus ReceiveMonitorInformation, SendRequest

To better clarify the organization with its roles and interactions, a final diagram is
presented in Fig.4 with the full model, including all protocols and required services
that will be the bases for the roles agents choose.
166 C. Vilarinho et al.

Fig. 4. Final diagram: roles, interactions and services

3.4 Detailed Design

The detailed design phase is the last step, and it is responsible for the most important
output: the full agent model definition for helping the actual implementation of
agents. The agent model identifies the agents from role-interaction analysis. Moreo-
ver, it includes a service model. The model design should try to reduce the model
complexity without compromising the organization rules. To present the agent model,
the dependency relations between agents, roles and services are presented in Fig. 5.

Fig. 5. Full Agent Model

The diagram above should be read as “Traffic Signal Planner agent is responsible to
perform the service Make Plan Static.” Five types of agents were defined. “Traffic
Stream” has n agents, one for each traffic stream of the intersection, so it depends on
A Conceptual MAS Model for Real-Time Traffic Control 167

intersection topology. It means that the agent class “Traffic Stream” is defined to play
the roles PlanApply, NegotiateTrafficActuation and UpdatePlan, and there are between
one and n instances of this class in MAS. After the completion of the design process, the
agent classes defined are ready to be implemented, according to previous models.

4 Conclusions

This paper presents the design of a conceptual model of a multi-agent architecture for
real-time signal control at isolated traffic intersections using Gaia as the main metho-
dology. The main idea is to make rational decisions about traffic stream lights such
that the control is autonomous and efficient under different conditions (e.g., topology,
traffic demand, traffic priority, and system failure). The traffic control of an isolated
intersection has the advantage that each intersection may have an independent control
not limited by neighbors’ control. This allows a control algorithm to be simpler than
one for coordinated intersections and more flexible to define plan design and times.
Comparing the proposed strategy with traditional approaches using MAS, it is
possible to find several differences. Traditional traffic control methods rely on each
agent controlling an intersection within the traffic network. The system usually has a
traffic signal plan defined a priori and the system controls how to perform small ad-
justments such as decreasing, increasing or advancing the green time interval of a
traffic stage. Research is being conducted on using MAS to coordinate several neigh-
boring agent controllers, in either a centralized or distributed system. Another feature
shared by traditional approaches is the agent decision (action selections) based on
learning. From the result of each decision, the learning rule gives the probability with
which every action should be performed in the future.
As introduced before, the present approach is distinct from other works to the ex-
tent that each traffic stream is an agent and each signalized intersection builds upon
independent MASs. Thus, the multitude of agents designed for isolated intersections
create, manage and evolve their own traffic signal plans. Therefore this proposed
multi-agent control brings the benefit of staged designs and sequences being formed
as needed instead of being established a priori. The system structure is flexible, and it
has the ability to adapt traffic control decisions to predictions and react to unexpected
traffic events.
The validation of this traffic control strategy will be performed using a state-of-the-
art microscopic traffic simulator such as, for instance, AIMSUN. The proposed model
was developed from scratch rather than by enhancing an existing model.
Finally, it is not our goal to present the process of designing and implementing a
MAS or promoting the use of Gaia; there is existing research that is much more ade-
quate for that. However, the methodology applied is well-suited to the problem.

Acknowledgment. This project has been partially supported by FCT, under grant
SFRH/BD/51977/2012.
168 C. Vilarinho et al.

References
1. Park, B., Schneeberger, J.D.: Evaluation of traffic signal timing optimization methods us-
ing a stochastic and microscopic simulation program. Virginia Transportation Research
Council (2003)
2. Zheng, H., Son, Y., Chiu, Y., Head, L., Feng, Y., Xi, H., Kim, S., Hickman, M.: A Primer
for Agent-Based Simulation and Modeling in Transportation Applications. FHWA (2013)
3. McKenney, D., White, T.: Distributed and adaptive traffic signal control within a realistic
traffic simulation. Engineering Applications of Artificial Intelligence 26, 574–583 (2013)
4. Bazzan, A.L.C.: Opportunities for multiagent systems and multiagent reinforcement learn-
ing in traffic control. Auton. Agent. Multi-Agent Syst. 18, 342–375 (2009)
5. Katwijk, R., Schutter, B., Hellendoorn, H.: Look-ahead traffic adaptive control of a single
intersection – A taxonomy and a new hybrid algorithm (2006)
6. Choy, M., Cheu, R., Srinivasan, D., Logi, F.: Real-Time Coordinated Signal Control
Through Use of Agents with Online Reinforcement Learning. Transportation Research
Record: Journal of the Transportation Research Board 1836, 64–75 (2003)
7. Bazzan, A.L.C., Klügl, F.: A review on agent-based technology for traffic and transporta-
tion. The Knowledge Engineering Review 29, 375–403 (2013)
8. Hernández, J., Cuena, J., Molina, M.: Real-time traffic management through knowledge-
based models: The TRYS approach. ERUDIT Tutorial on Intelligent Traffic Management
Models, Helsinki, Finland (1999)
9. Roozemond, D.A., Rogier, J.L.: Agent controlled traffic lights. In: ESIT 2000, European
Symposium on Intelligent Techniques. Citeseer (2000)
10. Bazzan, A.L.C., Oliveira, D., Silva, B.C.: Learning in groups of traffic signals. Engineer-
ing Applications of Artificial Intelligence 23, 560–568 (2010)
11. Wiering, M., Veenen, J., Vreeken, J., Koopman, A.: Intelligent Traffic Light Control (2004)
12. Oliveira, D., Bazzan, A.L.C.: Traffic lights control with adaptive group formation based on
swarm intelligence. In: Dorigo, M., Gambardella, L.M., Birattari, M., Martinoli, A., Poli,
R., Stützle, T. (eds.) ANTS 2006. LNCS, vol. 4150, pp. 520–521. Springer, Heidelberg
(2006)
13. Dresner, K., Stone, P.: A Multiagent Approach to Autonomous Intersection Management.
J. Artif. Intell. Res. (JAIR) 31, 591–656 (2008)
14. Vasirani, M., Ossowski, S.: A market-inspired approach to reservation-based urban road traffic
management. In: Proceedings of 8th International Conference on AAMAS, pp. 617–624.
International Foundation for AAMS (2009)
15. Vasirani, M., Ossowski, S.: A computational market for distributed control of urban road traf-
fic systems. IEEE Transactions on Intelligent Transportation Systems 12, 313–321 (2011)
16. Zambonelli, F., Jennings, N.R., Wooldridge, M.: Developing multiagent systems: The
Gaia methodology. ACM T. Softw. Eng. Meth. 12, 317–370 (2003)
17. Wooldridge, M., Jennings, N.R., Kinny, D.: The Gaia methodology for agent-oriented
analysis and design. Auton. Agent. Multi-Agent Syst. 3, 285–312 (2000)
18. Passos, L.S., Rossetti, R.J.F., Gabriel, J.: An agent methodology for processes, the envi-
ronment, and services. In: IEEE Int. C. Intell. Tr., pp. 2124–2129. IEEE (2011)
19. Bresciani, P., Perini, A., Giorgini, P., Giunchiglia, F., Mylopoulos, J.: Tropos: An agent-
oriented software development methodology. Auton. Agent. Multi-Agent Syst. 8, 203–236
(2004)
20. Castro, A., Oliveira, E.: The rationale behind the development of an airline operations
control centre using Gaia-based methodology. International Journal of Agent-Oriented
Software Engineering 2, 350–377 (2008)
Prediction of Journey Destination in Urban Public
Transport

Vera Costa1(), Tânia Fontes1, Pedro Maurício Costa1, and Teresa Galvão Dias1,2
1
Department of Industrial Management, Faculty of Engineering,
University of Porto, Porto, Portugal
[email protected]
2
INESC-TEC, Porto, Portugal

Abstract. In the last decade, public transportation providers have focused on

improving infrastructure efficiency as well as providing travellers with relevant
information. Ubiquitous environments have enabled traveller information sys-
tems to collect detailed transport data and provide information. In this context,
journey prediction becomes a pivotal component to anticipate and deliver rele-
vant information to travellers. Thus, in this work, to achieve this goal, three
steps were defined: (i) firstly, data from smart cards were collected from the
public transport network in Porto, Portugal; (ii) secondly, four different travel-
ler groups were defined, considering their travel patterns; (iii) finally, decision
trees (J48), Naïve Bayes (NB), and the Top-K algorithm (Top-K) were applied.
The results show that the methods perform similarly overall, but are better
suited for certain scenarios. Journey prediction varies according to several fac-
tors, including the level of past data, day of the week and mobility spatiotem-
poral patterns.

Keywords: Prediction · Journey destination · Urban public transports

1 Introduction

In the last decade, Urban Public Transport (UPT) systems have turned to Information
and Communication Technologies (ICT) for improving the efficiency of existing
transportation networks, rather than expanding their infrastructures [3, 7]. Public
transport providers make use of a wide range of ICT tools to adjust and optimise their
service, and plan for future development.
The adoption of smart cards, in particular, has not only enabled providers to access
detailed information about usage, mobility patterns and demand, but also contributed
significantly towards service improvement for travellers [21, 25]. For instance,
inferring journey transfers and destination based on historical smart card data has
allowed transportation providers to significantly improve their estimates of service
usage – otherwise based on surveys and other less reliable methods [8]. As a result,
UPT providers are able to adjust their service accordingly while reducing costs [2].
Furthermore, the combination of UPT and ICT has enabled the development of
Traveller Information Systems (TIS), with the goal of providing users with relevant
© Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 169–180, 2015.
DOI: 10.1007/978-3-319-23485-4_18
170 V. Costa et al.

on-time information. Previous work has shown that TIS have a positive impact on
travellers. For instance, providing on-time information at bus stops can significantly
increase perception, loyalty and satisfaction [5].
The latest developments in ICT have paved the way for the emergence of ubiqui-
tous environments and ambient intelligence in UPT, largely supported by miniaturised
computer devices and pervasive communication networks. Such environments sim-
plify the collection and distribution of detailed real-time data that allow for richer
information and support the development of next-generation TIS [6, 20].
In this context, as transportation data is generated and demand for real-time infor-
mation increases, the need for contextual services arises for assisting travellers, identi-
fying possible disruptions and anticipating potential alternatives [20, 26].
A number of methods have been used for inferring journeys offline (e.g. [1,8,23]).
After the journeys are completed, the application of these methods can support different
analysis, such as patterns of behaviour (e.g. [13,14]) and traveller segmentation
(e.g. [11,12]. In contrast, little research has focused on real-time journey prediction (e.g.
[16]). Contextual services, however, require on-time prediction and, unless explicitly
stated by the user, the destination of a journey may not be known until alighting.
The prediction of journeys based on past data and mobility patterns is a pivotal
component of the next generation of TIS for providing relevant on-time contextual
information. Simultaneously, UPT providers benefit from up-to-date travelling infor-
mation, allowing them to monitor their infrastructures closely and take action.
An investigation of journey prediction is presented, based on a group of bus travel-
lers in Porto, Portugal. Specifically this research focuses on the following questions:

• Is it possible to predict a journey destination of UPT based on past usage? How do

past journeys impact the quality of these predictions?
• What is the variation in journey prediction for groups of travellers with different
mobility characteristics? Why is so important define such groups?
• Are there variations between groups of travellers over time, specifically for differ-
ent days of the week?

In order to answer to these questions, this paper is structured as follows: Section 2

describes the data collection and the algorithms used in the work presented; Section 3
presents and discusses the results and the main findings obtained; final conclusions
and considerations are presented in Section 4.

2 Material and Methods

In order to predict the journey destinations for an individual traveller, three steps were
defined: (i) firstly, data from smart cards were collected and pre-processed from the
public transport network in Porto, Portugal (see section 2.1); (ii) secondly, four differ-
ent groups of users were defined, considering their travel patterns (see section 2.2);
(iii) finally, three different intelligent algorithms were assessed considering different
performance measures (see section 2.3). Figure 1 presents an overview of the overall
methodology applied to perform the simulations. While at this stage the analysis is
Prediction of Journey Destination in Urban Public Transport 171

based on a set of simulations and historical data, the goal is to apply the method for a
timely prediction of destinations and which will be implemented in the scope of the
Seamless project [4]. Thus, the simulations presented in the present paper enable the
evaluation of the importance of groups of travellers, and the best algorithm to use for
predicting journey destinations in a real-world environment.

Data collection

Data preprocessing

Definition of groups

Selection of random groups

Training set Test set

Model search Model search Model search

(J48) (NB) (Top-K)

No
All Repetitions?

Yes

No
Last group?

Yes

Model evaluation

Fig. 1. Methodology overview.

172 V. Costa et al.

2.1 Data
The public transport network of the Metropolitan Area of Porto covers an area
of 1,575 km2 and serves 1.75 million of inhabitants [10]. The network is
composed of 126 buses lines (urban and regional), 6 metro lines, 1 cable line, 3 tram
lines, and 3 train lines [24]. This system is operated by 11 transport providers, of
which Metro do Porto and STCP are the largest.
The Porto network is based on an intermodal and flexible ticket system: the An-
dante. Andante is an open zonal system, based on smart cards, that requires validation
only when boarding. A validated occasional ticket allows for unlimited travel within a
specified area and time period: 1 hour for the minimum 2-zone ticket, and longer as
the number of zones increases. Andante holders can use different lines and transport
modes in a single ticket.
In this work, to perform the simulations two months of data were used, April and May
of 2010 to perform the simulations. Table 1 shows an example of the data collected for
an individual traveller for one week of April 2010. Journey ID is a unique identifier for
each trip, sorted in ascending order by the transaction time. For each traveller (i.e. for
each Andante smart card), the information related with the boarding time (first boarding
on the route), the line (or lines for each trip) and the stop (or stops for each trip) is avail-
able. Each trip could have one or more stages. The first line of the table shows a trip with
two stages. First the traveller uses stop 1716 and line 303 at 11h34 followed the stop
3175 and line 302. Based on these data the route sequence can be rebuilt.

Table 1. Extracted trip chain information for an individual travellers during a week of April,
2010.
Journey ID Date First boarding time Route sequence Stop sequence
of the route (Line ID) (Stop ID)
1036866 10/04/2010 11:24 303 302 1716  3175
1036867 10/04/2010 16:27 203 1622
1036868 10/04/2010 23:14 200 1035
1036869 12/04/2010 09:05 402 1632
1036870 12/04/2010 12:42 203 1695
1036871 12/04/2010 13:44 203 1632
1036872 12/04/2010 19:45 303 1338
1036873 12/04/2010 22:29 400  400 1675  1689
1036874 13/04/2010 09:09 402 1632
1036875 13/04/2010 19:11 206 1338
1036876 14/04/2010 08:45 302 1632
1036877 14/04/2010 12:30 402 1338
1036878 14/04/2010 13:43 203 1632
1036879 14/04/2010 19:11 303 1338
1036880 15/04/2010 09:08 402 1632
1036881 15/04/2010 12:57 203 1695
1036882 15/04/2010 14:04 302501 16321810
1036883 15/04/2010 20:52 303 1338

In order to make a prepossessing of data, an inference algorithm was used [18] to

identify the journey destination. Since the Oporto system is based on the validation at
the entrance only, this pre-processing is required. In this method, the travel origin of a
traveller is the destination of the previous travel of that traveller. However, due to the
Prediction of Journey Destination in Urban Public Transport 173

absence of information, the results obtained with the application of this algorithm
were partially restricted, since data from only one transport provider, the STCP com-
pany, were available. Thus, in order to minimize the error, only data from users with
at least 80% of destinations inferred with success and, on average, two or more vali-
dations per day was used. As a result, the sample consists of 615,647 trips corre-
sponding to 6865 different Andante cards.
The data set consists of a set of descriptive attribute of which three of them were
used. The first attribute represents the code of the origin bus stop. The second attrib-
ute identifies the date, which represents the day of the week for each validation. The
third represents the bus stop as an inferred destination.

2.2 Definition of Groups

Sets of travellers were selected from the main dataset for predicting the destination of
a journey. Four different groups of travellers were defined with different mobility
characteristics. The first three groups (1, 2 and 3) are characterized by patterns of
mobility with different usage characteristics. The last group (4) is composed of travel-
lers without a seemingly travel pattern. The spatiotemporal mobility patterns are
based on two characteristics: primary journeys and journey schedule. The primary
journeys describe spatial regularity, identifying the most frequent journeys in a given
route. The journey schedule assesses temporal regularity, based on the departure
times. Thus, we have:

• Group 1 (G1): includes individual travellers with a regular spatiotemporal pattern.

In this group, individuals with two primary journeys were selected (e.g.
home/work/home or home/school/home). These primary journeys represent, in av-
erage, 74% of the total journeys. Furthermore, to ensure the temporal travelling
regularity, a maximum departure time deviation of one hour was considered. These
restrictions resulted in a group of travellers who have a tendency for a rigid journey
schedule (e.g. professionals);
• Group 2 (G2): includes travellers with a regular spatial pattern but without tempo-
ral regularity. Similar to group one, travellers with two primary journeys (e.g.
home/work/home or home/school/home) were selected. These primary journeys
represent about 72% of the total journeys. In addition, to exclude temporal regular-
ity of the primary journeys, a departure time deviation greater than one hour was
considered. As a result, this group of travellers have a tendency towards a flexible
journey schedule (e.g. students);
• Group 3 (G3): includes travellers with a broader spatial regularity. In this group,
individuals with four primary journeys were considered (e.g. home/work/home and
work/gymnasium/home). These primary journeys represent about 79% of the total
journeys. In contrast to the previous two groups, temporal regularity was not taken
into account;
• Group 4 (G4): is characterized by a non-regular spatial mobility pattern. In this
group, the number of different routes is higher than 50% of the total number of
journeys (e.g. occasional travellers).
174 V. Costa et al.

Table 2 shows the main characteristics of those groups.

Table 2. Characteristics of individual journeys considering different groups of travellers.

Number Number of Frequency of primary Deviation of Representa-
Number of
of different journeys (%) journey tivity of each
total journeys
travellers routes Top 2 Top 4 schedule group in the
(X±SD)
(N) (X±SD) journeys journeys (X±SD) population
G1 200 75.9±12.4 15.8±8.4 73.6% 81.1% 00:17±00:15 a 4.6%
G2 200 91.5±30.9 19.1±12.2 72.1% 79.3% 03:08±01:36 a 9.6%
G3 200 100.8±26.7 19.2±8.5 48.7% 79.1% 02:04±01:49 b 10.5%
G4 200 83.2±35.1 60.0±25.3 0.3% 0.5% - 12.6%
a
for the top 2 trips; b for the top 4 trips.

2.3 Methods
In order to estimate the destination of each traveller, three different algorithms were
analysed: (i) the decision trees (J48); (ii) the Naïve Bayes (NB); and (iii) the Top- K
algorithm (Top-K).
Decision trees represent a supervised approach to classification. These algorithms
are a tree-based knowledge representation methodology, which are used to represent
classification rules in a simple structure where non-terminal nodes represent tests on
one or more attributes and terminal nodes reflect decision outcomes. The decision tree
approach is usually the most useful in classification problems [17]. With this tech-
nique, a tree is built to model the classification process. J48 is an implementation of a
decision tree algorithm in the WEKA system, used to generate a decision tree model
to classify the destination based on the attribute values of the available training data.
In R software, the RWeka package was used.
The Naïve Bayes algorithm is a simple probabilistic classifier that calculates a set
of probabilities by counting the frequency and combinations of values in a given data
set [19]. The probability of a specific feature in the data appears as a member in the
set of probabilities and is calculated by the frequency of each feature value within a
class of a training data set. The training dataset is a subset, used to train a classifier
algorithm by using known values to predict future, unknown values. The algorithm is
based on the Bayes theorem and assumes all attributes to be independent given the
value of the class variable. In this work the e1071 R package was used.
The Top-K algorithm enables finding the most frequent elements or item sets
based on an increment counter [15]. The method is generally divided into counter-
based and sketch-based techniques. Counter-based techniques keep an individual
counter for a subset of the elements in the dataset, guaranteeing their frequency.
Sketch-based techniques, on the other hand, provide an estimation of all elements,
with a less stringent guarantee of frequency. Metwally proposed the Space-Saving
algorithm, a counter-based version of the Top-K algorithm that targets performance
and efficiency for large-scale datasets. This version of the algorithm maintains partial
information of interest, with accurate estimates of significant elements supported by a
lightweight data structure, resulting in memory saving and efficient processing. It
focuses on the influential nodes and discards less connected ones [22]. The main idea
Prediction of Journey Destination in Urban Public Transport 175

behind this method is to have a set of counters that keep the frequency of individual
elements. Invoking a parallelism with social network analysis, the algorithm proposed
by Sarmento et al. [22] was changed; a journey is considered to be an edge i.e. a con-
nection between any node (stop) A and B. The algorithm starts to count occurrences
of journeys. For each traveller, if the new journey is monitored, the counter is up-
dated. Otherwise, the algorithm adds a new journey in your Top-K list. If the number
of unique journeys exceeds 10*K monitored journeys the algorithm follows the space
saving application.
For each algorithm and group of travellers defined previously (see Section 2.2), 15
repetitions were performed. For each repetition, 30 travellers were randomly selected
from each group. Figure 2 shows the average number of journeys for the groups. In
each simulation, the test size is always one and corresponds to the day under evalua-
tion, i (ntest = 1 day), while the train continuously grows with i (ntrain = i-1 day(s)).
Table 3 illustrates this procedure.

*
2010-04-27:
STCP strike
day;
2010-05-14:
visit of pope in
the city.

* *

Fig. 2. Average number of journeys by each group of travellers.

Table 3. Data sets used to train (t) and test (T).

Days
1 2 3 4 5 6 7 8 9 10 (…) n-1 n
i=1 T
i=2 t T
i=3 t t T
(…)
i=n t t t t t t t t t t (...) t T
NOTES: t: days of training set; T: days of the test set.

To evaluate the performance of the different algorithms the Accuracy measure (1),
which represents the proportion of correctly identified results, was used. The basis for
this approach is the confusion matrix, a two-way table that summarizes the performance
of the classifier. Considering one of the classes as the positive (P) class, and other the
negative (N) class, four quantities may be defined: the true positives (TP), the true nega-
tives (TN), the false positives (FP) and the false negatives (FN), we have:
176 V. Costa et al.

(1)

Nonetheless, due to the sparse usage of the transportation network by travellers, in

this work we have an imbalanced sample. In this case, the Accuracy measure by itself
is not the most appropriate to analyse the presence of imbalanced data [9]. Therefore,
the F-score (2), which combines Precision and Recall measures as a weighted aver-
age, was also used. Precision gives an insight on how the classifier behaves in relation
to the positive class, giving a measure of how many positive predicted instances are in
fact positive, while the Recall gives an insight of the classifier's performance on the
positive class, measuring how well the whole positive class is recognized. F-score is
given by:
2 (2)

3 Results and Discussion

Table 4 show the average Accuracy and F-score obtained for journey destination
prediction respectively, by algorithm, group of travellers and day of the week (week-
day vs weekend). The analysis of results shows several similarities and differences
between the different groups of travellers analysed.
Regarding the comparison of the journey destination prediction between weekdays
and weekends, the results shows small differences of performance for the individuals
of Group 4. In this case the Accuracy and F-score for weekdays is around 2% higher
than on weekends. Nevertheless, for the remaining groups (G1, G2 and G3), a clear
difference is observed for these two periods. As example, while during weekdays for
Group 1, the Accuracy is, on average, 77-79% (with exception of disruptive events in
the city, namely on 2010-04-27, a STCP strike day, and on 2010-05-14, a pope visit to
the city, which were removed), during the weekends the values fall down to 47-49%.
The same trend is observed to F-score (an average value of 80-83% and 51-53% re-
spectively). Therefore, high deviations are observed during the weekends, which sug-
gest uncertainty in predicting for these days, associated with the lack of travelling
routines for these groups (G1, G2 and G3). This discrepancy between weekdays and
weekends dissipates, as mobility pattern characteristics are less strict. Simultaneously,
as weekdays and weekends become indiscernible, so does the average prediction per-
formance, with an average decrease which varies between 6 and 40%.
Regarding the algorithms applied, similar results were found between them. For
this comparison, the first four weeks were disregarded to exclude the initial learning
period. Thus, the last five weeks represent a more stable account of journey predic-
tion. On average low differences of performance were found during the weekdays
(Accuracy: G1=2%, G2=3%, G3=2%, and G4=4%; F-score: G1=2%, G2=3%,
G3=3%, and G4=3%). During the weekends, these differences are generally higher
(Accuracy: G1=7%, G2=4%, G3=4%, and G4=4%; F-score: G1=6%, G2=4%,
G3=5%, and G4=4%).
Prediction of Journey Destination in Urban Public Transport 177

Table 4. Accuracy and F-score (%) average (X) and standard deviation (SD), obtained to the
prediction of the journey destination, by algorithm (J48, NB and Top-K), day (weekdays and
weekends), and group of individual travellers (G1, G2, G3 and G4).

J48 NB Top-K
X±SD min-max X±SD min-max X±SD min-max
G1 78.3 ± 5.6 27.0 - 89.8 77.2 ± 6.0 39.8 - 88.7 79.6 ± 6.0 38.8 - 89.4
Weekend Weekday Weekend Weekday

G2 74.3 ± 6.9 21.0 - 85.7 72.5 ± 7.4 28.7 - 83.7 75.2 ± 7.1 25.5 - 85.0
Accuracy (%)

G3 63.5 ± 5.2 17.5 - 74.2 63.1 ± 5.5 24.3 - 72.3 62.9 ± 5.7 24.2 - 74.1
G4 21.1 ± 5.0 6.9 - 29.4 18.8 ± 5.0 8.4 - 27.0 19.2 ± 5.0 4.9 - 29.0
G1 49.7 ± 21.9 30.2 - 62.6 47.3 ± 21.6 31.2 - 65.7 47.8 ± 22.4 34.2 - 63.8
G2 65.0 ± 12.9 30.3 - 82.6 63.3 ± 13.2 40.2 - 76.7 65.3 ± 12.8 40.2 - 83.4
G3 56.9 ± 15.1 27.8 - 73.9 59.0 ± 14.9 40.3 - 77.1 58.5 ± 15.6 39.1 - 72.5
G4 19.5 ± 9.4 9.0 - 26.8 17.2 ± 8.8 10.8 - 25.9 15.8 ± 9.0 5.8 - 24.0
G1 81.8 ± 5.2 33.9 - 90.9 80.8 ± 5.3 55.9 - 90.3 83.3 ± 5.3 55.8 - 90.3
G2 77.6 ± 6.7 28.2 - 87.7 75.4 ± 7.2 43.6 - 86.0 78.3 ± 7.0 38.8 - 88.1
F-score (%)

G3 66.3 ± 5.5 18.4 - 76.4 65.9 ± 5.7 33.1 - 74.9 66.0 ± 5.9 33.0 - 76.2
G4 22.2 ± 5.8 7.7 - 31.2 20.8 ± 5.8 10.6 - 27.5 20.9 ± 5.7 7.3 - 30.9
G1 52.9 ± 22.1 36.0 - 66.4 51.5 ± 21.9 32.6 - 70.0 52.4 ± 22.4 40.9 - 68.1
G2 66.7 ± 12.9 37.2 - 81.5 65.4 ± 12.8 50.6 - 76.6 67.6 ± 12.6 50.9 - 81.7
G3 58.4 ± 14.7 30.0 - 73.4 60.7 ± 14.5 38.9 - 76.7 60.2 ± 15.0 37.8 - 72.3
G4 20.6 ± 10.2 8.8 - 30.3 18.9 ± 9.9 10.5 - 27.1 17.1 ± 9.3 5.5 - 26.2

Shadow area: >75% 75-50% <50%

Removed days: 2010-04-27, STCP strike day; 2010-05-14, visit of pope in the city.

A detailed analysis of Accuracy revealed that the three methods have different per-
formance levels related to the spatiotemporal characteristics of the groups. For Group
1 while Top-K shows better performance in the first five weeks, the J48 method per-
forms better for the last four weeks. The NB was better than the other two in only
17% of the days. In Group 2, the Top-K method performs better in 46% of the days,
followed by the J48 with 41% and NB with 13%. In contrast to the previous two
groups, in Group 3 the NB method performs better in the first three weeks of predic-
tions and in 34% of the days overall, with the J48 method performing better in 46%
and the Top-K in 20%. Interestingly, the first three weeks are very similar in terms of
performance between the NB and Top-K, with J48 and NB in the remaining ones.
Similarly, in Group 4 the NB method performs best in the first two weeks, down to
17% overall. Top-K performs better in only 7% of the days, and J48 in 76% of them.
With the exception of the last Group 4, both Accuracy and F-score measures show
similar results. However, in Group 4, the F-score measure reveals that both NB and
Top-K perform better in 20% of the days, and J48 in 60%. In addition, the F-score
performance tends to show better performance for Top-K for Groups 2, 3 and 4 in
detriment of J48. With the exception of the mentioned differences, the similarity be-
tween Accuracy and F-score measures indicates robustness in the results obtained.
Figure 3 shows the average F-score for one algorithm, Top-K. Whereas in the first
1-2 weeks of prediction, the F-score values increase steeply, and almost duplicate for
Groups 1 and 2, after this period it increases very slowly until 5-6 weeks. The excep-
tion is Group 4, with a slow grow tendency for the entire period of 2 months.
178 V. Costa et al.

G1 G2 G3 G4
100%

80%

60%
F-score

40%

20%

0%
sat

sat

sat
wed

wed

wed
sun
mon

sun
mon

sun
fri

tue
fri

tue
fri
thu

thu

thu
week week 2 week 3 week 4 week 5 week 6 week 7 week 8 week 9
Fig. 3. F-score average for the prediction of the journey destination, by group of individual
travellers obtained with the application of the Top-K algorithm.

4 Conclusions
In this work, an investigation into journey prediction was performed, based on past
data and mobility patterns. Three different methods were used to predict journey des-
tination for four different groups of travellers and spatiotemporal characteristics. The
main findings obtained are described in the previous Section provide answers to the
questions originally formulated as follows:

• The results show that it is indeed possible to predict journey destination in

UPT based on past usage, with varying degrees of success depending on the
mobility patterns. In addition, the accuracy of journey predictions tends to
stabilize after two or more weeks of data (historical journeys), with consider-
able differences between the groups;
• After the initial two weeks of prediction, the average Accuracy and F-score has
values around 70% for Groups 1 and 2, 60% for Group 3 and 20% for Group 4.
The performance of journey predictions seems to be directly related to the mobil-
ity patterns, with stricter characteristics scoring higher prediction performance;
• Weekend’s present low mean and high standard deviation values of Accu-
racy and F-score (between 6% and 40% lower than weekdays). This differ-
ence increases in groups with more frequent journeys. Group 1 and 2 have
low Accuracy and F-score in the weekends but higher values in weekdays.
The difference is negligible for Group 3 and non-existing in Group 4.

Even though the three methods present similar results overall, the analysis shows that
certain scenarios allow them to perform differently. The performance differences are
mainly related to the level of historic data, day of the week and travelling patterns.
Thus, journey prediction is impacted by a number of factors that inform the design
and implementation of TIS.
Prediction of Journey Destination in Urban Public Transport 179

Future work will enable a comparison between used classifiers regarding process-
ing time and memory efficiency. We did not approach these metrics in this work, due
to space restrictions. Further research is also demanded on the characterization of
groups of travellers, regarding further analysis to enable the discovery of additional
groups of typical traveller’s profile. We will hopefully be able to find and study these
new profiles with a vaster dataset of users.

Acknowledgements. This work was performed under the project "Seamless Mobility"
(FCOMP-01-0202-FEDER-038957), financed by European Regional Development Fund
(ERDF), through the Operational Programme for Competitiveness Factors (POFC) in the Na-
tional Strategic Reference Framework (NSRF), within the Incentive System for Technology
Research and Development. The authors would also like to acknowledge the bus transport
provider of Porto, STCP, which provided travel data for the project.

References
1. Bagchi, M., White, P.R.: The potential of public transport smart card data. Transport Poli-
cy 12(5), 464–474 (2005)
2. Bera, S., Rao, K.V.: Estimation of origin-destination matrix from traffic counts: the state
of the art, European Transport\Trasporti Europei, ISTIEE, Institute for the Study of Trans-
port within the European Economic Integration, vol. 49, pp. 2–23 (2011)
3. Caragliu, A., Bo, C.D., Nijkamp, P.: Smart Cities in Europe. J. of Urban Technology
18(2), 65–82 (2011)
4. Costa, P.M., Fontes, T., Nunes, A.N., Ferreira, M.C., Costa, V., Dias, T.G., Falcão e Cunha, J.:
Seamless Mobility: a disruptive solution for public urban transport. In: 22nd ITS World Con-
gress, 5-9/10, Bordeux (2015)
5. Dziekan, K., Kottenhoff, K.: Dynamic at-stop real-time information displays for public
transport: effects on customers. Transp. Research Part A 41(6), 489–501 (2007)
6. Foth, M., Schroeter, R., Ti, J.: Opportunities of public transport experience enhancements
with mobile services and urban screens. Int. J. of Ambient Computing and Intelligence
(IJACI) 5(1), 1–18 (2013)
7. Giannopoulos, G.A.: The application of information and communication technologies in
transport. European J. of Operational Research 152(2), 302–320 (2004)
8. Gordon, J.B., Koutsopoulos, H.N., Wilson, N.H.M., Attanucci, J.P.: Automated Inference
of Linked Transit Journeys in London Using Fare-Transaction and Vehicle Location Data.
Transp. Res. Record: J. of the Transportation Research Board 2343, 17–24 (2013)
9. He, H., Garcia, E.: Learning form imbalanced data. IEEE Transactions on Knowledge and
Data Engineering 21(9), 1263–1284 (2009)
10. INE (2013). https://www.ine.pt/. Instituto Nacional de Estatística I.P., Portugal
11. Kieu, L.M., Bhaskar, A., Chung, E.: Transit passenger segmentation using travel regularity
mined from Smart Card transactions data. In: Transportation Research Board 93rd Annual
Meeting. Washington, D.C., January 12–16, 2014
12. Krizek, J.J., El-Geneidy, A.: Segmenting preferences and habits of transit users and non-
users. Journal of Public Transportation 10(3), 71–94 (2007)
13. Kusakabe, T., Asakura, Y.: Behavioural data mining of transit smart card data: A data fu-
sion approach. Transp. Research Part C 46, 179–191 (2014)
180 V. Costa et al.

14. Ma, X., Wu, Y.-J., Wanga, Y., Chen, F., Liu, J.: Mining smart card data for transit riders’
travel patterns. Transp. Research Part C 36, 1–12 (2013)
15. Metwally, A., Agrawal, D.P., El Abbadi, A.: Efficient computation of frequent and top-k
elements in data streams. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363,
pp. 398–412. Springer, Heidelberg (2005)
16. Mikluščák, T., Gregor, M., Janota, A.: Using neural networks for route and destination
prediction in intelligent transport systems. In: Mikulski, J. (ed.) TST 2012. CCIS, vol. 329,
pp. 380–387. Springer, Heidelberg (2012)
17. Nor Haizan, W., Mohamed,W., Salleh, M.N.M., Omar, A.H.: A Comparative Study of Re-
duced Error Pruning Method in Decision Tree Algorithms. In: International Conference on
Control System, Computing and Engineering (IEEE). Penang, Malaysia, November 25,
2012
18. Nunes, A., Dias, T.G., Cunha, J.F.: Passenger Journey Destination Estimation from Auto-
mated Fare Collection System Data Using Spatial Validation. IEEE Transactions on Intel-
ligent Transportation Systems. Forthcoming
19. Patil, T., Sherekar, S.: Performance Analysis of Naive Bayes and J48 Classification Algo-
rithm for Data Classification. Int. J. of Comp. Science and Applic. 5(2), 256–261 (2013)
20. Patterson, D.J., Liao, L., Gajos, K., Collier, M., Livic, N., Olson, K., Wang, S., Fox, D.,
Kautz, H.: Opportunity knocks: a system to provide cognitive assistance with transportation
services. In: Mynatt, E.D., Siio, I. (eds.) UbiComp 2004. LNCS, vol. 3205, pp. 433–450.
Springer, Heidelberg (2004)
21. Pelletier, M.P., Trépanier, M., Morency, C.: Smart card data use in public transit: A litera-
ture review. Transp Research Part C 19(4), 557–568 (2011)
22. Sarmento, R., Cordeiro, M., Gama, J.: Streaming network sampling using top-k neworks.
In: Proceedings of the 17th International Conference on Enterprise Information Systems
(ICEIS 2015), p. to appear. INSTICC (2015)
23. Seaborn, C., Attanucci, J., Wilson, H.M.: Analyzing multimodal public transport journeys
in London with smart card fare payment data. Transp. Research Record: J. of the Transp.
Research Board 2121(1) (2009)
24. TIP (2015). http://www.linhandante.com/. Transportes Intermodais do Porto
25. Utsunomiya, M., Attanucci, J., Wilson, N.H.: Potential Uses of Transit Smart Card Regis-
tration and Transaction Data to Improve Transit Planning. Transp. Research Record: J. of
the Transp. Research Board 119–126 (2006)
26. Zito, P., Amato, G., Amoroso, S., Berrittella, M.: The effect of Advanced Traveller Infor-
mation Systems on public transport demand and its uncertainty. Transportmetrica 7(1),
31–43 (2011)
Demand Modelling for Responsive Transport
Systems Using Digital Footprints

Paulo Silva, Francisco Antunes(B) , Rui Gomes, and Carlos Bento

Faculdade de Ciências e Tecnologia da Universidade de Coimbra Pólo II,

Rua Sı́lvio Lima, 3030-790 Coimbra, Portugal
[email protected], {fnibau,ruig,bento}@dei.uc.pt

Abstract. Traditionally, travel demand modelling focused on long-term

multiple socio-economic scenarios and land-use configurations to esti-
mate the required transport supply. However, the limited number of
transportation requests in demand-responsive flexible transport systems
require a higher resolution zoning. This work analyses users short-term
destination choice patterns, with a careful analysis of the available data
coming from various different sources, such as GPS traces and social
networks. We use a Multinomial Logit Model, with a social component
for utility and characteristics, both derived from Social Network Analy-
ses. The results from the model show meaningful relationships between
distance and attractiveness for all the different alternatives, with the
variable distance being the most significant.

Keywords: Innovative transport modes · Public transport operations ·

Transport demand and behaviour · Urban mobility and accessibility

1 Introduction
Transportation systems are a key factor for economic sustainability and social
welfare, but providing quality public transportation may be extremely expensive
when demand is low, variable and unpredictable, as it is on some periods of the
day in urban areas. Demand Responsive Transportation (DRT) services try to
address this problem with routes and frequencies that may vary according to
the actual observed demand. However, in terms of financial sustainability and
quality level, the design of this type of services may be complicated.
Anticipating demand by studying users short-term destination choice can
improve the overall efficiency and sustainability of the transport services. Tradi-
tionally, demand modelling focused on long-term socio-economic scenarios and
land-use to estimate the required level of supply. However, the limited number
of transportation requests in DRT systems does not allow the application of
traditional models. Also, DRTs require a higher resolution zoning, otherwise it
can lead to unacceptable inaccuracies. Information coming from various sources
should be used effectively in order to model demand for DRTs trips.
The approach followed in this work analyses users short-term destination
choice patterns, with a careful analysis of the available data coming from var-
ious different sources, such as, GPS traces and social networks. The theory of

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 181–186, 2015.
DOI: 10.1007/978-3-319-23485-4 19
182 P. Silva et al.

utility maximization, usually through discrete choice modelling, is often used to

study individual decision-making. We use the Multinomial Logit Model (MNL)
with a social component for utility and characteristics, both derived from Social
Network Analyses (SNA), where a network is constructed linking the nodes
(decision makers) that have social influence over one another (friendship), and
the strength of that influence. To measure different ties strength, mutuality,
propinquity, mutual friends and multiplexity factors were used.
We review the state of the art in the next section. The methodology is pre-
sented in Section 3 and the results in Section 4. The documents ends with the
conclusions and possible future lines of work.

2 State of the Art

Urban movements profiling has usually relied on traditional survey methods
that are expensive and time consuming, giving planners only a picture of what
has happened. In contrast, the wide deployment of pervasive computing devices
(cell phone, GPS devices and digital cameras) provide unprecedented digital
footprints, telling where and when people are. An emerging field of research
uses mobile phones for “urban sensing” [1]. Moreover, the past few years have
witnessed a huge increase in the adoption of social media and transportation
researchers have also realized the potential of SNA for demand modelling [2].
A growing research topic is understanding how trips, trip modes and trip
purposes can be derived from GPS data. For instance, [3] propose an approach
to predict both the intended destination and route of a person by exploiting
personal movement data collected by GPS. GPS data, however, have some lim-
itations, such as (1) GPS signals are usually blocked indoor, (2) GPS devices
may get interferences near tall buildings, and (3) continuously collecting GPS
data may consume devices energy quickly.
Social networks and human interactions are crucial not only for understand-
ing social activities, but also for travel patterns [2]. [4] connects travel with social
networks, arguing that daily life revolves around family, colleagues, friends and
shopping. [5] refers to the conformation to social norms, implying that decision-
makers are more likely to choose a particular alternative if more peers have
already chosen the same alternative. The emergence of geolocated social media
seems a good opportunity to address SNA’s lack of geographic consideration.
For instance, [6] presents a technique to analyse large-scale geo-location data
from social media to infer individual activity patterns.
Previously, for travel demand modelling aggregate approaches were used,
such as gravity or entropy models. These approaches were gradually replaced
by disaggregated models [7]. In discrete choice modelling, the effect of social
dimensions was first formalized for the binomial and the multinomial cases in [8]
and [9], respectively. Generically, the agents’ utility is formed by both private and
social components. The private component corresponds to the decision-makers
characteristics. The social component represents the strength of social utility and
the percentage of others in the neighbourhood selecting the same alternative in
the choice set [10].
Demand Modelling for Responsive Transport Systems 183

3 Methodology
3.1 Data Gathering

We use GPS data traces provided by TU Delft from 80 individuals over the
course of four days, and also data collect from social networks, namely Twitter,
Instagram and Foursquare. The data obtained is cleared of personal values as to
ensure privacy. To get the geo-located points of interest, we use the FourSquare
API, extracting the 50 most popular venues, within a radius of 30 meters for
each given point, resulting in a total of 37506 venues, in 489 categories, with their
identiﬁcation, geo-location and total number of check-ins made. The subscription
zone for Instagram had a radius of 5 kilometers from the city center. For Twitter,
we covered a bigger area in order to get Delft surroundings.

3.2 Social Network Analysis

Friendship. To get the friendship, we have to use the user unique identiﬁcation
from the post, and request the users that the user followed and that follow him
back. The only signiﬁcant friendships considered are the ones between users that
posted around Delft. The total number of friendships used is 35457. Discrete
choice model also had to take into account the strength between users. To get
and measure the ties strength, tie mutuality, propinquity, mutual friends and
multiplexity factors are used.

Detecting Important Locations. To build the MNL we also need to know the
user home and work location, since we are only interested in the user movement
patterns before and after work hours. Home and work are the starting points for
which the distance to the points of interest are measured. To get these locations,
we use a clustering algorithm, namely DBScan [11].

3.3 Data Preparation

In the data set with the posts and associated venues, i.e., the choice set (CS),
there is a large amount of data with no use for us, as it does not provide useful
information (for instance, useless categories) or represent work or residential
places, for which the demand patterns are well established and can be met by
traditional transportation services. The data containing those specific categories
was erased from the choice set. Since the number of alternatives is quite big, we
grouped those venues in 6 main categories: Appointment (17%), Food (17%),
Bar (5%), Shop (24%), Entertainment (27%) and Travel (10%).
If we used these categories as our number of different alternatives for the MNL
model, we would only get results concerning each of those 6 alternatives, which
are quite generic. However, we want to use the model to predict probabilities
of destination choices with a higher resolution, so we generated data for all the
venues and then use those categories only to filter unnecessary data.
184 P. Silva et al.

3.4 Multinomial Logit Model

Our data corresponds to the observed choices of individuals - revealed prefer-

ences data. For each dataset we have the number of alternatives selected in each
hour, which is our ﬁnite set of alternatives for each individual. The number of
alternatives and observations vary signiﬁcantly along the hours. The variables
used for the data-frame are:

– distance : the venue distance to the user central point,

– check-ins : the total number of check-ins in each alternative for each user,
– friendship : the sum of the individual friendship for each alternative,
– choice : the alternative selection.

Since we cannot directly extract user personal information (e.g. age, gender),
our data does not contain individual specific variables, and so the alternative
specific variables have a generic coefficient, i.e., we consider that the number
of check-ins, distance and friendship have the same value for all alternatives.
Choice takes values of yes and no, if the alternative was chosen or not by the
user. To estimate the MNL we have used the R statistics system with the mlogit
package. The following formula was used for our work,

M logit(choice ∼ distance + f riendship + attractiveness, CS)

where choice is the variable that indicates the choice made for each individual
among the alternatives and the distance, friendship and attractiveness being the
alternative speciﬁc variables with generic coeﬃcients from the choice set CS.

4 Results
We present the results and estimation parameter for one choice set, namely the
one representing the choices made at hour 21, which has 24 alternatives and 91
observations.
The model predictions are reasonable good when tested against the user
observed choices. Table 1 presents the average probabilities returned by the
model against the observed frequency. The results from the MNL model show
meaningful relationships between distance and attractiveness for all the different
alternatives, being distance the most significant variable, i.e., longer distances
almost always reduce the attractiveness of a destination, all else being equal.
The same can be said for the attractiveness variable, but the friendship vari-
able does not have the same impact to the individual when choosing an alter-
native. Table 2 illustrates these findings. To show the usefulness of the analyses
made, we feed the probabilities predicted by our model to a DRT simulator
developed in [12]. Figure 1 shows that most origins and destinations found for
the time period and travel objective considered lie outside the service area of
the different public transport modes (dotted lines) and DRT could satisfy this
demand (solid lines).
Demand Modelling for Responsive Transport Systems 185

Table 1. Average probabilities returned by the model

Venue Freq. Avg. Prob.
Stadion Feijenoord 0.032967 0.04490835
EkoPlaza 0.065934 0.03791752
Station Den Haag HS 0.054945 0.05494505
La Mer 0.032967 0.03303908
Diner Company 0.032967 0.03777430
LantarenVenster 0.032967 0.04320755
Station Rotterdam Centraal 0.153846 0.12336128
BIRD 0.043956 0.03794951
Maassilo 0.043956 0.03530117
Emma 0.021978 0.02478593
Station Den Haag Centraal 0.054945 0.05070866
Lucent Danstheater 0.032967 0.02519137
Zaal 3 0.032967 0.03212339
De Banier 0.054945 0.09073518
Randstadrail javalaan 0.032967 0.02411359
Kot Treinpersoneel 0.032967 0.02666024
Spuimarkt 0.032967 0.02543518
Doerak 0.032967 0.04017886
Paard van Troje 0.032967 0.03294463
Ahoy Rotterdam 0.043956 0.03654922
Restaurant Meram 0.043956 0.03907711
Live Tv Show 0.021978 0.04634078
Stadskwekerij den haag 0.010989 0.03433138
Oudedijk 166 A2 0.032967 0.04312563

Table 2. Relationships between variables

Variables Estimate Std.error t-value p-value
distance -0.107414 0.022597 -4.7534 2.000e-06
friendship 0.094441 0.081570 1.1578 0.2469
attractiveness 0.342170 0.061907 5.5272 3.254e-08

Fig. 1. Simulation results

5 Conclusions

Traditionally, travel demand modelling focused on long-term socio-economic sce-

narios and land-use to estimate the required transport supply. However, the lim-
ited number of transportation requests in demand-responsive ﬂexible transport
systems require a higher resolution zoning. We analysed users short-term desti-
nation choice patterns, with a careful analysis of the available data coming from
GPS traces and social networks. We deﬁned a Multinomial Logit Model (MNL),
with a social component for utility and characteristics, both derived from Social
186 P. Silva et al.

Network Analyses. The low frequency of posts with identified locations for each
user made it difficult to generate a clear pattern for each user. Nevertheless,
the results from the model show meaningful relationships between distance and
attractiveness for all the different alternatives, with the variable distance being
the most significant.
Since the analyses of the social network done in this work does not produce
individual characteristics, like age, gender and socio-economic, it would be inter-
esting for future work to include data mining algorithms to extract some of those
values from tweets, and add features specific to each venue, to better understand
the motivation behind the choice made.

References
1. Cuff, D., Hansen, M., Kang, J.: Urban sensing: Out of the woods. Communications
of the ACM 51(3), 24–33 (2008)
2. Carrasco, J., Hogan, B., Wellman, B., Miller, E.: Collecting social network data to
study social activity-travel behaviour: an egocentric approach. Environment and
Planning B: Planning and Design 35, 961–980 (2008)
3. Chen, L., Mingqi, L., Chen, G.: A system for destination and future route predic-
tion based on trajectory mining. Pervasive and Mobile Computing 6(6), 657–676
(2010)
4. Axhausen, K.: Social networks, mobility biographies, and travel: survey challenges.
Environment and Planning B: Planning and Design 35, 981–996 (2008)
5. Paáez, A., Scott, D.: Social influence on travel behavior: a simulation example of
the decision to telecommute. Environment and Planning A 39(3), 647–665 (2007)
6. Hasan, S., Ukkusuri, S.: Urban activity pattern classification using topic mod-
els from online geo-location data. Transportation Research Part C: Emerging
Technologies 44, 363–381 (2014)
7. Ben-Akiva, M., Lerman, S.: Discrete Choice Analysis: Theory and Application to
Travel Demand (1985)
8. Brock, W., Durlauf, S.: Discrete Choice with Social Interactions. Review of
Economic Studies 68(2), 235–260 (2001)
9. Brock, W., Durlauf, S.: A multinomial choice model with neighborhood effects.
American Economic Review 92, 298–303 (2002)
10. Zanni, A.M., Ryley, T.J.: Exploring the possibility of combining discrete choice
modelling and social networks analysis: an application to the analysis of weather-
related uncertainty in long-distance travel behaviour. In: International Choice
Modelling Conference, Leeds, pp. 1–22 (2011)
11. Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering
clusters in large spatial databases with noise. In: Proceedings of 2nd International
Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press
(1996)
12. Gomes, R., Sousa, J.P., Galvao, T.: An integrated approach for the design of
Demand Responsive Transportation services. In: de Sousa, J.F., Rossi, R. (eds.)
Computer-based Modelling and Optimization in Transportation. AISC, vol. 232,
pp. 223–235. Springer, Heidelberg (2014)
Artificial Life and Evolutionary
Algorithms
A Case Study on the Scalability of Online
Evolution of Robotic Controllers

Fernando Silva1,2,4(B) , Luı́s Correia4 , and Anders Lyhne Christensen1,2,3

1
BioMachines Lab, Lisboa, Portugal
[email protected], [email protected]
2
Instituto de Telecomunicações, Lisboa, Portugal
3
Instituto Universitário de Lisboa (ISCTE-IUL), Lisboa, Portugal
4
BioISI, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
[email protected]

Abstract. Online evolution of controllers on real robots typically

requires a prohibitively long evolution time. One potential solution is to
distribute the evolutionary algorithm across a group of robots and evolve
controllers in parallel. No systematic study on the scalability properties
and dynamics of such algorithms with respect to the group size has,
however, been conducted to date. In this paper, we present a case study
on the scalability of online evolution. The algorithm used is odNEAT,
which evolves artificial neural network controllers. We assess the scala-
bility properties of odNEAT in four tasks with varying numbers of sim-
ulated e-puck-like robots. We show how online evolution algorithms can
enable groups of different size to leverage their multiplicity, and how
larger groups can: (i) achieve superior task performance, and (ii) enable
a significant reduction in the evolution time and in the number of eval-
uations required to evolve controllers that solve the task.

Keywords: Evolutionary robotics · Artiﬁcial neural network · Evolu-

tionary algorithm · Online evolution · Robot control · Scalability

1 Introduction
Evolutionary computation has been widely studied and applied to synthesise
controllers for autonomous robots in the ﬁeld of evolutionary robotics (ER).
In online ER approaches, an evolutionary algorithm (EA) is executed onboard
robots during task execution to continuously optimise behavioural control. The
main components of the EA (evaluation, selection, and reproduction) are per-
formed by the robots without any external supervision. Online evolution thus
enables addressing tasks that require online learning or online adaptation. For
instance, robots can evolve new controllers and modify their behaviour to
respond to unforeseen circumstances, such as changes in the task or in the envi-
ronment.

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 189–200, 2015.
DOI: 10.1007/978-3-319-23485-4 20
190 F. Silva et al.

Research in online evolution started out with a study by Floreano and Mon-
dada [1], who conducted experiments on a real mobile robot. The authors success-
fully evolved navigation and obstacle avoidance behaviours for a Khepera robot.
The study was a significant breakthrough as it demonstrated the potential of
online evolution of controllers. Researchers then focused on how to mitigate the
issues posed by evolving controllers directly on real robots, especially the pro-
hibitively long time required [2]. Watson et al. [3] introduced an approach called
embodied evolution in which an online EA is distributed across a group of robots.
The main motivation behind the use of multirobot systems was to leverage the
potential speed-up of evolution due to robots that evolve controllers in parallel
and that exchange candidate solutions to the task.
Over the past decade, numerous approaches to online evolution in multirobot
systems have been developed. Examples include Bianco and Nolfi’s open-ended
approach for self-assembling robots [4], mEDEA by Bredeche et al. [5], and
odNEAT by Silva et al. [6]. When the online EA is decentralised and distributed
across a group of robots, one common assumption is that online evolution inher-
ently scales with the number of robots [3]. Generally, the idea is that the more
robots are available, the more evaluations can be performed in parallel, and the
the faster the evolutionary process [3]. The dynamics of the online EA itself, and
common issues that arise in EAs from population sizing such as convergence rates
and diversity [7] have, however, not been considered. Furthermore, besides ad-
hoc experiments with large groups of robots, see [5] for examples, there has been
no systematic study on the scalability properties of online EAs across different
tasks. Given the strikingly long time that online evolution requires to synthesise
solutions to any but the simplest of tasks, the approach remains infeasible on
real robots [8].
In this paper, we study the scalability properties of online evolution of robotic
controllers. The online EA used in this case study is odNEAT [9], which opti-
mises artificial neural network (ANN) controllers. One of the main advantages of
odNEAT is that it evolves both the weights and the topology of ANNs, thereby
bypassing the inherent limitations of fixed-topology algorithms [9]. odNEAT is
used here as a representative efficient algorithm that has been successfully used
in a number of simulation-based studies related to adaptation and learning in
robot systems, see [6,8–11] for examples. We assess the scalability properties and
performance of odNEAT in four tasks involving groups of up to 25 simulated e-
puck-like robots [12]: (i) an aggregation task, (ii) a dynamic phototaxis task, and
(iii, iv) two foraging tasks with differing complexity. Overall, our study shows
how online EAs can enable groups of different size to leverage their multiplicity
for higher performance, and for faster evolution in terms of evolution time and
number of evaluations required to evolve effective controllers.

2 Online Evolution with odNEAT

This section provides an overview of odNEAT; for a comprehensive introduc-
tion see [9]. odNEAT is an eﬃcient online neuroevolution algorithm designed
A Case Study on the Scalability of Online Evolution of Robotic Controllers 191

for multirobot systems. The algorithm starts with minimal networks with no
hidden neurons, and with each input neuron connected to every output neuron.
Throughout evolution, topologies are gradually complexified by adding new neu-
rons and new connections through mutation. In this way, odNEAT is able find
an appropriate degree of complexity for the current task, and a suitable ANN
topology is the result of a continuous evolutionary process [9].
odNEAT is distributed across multiple robots that exchange candidate solu-
tions to the task. The online evolutionary process is implemented according to a
physically distributed island model. Each robot optimises an internal population
of genomes (directly encoded ANNs) through intra-island variation, and genetic
information between two or more robots is exchanged through inter-island migra-
tion. In this way, each robot is potentially self-sufficient and the evolutionary pro-
cess opportunistically capitalises on the exchange of genetic information between
multiple robots for collective problem solving [9].
During task execution, each robot is controlled by an ANN that represents
a candidate solution to a given task. Controllers maintain a virtual energy level
reflecting their individual performance. The fitness value is defined as the mean
energy level. When the virtual energy level of a robot reaches a minimum thresh-
old, the current controller is considered unfit for the task. A new controller is
then created via selection of two parents from the internal population, crossover
of the parents’ genomes, and mutation of the offspring. Mutation is both struc-
tural and parametric, as it adds new neurons and new connections, and optimises
parameters such as connection weights and neuron bias values.
odNEAT has been successfully used in a number of simulation-based stud-
ies related to long-term self-adaptation in robot systems. Previous studies have
shown: (i) that odNEAT effectively evolves controllers for robots that oper-
ate in dynamic environments with changing task parameters [11], (ii) that the
controllers evolved are robust and can often adapt to changes in environmental
conditions without further evolution [9], (iii) that robots executing odNEAT can
display a high degree of fault tolerance as they are able to adapt and learn new
behaviours in the presence of faults in the sensors [9], (iv) how to extend the algo-
rithm to incorporate learning processes [11], and (v) how to evolve behavioural
building blocks prespecified by the human experimenter [8,10]. Given previous
results, odNEAT is therefore used in our study as a representative online EA.
The key research question of our study is if and how online EAs can enable
robots to leverage their multiplicity. That is, besides performance and robust-
ness criteria, we are interested in studying scalability with respect to the group
size, an important aspect when large groups of robots are considered.

3 Methods

In this section, we deﬁne our experimental methodology, including the simulation

platform and robot model, and we describe the four tasks used in the study:
aggregation, phototaxis, and two foraging tasks with diﬀering complexity.
192 F. Silva et al.

3.1 Experimental Setup

We use JBotEvolver [13] to conduct our simulation-based experiments. JBotE-

volver is an open-source, multirobot simulation platform and neuroevolution
framework. In our experiments, the simulated robots are modelled after the e-
puck [12], a 7.5 cm in diameter, diﬀerential drive robot capable of moving at
speeds up to 13 cm/s. Each robot is equipped with infrared sensors that mul-
tiplex obstacle sensing and communication between robots at a range of up to
25 cm.1 The sensor and actuator conﬁgurations for the tasks are listed in Table 1.
Each sensor and each actuator are subject to noise, which is simulated by adding
a random Gaussian component within ± 5% of the sensor saturation value or of
the current actuation value.
The robot controllers are discrete-time ANNs with connection weights in the
range [-10,10]. odNEAT starts with simple networks with no hidden neurons,
and with each input neuron connected to every output neuron. The ANN inputs
are the readings from the sensors, normalised to the interval [0,1]. The output
layer has two neurons whose values are linearly scaled from [0,1] to [-1,1] to set
the signed speed of each wheel. In the two foraging tasks, a third output neuron
sets the state of a gripper. The gripper is activated if the output value of the
neuron is higher than 0.5, otherwise it is deactivated. If the gripper is activated,
the robot collects the closest resource within a range of 2 cm, if there is any.
Depending on the foraging task (see below), the robot may need to actively
select which type of resources to collect and to avoid other types.

Aggregation Task. In an aggregation task, dispersed robots must move close

to one another to form a single cluster. Aggregation combines several aspects of
multirobot tasks, including distributed individual search, coordinated movement,
and cooperation. Furthermore, aggregation plays an important role in robotics
because it is a precursor of other collective behaviours such as group transport
of heavy objects [14]. In our aggregation task, robots are evaluated based on
criteria that include the presence of robots nearby, and the ability to explore the
arena and move fast, see [9] for details. The initial virtual energy level E of each
controller is set to 1000 and limited to the range [0, 2000] units. At each control
cycle, the update of the virtual energy level, E, is given by:

ΔE
= α(t) + γ(t) (1)
Δt
where t is the current control cycle, α(t) is a reward proportional to the num-
ber n of diﬀerent genomes received in the last P = 10 control cycles. Because
robots executing odNEAT exchange candidate solutions, the number of diﬀerent

1
The original e-puck infrared range is 2-3 cm [12]. In real e-pucks, the liblrcom library,
see http://www.e-puck.org, extends the range up to 25 cm and multiplexes infrared
communication with proximity sensing.
A Case Study on the Scalability of Online Evolution of Robotic Controllers 193

Table 1. Controller details. Light sensors have a range of 50 cm (phototaxis task).

Other sensors have a range of 25 cm.

Aggregation task – controller details

Input neurons: 18
8 for IR robot detection
8 for IR wall detection
1 for energy level reading
1 for reading the number of
diﬀerent genomes received
Output neurons: 2 Left and right motor speeds
Phototaxis task – controller details
Input neurons: 25
8 for IR robot detection
8 for IR wall detection
8 for light source detection
1 for energy level reading
Output neurons: 2 Left and right motor speeds
Foraging tasks – controller details
Input neurons: 25
4 for IR robot detection
4 for IR wall detection
1 for energy level reading
8 for resource A detection
8 for resource B detection
Output neurons: 3
2 for left and right motor speeds
1 for controlling the gripper

genomes received is used to estimate the number of robots nearby. γ(t) is a factor
related to the quality of movement computed as:

-1 if vl (t) · vr (t) < 0
γ(t) = (2)
Ωs (t) · ωs (t) otherwise

where vl (t) and vr (t) are the left and right wheel speeds,
Ωs (t) is the ratio
between the average and maximum speed, and ωs (t) = vl (t) · vr (t) rewards
controllers that move fast and straight at each control cycle.

Phototaxis Task. In a phototaxis task, robots have to search and move towards
a light source. Following [9], we use a dynamic version of the phototaxis task
in which the light source is periodically moved to a new random location. As a
result, robots have to continuously search for and reach the light source, which
eliminates controllers that ﬁnd the light source by chance. The virtual energy
194 F. Silva et al.

level E ∈ [0, 100] units, and controllers are assigned an initial value of 50 units.
At each control cycle, E is updated as follows:
⎧
⎪
⎨Sr if Sr > 0.5
ΔE
= 0 if 0 < Sr ≤ 0.5 (3)
Δt ⎪
⎩
-0.01 if Sr = 0

where Sr is the maximum value of the readings from light sensors, between 0 (no
light) and 1 (brightest light). Light sensors have a range of 50 cm and robots are
therefore only rewarded if they are close to the light source. Remaining sensors
have a range of 25 cm.

Foraging Tasks. In a foraging task, robots have to search for and pick up
objects scattered in the environment. Foraging is a canonical testbed in cooper-
ative robotics domains, and is evocative of tasks such as toxic waste clean-up,
harvesting, and search and rescue [15].
We setup a foraging task with diﬀerent types of resources that have to be
collected. Robots spend virtual energy at a constant rate and must learn to ﬁnd
and collect resources. When a resource is collected by a robot, a new resource
of the same type is placed randomly in the environment so as to keep the num-
ber of resource constant throughout the experiments. We experiment with two
variants of a foraging task: (i) one in which there are only type A resources,
henceforth called standard foraging task, and (ii) one in which there are both
type A and type B resources, henceforth called concurrent foraging task. In the
concurrent foraging task, resources A and B have to be consumed sequentially.
That is, besides learning the foraging aspects of the task, robots also have to
learn to collect resources in the correct order. The energy level of each controller
is initially set to 100 units, and limited to the range [0,1000]. At each control
cycle, E is updated as follows:
⎧
⎪
⎨reward if right type of resource is collected
ΔE
= penalty if wrong type of resource is collected (4)
Δt ⎪
⎩
-0.02 if no resource is consumed

where reward = 10 and penalty = -10. The constant decrement of 0.02 means
that each controller will execute for a period of 500 seconds if no resource is
collected since it started operating. Note that the penalty component applies
only to the concurrent foraging task. To enable a meaningful comparison of
performance when groups of diﬀerent size are considered, the number of resources
of each type is set to the number of robots multiplied by 10.

3.2 Experimental Parameters and Treatments

We analyse the impact of the group size on the performance of odNEAT by
conducting experiments with groups of 5, 10, 15, 20, and 25 robots. For each
A Case Study on the Scalability of Online Evolution of Robotic Controllers 195

experimental conﬁguration, we conduct 30 independent evolutionary runs. Each

run lasts 100 hours of simulated time. odNEAT parameters are set as in previous
studies [9], including a population size of 40 genomes per robot and a control
cycle frequency of 100 ms. Robots operate in a square arena surrounded by
walls. In the aggregation and phototaxis tasks, the area of the arena is increased
proportionally to the number of robots (5 robots: 9 m2 , 10 robots: 18 m2 , ...,
25 robots: 45 m2 ). Notice that if we maintained the same size of the environment,
comparisons would not be meaningful. For instance, in the aggregation task, with
the increasing density of robots in the environment, the task becomes easier
to solve simply because robots encounter each other more frequently. In the
phototaxis task, the number of light sources in the environment is also increased
proportionally to the number of robots.

4 Experimental Results and Discussion

In this section, we present our experimental results. We analyse: (i) the task per-
formance of controllers in terms of their individual fitness score, (ii) the number
of evaluations, that is, the number of controllers tested by each robot before
a solution to the task is found, and (iii) the corresponding evolution time. We
use the two-tailed Mann-Whitney U test to compute statistical significance of
differences between results because it is a non-parametric test, and therefore no
strong assumptions need to be made about the underlying distributions.

4.1 Quality of the Solutions and Population-Mixing

We first compare the individual fitness scores of the final controllers. In the
aggregation task and in the phototaxis tasks, groups of 5 robots are typically

(a) Aggregation task (b) Phototaxis task

Fig. 1. Distribution of the ﬁtness score of the ﬁnal controllers in: (a) aggregation task,
and (b) phototaxis task.
196 F. Silva et al.

Table 2. Summary of the individual ﬁtness score of ﬁnal solutions in the two foraging
tasks.

Task Robots Mean Std. dev. Minimum Maximum

5 96.03 45.62 41.88 268.02
10 105.31 62.31 29.57 396.47
Standard foraging 15 107.98 119.79 33.64 981.34
20 112.06 109.29 36.69 968.27
25 136.37 158.39 39.03 994.47
5 104.02 73.11 32.62 459.83
10 112.02 115.61 39.67 949.35
Concurrent foraging 15 144.54 153.29 38.85 975.81
20 165.39 165.82 38.77 971.42
25 179.29 196.58 38.08 978.56

outperformed by larger groups (ρ < 0.001, see Fig. 1). In the phototaxis task,
groups of 25 robots also perform significantly better than groups with 20 robots
(ρ < 0.01). Specifically, results suggest that a minimum of 10 robots are necessary
for high-performing controllers to be evolved in a consistent manner.
A summary of the results obtained in the two foraging tasks is shown in
Table 2. Given the dynamic nature of task, especially as the number of robots
increases, the fitness score of the final controllers displays a high variance. The
results, however, further show that larger groups typically yield better perfor-
mance both in terms of the mean and of the maximum fitness scores, and is
an indication that decentralised online approaches such as odNEAT can indeed
capitalise on larger groups to evolve more effective solutions to the current task.
To quantify to what extent is a robot dependent on the candidate solutions
it receives from other robots, we analyse the origin of the information stored
in the population of each robot. In the phototaxis task, when capable solutions
have been evolved approximately 86.85% (5 robots) to 93.95% (25 robots) of
genomes maintained in each internal population originated from other robots,
whereas the remaining genomes stored were produced by the robots themselves
(analysis of the results obtained in the other tasks revealed a similar trend).
The final solutions executed by each robot to solve the task have on average
from 87.26% to 89.10% matching genes. Moreover, 39.73% (5 robots) to 47.70%
(25 robots) of these solutions have more than 90% of their genes in common. The
average weight difference between matching connection genes varies from 2.48 to
4.37, with each weight in [-10, 10], which indicates that solutions were refined by
the EA on the receiving robot. Local exchange of candidate controllers therefore
appears to be a crucial part in the evolutionary dynamics of decentralised online
EAs because it serves as a substrate for collective problem solving. In the fol-
lowing section, we analyse how the exchange of such information enables online
EAs to capitalise on increasingly larger groups of robots for faster evolution of
solutions to the task.
A Case Study on the Scalability of Online Evolution of Robotic Controllers 197

(a) Aggregation task (b) Phototaxis task

(c) Standard foraging task (d) Concurrent foraging task

Fig. 2. Distribution of evaluations in: (a) aggregation task, (b) phototaxis task,
(c) standard foraging task, and (d) concurrent foraging task.

4.2 Evaluations and Time Analysis

The distribution of evaluations with respect to the group size is shown in Fig. 2.
In the aggregation task, the number of evaluations required to evolve solutions
to the task decreases as the group size is increased, and becomes signiﬁcantly
lower when the group size is increased from 10 to 15 robots (ρ < 0.001). On
average, the number of evaluations decreases from 104 for groups of 5 robots
to 55 for groups of 25 robots. The mean evolution time is of 6.22 hours for
groups of 5 robots, 2.34 hours for 10 robots, 1.80 hours for 15 robots, 1.48 hours
for 20 robots, and 1.12 hours for 25 robots. Hence, adding more robots also
enables a signiﬁcant reduction of the evolution time (ρ < 0.01 for every group
increment). With the increase in the size of the environment, there is a larger
area to search for other robots and to explore. Task conditions become more
challenging because, in relative terms, each robot senses a smaller portion of the
environment. Robots are, however, still able to evolve successful controllers in
fewer evaluations and less evolution time.
198 F. Silva et al.

Fig. 3. Operation time of intermediate controllers in the concurrent foraging task. 67%
to 96% of intermediate controllers operate for few minutes before they fail (not shown
for better plot readability).

The speed up of evolution with the increase of group size also occurs in the
phototaxis task. The number of evaluations is significantly reduced (ρ < 0.001)
with the increase of the group size from 5 to 10 robots (mean number of evalua-
tions of 39 and 14, respectively). The mean evolution time is of 39.16 hours for
groups of 5 robots, 9.51 hours for 10 robots, 7.20 hours for 15 robots, 6.30 hours
for 20 robots, and 5.27 hours for 25 robots. Similarly to the number of evalua-
tions, the evolution time yields on average a 4-fold-decrease when the group is
enlarged from 5 to 10 robots (ρ < 0.001). Larger groups enable further improve-
ments (ρ < 0.001 for increases up to 20 robots, ρ < 0.01 when group size is
changed from 20 to 25 robots), but at comparatively smaller rates. Chiefly, the
results of the aggregation task and of the phototaxis task show quantitatively
distinct speed-ups of evolution when groups are enlarged.
With respect to the two foraging tasks, the distribution of the number of
evaluations shown in Fig. 2 is inversely proportional, with a gentle slope, to
the number of robots in the group. For both tasks, differences in the number
of evaluations are significant across all comparisons (ρ < 0.001). In effect, the
number of evaluations is reduced on average: (i) from 115 evaluations (5 robots)
to 15 evaluations (25 robots) in the standard foraging task, which corresponds
to a 7.67-fold decrease in terms of evaluations, and (ii) from 82 evaluations (5
robots) to 8 evaluations in the concurrent foraging task, which amounts to a
10.25-fold decrease. These results show that decentralised online evolution can
scale well in terms of evaluations, even when task complexity is increased.
Regarding the evolution time, results show a similar trend for both foraging
tasks. On average, the evolution time varies from approximately 35 and 36 hours
for groups of 5 robots to 21 and 23 hours for groups of 25 robots. That is, despite
significant improvements in terms of the number of evaluations, the evolution
time required to evolve the final controllers to the task is still prohibitively long.
This result is due to the controller evaluation policy. Online evolution approaches
A Case Study on the Scalability of Online Evolution of Robotic Controllers 199

typically employ a policy in which robots substitute controllers at regular time

intervals, see [5] for an example. This approach has been shown to lead to incon-
gruous group behaviour and to poor performance in collective tasks that explic-
itly require continuous collective coordination and cooperation [9]. odNEAT, on
the other hand, adopts a different approach by allowing a controller to remain
active as long as it is able to solve the task. A new controller is thus only synthe-
sised if the current one fails. As shown in Fig. 3 for the concurrent foraging task,
the evaluation policy results in intermediate controllers that operate for a signifi-
cant amount of time (the standard foraging task displays a similar trend). While
67% to 96% of intermediate controllers only operate for a few minutes (data
not shown for better plot readability), there are a few intermediate controllers
that operate up to 20 hours of consecutive time before they fail. Although such
controllers yield high fitness scores comparable to those of the final solutions,
typically 1% to 4% less, they delay the synthesis of more effective solutions.

5 Concluding Discussion and Future Work

In this paper, we presented a case study on the scalability properties and per-
formance of online evolutionary algorithms. We used odNEAT, a decentralised
online evolution algorithm in which robots optimise controllers in parallel and
exchange candidate solutions to the task. We conducted experiments with groups
of up to 25 e-puck-like-robots [12] in four tasks: (i) aggregation, (ii) dynamic
phototaxis, and (iii, iv) two foraging tasks with differing complexity.
We showed that larger groups of robots typically enable: (i) superior task
performance in terms of the fitness score, and (ii) significant improvements both
in terms of the number of evaluations required to evolve solutions to the task and
of the corresponding evolution time. There are, however, specific conditions in
which intermediate controllers are able to operate up to 20 hours of consecutive
time. These controllers yield high performance levels as their fitness score is
typically 1% to 4% less than the fitness score of the final solutions. In addition,
while additional robots may further speed up evolution, there are specific group
sizes after which speed-ups are comparatively smaller. One key research question
regarding scalability is therefore how to best leverage all robots so that they
can learn appropriate behaviours, constitute differentiated groups, and perform
cooperative or competitive actions reflecting the structure of the task.
The immediate follow-up work includes studying novel evaluation policies
for online evolution of robotic controllers. Regarding odNEAT, the algorithm
typically enables a significant reduction in the number of evaluations as the
group increases. Hence, if intermediate controllers that operate for long periods
of time can be detected and discarded via, for instance, established methods
such as early stopping algorithms or racing techniques, there is the potential to
enable timely and efficient online evolution in real robots.

Acknowledgments. This work was partly supported by FCT under

grants UID/EEA/50008/2013, SFRH/BD/89573/2012, UID/Multi/04046/2013, and
EXPL/EEI-AUT/0329/2013.
200 F. Silva et al.

References
1. Floreano, D., Mondada, F.: Automatic creation of an autonomous agent: Genetic
evolution of a neural-network driven robot. In: 3rd International Conference on
Simulation of Adaptive Behavior, pp. 421–430. MIT Press, Cambridge (1994)
2. Matarić, M., Cliff, D.: Challenges in evolving controllers for physical robots.
Robotics and Autonomous Systems 19(1), 67–83 (1996)
3. Watson, R.A., Ficici, S.G., Pollack, J.B.: Embodied evolution: Distributing an evo-
lutionary algorithm in a population of robots. Robotics and Autonomous Systems
39(1), 1–18 (2002)
4. Bianco, R., Nolfi, S.: Toward open-ended evolutionary robotics: evolving elemen-
tary robotic units able to self-assemble and self-reproduce. Connection Science
16(4), 227–248 (2004)
5. Bredeche, N., Montanier, J., Liu, W., Winfield, A.: Environment-driven distributed
evolutionary adaptation in a population of autonomous robotic agents. Mathemat-
ical and Computer Modelling of Dynamical Systems 18(1), 101–129 (2012)
6. Silva, F., Urbano, P., Oliveira, S., Christensen, A.L.: odNEAT: An algorithm for
distributed online, onboard evolution of robot behaviours. In: 13th International
Conference on the Simulation and Synthesis of Living Systems, pp. 251–258. MIT
Press, Cambridge (2012)
7. De Jong, K.A.: Evolutionary computation: a unified approach. MIT Press,
Cambridge (2006)
8. Silva, F., Duarte, M., Oliveira, S.M., Correia, L., Christensen, A.L.: The case for
engineering the evolution of robot controllers. In: 14th International Conference
on the Synthesis and Simulation of Living Systems, pp. 703–710. MIT Press,
Cambridge (2014)
9. Silva, F., Urbano, P., Correia, L., Christensen, A.L.: odNEAT: An algorithm
for decentralised online evolution of robotic controllers. Evolutionary Computa-
tion (2015) (in press). http://www.mitpressjournals.org/doi/pdf/10.1162/EVCO
a 00141
10. Silva, F., Correia, L., Christensen, A.L.: Speeding Up online evolution of robotic-
controllers with macro-neurons. In: Esparcia-Alcázar, A.I., Mora, A.M. (eds.)
EvoApplications 2014. LNCS, vol. 8602, pp. 765–776. Springer, Heidelberg (2014)
11. Silva, F., Urbano, P., Christensen, A.L.: Online evolution of adaptive robot
behaviour. International Journal of Natural Computing Research 4(2), 59–77
(2014)
12. Mondada, F., Bonani, M., Raemy, X., Pugh, J., Cianci, C., Klaptocz, A., Magnenat,
S., Zufferey, J., Floreano, D., Martinoli, A.: The e-puck, a robot designed for
education in engineering. In: 9th Conference on Autonomous Robot Systems and
Competitions, pp. 59–65, IPCB, Castelo Branco (2009)
13. Duarte, M., Silva, F., Rodrigues, T., Oliveira, S.M., Christensen, A.L.: JBotE-
volver: A versatile simulation platform for evolutionary robotics. In: 14th Interna-
tional Conference on the Synthesis and Simulation of Living Systems, pp. 210–211.
MIT Press, Cambridge (2014)
14. Groß, R., Dorigo, M.: Towards group transport by swarms of robots. International
Journal of Bio-Inspired Computation 1(1–2), 1–13 (2009)
15. Cao, Y., Fukunaga, A., Kahng, A.: Cooperative mobile robotics: Antecedents and
directions. Autonomous Robots 4(1), 1–23 (1997)
Spatial Complexity Measure for Characterising
Cellular Automata Generated 2D Patterns

Mohammad Ali Javaheri Javid(B) , Tim Blackwell, Robert Zimmer,

and Mohammad Majid Al-Rifaie

Department of Computing Goldsmiths, University of London,

London SE14 6NW, UK
{m.javaheri,t.blackwell,r.zimmer,m.majid}@gold.ac.uk

Abstract. Cellular automata (CA) are known for their capacity to gen-
erate complex patterns through the local interaction of rules. Often the
generated patterns, especially with multi-state two-dimensional CA, can
exhibit interesting emergent behaviour. This paper addresses quanti-
tative evaluation of spatial characteristics of CA generated patterns.
It is suggested that the structural characteristics of two-dimensional
(2D) CA patterns can be measured using mean information gain. This
information-theoretic quantity, also known as conditional entropy, takes
into account conditional and joint probabilities of cell states in a 2D
plane. The eﬀectiveness of the measure is shown in a series of experiments
for multi-state 2D patterns generated by CA. The results of the experi-
ments show that the measure is capable of distinguishing the structural
characteristics including symmetry and randomness of 2D CA patterns.

Keywords: Cellular automata · Spatial complexity · 2D patterns

1 Introduction
Cellular automata (CA) are one of the early bio-inspired systems invented by von
Neumann and Ulam in the late 1940s to study the logic of self-reproduction in
a material-independent framework. CA are known to exhibit complex behaviour
from the iterative application of simple rules. The popularity of the Game of
Life drew the attention of a wider community of researchers to the unexplored
potential of CA applications and especially in their capacity to generate complex
behaviour. The formation of complex patterns from simple rules sometimes with
high aesthetic qualities has been contributed to the creation of many digital
art works since the 1960s. The most notable works are “Pixillation”, one of the
early computer generated animations [11], the digital art works of Struycken [10],
Brown [3] and evolutionary architecture of Frazer [5]. Furthermore, CA have been
used for music composition, for example, Xenakis [17] and Miranda [9].
Although classical one-dimensional CA with binary states can exhibit com-
plex behaviours, experiments with multi-state two-dimensional (2D) CA reveal
a very rich spectrum of symmetric and asymmetric patterns [6,7].

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 201–212, 2015.
DOI: 10.1007/978-3-319-23485-4 21
202 M.A. Javaheri Javid et al.

There are numerous studies on the quantitative [8] and qualitative

behaviour [14–16] of CA but they are mostly concerned with categorising the
rule space and the computational properties of CA. In this paper, we investigate
information gain as a spatial complexity measure of multi-state 2D CA patterns.
Although the Shannon entropy is commonly used to measure complexity, it fails
to discriminate accurately structurally different patterns in two-dimensions. The
main aim of this paper is to demonstrate the effectiveness of information gain
as a measure of 2D structural complexity.
This paper is organised as follows. Section 2 provides formal definitions and
establishes notation. Section 3 demonstrates that Shannon entropy is an inade-
quate measure of 2D cellular patterns. In the framework of the objectives of this
study a spatial complexity spectrum is formulated and the potential of informa-
tion gain as a structural complexity measure is discussed. Section 4 gives details
of experiments that test the effectiveness of information gain. The paper closes
with a discussion and summary of findings.

2 Cellular Automata
This section serves to specify the cellular automata considered in this paper, and
to deﬁne notation.
A cellular automaton A is speciﬁed by a quadruple L, S, N, f where:

– L is a ﬁnite square lattice of cells (i, j).

– S = {1, 2, . . . , k} is set of states. Each cell (i, j) in L has a state s ∈ S.
– N is neighbourhood, as specified by a set of lattice vectors {ea }, a =
1, 2, . . . , N . The neighbourhood of cell r = (i, j) is {r +e1 , r +e2 , . . . , r +eN }.
A cell is considered to be in its own neighbourhood so that one of {ea } is the
zero vector (0, 0). With an economy of notation, the cells in the neighbour-
hood of (i, j) can be numbered from 1 to N ; the neighbourhood states of (i, j)
can therefore be denoted (s1 , s2 , . . . , sN ). Two common neighbourhoods are
the five-cell von Neumann neighbourhood {(0, 0), (±1, 0), (0, ±1)} and the
nine-cell Moore neighbourhood {(0, 0), (±1, 0), (0, ±1), (±1, ±1)}. Periodic
boundary conditions are applied at the edges of the lattice so that complete
neighbourhoods exist for every cell in L.
– f is the update rule. f computes the state s1 (t + 1) of a given cell
from the states (s1 , s2 , . . . , sN ) of cells in its neighbourhood:s1 (t + 1) =
f (s1 , s2 , . . . , sN ). A quiescent state sq satisfies f (sq , sq , . . . , sq ) = sq .

The collection of states for all cells in L is known as a conﬁguration c. The

global rule F maps the whole automaton forward in time; it is the synchronous
application of f to each cell. The behaviour of a particular A is the sequence
c0 , c1 , c2 , . . . , ct−1 , where c0 is the initial conﬁguration (IC) at t = 0.
CA behaviour is sensitive to the IC and to L, S, N and f . The behaviour is
generally nonlinear and sometimes very complex; no single mathematical analysis
can describe, or even estimate, the behaviour of an arbitrary automaton. The
vast size of the rule space, and the fact that this rule space is unstructured,
Spatial Complexity Measure for Characterising Cellular Automata 203

mean that knowledge of the behaviour a particular cellular automaton, or even

of a set of automata, gives no insight into the behaviour of any other CA. In the
lack of any practical model to predict the behaviour of a CA, the only feasible
method is to run simulations. Fig. 1 illustrates some experimental conﬁgurations
generated by the authors to demonstrate the capabilities of CA in exhibiting
complex behaviour with visually pleasing qualities.

Fig. 1. Samples of multi-state 2D CA patterns

3 Spatial Complexity Measure of 2D Patterns

The introduction of information theory by Shannon provided a mathematical
model to measure the order and complexity of systems. Shannon’s information
theory was an attempt to address communication over an unreliable channel [12].
Entropy is the core of this theory [4]. Let X be discrete alphabet, X a discrete
random variable, x ∈ X a particular value of X and P (x) the probability of x.
Then the entropy, H(X), is:

H(X) = − P (x) log2 P (x) (1)
x∈X

The quantity H is the average uncertainty in bits, log2 ( p1 ) associated with

X. Entropy can also be interpreted as the average amount of information needed
to describe X. The value of entropy is always non-negative and reaches its max-
imum for the uniform distribution, log2 (|X |):

0 H log2 (|X |) (2)

The lower bound of relation (2) corresponds to a deterministic variable (no

uncertainty) and the upper bound corresponds to a maximum uncertainty asso-
ciated with a random variable. Another interpretation of entropy is as a measure
of order and complexity. A low entropy implies low uncertainty so the message
is highly predictable, ordered and less complex. And high entropy implies a
high uncertainty, less predictability, highly disordered and complex. Despite the
dominance of Shannon entropy as a measure of complexity, it fails to reﬂect on
structural characteristics of 2D patterns. The main reason for this drawback is
that it only reﬂects on the distribution of the symbols, and not on their order-
ing. This is illustrated in Fig. 2 where, following [1], the entropy of 2D patterns
204 M.A. Javaheri Javid et al.

with various structural characteristics is evaluated. Fig. 2a-b are patterns with
ordered structures and Fig. 2c is a pattern with repeated three element structure
over the plane. Fig. 2d is a fairly structureless pattern.

(a) (b) (c) (d)

H = 1.5850 H = 1.5850 H = 1.5850 H = 1.5850

Fig. 2. Measure of H for structurally diﬀerent patterns with uniform distribution of

elements

Fig. 2 clearly demonstrates the failure of entropy to discriminate structurally

diﬀerent 2D patterns. In other words, entropy is invariant to spatial rearrange-
ment of composing elements. This is in contrast to our intuitive perception of
the complexity of patterns and is problematic for the purpose of measuring the
complexity of multi-state 2D CA behaviour.
Taking into account our intuitive perception of complexity and structural
characteristics of 2D patterns, a complexity measure must be bounded by two
extreme points of complete order and disorder. It is reasonable to assume
that regular structures, irregular structures and structureless patterns lie along
between these extremes, as illustrated in Fig. 3.

regular structure | irregular structure | structureless

order ←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→ disorder

Fig. 3. The spectrum of spatial complexity.

A complete regular structure is a pattern of high symmetry, an irregular

structure is a pattern with some sort of structure but not as regular as a fully
symmetrical pattern and ﬁnally a structureless pattern is a random arrangement
of elements.
A measure introduced in [1,2,13] and known as information gain, has been
suggested as a means of characterising the complexity of dynamical systems and
of images. It measures the amount of information gained in bits when specifying
the value, x, of a random variable X given knowledge of the value, y, of another
random variable Y ,
Gx,y = − log2 P (x|y). (3)
P (x|y) is the conditional probability of a state x conditioned on the state y.
Then the mean information gain, GX,Y , is the average amount of information
Spatial Complexity Measure for Characterising Cellular Automata 205

gain from the description of the all possible states of Y :

GX,Y = P (x, y)Gx,y = − P (x, y) log2 P (x|y) (4)
x,y x,x

where P (x, y) is the joint probability, prob(X = x, Y = y). G is also known

as the conditional entropy, H(X|Y ) [4]. Conditional entropy is the reduction in
uncertainty of the joint distribution of X and Y given knowledge of Y , H(X|Y ) =
H(X, Y ) − H(Y ). The lower and upper bounds of GX,Y are
0 GX,Y log2 |X |. (5)
where y ∈ Y.
In principle, G can be calculated for a 2D pattern by considering the distri-
bution of cell states over pairs of cells r, s,

Gr,s = − P (sr , ss ) log2 P (sr , ss ) (6)
sr ,ss

where sr , ss are the states at r and s. Since |S|= N , Gr,s is a value in [0, N ].
In particular, horizontal and vertical near neighbour pairs provide four MIGs,
G(i,j),(i+1,j) , G(i,j),(i−1,j) , G(i,j),(i,j+1) and G(i,j),(i,j−1) . In the interests of nota-
tional economy, we write Gs in place of Gr,s , and omit parentheses, so that,
for example, Gi+1,j ≡ G(i,j),(i+1,j) . The relative positions for non-edge cells are
given by matrix M : ⎡ ⎤
(i,j+1)
M = ⎣ (i−1,j) (i,j) (i+1,j)
⎦. (7)
(i,j−1)

Correlations between cells on opposing lattice edges are not considered. Fig. 4
provides an example. The depicted pattern is composed of four diﬀerent symbols
S = {light-grey,grey,white,black }. The light-grey cell correlates with two neigh-
bouring white cells (i + 1, j) and (i, j − 1). On the other hand, The grey cell
has four neighbouring cells of which three are white and one is black. The result
of this edge condition is that Gi+1,j is not necessarily equal to Gi−1,j . Diﬀer-
ences between the horizontal (vertical) mean information rates reveal left/right
(up/down) orientation.

Fig. 4. A sample 2D pattern

The mean information gains of the sample patterns in Fig. 2 are presented
in Fig. 5. The merits of G in discriminating structurally diﬀerent patterns rang-
ing from the structured and symmetrical (Fig. 5a-b), to the partially structured
206 M.A. Javaheri Javid et al.

(Fig. 5c) and the structureless and random (Fig. 5d), are clearly evident. The
cells in the columns of pattern (a) are completely correlated. However knowledge
of cell state does not provide complete predictability in the horizontal direction
and, as a consequence, the horizontal G is finite. Pattern (b) has non-zero, and
identical G’s indicating a symmetry between horizontal and vertical directions,
and a lack of complete predictability. Analysis of pattern (c) is similar to (a)
except the roles of horizontal and vertical directions are interchanged. The four
Gs in the final pattern are all different, indicating a lack of vertical and hori-
zontal symmetry; the higher values show the increased randomness. Details of
calculations for a sample pattern are provided in the appendix.

(a) (b) (c) (d)

H = 1.5850 H = 1.5850 H = 1.5850 H = 1.5850
Gi,j+1 = 0 Gi,j+1 = 0.7564 Gi,j+1 = 0.9710 Gi,j+1 = 1.5188
Gi,j−1 = 0 Gi,j−1 = 0.7564 Gi,j−1 = 0.9710 Gi,j−1 = 1.5140
Gi−1,j = 0.8000 Gi−1,j = 0.7564 Gi−1,j = 0 Gi−1,j = 1.3565
Gi+1,j = 0.8000 Gi+1,j = 0.7564 Gi+1,j = 0 Gi+1,j = 1.3473

Fig. 5. The comparison of H with measures of Gi,j for structurally diﬀerent patterns.

4 Experiments and Results

A set of experiments was designed to examine the eﬀectiveness of G in discrim-
inating the particular patterns that are generated by a multi-state 2D cellular
automaton. The (outer-totalistic) CA is speciﬁed in Table 1. The chosen experi-
mental rule maps three states, represented by green, red and white; the quiescent
state is white.

Table 1. Speciﬁcations of experimental cellular automaton

L = 129 × 129 (16641 cells).

S = {0, 1, 2} ≡ {white, red, green}
N : von Neumann neighbourhood
f : S 9 → S ⎧ ⎫
⎪
⎪ 1 if s(i,j) (t) = 1, 2 and σ = 0 − 2 ⎪
⎪
⎨ ⎬
2 if s(i,j) (t) = 2, 3 and σ = 1
f (si,j )(t) = si,j (t + 1) =
⎪ 2 if s(i,j) (t) = 2 and σ = 2
⎪ ⎪
⎪
⎩ ⎭
0 otherwise
where σ is the sum total of the neighbourhood states.
Spatial Complexity Measure for Characterising Cellular Automata 207

The experiments are conducted with two diﬀerent ICs: (1) all white cells
except for a single red cell and (2) a random conﬁguration with 50% white
quiescent states (8320 cells), 25% red and 25% green. The experimental rule
has been iterated synchronously for 150 successive time steps. Fig. 6 and Fig. 7
illustrate the space-time diagrams for a sample of time steps starting from single
and random ICs.

Fig. 6. Space-time diagram of the experimental cellular automaton for sample time
steps starting from the single cell IC.

The behaviour of cellular automaton from the single cell IC is a sequence

of symmetrical patterns (Fig. 6). The directional measurements of Gi,j for the
single cell IC start with Gi,j+1 = Gi,j−1 = Gi−1,j = Gi+1,j = 0.00094 and
H = 0.00093, and they attain Gi,j+1 = Gi,j−1 = Gi−1,j = Gi+1,j = 1.13110 and
H = 1.13714 (and Figs 8, 11) at the end of the runs.
208 M.A. Javaheri Javid et al.

Fig. 7. Space-time diagram of the experimental cellular automaton for sample time
steps starting from the random IC.

The sequence of states can be analysed by considering the diﬀerences between

the up/down and left/right mean information gains, as deﬁned by

ΔGi,j±1 = |Gi,j+1 − Gi,j−1 | (8)

ΔGi±1,j = |Gi+1,j − Gi−1,j |. (9)

For the single cell IC, ΔGi,j±1 and ΔGi±1,j are constant for the 150 time
steps (ΔGi,j±1 = ΔGi,j±1 = 0). This indicates the development of the symmet-
rical patterns along the up/down and left/right directions.
The behaviour of cellular automaton from the random IC is a sequence of
irregular structures (Fig. 7). The formation of patterns with local structures has
reduced the values of Gi,j until a stable oscillating pattern is attained (Figs 7, 9).
This is an indicator of the development of irregular structures. However the
patterns are not random patterns since Gi,j ≈ 1.1 is less than the maximum
three-state value log2 (3) = 1.5850 (see Eq. 5). Mean information rate diﬀerences
Spatial Complexity Measure for Characterising Cellular Automata 209

ΔGi,j±1 and ΔGi±1,j for both ICs are plotted in Fig. 10. The structured but
asymmetrical patterns emerging from the random start are clearly distinguished
from the symmetrical patterns of the single cell IC.

Fig. 8. Measurements of H, Gi,j+1 , Gi,j−1 , Gi+1,j ,Gi−1,j for 150 time steps starting
from the single cell IC.

Fig. 9. Measurements of H, Gi,j+1 , Gi,j−1 , Gi+1,j ,Gi−1,j for 150 time steps starting
from the random IC.

These experiments demonstrate that a cellular automaton rule seeded with

diﬀerent ICs leads to the formation of patterns with structurally diﬀerent char-
acteristics. The gradient of the mean information rate along lattice axes is able
to detect the structural characteristics of patterns generated by this particular
multi-state 2D CA. From the comparison of H with ΔGi,j±1 and ΔGi±1,j in
the set of experiments, it is clear that entropy fails to discriminate between the
diversity of patterns that can be generated by various CA.
210 M.A. Javaheri Javid et al.

Fig. 10. Plots of ΔGi,j±1 and ΔGi±1,j for two diﬀerent ICs

(a) (b) (c) (d)

t=0 t = 150 t=0 t = 150
H = 0.00093 H = 1.13714 H = 1.50002 H = 1.09465
Gi,j+1 = 0.00094 Gi,j+1 = 1.13110 Gi,j+1 = 1.49924 Gi,j+1 = 1.08655
Gi,j−1 = 0.00094 Gi,j−1 = 1.13110 Gi,j−1 = 1.49972 Gi,j−1 = 1.08613
ΔGi,j±1 = 0 ΔGi,j±1 = 0 ΔGi,j±1 = 0.00048 ΔGi,j±1 = 0.00042
Gi−1,j = 0.00094 Gi−1,j = 1.13110 Gi−1,j = 1.50023 Gi−1,j = 1.08318
Gi+1,j = 0.00094 Gi+1,j = 1.13110 Gi+1,j = 1.49974 Gi+1,j = 1.08308
ΔGi±1,j = 0 ΔGi±1,j = 0 ΔGi±1,j = 0.00049 ΔGi±1,j = 0.00010

Fig. 11. Comparison of the cellular automaton’s H with four directional measure of
Gi,j ΔGi,j±1 and ΔGi±1,j starting from single (a, b) and random ICs (c, d).

5 Conclusion

Cellular automata (CA) are one of the early bio-inspired models of self-
replicating systems and, in 2D, are powerful tools for the pattern generation.
Indeed, multi-state 2D CA can generate many interesting and complex patterns
with various structural characteristics. This paper considers an information-
theoretic classiﬁcation of these patterns.
Entropy, which is a statistical measure of the distribution of cell states, is not
in general able to distinguish these patterns. However mean information gain,
as proposed in [1,2,13], takes into account conditional and joint probabilities
Spatial Complexity Measure for Characterising Cellular Automata 211

between pairs of cells and, since it is based on correlations between cells, holds
promise for pattern classiﬁcation.
This paper reports on a pair of experiments for two diﬀerent initial con-
ditions of an outer-totalistic CA. The potential of mean information gain for
distinguishing multi-state 2D CA patterns is demonstrated. Indeed, the mea-
sure appears to be particular good at distinguishing symmetry from non-random
non-asymmetric patterns.
Since CA are one of the generative tools in computer art, means of evaluating
the aesthetic qualities of CA generated patterns could have a substantial con-
tribution towards further automation of CA art. This is the subject of on-going
research.

Appendix
In this example the pattern is composed of two diﬀerent cells S = {white, black}
where the set of permutations with repetition is {ww, wb, bb, bw}. Considering
the mean information gain (Eq. 4) and given the positional matrix M (Eq. 7),
the calculations can be performed as follows:

white − white black − black

P (w, s(i,j+1) ) = 56 P (b, s(i,j+1) ) = 16
P (w|w(i,j+1) ) = 45 P (b|b(i,j+1) ) = 11
P (w, w(i,j+1) ) = 56 × 45 = 23 P (b, b(i,j+1) ) = 16 × 11 = 16
G(w, w(i,j+1) ) = 23 log2 P ( 45 ) G(b, b(i,j+1) ) = 16 log2 P (1)
G(w, w(i,j+1) ) = 0.2146 bits G(b, b(i,j+1) ) = 0 bits
white − black black − white
P (w, s(i,j+1) ) = 56 P (b, s(i,j+1) ) = 16
P (w|b(j+1) ) = 15 P (b|w(i,j+1) ) = 01
5 1 1
P (w, b(i,j+1) ) = 6 × 5 = 6 P (b, w(i,j+1) ) = 16 × 0
1 1
G(w, b(i,j+1) ) = 6 log2 P 5 G(b, w(i,j+1) ) = 0 bits
G(w, b(i,j+1) ) = 0.3869 bits
G = G(w, w(i,j+1) ) + G(w, b(i,j+1) ) + G(b, b(i,j+1) ) + G(b, w(i,j+1) )
G = 0.6016 bits
In white − white case G measures the uniformity and spatial property where
P (w, s(i,j+1) ) is the joint probability that a cell is white and it has a neighbouring
cell at its (i, j + 1) position, P (w|w(i,j+1) ) is the conditional probability of a
cell is white given that it has white neighbouring cell at its (i, j + 1) position,
P (w, w(i,j+1) ) is the joint probability that a cell is white and it has neighbouring
cell at its (i, j + 1) position, G(w, w(i,j+1) ) is information gain in bits from
specifying a white cell where it has a white neighbouring cell at its (i, j + 1)
position. The same calculations are performed for the rest of cases; black-black,
white-black and black-white.
212 M.A. Javaheri Javid et al.

References
1. Andrienko, Y.A., Brilliantov, N.V., Kurths, J.: Complexity of two-dimensional pat-
terns. Eur. Phys. J. B 15(3), 539–546 (2000)
2. Bates, J.E., Shepard, H.K.: Measuring complexity using information fluctuation.
Physics Letters A 172(6), 416–425 (1993)
3. Brown, P.: Stepping stones in the mist. In: Creative Evolutionary Systems,
pp. 387–407. Morgan Kaufmann Publishers Inc. (2001)
4. Cover, T.M., Thomas, J.A.: Elements of Information Theory (Wiley Series in
Telecommunications and Signal Processing). Wiley-Interscience (2006)
5. Frazer, J.: An evolutionary architecture. Architectural Association Publications,
Themes VII (1995)
6. Javaheri Javid, M.A., Al-Rifaie, M.M., Zimmer, R.: Detecting symmetry in cel-
lular automata generated patterns using swarm intelligence. In: Dediu, A.-H.,
Lozano, M., Martı́n-Vide, C. (eds.) TPNC 2014. LNCS, vol. 8890, pp. 83–94.
Springer, Heidelberg (2014)
7. Javaheri Javid, M.A., te Boekhorst, R.: Cell dormancy in cellular automata. In:
Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2006.
LNCS, vol. 3993, pp. 367–374. Springer, Heidelberg (2006)
8. Langton, C.G.: Studying artificial life with cellular automata. Physica D: Nonlinear
Phenomena 22(1), 120–149 (1986)
9. Miranda, E.: Composing Music with Computers. No. 1 in Composing Music with
Computers. Focal Press (2001)
10. Scha, I.R.: Kunstmatige Kunst. De Commectie 2(1), 4–7 (2006)
11. Schwartz, L., Schwartz, L.: The Computer Artist’s Handbook: Concepts, Tech-
niques, and Applications. W W Norton & Company Incorporated (1992)
12. Shannon, C.: A mathematical theory of communication. The Bell System Technical
Journal 27, 379–423, 623–656 (1948)
13. Wackerbauer, R., Witt, A., Atmanspacher, H., Kurths, J., Scheingraber, H.:
A comparative classification of complexity measures. Chaos, Solitons & Fractals
4(1), 133–173 (1994)
14. Wolfram, S.: Statistical mechanics of cellular automata. Reviews of Modern Physics
55(3), 601–644 (1983)
15. Wolfram, S.: Universality and complexity in cellular automata. Physica D:
Nonlinear Phenomena 10(1), 1–35 (1984)
16. Wolfram, S.: A New Kind of Science. Wolfram Media Inc. (2002)
17. Xenakis, I.: Formalized music: thought and mathematics in composition.
Pendragon Press (1992)
Electricity Demand Modelling
with Genetic Programming

Mauro Castelli1 , Matteo De Felice2 , Luca Manzoni3(B) ,

and Leonardo Vanneschi1
1
NOVA IMS, Universidade Nova de Lisboa, 1070-312 Lisboa, Portugal
{mcastelli,lvanneschi}@novaims.unl.pt
2
Energy and Environment Modeling Technical Unit (UTMEA), ENEA,
Casaccia Research Center, 00123 Roma, Italy
[email protected]
3
Dipartimento di Informatica, Sistemistica E Comunicazione,
Università Degli Studi di Milano Bicocca, 20126 Milano, Italy
[email protected]

Abstract. Load forecasting is a critical task for all the operations of

power systems. Especially during hot seasons, the inﬂuence of weather
on energy demand may be strong, principally due to the use of air con-
ditioning and refrigeration. This paper investigates the application of
Genetic Programming on day-ahead load forecasting, comparing it with
Neural Networks, Neural Networks Ensembles and Model Trees. All the
experimentations have been performed on real data collected from the
Italian electric grid during the summer period. Results show the suitabil-
ity of Genetic Programming in providing good solutions to this problem.
The advantage of using Genetic Programming, with respect to the other
methods, is its ability to produce solutions that explain data in an intu-
itively meaningful way and that could be easily interpreted by a human
being. This fact allows the practitioner to gain a better understanding
of the problem under exam and to analyze the interactions between the
features that characterize it.

1 Introduction

Load forecasting is the task of predicting the electricity demand on different time
scales, such as minutes (very short-term), hours/days (short-term), and months
and years (long-term). This information has to be used to plan and schedule
operations on power systems (dispatch, unit commitment, network analysis) in
a way to control the flow of electricity in an optimal way, with respect to various
aspects (quality of service, reliability, costs, etc). An accurate load forecasting
has great benefits for electric utilities and both negative or positive errors lead
to increased operating costs [10]. Overestimate the load leads to an unnecessary
energy production or purchase and, on the contrary, underestimation causes
unmet demand with a higher probability of failures and costly operations. Several
factors influence electricity demand: day of the week and holidays (the so-called

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 213–225, 2015.
DOI: 10.1007/978-3-319-23485-4 22
214 M. Castelli et al.

“calendar eﬀects”), special or unusual events, and weather conditions. In warm

countries, the last factor is particularly critical during summer, when the use of
refrigeration, irrigation and air conditioning becomes more common than in the
rest of the year.
Most of the used methods for Load forecasting are time-series approaches,
like Box-Jenkins models, or artificial intelligence methods, like Neural Networks
(NN). There is a large literature about the use of computer science for load fore-
casting, in Section 2 a review of this literature is proposed. In the last decades,
techniques based on Computational Intelligence (CI) methods have been pro-
posed to overcome the most common problems of traditional methods, especially
in most difficult scenarios. These techniques have demostrated their effectiveness
in several cases, often becoming a valid alternative to conventional methods.
A CI method is Genetic Programming [11,21] (GP). GP has several advan-
tages over others machine learning methods, including the ones that have been
considered in this work, that are Neural Networks and Model Trees. In particular,
one interesting feature is the ability of GP to produce human-readable solutions.
This property may allow an in depth analysis of the features that characterize
a specific problem. It is important to underline that GP trees are usually very
large and have significant redundancy (introns) even with parsimony measures.
Thus, they can be very difficult to interpret. However, these trees can be usu-
ally simplified, producing a compact and readable model. In this work, the term
human-readable is related to the fact that GP produces a model that represents
interactions between variables, and that can be used for a better understand-
ing of the problem under examination, a property that is particularly useful in
real-life problems. This property is not true when, for instance, Neural Networks
are considered. In fact NNs produce numerical matrices of weights, which does
not facilitate the practitioner’s task of obtaining a better understanding of the
relations between the features that characterize a specific problem. In this work
we use GP for the load forecast problem during summer period, and we provide
a comparison between GP and three different machine learning techniques on
this problem.
The paper is organized as follows: Section 2 introduces the load forecasting
problem and gives an overview of the state of the art approaches to deal with this
problem. Section 3 briefly presents the techniques considered in this paper: neural
networks, neural networks ensemble, model trees and GP. Section 4 describes
the experimental phase, reporting the experimental settings and discussing the
results. Section 5 concludes the paper and summarizes the results of this work.

2 Short-Term Load Forecasting

Load forecasting is commonly deﬁned “short-term” when the prediction horizon
is from one hour to one week. For this kind of problem various factors might be
considered, such as weather data or, more in general, all the factors inﬂuencing
the load pattern (e.g. day of the week). Load data normally exhibit seasonality:
the load at time t tends to be similar to the load at time t − k with k usually
Electricity Demand Modelling with Genetic Programming 215

representing a day, a week or a month, depending on the dataset. In this paper,

we are focusing on daily average load and we can observe that the load at the
day t is usually similar to the one at the same day of the previous week (i.e.,
weekly seasonality), with the exception of holidays and weekends, when the load
pattern is usually more unpredictable.

2.1 Forecasting Methods: State of the Art

Different methods have been used to cope with load forecasting, mainly clas-
sical statistical methods and machine learning techniques. The first approach,
time series analysis, consists of the determination of the relationship between
process input and output with a linear model. To do that, observations that
are assumed equispaced discrete-time samples are used. There is a wide variety
of models to deal with this problem, the most popular in engineering applica-
tions are probably autoregression (AR) and moving-average (MA) models with
their combinations (ARMA, ARIMA, etc) and a common methodology is the
iterative one proposed by Box & Jenkins [4]. Since Park’s paper in 1991 [20],
Neural Networks have been widely applied to the Load forecasting problem. The
main advantage of NNs is their implicit nonlinearity which can potentially allow
modeling complex dynamics. On the other hand, NNs present numerous param-
eters that are usually tuned with empirical approaches. Furthermore, NNs com-
putational needs become expensive with high-dimensional problems and large
datasets. Some reviews on the application of NNs can be found in [1,7,9]. Model
trees, which we will introduce in the next section, have been used less frequently
and, to the best of our knowledge, there are only few works using them for
forecasting [19,24,25].
While GP has been used [3,16] to face the load forecasting problem, it is not
a widely applied technique like other CI approaches. This is probably due to the
fact that GP is computationally expensive, in particular the evaluation of can-
didate solutions requires more time with respect to other machine learning tech-
niques. On the other hand, differently from other machine learning techniques
like Neural Networks, GP is able to produce solutions that are easier to read
and interpret by humans [11], a feature that can be particularly useful in some
applications. Moreover, like other Evolutionary Algorithms, the practitioner may
design a specific fitness function in order to make the algorithm focusing on a
specific problem feature [14]. While this makes the whole task “technically” more
difficult, on the other hand the algorithm becomes more versatile: depending on
how the fitness function is defined (or functions if multi-objective optimization
is considered), it is possible to direct GP towards different parts of the solution
space. In other words, it is possible to steer GP to particular behaviors, that are
much more difficult to obtain with other machine learning techniques. Hence, it
could be not only used for producing a forecasting model, but also to have some
hints on the role of the variables that influence the forecasting model itself. As
reported in [12], GP has been used to produce many instances of results that
are competitive with human-produced results. These human-competitive results
come from a wide variety of fields, including quantum computing circuits [2],
216 M. Castelli et al.

analog electrical circuits [14], antennas [18], mechanical systems [17], photonic
systems [22], optical lens systems [13] and sorting networks [15].

3 Considered Techniques for the Load Forecasting

Problem

This section brieﬂy introduces the techniques used in this paper: neural networks,
model trees, and genetic programming.

3.1 Neural Networks

Neural Networks are a non-linear statistical modeling tool. In this work we

use feed-forward neural networks trained with Levenberg-Marquardt back-
propagation training algorithm, implemented in MATLAB. In addition to single
neural networks, we use an ensemble of 100 neural networks with their outputs
combined using an arithmetic mean. Neural networks ensemble have been intro-
duced in [8] and this approach has already been used for load forecasting in [5].

3.2 Model Trees

Model trees (M5 system) can be considered an extension of regression trees

introduced by Quinlan in 1992 [23]. A regression tree is a type of prediction tree
where in each leaf (terminal node) there is a zero-order model (i.e., constant
value) predicting the target variable. On the other hand, trees built with the M5
algorithm [23] have at their leaves a multivariate linear model (i.e., first-order
model). This linear model is the one that best fits those training points that
satisfy the conditions represented in the internal nodes that are on the path
from the root to the linear model itself. In these binary trees each internal node
represents a “rule” that defines which sub-branch of the tree we have to use to
make a prediction on a particular case. Model trees are basically a combination
between conventional regression trees and linear regression models. They are
particularly useful in the cases where a single global model is hard to obtain.
Their ability in partitioning the sample space allows finding the part of the data
space where a linear model best fits. The advantages of this technique are sev-
eral: easiness of implementation, possibility to cope with not-smooth regression
surfaces and existence of fast and reliable learning algorithms. In this work, we
used a MATLAB implementation of the M5 algorithm1 . For a description of the
algorithm we refer to the Quinlan original paper [23] and to the improvement
made by Wang [26].

1
M5PrimeLab is an open source toolbox for MATLAB/Octave available at http://
www.cs.rtu.lv/jekabsons/regression.html
Electricity Demand Modelling with Genetic Programming 217

3.3 Genetic Programming

GP is a machine learning technique inspired by Darwin’s theory of evolution. In

GP terminology [11] each candidate solution is called individual and the qual-
ity of the solution is called fitness. Fitness is a function that associates a real
number to each possible individual. In minimization (respectively maximization)
problems the objective is to find the solution with the minimal (respectively max-
imal) fitness (or a good-enough approximation of it). The GP algorithm is an
iterative process (every iteration is called generation) that explores the search
space to find good individuals (solutions) by evolving a population (set of pos-
sible solutions). In doing this exploration GP uses several genetic operators to
mimic the natural evolution process: selection, mutation, and crossover. In GP
individuals are traditionally represented as LISP-like trees. The selection oper-
ator selects individuals in function of their fitness with better solutions having
higher probability of being selected. The selection operator is the only GP oper-
ator that works on fitness. The two operators that can change the structure of
GP individuals are mutation and crossover. Mutation can be defined as random
manipulation that operates on only one individual. The aim of the mutation is to
avoid local optima and to move the search to new areas of the search space [11].
This operator selects a point in the GP tree randomly and replaces the existing
sub-tree at that point with a new randomly generated sub-tree. The crossover
operator combines the genetic material of two parents by swapping a sub-tree
of one parent with a part of the other. The crossover operator is used to com-
bine the pairs of selected individuals (parents) to create new individuals that
potentially have a higher fitness than either of their parents. For a complete
description of GP and ability to solve real-world problems, the reader is referred
to [11,21].
In GP every internal node is a function whose arguments are its children
(that can also be other functions). Leaf nodes are constants or variables. Thus,
a GP individual is a mathematical function and the sequence of composition
of “basic functions” needed to produce it. Thus, it is important to underline
that, differently from the aforementioned model trees, GP produces a solution
that represents an algorithm to solve a particular problem and not a method to
partition the sample space. In this work we used our C++ implementation of
GP.

4 Experiments

In this work we compare the performances of diﬀerent machine learning tech-

niques on the electric load forecasting problem using Italian national grid data
provided by TERNA2 . We collected the daily average load y during the working
days in June and July for the years 2003-2009, for a ﬁnal data set of 300 samples.
2
TERNA is the owner of the Italian transmission grid and the responsible for energy
transmission and dispatching. Real-time data about electricity demand are available
on their homepage www.terna.it
218 M. Castelli et al.

The aim of this forecasting task is the load at time t (yt ) providing information
until day t − 1 (one-day ahead forecasting) using the past samples of the load
and the information provided by temperature. We built a data set with 9 input
variables: x0 , x1 , . . . , x6 representing the daily load for each past day. The vari-
able x0 refers to the load at time t − 7 while x6 to the load at time t − 1. The
value x7 is the daily average temperature (Celsius degrees) at day t − 1 while x8
is the daily average temperature the same day of the forecast.
Temperature data have been obtained with an average of all the data avail-
able in Italy provided by the ECMWF (European Centre for Medium-Range
Weather Forecasts) ERA-Interim reanalysis [6]. For the variable x8 , we assume
to have a perfect forecast for the day t and hence we use the observed data.

4.1 Experimental Settings

As explained before, the techniques we have considered are the standard GP
algorithm, back propagation Neural Networks, and M5 Model Trees.
Training and test sets have been obtained by randomly splitting the dataset
described in the previous section. In particular 50 different partitions of the
original dataset with its same size, have been considered. In each partition 70%
of the data have been randomly selected (without replacement) with uniform
probability and inserted into the training set, while the remaining 30% form the
test set (i.e., they are not used during the training phase). For each partition
a total of 50 runs were performed with each technique (2500 runs in total).
This setup has been implemented in order to perform a fair comparison between
different techniques. In fact, unlike M5 algorithm, GP and NNs are stochastic
methods and thus their performances can be influenced by initial conditions.

Neural Networks. Feed-forward neural networks with 9 inputs, 3 hidden neu-

rons (with hyperbolic tangent transfer function) and 1 output have been used.
The number of hidden neurons has been chosen after a set of preliminary tests.
For each dataset we created 100 neural networks with diﬀerent initial weights
and, after the training phase, their output has been combined using an arith-
metic mean and hance evaluated on the test set. For investigation purposes, in
this paper we present both the results for the ensemble and for the best-training
NN, i.e., the NN with the lower error during the training phase.

M5 Model Trees. In this case we performed a single execution for each dataset,
in fact the M5 algorithm is not stochastic and so there is no need to perform mul-
tiple runs. We set as 10 the minimum number of training data cases represented
by each leaf, this value has been selected after a set of exploratory tests.

Genetic Programming. Regarding the experimental settings related to stan-

dard GP, all the runs used populations of 100 individuals allowed to evolve for
100 generations. Tree initialization was performed with the Ramped Half-and-
Half method [21] with a maximum initial depth of 6. The function set contained
Electricity Demand Modelling with Genetic Programming 219

the four binary operators +, −, ∗, and / protected as in [21]. The terminal set
contained 9 variables and 100 random constants randomly generated in the range
[0, 40000]. This range has been chosen considering the magnitude of the values
at stake in the considered application. Because the cardinalities of the function
and terminal sets were so diﬀerent, we have explicitly imposed functions and
terminals to have the same probability of being chosen when a random node is
needed. The reproduction (replication) rate was 0.1, meaning that each selected
parent has a 10% chance of being copied to the next generation instead of being
engaged in breeding. Standard tree mutation and standard crossover (with uni-
form selection of crossover and mutation points) were used with probabilities of
0.1 and 0.9, respectively. Recall that the use crossover and mutation are applied
only on individuals not selected for the replication. The new random branch
created for mutation has maximum depth 6. Selection for survival was elitist,
with the best individual preserved in the next generation. The selection method
used was tournament selection with size 6. The maximum tree depth is 17. This
depth value is considered the standard value for this parameter [11]. Despite
the higher number of parameters, parameter tuning in GP was not particularly
problematic with respect to the other methods since there are some general rules
(e.g., low mutation and high crossover rate) that provide a good starting point
for setting the parameters.

4.2 Results
We outline here the results obtained after the performed experimentation on the
50 different partitions of the dataset.
The objective of the learning process is to minimize the root mean squared
error (RMSE) between outputs and targets. For each partition of the dataset, we
collected the RMSE on test set of the best individual produced at the end of the
training process. Thus, we have 50 values for each partition and we considered the
median of these 50 values. The median was preferred over the arithmetic mean
due to its robustness to outliers. Repeating this process with all the considered
50 partitions results in a set of 50 values. Each value is the median of the error on
test set at the end of the learning process, for a specific partition of the dataset.
Table 1 reports median and standard deviation of all the median errors
achieved considering all the 50 partitions of the dataset for the considered tech-
niques. The same results are shown with a boxplot in Fig. 1(a). Denoting by IQR
the interquartile range, the ends of the whiskers represent the lowest datum still
within 1.5·IQR of the lower quartile, and the highest datum still within 1.5·IQR
of the upper quartile. Errors for each dataset are shown in Fig. 1(b) where it is
particularly visible that GP and M5 have similar performances.
GP is the best performer, considering both the median and the standard
deviation. To analyze the statistical significance of these results, a set of statis-
tical tests has been performed on the resulting median errors. The Kolmogorov-
Smirnov test shows that the data are not normally distributed hence a rank-
based statistic has been used. The Mann Whitney rank-sum test for pairwise
data comparison is used under the alternative hypothesis that the samples do
220 M. Castelli et al.

Table 1. Median and standard deviation of median test errors of the dataset’s parti-
tions for the considered techniques.

Technique Median [kW] Standard deviation (σ)

Neural Networks (best) 27.599 6.731
Ensemble 24.887 7.467
Genetic Programming 22.993 4.622
M5 Model Trees 23.122 4.730

not have equal medians. The p-values obtained are 3.9697 · 10−7 when GP is
compared to Neural Networks, 0.0129 when it is compared to a Neural Network
ensemble and 0.9862 when GP is compared to M5 trees. Therefore, when using
a significance level α = 0.05 with a Bonferroni correction for the value α, we
obtain that in the first two cases GP produces fitness values that are significantly
lower (i.e., better) than the other methods, but the same conclusion cannot be
reached when comparing to M5 (the p-value is equal to 0.9862). Results of Mann
Whitney test are summarized in Table 2.
For a better understanding of the dynamic of the evolutionary process for this
particular real-world problem, in Fig. 2 the median of the test fitness generation
by generation is reported.

Table 2. p-values of Mann Whitney rank-sum test. In bold values that can not reject
the null hypothesis with 5% significance value, meaning that errors difference is not
significative.

p-value NNs NN Ens. GP M5

NNs - 0.0074 3.9697 · 10−7 1.0069 · 10−7
NN Ens. 0.0074 - 0.0129 0.0182
GP 3.9697 · 10−7 0.0129 - 0.9862
M5 1.0069 · 10−7 0.0182 0.9862 -

4.3 Analysis of Results

We applied four different techniques to face the load forecasting problem and the
solutions achieved have different forms. As stated before, GP and M5 produce,
differently from NNs, solutions that could be easily interpreted. Hence, in this
section we want to analyze the structure of the best solution returned by both
these techniques.
We achieved 50 model trees with the M5 algorithm, one for each testing
dataset. We analyzed all the linear models present in the tree leaves with the
aim of understanding which variables are most used. We found that the variable
x8 is used 55 times. Moreover variables from x7 to x4 are used respectively 45,
55, 3 and 3 times. The variable x0 is used 9 times and x2 only once. Then,
as expected, the most common used variables are temperatures and day-before
Electricity Demand Modelling with Genetic Programming 221

40
4 NNs
35 NN Ens
3.5 GP
Test RMSE x104

30 M5

Test RMSE
3
25
2.5

2 20

1.5 15

1 NNs NN Ens GP M5
10
0 10 20 30 40 50
Method Dataset (sorted by RMSE)
(a) (b)

Fig. 1. Summary of test errors.

4
x 10
5
Median of Test Error for GP
4.5

4
Test Error

3.5

2.5

2
0 20 40 60 80 100
Number of Generations

Fig. 2. Median of test ﬁtness generation by generation. 50 independent runs have been
considered.

load. In particular x8 is the temperature at the day of the forecasting, x7 is the

temperature at the day before the forecasting and x6 is the energy consumption
the day before the forecasting. Median values for the coefficients of x6 , x7 and x8
are respectively 0.9164, −9435 and 10057 (relative standard deviations 17.6%,
20.7% and 25.5%). In Figure 3 a sample model tree is shown, it consists of
five linear models of which three are constants. Considering the best individuals
obtained by GP in all the considered partitions of the dataset, we can observe
that all of them show a common structure. In particular, best individuals have
this structure:
x6 + f (x7 , x8 )
where x6 is the energy consumption at time t − 1 and f is a polynomial (that
we call “correction”) that only considers the temperatures at time t − 1 and t.
Below we report two (simplified) individuals returned by GP:
222 M. Castelli et al.

x1 ≤ 66267

L1 (128) x9 ≤ 22.76

L2 (11) x7 ≤ 111090

x7 ≤ 83770 L5 (23)

L3 (21) L4 (22)

• L1 :y = −69185 + 0.718 x7 + 3642 x9

• L2 :y = 36364
• L3 :y = 81604
• L4 :y = 101605
• L5 :y = −205637 + 13344 x9

Fig. 3. A sample model tree. In brackets the number of data samples represented by
the speciﬁc linear model is given.

I1 : x6 + x7 − x8 + 4 x7 · (2 x7 + (−x7 + x8 ) · (x7 + 6 x8 )) − 3, 968.452198

I2 : x6 + x7 − x8 + (x7 − x8 ) · x8 + x27 x8 · (−x7 + x8 ) + 1, 507.852134

As it is possible to see, GP returns solutions that could be understood and

interpreted more easily than Neural Networks and M5 Model Trees. In fact, with
respect to the latter, GP does not provide a tree with decision nodes and a linear
model in every leaf. Instead, it provides a more compact non-linear model and
it also performs an automatic feature selection; in fact, the ﬁnal model contains
only three variables. Furthermore, the form of the model (x6 + f (x7 , x8 )) is
preserved in all the solutions produced during the performed runs. We want also
to point out that while many standard techniques used for prediction are either
limited in the form of the solution presented or in the legibility of the solutions,
GP has neither problems. In fact, the expressiveness of GP is limited only by
the choice of functional and terminal symbols and by the size of the trees.
Figure 4 reports the polynomial P obtained considering the average correc-
tion of the 50 best individuals:

1
50
P (x7 , x8 ) = fi (x7 , x8 ) (1)
50 i=1

The bigger is the diﬀerence between x7 and x8 , the bigger is the “correc-
tion”. This result matches our intuition regarding energy demand pattern: if in
a particular day D, with a temperature equal to T , the energy consumption is
E, we expect that the day D + 1 the energy consumption will be greater than E
Electricity Demand Modelling with Genetic Programming 223

Fig. 4. Average correction considering the best individuals of all the considered
datasets. x7 is the temperature at time t − 1 while x8 is the temperature at time
t. The black points represent the pairs of temperatures present in the dataset.

if the temperature is higher than T while we expect to have a lower energy

consumption otherwise. As we stated before, the electricity demand in Italy
during summer is strongly affected by the use of the air conditioning systems.
Having a model that is easily understandable and that respects our intuition
of the problem (we could informally say a model that “makes sense”) allows a
manual validation by the end user. It is a crucial issue in convincing practitioners
to really use it for their applications. In our opinion, the GP models are the only
ones (among the studied techniques) that have this characteristic of intuitivity
and practical consistence with our interpretation of the problem. Basically, GP
is saying to us that the energy consumption in a given day is the same as the one
of previous day plus (or minus) a given quantity that directly depends on the
temperature increase (or decrease) between that day and the previous one. Any
practitioner would understand and trust such a model as a reliable explanation
of the data. In fact, GP performs an effective (and intuitively meaningful) auto-
matic feature selection that results in the preservation of only three variables.
Differently, NNs always use all the provided variables in their solutions unless
an appropriate feature selection algorithm is used in a preprocessing phase. M5
Model trees also use a different subset of variables in each of the linear mod-
els that are present in their leaves while GP always uses the same 3 features,
consistently in all the best individuals found in all the runs.
224 M. Castelli et al.

5 Conclusions
Energy load forecasting can provide important information regarding future
energy consumption. In this work, a genetic programming based forecasting
method has been presented and a comparison with other machine learning tech-
niques has been performed. Experimental results show that GP and M5 perform
quite similar, with a difference that is not statistically significant. Neural net-
works and the ensemble method have returned results of poorer quality. GP,
differently from the other methods, produces solutions that explain data in a
simple and intuitively meaningful way and could be easily interpreted. Hence,
GP has been used in this work not only to build a model, but also to evaluate
the effect of the parameters of the model. In fact, for the considered problem GP
highlighted the relationship between the energy consumption and the external
temperature. Moreover, GP demonstrates the ability to perform feature selection
in an automatic and effective way, using a more compact set of variables than
the other techniques and using always the same limited set of variables in all
the returned solutions. When other machine learning techniques are considered,
the analysis of the solutions is much more difficult and less intuitive. Possible
future works may consider longer forecasting periods, also with the use of data
provided by temperatures forecasts.

References
1. Adya, M., Collopy, F.: How eﬀective are neural networks at forecasting and pre-
diction? A review and evaluation. Journal of Forecasting 17, 481–495 (1998)
2. Barnum, H., Bernstein, H.J., Spector, L.: Quantum circuits for OR and AND of
ORs. Journal of Physics A: Mathematical and General 33, 8047–8057 (2000)
3. Bhattacharya, M., Abraham, A., Nath, B.: A linear genetic programming approach
for modelling electricity demand prediction in victoria. In: Hybrid Information
Systems, pp. 379–393. Springer (2002)
4. Box, G., Jenkins, G.M., Reinsel, G.: Time Series Analysis: Forecasting & Control,
3rd edn. Prentice Hall, February 1994
5. De Felice, M., Yao, X.: Neural networks ensembles for short-term load forecasting.
In: IEEE Symposium Series in Computational Intelligence (SSCI) (2011)
6. Dee, D., Uppala, S., Simmons, A., Berrisford, P., Poli, P., Kobayashi, S., Andrae, U.,
Balmaseda, M., Balsamo, G., Bauer, P., et al.: The ERA-Interim reanalysis: con-
ﬁguration and performance of the data assimilation system. Quarterly Journal of
the Royal Meteorological Society 137(656), 553–597 (2011)
7. Feinberg, E.A., Genethliou, D.: Load forecasting. In: Chow, J., Wu, F., Momoh, J.
(eds.) Applied Mathematics for Restructured Electric Power Systems: Optimiza-
tion, Control and Computational Intelligence, pp. 269–285. Springer (2005)
8. Hansen, J., Nelson, R.: Neural networks and traditional time series methods: a
synergistic combination in state economic forecasts. IEEE Transactions on Neural
Networks 8(4), 863–873 (1997)
9. Hippert, H., Pedreira, C., Souza, R.: Neural networks for short-term load forecast-
ing: a review and evaluation. IEEE Transactions on Power Systems 16(1), 44–55
(2001)
Electricity Demand Modelling with Genetic Programming 225

10. Hobbs, B., Jitprapaikulsarn, S., Konda, S., Chankong, V., Loparo, K.,
Maratukulam, D.: Analysis of the value for unit commitment of improved load
forecasts. IEEE Transactions on Power Systems 14(4), 1342–1348 (1999)
11. Koza, J.R.: Genetic programming: on the programming of computers by natural
selection. MIT Press, Cambridge (1992)
12. Koza, J.R.: Human-competitive results produced by genetic programming. Genetic
Programming and Evolvable Machines 11, 251–284 (2010)
13. Koza, J.R., Al-Sakran, S.H., Jones, L.W.: Automated ab initio synthesis of com-
plete designs of four patented optical lens systems by means of genetic program-
ming. Artif. Intell. Eng. Des. Anal. Manuf. 22(3), 249–273 (2008)
14. Koza, J.R., Andre, D., Bennett, F.H., Keane, M.A.: Genetic Programming III:
Darwinian Invention & Problem Solving, 1st edn. Morgan Kaufmann Publishers
Inc., San Francisco (1999)
15. Koza, J.R., Bade, S.L., Bennett, F.H.: Evolving sorting networks using genetic
programming and rapidly reconfigurable field-programmable gate arrays. In: Work-
shop on Evolvable Systems. International Joint Conference on Artificial Intelli-
gence, pp. 27–32. IEEE Press (1997)
16. Lee, D.G., Lee, B.W., Chang, S.H.: Genetic programming model for long-term fore-
casting of electric power demand. Electric Power Systems Research 40(1), 17–22
(1997)
17. Lipson, H.: Evolutionary synthesis of kinematic mechanisms. Artif. Intell. Eng. Des.
Anal. Manuf. 22(3), 195–205 (2008)
18. Lohn, J.D., Hornby, G.S., Linden, D.S.: Human-competitive evolved antennas.
Artif. Intell. Eng. Des. Anal. Manuf. 22(3), 235–247 (2008)
19. Troncoso Lora, A., Riquelme, J.C., Martı́nez Ramos, J.L., Riquelme Santos, J.M.,
Gómez Expósito, A.: Influence of kNN-based load forecasting errors on optimal
energy production. In: Pires, F.M., Abreu, S.P. (eds.) EPIA 2003. LNCS (LNAI),
vol. 2902, pp. 189–203. Springer, Heidelberg (2003)
20. Park, D., El-Sharkawi, M., Marks, R., Atlas, L., Damborg, M.: Electric load fore-
casting using an artificial neural network. IEEE Transactions on Power Systems 6,
442–449 (1991)
21. Poli, R., Langdon, W.B., Mcphee, N.F.: A field guide to genetic programming,
published via Lulu.com (2008). http://www.gp-field-guide.org.uk/
22. Preble, S., Lipson, M., Lipson, H.: Two-dimensional photonic crystals designed by
evolutionary algorithms. Applied Physics Letters 86(6) (2005)
23. Quinlan, R.J.: Learning with continuous classes. In: 5th Australian Joint Confer-
ence on Artificial Intelligence, pp. 343–348. World Scientific, Singapore (1992)
24. Solomatine, D., Xue, Y.: M5 model trees and neural networks: application to flood
forecasting in the upper reach of the Huai River in China. Journal of Hydrologic
Engineering 9(6), 491–501 (2004)
25. Štravs, L., Brilly, M.: Development of a low-flow forecasting model using the M5
machine learning method. Hydrological Sciences Journal 52(3), 466–477 (2007)
26. Wang, Y., Witten, I.: Inducing model trees for continuous classes. In: Proceedings
of the Ninth European Conference on Machine Learning, pp. 128–137 (1997)
The Optimization Ability of Evolved Strategies

Nuno Lourenço1(B) , Francisco B. Pereira1,2 , and Ernesto Costa1

1
CISUC, Department of Informatics Engineering, University of Coimbra,
Polo II - Pinhal de Marrocos, 3030 Coimbra, Portugal
{naml,xico,ernesto}@dei.uc.pt
2
Instituto Politécnico de Coimbra, ISEC, DEIS, Rua Pedro Nunes,
Quinta da Nora, 3030-199 Coimbra, Portugal

Abstract. Hyper-Heuristics (HH) is a ﬁeld of research that aims to

automatically discover effective and robust algorithmic strategies by
combining low-level components of existing methods and by defining
the appropriate settings. Standard HH frameworks usually comprise two
sequential stages: Learning is where promising strategies are discovered;
and Validation is the subsequent phase that consists in the application of
the best learned strategies to unseen optimization scenarios, thus assess-
ing its generalization ability.
Evolutionary Algorithms are commonly employed by the HH learning
step to evolve a set of candidate strategies. In this stage, the algorithm
relies on simple fitness criteria to estimate the optimization ability of
the evolved strategies. However, the adoption of such basic conditions
might compromise the accuracy of the evaluation and it raises the ques-
tion whether the HH framework is able to accurately identify the most
promising strategies learned by the evolutionary algorithm. We present
a detailed study to gain insight into the correlation between the opti-
mization behavior exhibited in the learning phase and the correspond-
ing performance in the validation step. In concrete, we investigate if the
most promising strategies identified during learning keep the good perfor-
mance when generalizing to unseen optimization scenarios. The analysis
of the results reveals that simple fitness criteria are accurate predictors
of the optimization ability of evolved strategies.

1 Introduction
Hyper-Heuristics (HH) is a field of research that aims to automatically discover
effective and robust optimization algorithms [1]. HH frameworks can generate
metaheuristics for a given computational problem either by selecting/ combin-
ing low level heuristics or by designing a new method based on components
of existing ones. In [1], Burke et al. have presented a detailed discussion of
these HH categories, complemented with several representative examples. HH
are commonly divided in two sequential stages: Learning is where the strate-
gies are automatically created, whilst, in Validation, the most promising learned
solutions are applied to unseen and more challenging scenarios.
Evolutionary Algorithms (EAs) are regularly applied as HH search engines
to learn effective algorithmic strategies for a given problem or class of related

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 226–237, 2015.
DOI: 10.1007/978-3-319-23485-4 23
The Optimization Ability of Evolved Strategies 227

problems. Each strategy created by the EA needs to be evaluated to estimate

its optimization ability. In concrete, every individual from the population is
applied to an instance of the problem under consideration and the quality of
the solutions found is used as an estimator of its optimization ability. To keep
the computational effort at a reasonable level, the evaluation step of the EA
search engine relies on small instances and simplified fitness criteria. However,
it is known that the training conditions impact the properties of the algorithms
being evolved [5–7,12] and it is not clear if the limited evaluation conditions
adopted by HH frameworks compromise the accurate identification of the best
optimization strategies. In this paper we address this question by investigating
if the fitness criteria used in learning provide enough information to identify the
most effective and robust strategies.
The study is performed with a well-known Grammatical Evolution (GE) [9] HH
framework, originally proposed in 2012 [14]. This computational model is able to
automatically generate complete Ant Colony Optimization (ACO) [2] algorithms
that can effectively solve different traveling salesperson problem (TSP) instances.
It allows for a flexible definition of the components and settings to be used by the
algorithm, as well as its general structure. Results presented in the aforementioned
reference confirm that this HH framework is able to evolve novel ACO architec-
tures, competitive with state-of-the-art human designed variants.
The analysis presented in this paper reveals that the most promising strate-
gies discovered in the learning phase tend to maintain the good performance in
the validation step. This outcome supports the adoption of simple and somehow
inaccurate fitness criteria in the learning phase, as this does not compromise the
ability of the HH framework to identify the most effective and robust optimiza-
tion strategies. Additionally, we investigate the existence of overfitting in the
learning phase, i.e., the over interpretation of relationships that only occur in
the learning data. Preliminary results suggest that learning is able to avoid the
overfitting of the data.
The paper is structured as follows: Section 2 describes the general properties
of the HH framework adopted in this work, whereas section 3 reviews the main
features of the ACO HH model used in the experiments. Section 4 contains the
experimental setup and presents an empirical study to assess the optimization
ability of the learned strategies. Finally, Section 5 gathers the main conclusions
and suggests some ideas for future work.

2 Hyper-Heuristics

HH is a recent area of research that addresses the construction of speciﬁc, high-

level, heuristic problem solvers, by searching the space of possible low-level
heuristics for the particular problem one wants to solve. HH can be divided
in two major groups [1]: the selection group comprises the search for the best
sequence of low-level heuristics, selected from a set of predeﬁned methods usually
applied to a speciﬁc problem; the other group includes methods that promote
the creation of new heuristics. In the later case, the HH iteratively learns a novel
228 N. Lourenço et al.

Fig. 1. Hyper-Heuristic Framework Architecture

algorithm which is then applied to solve the problem at hand. During this pro-
cess, the HH are usually guided by feedback obtained through the execution of
each candidate solution in simple instances of the problem under consideration.
Genetic Programming (GP), a branch of EAs, has been increasingly adopted
as the HH search engine to learn eﬀective algorithmic strategies [10]. In the
recent years, Grammatical Evolution (GE) [9], a linear form of GP, has received
increasing attention from the HH community since it allows for a straightforward
enforcement of semantic and syntactic restrictions, by means of a grammar.

2.1 Framework

In this work a two phase architecture is adopted (see Fig. 1). In the first phase,
Learning, a GE-based HH will construct algorithmic strategies. GE is a GP
branch that decouples the genotype from the phenotype by encoding solutions
as a linear string of integers. The evaluation of an individual requires the appli-
cation of a genotype-phenotype mapping that decodes the linear string into a
syntactically legal algorithmic strategy, by means of a grammar. GE grammars
are composed by a set of production rules written in the Backus-Naur format,
defining the general structure of the programs being evolved and also the com-
ponents that can be selected to build a given strategy (consult [9] for details
concerning GE algorithms).
The quality of a strategy generated by the GE should reflect its ability to
solve a given problem. During evolution, each GE solution is applied to a pre-
determined problem instance and its fitness corresponds to the quality of the
best solution found. Given this modus operandi, the GE evaluation step is a
computationally intensive task. To prevent the learning process from taking an
excessive amount of time, some simple evaluation conditions are usually defined:
i) one single and small problem instance is used to assign fitness; ii) only one
run is performed; iii) the number of iterations is kept low. Clearly, the adoption
of such simple conditions might compromise the results by hindering differences
between competing strategies, leading to an inaccurate assessment of the real
optimization ability of evolved solutions. The experiments described in section 4
aim to gain insight into this situation.
The Optimization Ability of Evolved Strategies 229

The second phase of the HH framework is Validation. The most promising

strategies (BAlg) identified in the previous step, i.e., those that obtained better
fitness in the learning task, are applied to unseen scenarios to confirm their
effectiveness and robustness.

2.2 Related Work

Recent works have shown that the conditions used in the learning phase influence
the structure of algorithmic strategies that are being evolved. In [13], Smit et al.
shows that using different performance measures like mean best fitness or success
rate may yield very different algorithmic strategies. Lourenço et al. [5] presented
a study on how a GE-based HH to evolve full-fledged EAs is affected by the
learning conditions used to evaluate the quality of the algorithm. More precisely,
they investigated how different population sizes and/or number of generations
influenced the components that were selected by the HH to build the EA. Later,
in [6] they presented a HH to learn selection strategies to EAs, and showed
that the levels of selective pressure would depend on the EA where the strategy
was inserted. In [7] Martin et al. evolved Black-Box Search Algorithms (BBSAs)
and showed that using multiple instances of the problem affect the algorithmic
structure of the strategies.
In [3], Eiben et al. present a discussion on how evolved algorithms should
be selected, and present robustness as being a key factor to determine the qual-
ity of algorithms. Robustness is related to performance and its variance across
some dimension. One of these dimensions is the range of problems (or problem
instances) that the algorithm can tackle. Based on this, they define two prop-
erties: fallibility which indicates that the algorithms can clearly fail on some
specific problems; applicability which indicates the range of problems that the
algorithm can successfully tackle. Note that the applicability depends on a cer-
tain performance threshold T. An algorithm is robust if it performs well across
several problems (high applicability), and if it has small performance variances.

3 Design of Ant Algorithms with Grammatical Evolution

The HH framework, originally proposed by Tavares et al. [14] to evolve full-

fledged ACO algorithms, will be used as the testbed for our experiments. Ant
Colony Optimization (ACO) algorithms are a set of population-based methods,
loosely inspired by the behaviour of ant foraging [2]. Following the original Ant
System (AS) algorithm proposed by Marco Dorigo in 1992, many other vari-
ants and extensions have been described in the literature. To help researchers
and practitioners to select and tailor the most appropriate variant to a given
problem, several automatic ACO design frameworks have been proposed in the
last few years [4,11,14]. The production set of the above mentioned framework
defines the general architecture of an ACO-like algorithm, comprising an ini-
tialisation step followed by an optimization cycle. The first stage initialises the
pheromone matrix and other settings of the algorithm. The main loop consists
230 N. Lourenço et al.

of the building of the solutions, pheromone trail update and daemon actions.
Each component contains several alternatives to implement a specific task. As an
example, the decision policy adopted by the ants to build a trail can be either the
random proportional rule used by AS methods or the q-selection pseudorandom
proportional rule introduced by the Ant Colony System (ACS) variant. If the
last option is selected, the GE engine also defines a specific value for the q-value
parameter. The grammar allows the replication of all main ACO algorithms,
such as AS, ACS, Elitist Ant System (EAS), Rank-based Ant System (RAS),
and Max-Min Ant System (MMAS). Additionally, it can generate novel combi-
nations of blocks and settings that define alternative ACO algorithms. Results
presented in [14] show that the GE-HH framework is able to learn original ACO
architectures, different from standard strategies. Moreover, results obtained in
validation instances reveal that the evolved strategies generalize well and are
competitive with human-designed variants (consult the aforementioned reference
for a detailed analysis of the results).

4 Experimental Analysis
Experiments described in this section aim to gain insight into the capacity of
the GE-based HH to identify the most promising solutions during the learning
step. In concrete, we determine the relation between the quality of strategies
as estimated by the GE and their optimization ability when applied to unseen
and harder scenarios. Such study will provide valuable information about the
capacity of the GE to build and identify strategies that are robust, i.e., highly
applicable and with small fallibility.
In practical terms, we take all strategies belonging to the last generation of
the GE and rank them by the fitness obtained in the learning evaluation instance.
Since the GE relies on a steady-state replacement method, the last generation
contains the best optimization strategies identified during the learning phase.
Then, these strategies are applied to unseen instances and ranked again based
on the new results achieved. The comparison of the ranks obtained in different
phases will provide relevant information in what concerns the generalization
ability of the evolved strategies.

Table 1. GE Learning Parameters: adapted from [14]

Runs 30
Population Size 64
Generations 40
Individual Size 25
Wrapping No
Crossover Operator One-Point with a 0.7 rate
Mutation Operator Integer-Flip with a 0.05 rate
Selection Tournament with size 3
Replacement Steady State
Learning Instances pr76, ts225
The Optimization Ability of Evolved Strategies 231

The GE settings used in the experiments are depicted in Table 1. The pop-
ulation size is set to 64 individuals, each one composed by 25 integer codons,
which is an upper bound on the number of production rules needed to generate
an ACO strategy using the grammar from [14]. As this grammar does not contain
recursive production rules, it is possible to determine the maximum number of
values needed to create a complete phenotype. Also, wrapping is not necessary
since the mapping process never goes beyond the end of the integer string.
We selected several TSP instances from the TSPLIB1 for the experimental
analysis. Two different instances were selected to learn the ACO strategies: pr76
and ts225 (the numerical values represent how many cities the instance has).
Each ACO algorithm encoded in a GE solution is executed once during 100
iterations. The fitness assigned to this strategy corresponds to the best solution
found. The strategies encode all the required settings to run the ACO algorithm,
with the exception of the colony size, which is set to 10% of the number of cities
(truncated the closest integer).
In what concerns the validation step, the best ACO strategies are applied
to four different TSP instances: lin105, pr136, pr226, lin318. In this phase, all
ACO algorithms are run for 30 times and the number of iterations is increased
to 5000. The size of the colony is the same (10% of the size of the instance being
optimised). Table 2 summarises the parameters used. In both phases, the results
are expressed as a normalised distance to the optimum.

Table 2. ACO Validation Parameters

Runs 30
Iterations 5000
Colony Size 10% of the Instance Size
Instances lin105, pr136, pr226, lin318

Fig. 2 displays the ranking distributions of the best ACO strategies learned
with the pr76 instance. The 4 panels correspond to the 4 different validation
instances. Each solution from the last GE generation is identified using an integer
from 1 to 64, displayed in the horizontal axis. These solutions are ranked by
the fitness obtained in training (solution 1 is the best strategy from the last
generation, whilst solution 64 is the worst). The vertical axis corresponds to the
position in the rank. Small circles highlight the learning rank and, given the
ordering of the solutions from the GE last generation, we see a perfect diagonal
in all panels. The small triangles identify the ranking of the solutions achieved
in the 4 validation tasks (one on each panel). Ideally, these rankings should
be identical to the ones obtained in training, i.e., the most promising solutions
identified by the GE would be those that generalize better to unseen instances.
An inspection of the results reveals an evident correlation between the behav-
ior of the strategies in both phases. An almost perfect line of triangles is visible in
1
http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/
232 N. Lourenço et al.

(a) lin105 (b) pr136

(c) pr226 (d) lin318

Fig. 2. Ranking distribution of the best ACO strategies discovered with the pr76 learn-
ing instance.

(a) lin105 (b) pr136

(c) pr226 (d) lin318

Fig. 3. Ranking distribution of the best ACO strategies discovered with the ts225
learning instance.
The Optimization Ability of Evolved Strategies 233

the 4 panels, confirming that the best strategies from training keep the good per-
formance in validation. This trend is visible across all the validation instances
and shows that, with the pr76 instance, training is accurately identifying the
more robust and effective ACO strategies.
Fig. 3 displays the ranking distributions of the best ACO strategies learned
with the ts225 instance. Although the general trend is maintained, a close inspec-
tion of the results reveals some interesting disagreements. The best ACO strate-
gies learned with ts225 tend to have a modest performance when applied to small
validation instances, such as lin105 and pr136. On the contrary, they behave well
on larger instances (see, e.g., the results obtained with the validation instance
from panel d)). This outcome confirms that the training conditions impact the
structure of the evolved algorithmic strategies, which is in agreement with other
findings reported in the literature [5,13]. The ts225 instance is considered a hard
TSP instance [8] and, given the results displayed in Fig. 3, it promotes the evo-
lution of ACO strategies particularly suited for TSP problems with a higher
number of cities. In the remainder of this section we present some additional
results that help gain insight into these findings.
To authenticate the correlation between learning and validation we com-
puted the Pearson correlation coefficient between the rankings obtained in each
phase. This coefficient ranges between -1 and 1, where -1 identifies a completely
negative correlation and 1 highlights a total correlation (the best strategies in
learning are the best in validation). The results obtained are presented in Table 3.
Columns contain instances used in learning, whilst rows correspond to validation
instances. The values from the table confirm that there is always a clearly posi-
tive correlation between the two phases, i.e., the quality obtained by a solution
in learning is an accurate estimator if its optimization ability. The lowest val-
ues of the Pearson coefficient are obtained by strategies learned with the ts225
instance and validated in small TSP problems, confirming the visual inspection
of Fig. 3. In this correlation analysis we adopted a significance level of α = 0.05.
All the p-values obtained were smaller than α, thus confirming the statistical
significance of the study.
To complement the analysis, we present in Fig. 4 the absolute performance
of the best learned ACO strategies in the 4 selected validation instances. Each
panel comprises one of the validation scenarios and contains a comparison
between the optimization performance of strategies evolved with different learn-
ing instances (black mean and error bars are from ACO strategies trained with
the pr76 instances, whilst the grey are from algorithms evolved with the ts225
instance). In general, for all panels and for strategies evolved with the two train-
ing instances, the deviation from the optimum increases with the training rank-
ing, confirming that the best algorithms from phase 1 are those that exhibit a
better optimization ability. However, the results reveal an interesting pattern
in what concerns the absolute behavior of the algorithms. For the smaller val-
idation instances (lin 105 and pr136 in panels a) and b)), the ACO strategies
evolved by the smaller learning instances achieve a better performance. On the
contrary, ACO algorithms learned with the ts225 instance are better equipped to
234 N. Lourenço et al.

handle the largest validation problem (lin318 in panel d)). This is another piece
of evidence that conﬁrms the impact of the training conditions on the structure
of the evolved solutions. A detailed analysis of the algorithmic structure reveals
that the pr76 training instance promotes the appearance of extremely greedy
ACO algorithms (e.g., they tend to have very low evaporation levels), partic-
ularly suited for the quick optimization of simple instances. On the contrary,
strategies evolved with the ts225 training instance strongly rely on full evapo-
ration, thus promoting the appearance of methods with increased exploration
ability, particularly suited for larger and harder TSP problems.

(a) lin105 (b) pr136

(c) pr226 (d) lin318

Fig. 4. MBF of the best evolved ACO strategies in the 4 validation instances. Black
symbols identify results from strategies learned with the pr76 instance and grey symbols
correspond to results from strategies obtained with the ts225 instance.

4.1 Measuring Overfitting

To complete our analysis we investigate the evolution of overﬁtting while learn-

ing ACO strategies. To estimate the occurrence of overﬁtting we selected one
additional instance for each training scenario, with the same size of the instance
used in training (eil76 and tsp225, respectively). In each GE generation, the
current best ACO strategy is applied to this new test instance and the quality
of the obtained solution is recorded (this value is never used for training).
The Optimization Ability of Evolved Strategies 235

Fig. 5 and 6 present the evolution of the Mean Best Fitness (MBF) during
the learning phase, respectively for the pr76 and ts225 instances. Both figures
contain two panels: panel a) exhibits the evolution of the MBF measured by
the learning instance, which corresponds to the value used to guide the GE
exploration; panel b) displays the MBF obtained with the testing instance and
it is only used to detect overfitting.
The results depicted in panels 5a and panel 6a show that the HH framework
gradually learns better strategies. A brief perusal of the MBF evolution reveals
a rapid decrease in the first generations, followed by a slower convergence. This
is explained by the fact that in the beginning of the evolutionary process the
GE combines different components provided by the grammar to build a robust
strategy, whilst at the end it tries to fine-tune the numeric parameters. The
search for a meaningful combination of components has a stronger impact on
fitness than modifying numeric values.
Overfitting occurs when the fitness of the learning strategies keeps improving,
whilst it deteriorates in testing. Panels 5b and 6b show the MBF for the testing
step. An inspection of the results shows that it tends to decrease throughout the
evolutionary run. This shows that the strategies being evolved are not becoming
overspecialized, i.e., they maintain the ability to solve instances different from
the ones used in training.

(a) Learning Fitness (b) Testing Fitness

Fig. 5. Evolution of the MBF for the pr76 learning instance and the corresponding
eil76 testing instance.

Table 3. Pearson correlation coeﬃcients

pr76 ts225
lin105 0.98 0.81
pr136 0.90 0.90
pr226 0.97 0.95
lin318 0.95 0.98
236 N. Lourenço et al.

(a) Learning Fitness (b) Testing Fitness

Fig. 6. Evolution of the MBF for the ts225 learning instance and the corresponding
tsp225 testing instance.

5 Conclusions
HH is an area of research that aims to automate the design of algorithmic
strategies by combining low-level components of existing methods. Most of the
HH frameworks are divided two phases. The first phase, Learning, is where the
strategies are built and evaluated. Afterwards, the robustness of the best solu-
tions is validated in unseen scenarios. Usually, researchers select the best learned
strategies based only simple and somehow inaccurate criteria. Given this situ-
ation, there is the risk of failing the identification of the most effective learned
algorithmic strategies.
In this work we studied the correlation between the quality exhibited by
strategies during learning and their effective optimization ability when applied
to unseen scenarios. We relied on an existing GE-based HH to evolve full-fledged
ACO algorithms to perform the analysis. Results revealed a clear correlation
between the quality exhibited by the strategies in both phases. As a rule, the
most promising algorithms identified in learning generalize better to unseen val-
idation instances. This study provides valuable guidelines for HH practitioners,
as it suggests that the limited training conditions do not seriously compromise
the identification of the algorithmic strategies with the best optimization abil-
ity. The outcomes also confirmed the impact of the training conditions on the
structure of the evolved solutions. Training with small instances promotes the
appearance of greedy optimization strategies particularly suited for simple prob-
lems, whereas larger (and harder) training cases favor algorithmic solutions that
excel in more complicated scenarios. Finally, a preliminary investigation revealed
that training seems to be overfitting free, i.e., the strategies being learned are
not becoming overspecialized to the specific instance used in the evaluation.
There are several possible extension to this work. In the near future we aim
to validate the correlation study in alternative training evaluations conditions.
Also, a complete understanding of overfitting is still in progress and we will
extend this analysis to a wider range of scenarios (e.g., the size, structure and
The Optimization Ability of Evolved Strategies 237

the number of the instances used for testing might inﬂuence the results). Finally,
we will investigate if the main results hold for diﬀerent HH frameworks.

Acknowledgments. This work was partially supported by Fundação para a Ciência

e Tecnologia (FCT), Portugal, under the grant SFRH/BD/79649/2011.

References
1. Burke, E.K., Gendreau, M., Hyde, M., Kendall, G., Ochoa, G., Özcan, E., Qu, R.:
Hyper-heuristics: A survey of the state of the art. Journal of the Operational
Research Society 64(12), 1695–1724 (2013)
2. Dorigo, M., Stützle, T.: Ant Colony Optimization. Bradford Company, Scituate
(2004)
3. Eiben, A., Smit, S.: Parameter tuning for configuring and analyzing evolutionary
algorithms. Swarm and Evolutionary Computation 1(1), 19–31 (2011)
4. López-Ibáñez, M., Stützle, T.: Automatic configuration of multi-objective
ACO algorithms. In: Dorigo, M., Birattari, M., Di Caro, G.A., Doursat, R.,
Engelbrecht, A.P., Floreano, D., Gambardella, L.M., Groß, R., Şahin, E.,
Sayama, H., Stützle, T. (eds.) ANTS 2010. LNCS, vol. 6234, pp. 95–106. Springer,
Heidelberg (2010)
5. Lourenço, N., Pereira, F.B., Costa, E.: The importance of the learning conditions
in hyper-heuristics. In: Proceedings of the 15th Annual Conference on Genetic and
Evolutionary Computation, GECCO 2013, pp. 1525–1532 (2013)
6. Lourenço, N., Pereira, F., Costa, E.: Learning selection strategies for evolutionary
algorithms. In: Legrand, P., Corsini, M.-M., Hao, J.-K., Monmarché, N., Lutton, E.,
Schoenauer, M. (eds.) EA 2013. LNCS, vol. 8752, pp. 197–208. Springer, Heidelberg
(2014)
7. Martin, M.A., Tauritz, D.R.: A problem configuration study of the robustness of
a black-box search algorithm hyper-heuristic. In: Proceedings of the 2014 Confer-
ence Companion on Genetic and Evolutionary Computation Companion, GECCO
Comp. 2014, pp. 1389–1396 (2014)
8. Merz, P., Freisleben, B.: Memetic algorithms for the traveling salesman problem.
Complex Systems 13(4), 297–346 (2001)
9. O’Neill, M., Ryan, C.: Grammatical evolution: evolutionary automatic program-
ming in an arbitrary language, vol. 4. Springer Science (2003)
10. Pappa, G.L., Freitas, A.: Automating the Design of Data Mining Algorithms:
An Evolutionary Computation Approach, 1st edn. Springer Publishing Company,
Incorporated (2009)
11. Runka, A.: Evolving an edge selection formula for ant colony optimization. In:
Proceedings of the GECCO 2009, pp. 1075–1082 (2009)
12. de Sá, A.G.C., Pappa, G.L.: Towards a method for automatically evolving bayesian
network classifiers. In: Proceedings of the 15th Annual Conference Companion on
Genetic and Evolutionary Computation, pp. 1505–1512. ACM (2013)
13. Smit, S.K., Eiben, A.E.: Beating the world champion evolutionary algorithm
via revac tuning. In: IEEE Congress on Evolutionary Computation (CEC) 2010,
pp. 1–8. IEEE (2010)
14. Tavares, J., Pereira, F.B.: Automatic design of ant algorithms with grammatical
evolution. In: Moraglio, A., Silva, S., Krawiec, K., Machado, P., Cotta, C. (eds.)
EuroGP 2012. LNCS, vol. 7244, pp. 206–217. Springer, Heidelberg (2012)
Evolution of a Metaheuristic for Aggregating
Wisdom from Artificial Crowds

Christopher J. Lowrance(B) , Omar Abdelwahab, and Roman V. Yampolskiy

Department of Computer Engineering and Computer Science,

University of Louisville, Louisville, KY 40292, USA
{chris.lowrance,omar.abdelwahab,roman.yampolskiy}@louisville.edu
https://louisville.edu/speed/computer

Abstract. Approximation algorithms are often employed on hard opti-

mization problems due to the vastness of the search spaces. Many approx-
imation methods, such as evolutionary search, are often indeterminate
and tend to converge to solutions that vary with each search attempt.
If multiple search instances are executed, then the wisdom among the
crowd of stochastic outcomes can be exploited by aggregating them to
form a new solution that surpasses any individual result. Wisdom of arti-
ficial crowds (WoAC), which is inspired by the wisdom of crowds phe-
nomenon, is a post-processing metaheuristic that performs this function.
The aggregation method of WoAC is instrumental in producing results
that consistently outperform the best individual. This paper extends
the contributions of existing work on WoAC by investigating the perfor-
mance of several aggregation methods. Specifically, existing and newly
proposed WoAC aggregation methods are used to synthesize parallel
genetic algorithm (GA) searches on a series of traveling salesman prob-
lems (TSPs), and the performance of each approach is compared. Our
proposed method of weighting the input of crowd members and incre-
mentally increasing the crowd size is shown to improve the chances of
finding a solution that is superior to the best individual solution by 51%
when compared to previous methods.

Keywords: Wisdom of crowds · Evolutionary combinatorial

optimization · Aggregation metaheuristic · Parallel search aggregation

1 Introduction
Computing the optimal solution using an exhaustive search becomes intractable
as the size of the problem grows for computationally hard (NP-hard) prob-
lems [1]. Consequently, heuristics and stochastic search algorithms are commonly
used in an effort to find reasonable approximations to difficult problems in a poly-
nomial time [1,2]. These approximation algorithms are often incomplete and can
produce indeterminate results that vary when repeated on larger search spaces
[3,4]. The variance produced by these types of searches, assuming several search
attempts have been made, can be exploited in a collaborative effort to form better

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 238–249, 2015.
DOI: 10.1007/978-3-319-23485-4 24
Evolution of a Metaheuristic for Aggregating Wisdom from Artificial Crowds 239

approximations to diﬃcult problems. The post-processing metaheuristic known

as the Wisdom of Artificial Crowds (WoAC) is based on this concept [1,5]. In
essence, the WoAC algorithm operates on a group of indeterminate search out-
comes from the same problem space, and then aggregates them with the goal of
forming a solution that is superior to any individual outcome within the group.
It is inspired by a more widely known sociological phenomenon referred to as
the Wisdom of Crowds (WoC) [6], and likewise, seeks to perform reasoning and
aggregation based on commonality observed within a group of converged and
best-fit search outcomes.
WoAC operates on a pool of converged search results that can be obtained via
any indeterminate search method. Evolutionary algorithms, such as the genetic
algorithm (GA) used in this paper, are well-suited for WoAC because they are
easily parallelizable [3]. Multiple search instances can be conducted in paral-
lel, which can speed up the process of generating a pool of best-fit candidate
approximations.
The means of aggregating the group of best-fit solutions is the essence of
WoAC, and this process is critical to the goal of producing a solution that is
paramount to any individual contributor within the crowd of possible solutions.
As a result, this paper focuses on the evaluating several means of WoAC aggre-
gation. The primary contribution of this paper includes the introduction and
evaluation of new methods for performing WoAC aggregation. We show that
our proposed aggregation algorithm improves the consistency of forming a supe-
rior solution when compared to existing methods. Our findings also highlight
some key factors that were previously unreported and that heavily influence the
performance of WoAC. These factors include the importance of weighting the
crowd members based on fitness and iteratively attempting aggregation after
each new member has been added to the crowd.
This paper utilizes the well-known TSP as the combinatorial optimization
problem for evaluating the aggregation components of the WoAC algorithm. As
our search method, we employ a GA for generating pools of possible TSP solu-
tions. We selected these options because of their familiarity among the research
community, and as a result, abstract details about the fundamentals of the TSP
and the GA. Instead, we focus our attention on the WoAC algorithm and note
that the metaheuristic is applicable to a wide range of optimization problems
and indeterminate search techniques other than the TSP and GA, respectively.
The remainder of this paper is organized as follows. Section 2 reviews the
related research involving aggregation methods for WoC and WoAC. Subse-
quently, in Section 3, we review the existing WoAC algorithm, and also, provide
the details on the newly proposed means of aggregation. Afterwards, we evaluate
the experimental results in Section 4, and finally, we provide our conclusions and
future work in Section 5.

2 Related Work
The concept of applying WoC to the TSP has shown promising results in several
works. Yi and Dry aggregated human-generated TSP responses and demonstrated
240 C.J. Lowrance et al.

that it is possible to generate an aggregate response that is superior to any individ-

ual [7]. This concept was extended to computer-generated TSP approximations
by Yampolskiy and El-Barkouky [1] and was coined WoAC. In that study, 90% of
the 20 contributors must agree on a TSP connection before it is kept as part of
the aggregate. In another study [5], Yampolskiy, Ashby, and Hassan remove the
90% agreement stipulation and have the group of 10 GA outcomes vote with equal
weights similar to [7].
Others researchers applied the concept of WoC with varying levels of suc-
cess to other applications [8–12], but none of these studies compared differ-
ent methods of aggregation, nor weighted the input of crowd members. Velic,
Grzinic, and Padavic applied the WoC concept in a stock market prediction
algorithm [12]. The algorithm at times produced results reflective of the group-
think phenomenon, where less knowledgeable contributors negate the influence
of highly experienced experts and negatively impact the outcome. In another
study, Moore and Clayton tested the effect of WoC in detecting phishing web-
sites and found that inexperienced users, who frequently made mistakes, often
voted similarly [11]. Kittur and Kraut leveraged WoC in managing Wikipedia
content and showed that coordination between writers on Wikipedia is vital
for content quality, especially as the crowd size grows [10]. Yu et al. proposed a
WoC-based algorithm for traffic route planning and used a non-Markovian aggre-
gation tree for fusing the results from route planning agents [9]. In [8], Hoshen,
Ben-Artzi, and Peleg proposed a means to combine multiple video streams into
an improved video using the WoC concept. Lastly, WoAC has also been applied
to a number of computer games [13–16].
Based on research using human groups, Wagner and Suh concluded that
the size of the crowd influences the performance of WoC, and they found that
improvement saturates as the crowd size grows beyond a certain size [17]. We
suspect that crowd size also affects the performance of artificial crowds. Hence,
this dynamic is investigated by our work, unlike the aforementioned studies.
Another dynamic that affects WoAC performance is the weight given to indi-
vidual contributors. The related works of this section generally used an aggrega-
tion method that provided equal weight to each crowd member. When the worth
of a contribution from an agent cannot be evaluated, then providing equal weight
might be the only prudent option. However, in most combinatorial optimization
problems, search results can be evaluated according to their performance with
respect to other potential solutions. Giving more weight to better-performed
members may be an important aspect to avoiding common mistakes (i.e. subop-
timal choices) taken by several members of the crowd. In other words, problems,
similar to the groupthink phenomenon observed in [12] and the common mistakes
observed in [11], could be mitigated by favoring better performers and suppress-
ing less fit contributors. The concept of merging multiple hypotheses using weight
assignments has also been explored in ensemble learning. For instance, Puuronen
et al. assigned different weights to the outputs of component classifiers based on
their predicted classification error, and used a stacked (i.e. second-level) clas-
sifier to determine the final output based on the weighted votes received from
Evolution of a Metaheuristic for Aggregating Wisdom from Artificial Crowds 241

the component classiﬁers [17]. However, the weighting techniques explored in

this paper are generated differently than in ensemble learning, and we apply the
concept in the application domain of optimization, not supervised learning.
A final factor that likely impacts the performance of WoC, but was not inves-
tigated in the previous works, is the level of difficulty involved in the problem.
As noted in a study using a group of human responses to various questions [18],
the performance of WoC suffers as the task difficulty increases. We posit that
the same holds true for artificial crowds and investigate this issue more closely
in Section 4 and show how weighting opinions can mitigate this issue.

3 WoAC Algorithm Description

The original WoAC algorithm proposed in [5,7] constructs new approxima-
tions to optimization problems based on the trends and commonality observed
among the pool of candidate solutions. The identification of popular search selec-
tions (i.e. edge connections) is accomplished through a histogram matrix, which
reflects the frequency of every edge option. In graph-based problems, the algo-
rithm consists of scanning each individual graph, and then, recording its specific
edge selections by updating the frequency counts stored in the matrix.
Similarly, we use an n x n matrix that serves as a histogram for recording the
frequency of TSP edge occurrences witnessed while examining each contributing
proposal within the crowd; the variable, n, corresponds to the number of nodes
in the TSP. Each position in the matrix (i.e. row and column combination)
corresponds to a possible edge connection between nodes. The initial step of
the WoAC algorithm is to review every search result within the crowd pool and
update the histogram accordingly based on the observed edge connections of each
solution. Once every search result is reviewed and the histogram matrix is fully
constructed, the positions with higher values correspond to the more common
selections made by the crowd members; hence, based on the WoC principle, such
node connections tend to be wise choices that should be a part of the new route.
After the histogram is constructed, the WoAC algorithm then proceeds to
build the new solution. For the TSP, this is accomplished by choosing a starting
node and then searching the histogram’s entire row or column corresponding to
this particular node. The matrix position that has the highest value is selected
as the adjacent node. This process is repeated for every adjacent node until the
Hamiltonian cycle is complete. If a node is already in the newly constructed path,
then the next highest occurrence is selected and so on. The objective function
(i.e. spatial information) is only referenced if all options in the histogram have
been exhausted, meaning that the crowd’s preferred choices have already been
selected as part of the newly constructed graph. In this case, a greedy heuristic
is used to find the nearest node as the next destination based on the objective
function. Finally, every node is attempted as the starting node and the route
yielding the lowest cost is chosen as the WoAC solution.
242 C.J. Lowrance et al.

3.1 Modifications to the WoAC Algorithm

We propose extending the aforementioned WoAC algorithm and incorporating

two primary modifications. The first modification deals with weighting the sug-
gestions from contributors based on the range of fitness observed in the crowd
pool, instead of treating them equally as previously described. The new weight-
ing process can easily be accomplished with minimal processing because the
fitness scores of the candidates are already known from the convergence of each
preliminary search. We propose two new types of weighted aggregation meth-
ods and later compare them to the equal weight distribution method of [5,7].
In order to support these new means of weighting contributions, we modify the
WoAC algorithm to maintain three separate histograms: one for each method of
weighting.
The other alteration we propose to WoAC is to vary the crowd size by incre-
mentally building and evaluating the weighted histograms. In other words, the
modified WoAC algorithm would review the edge selections of an individual solu-
tion, then update each weighted histogram based on its fitness score with respect
to the group, and afterwards, iterate through the WoAC aggregation process.
This process is repeated for every member in the crowd pool. In contrast, the
previous version of WoAC selects some arbitrary crowd size a priori, and then,
complies their selection choices in a single histogram matrix before running the
final aggregation process only once. Instead, we effectively vary the crowd size
and build a new WoAC aggregate after every individual member is added to each
histogram matrices. Once all crowd members have been considered, the lowest
cost solution from all WoAC aggregation attempts would be propagated as the
algorithm output.

Percentage Weight. The ﬁrst considered aggregation alternative to the equal

weight method used in [5,7] is a simple percentage weight. The new method
assigns a weight to each edge selection of a contributor based on its objective
function ranking among its peers in the crowd pool. Assuming that minimal cost
is desirable, we can represent the ﬁtness of each candidate using

c = {c1 , c2 , ..., cn } (1)

where c is the cost vector (i.e. array) that contains the ﬁtness associated with
each individual crowd member and n represents the total number of crowd mem-
bers. Given the range of costs associated with the candidates, we can formulate
a cost-distance using
ci − min(c)
di = (2)
max(c) − min(c)
where di is the cost-distance ratio for the i th member of the crowd. This calcu-
lation provides a ratio or a percentage of how far away a candidate’s solution
is from the best agent in the crowd with respect to the worst. This metric
varies from 0-1, and as individual ﬁtness scores approach the best, the metric
Evolution of a Metaheuristic for Aggregating Wisdom from Artificial Crowds 243

approaches zero. Finally, this ratio is transformed into a weight that is with
respect to the candidate’s proximity to the best agent by

wpi = 1 − di (3)

where w pi is the percentage weight assigned to the i th candidate’s edge con-

tributions, which are stored in the histogram dedicated for percentage weight.
Using this approach, the best GA candidate solution among the crowd is given
the full weight of one, while the worst candidate is ignored by assigning it zero.

Exponential Weight. In order to provide more weight to better-performed

candidates and more rapidly diminish the contributions of those less favorable
solutions, an exponential weighting algorithm for WoAC was also investigated.
Specifically, the weight of individual contributions was based on the exponential
function
wei = e−xi (4)
where w ei is the exponential weight of the i th candidate in the crowd of sug-
gested solutions and x i is a constant that is associated with d i above, which is
a measure of the candidate’s distance from the best option in the crowd. Specif-
ically, the constant x i is obtained by multiplying d i by another constant, m,
which facilitates the mapping of d i to a range of values suitable for generat-
ing a weighting range from 0-1 using the exponential function. In the following
equation,
xi = di ∗ m (5)
m is the constant that maps the ratio, d i , to a range of values between 0 to m.
Based on (5), the best option within the crowd would be assigned 0 because d i
would be 0, and the worst-fit candidate would be assigned m because d i would be
1. Therefore, the weight assigned to an individual solution’s contribution to the
WoAC aggregate, as described in (4), would be a maximum of 1, while options
with a higher cost (i.e. less fit) are assigned a weight less than 1. In this paper,
we let m = 5, as e −5 is approximately zero, yielding a weighting factor that
ranges from 1-0, with exponential decay as the candidate solution moves away
from the top candidate.

Pseudocode for the Modified WoAC

BEGIN
1 C1,C2,...,Cn = gather_crowd(n) //Perform n independent searches
2 best_performer = MAX(C1,C2,...,Cn) //Most-fit crowd member
3 worst_performer = MIN(C1,C2,...,Cn) //Least-fit crowd member
4 FOR i = 1 to n // Iterate through all crowd members
5 d = calc_dist_ratio(C(i)), best_performer,worst_performer)
// see eqn. (2)
244 C.J. Lowrance et al.

6 p(i) = calc_percent_weight(d) // see eqn. (3)

7 e(i) = calc_exp_weight(d) // see eqns. (4) & (5)
8 FOR j = 1 to edge_count // Iterate through entire graph
9 row,column = identify_connected_vertices(C(i),j)
// given the current edge in cycle i,
// return the connected vertices
10 update_equal_histogram(row,column)
// update matrix position corresponding to
// connected vertices with: cnt = cnt + 1
11 update_percent_histogram(row,column,p(i))
// update matrix position corresponding to
// connected vertices with: cnt = cnt + p(i)
12 update_exp_histogram(row,column,e(i))
// update matrix position corresponding to
// connected vertices with: cnt = cnt + e(i)
13 ENDFOR
14 equal_graph = build_graph(equal_histogram)
15 percent_graph = build_graph(percent_histogram)
16 exp_graph = build_graph(exp_histogram)
17 current_best = MIN(equal_graph,percent_graph,exp_graph)
// find the least expense cycle from the 3 options
18 IF (current_best < overall_best) then
overall_best = current_best
// if new WoAC cycle is better than all previous,
// then store it
19 ENDIF
20 ENDFOR
END

4 Experimental Evaluation
4.1 Evaluation Overview
The modified WoAC algorithm was repetitively tested to evaluate its perfor-
mance. Every trial run of the algorithm consisted of the following two-step pro-
cess. First, a crowd (i.e. pool) of approximations were generated to a specific
TSP by instantiating 30 GA searches. After the searches converged, the post-
processing metaheuristic was executed according to the procedure outlined in
Section 3. The best aggregate solutions of each method, as well as their respec-
tive crowd sizes, were logged for statistical purposes.
Before reviewing the evaluation statistics, we will provide some preliminary
information about the testing environment. The evaluation process was repeated
on four different TSP datasets, and a total of 100 trials were executed on each.
The TSP datasets were randomly generated using Concorde [19], and the sizes
and optimal (i.e. best-known) costs of each are displayed in Table 1. The optimal
costs were obtained using the TSP solver in Concorde.
Evolution of a Metaheuristic for Aggregating Wisdom from Artificial Crowds 245

Table 1. Sizes and Optimal Costs of the TSP Datasets

Num. of Nodes Best-known Cost

44 549
77 707
97 794
122 868

The parameters of the GA used on the TSP datasets are outlined in Table 2.
A total of 30 parallel instances of the GA were allowed to search and converge
before executing the WoAC algorithm. The GA generally produced crowd mem-
bers (i.e. candidate approximations) that were near, but slightly suboptimal to
the costs generated by Concorde. Therefore, there was opportunity for the WoAC
algorithm to improve upon the pool of candidate solutions and aggregate them
to form a new solution closer to the best-known optimum.

Table 2. Parameters of the Genetic Algorithm

GA Parameter Setting
Population Size 20
Parent Selection Fitness Tournament (uniformly random among top 5)
Crossover Operator Single-point (uniformly random)
Mutation Operator Combination of Two Mutation Steps:
1. Uniformly random 1% mutation
2. Greedy custom - adjacent node swap until improved

As an illustrative example of the evaluation procedure, Fig. 1 shows the evo-

lutionary development of a crowd of GAs that worked on the 44 node TSP. After
the convergence of all 30 search instances, the WoAC algorithm was initiated.
The percentage and exponential weights assigned to the crowd members as part
of the modified WoAC are also shown in Fig. 1. Together the plots indicate
that the crowd had some diversity in opinions (i.e. edge selections), which is an
important aspect in attempting to recombine these opinions into a unique aggre-
gate. Some level of diversity is important, but the original WoAC algorithm is
fundamentally based on making decisions using group consensus. However, the
newly proposed weighting concept skews this fundamental bias towards group
decision making, and instead, gives greater consideration to those contributors
known to be wiser. The goal of the evaluation was to distinguish between these
different approaches and to identify the most effective means of aggregating the
crowd members to form superior solutions.

4.2 Evaluation Analysis

First, we will focus on the impact that weighting crowd members had on improv-
ing the chances of surpassing the best GA. The success rates of the diﬀerent
246 C.J. Lowrance et al.

Fig. 1. The plot on the left shows the convergence of 30 independent instances of the
same GA searching for the optimum tour on a TSP. The plot on the right corresponds to
the exponential (dotted ) and percentage (square) weights assigned to the 30 converged
GA outcomes as part of the modified WoAC algorithm.

Table 3. Success Rate of the Modified WoAC Aggregation Methods

Percent Success in Surpassing Best GA

Dataset Equal Percentage Exponential Combined
TSP 44 22% 36% 36% 51%
TSP 77 52% 60% 48% 77%
TSP 97 4% 9% 32% 42%
TSP 122 5% 7% 33% 41%
All (Mean) 20.8% 28% 37.3% 52.8%

weighting techniques are summarized in Table 3. The percentages are based

on the number of times the aggregation methods surpassed the best GA search,
given that 100 evaluations were performed on each TSP dataset. From the table,
it is evident that the exponential weighting method outperformed the other
methods more consistently. On the other hand, the reliability of the equal- and
percentage-weight techniques are shown to decrease as the TSPs became larger
(i.e. more challenging). By comparing Tables 1 and 4, the GA crowd pool for
the larger datasets are farther away from the global optimum and the pools
are more spread apart (i.e. diverse). Therefore, the crowd was generally noisier
(i.e. possessed less consensus) in these cases, which appeared to cause problems
for the equal- and percentage-weighting schemes. These methods do not appear
to provide a strong enough distinction between the candidates’ performances
in these noisy environments. In contrast, the exponential technique favors the
wiser contributors more than the others, while also diminishing the opinions of
the more inferior candidates. Overall, the results show that biasing the input
of stronger contributors is critical to improving the success rate of the WoAC
algorithm.
Evolution of a Metaheuristic for Aggregating Wisdom from Artificial Crowds 247

Table 4. Mean Costs (μ) and Standard Deviations (σ ) of Crowd Members and WoAC
Aggregates

Dataset Crowd of GAs Best GA Equal Percentage Exponential

μ σ μ σ μ σ μ σ μ σ
TSP 44 613.3 26.5 566.4 7.3 576.3 12.6 570.8 10.9 565.1 9.1
TSP 77 827.2 37.2 761.6 12.5 760.3 19.5 755.5 18.3 757.8 16.4
TSP 97 944.0 41.0 868.6 14.8 915.0 28.7 895.8 23.9 873.9 22.0
TSP 122 1044.9 45.5 963.2 13.7 1007.8 30.2 994.3 24.0 969.5 21.8

The other dynamic investigated during the evaluation of the modified WoAC
algorithm was the concept of varying the crowd size and incrementally adding
new opinions (i.e. approximations) to the crowd one-at-a-time. In the original
WoAC algorithm [1,5], a fixed crowd size was determined a priori and all opin-
ions were aggregated only once after considering the votes from all contributors.
The success rate for the equal-weight technique in Table 3 was based on varying
the crowd size; however, if the crowd size was not varied and fixed at 30, then
the mean success rate of the equal-weight technique would drop to 1.8%, given
the results from all the TSP trials (i.e. 400 experiments). To better visualize the
impact of crowd size, Fig. 2 shows a histogram of the number of members in
the crowd when the equal-weight technique successfully surpassed the best GA
during the 400 trials. It indicates that the number of opinions needed to outper-
form the best GA is unpredictable and should not be fixed a priori ; therefore,
varying the crowd size is effective at mitigating this challenge and improving the
success rate of the metaheuristic.

Histogram Showing the Effect of Crowd Size

7
Number of Occurrences

0
0 5 10 15 20 25 30
Crowd Size When Equal−Weight Method Surpassed Best GA

Fig. 2. The crowd size at the time when the equal-weight technique surpassed the best
GA. This statistic, which is based on 400 trials from all TSP datasets, is plotted as a
histogram.
248 C.J. Lowrance et al.

In summary, the experimental results show that weighting the opinions of

crowd members and varying the crowd size are vital to the success of WoAC.
For instance, if we consider that the original configuration of the algorithm (i.e.
fixed crowd size and equal-weight) succeeded 1.8% of the time, and that the
combined success rate of the modified WoAC (i.e. all weighting techniques and
variable crowd size) was 52.8% (see Table 3), then the new approach improved
the algorithm’s success rate by 51%.

4.3 Conclusion and Future Work

This paper investigated ways of improving the WoAC metaheuristic, which

aggregates collective searches in an attempt to form superior approximations to
computationally-hard problems. Specifically, we explored the heuristic’s means
of aggregation and discovered the critical factors that influence its performance.
As a result, we proposed two beneficial modifications to the algorithm, which
significantly improved its performance in surpassing the best candidate within
the crowd. These modifications include iteratively adjusting the crowd size and
intelligently weighting the input of the crowd members based on their fitness
within the pool.
The crowds (i.e. converged searches) generated in this paper were collected
using several parallel instances of the same genetic algorithm (GA). In the
future, we are interested in using non-uniform GAs as part of the crowd gather-
ing process. Such an approach could introduce more diversified opinions within
the crowd and provide the potential for increased performance. However, with
increased diversity, we suspect that assigning weights to the better-performed
outcomes would become even more important in order to dampen the noise in
the crowd caused by less efficient contributors. We are also interested in applying
WoAC to other computationally hard problems and comparing its performance
to other metaheuristics.

References
1. Yampolskiy, R.V., El-Barkouky, A.: Wisdom of artificial crowds algorithm for solv-
ing NP-hard problems. International Journal of Bio-Inspired Computation 3(6),
358–369 (2011)
2. Collet, P., Rennard, J.-P.: Stochastic optimization algorithms (2007). arXiv
preprint arXiv:0704.3780
3. Hoos, H.H., Sttzle, T.: Stochastic search algorithms, vol. 156. Springer (2007)
4. Kautz, H.A., Sabharwal, A., Selman, B.: Incomplete Algorithms. Handbook of
Satisfiability 185, 185–204 (2009)
5. Yampolskiy, R.V., Ashby, L., Hassan, L.: Wisdom of Artificial Crowds - A Meta-
heuristic Algorithm for Optimization. Journal of Intelligent Learning Systems and
Applications 4, 98 (2012)
6. Surowiecki, J.: The wisdom of crowds. Random House LLC (2005)
7. Yi, S.K.M., Steyvers, M., Lee, M.D., Dry, M.: Wisdom of the Crowds in Traveling
Salesman Problems. Memory and Cognition 39, 914–992 (2011)
Evolution of a Metaheuristic for Aggregating Wisdom from Artificial Crowds 249

8. Hoshen, Y., Ben-Artzi, G., Peleg, S.: Wisdom of the crowd in egocentric video
curation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition
Workshops (CVPRW), pp. 587–593, June 23–28, 2014
9. Jiangbo, Y., Kian Hsiang, L., Oran, A., Jaillet, P.: Hierarchical Bayesian nonpara-
metric approach to modeling and learning the wisdom of crowds of urban traf-
fic route planning agents. In: 2012 IEEE/WIC/ACM International Conferences
on Web Intelligence and Intelligent Agent Technology (WI-IAT), pp. 478–485,
December 4–7, 2012
10. Kittur, A., Kraut, R.E.: Harnessing the wisdom of crowds in wikipedia: quality
through coordination. Paper presented at the Proceedings of the 2008 ACM con-
ference on Computer supported cooperative work, San Diego, CA, USA
11. Moore, T., Clayton, R.C.: Evaluating the wisdom of crowds in assessing phishing
websites. In: Tsudik, G. (ed.) FC 2008. LNCS, vol. 5143, pp. 16–30. Springer,
Heidelberg (2008)
12. Velic, M., Grzinic, T., Padavic, I.: Wisdom of crowds algorithm for stock mar-
ket predictions. In: Proceedings of the International Conference on Information
Technology Interfaces, ITI, pp. 137–144 (2013)
13. Ashby, L.H., Yampolskiy, R.V.: Genetic algorithm and wisdom of artificial crowds
algorithm applied to light up. In: 2011 16th International Conference on Computer
Games (CGAMES), pp. 27–32, July 27–30, 2011
14. Hughes, R., Yampolskiy, R.V.: Solving Sudoku Puzzles with Wisdom of Artificial
Crowds. International Journal of Intelligent Games and Simulation 7(1), 6 (2013)
15. Khalifa, A.B., Yampolskiy, R.V.: GA with Wisdom of Artificial Crowds for Solving
Mastermind Satisfiability Problem. International Journal of Intelligent Games and
Simulation 6(2), 6 (2011)
16. Port, A.C., Yampolskiy, R.V.: Using a GA and wisdom of artificial crowds to solve
solitaire battleship puzzles. In: 2012 17th International Conference on Computer
Games (CGAMES), pp. 25–29, July 30, 2012-August 1, 2012
17. Puuronen, S., Terziyan, V., Tsymbal, A.: A dynamic integration algorithm for an
ensemble of classifiers. In: Ra, Z., Skowron, A. (eds.) Foundations of Intelligent
Systems. Lecture Notes in Computer Science, vol. 1609, pp. 592–600. Springer,
Berlin Heidelberg (1999)
18. Wagner, C., Ayoung, S.: The wisdom of crowds: impact of collective size and exper-
tise transfer on collective performance. In: 2014 47th Hawaii International Confer-
ence on System Sciences (HICSS), pp. 594–603, January 6–9, 2014
19. Concorde TSP Solver. http://www.math.uwaterloo.ca/tsp/concorde/index.html
The Influence of Topology in Coordinating
Collective Decision-Making in Bio-hybrid
Societies

Rob Mills(B) and Luı́s Correia

BioISI – Biosystems and Integrative Sciences Institute,

Faculty of Sciences, University of Lisbon, Lisbon, Portugal
[email protected]

Abstract. Collective behaviours are widespread across the animal king-

dom, many of which result from self-organised processes, making it diffi-
cult to understand the individual behaviours that give rise to such results.
One method to improve our understanding is to develop bio-hybrid soci-
eties, in which robots and animals interact, combining elements whose
behaviours are under our control (robots) with elements that are not
(animals). Recent work has shown that a bio-hybrid society comprising
simulated robots and honeybees is able to reach collective decisions that
are the product of self-organisation among the robots and the bees, and
that these decisions can be coordinated across multiple groups that reside
in distinct habitats via robot–robot communication. Here we examine
how sensitive the collective decision-making is to the specific topologies of
information sharing in such bio-hybrid societies, using agent-based simu-
lation modelling. We find that collective decision-making across multiple
groups occupying distinct habitats is possible for a range of inter-habitat
interaction topologies, where the rate of coordinated outcomes has a pos-
itive relationship with the number of inter-habitat links. This indicates
that system-wide coordination states are relatively robust and do not
require as strong inter-habitat coupling as had previously been used.

Keywords: Collective behaviour · Mixed animal-robot societies

1 Introduction

Social living is integral to organisms across many magnitudes of scale and com-
plexity, from bacterial biofilms [1] to primates [2], and such societies frequently
exhibit behaviours at the level of the collective, such as moving together by fol-
lowing a leader [3] or self-organised aggregation [4]. Many social animals and
behaviours have a substantial impact on humanity, both beneficial (e.g., pol-
lination) and detrimental (e.g., spread of disease). Since collective behaviours
can emerge from a combination of self-organised interactions, it can be prob-
lematic to understand what triggers, modulates, or suppresses their emergence.
One emerging methodology used to examine collective behaviours is to develop

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 250–261, 2015.
DOI: 10.1007/978-3-319-23485-4 25
The Influence of Topology in Coordinating Collective Decision-Making 251

bio-hybrid societies, in which robots are integrated into the animal society [5,6].
In so doing, such an approach allows direct testing of hypotheses regarding indi-
vidual behaviours and how they are modulated by the group context (for exam-
ple, confirming a hypothesised behaviour by showing that a collective behaviour
is not changed when some animals are substituted by robots [6]). Alternatively,
it becomes possible to use robot behaviours that can manipulate the overall
collective behaviours [5].
Our research aims to develop bio-hybrid societies, ultimately comprising mul-
tiple species that interact with robots, which thus form an interface between ani-
mals that need not naturally share a habitat. Interfacing in this manner has the
added advantage that we can monitor precisely what information is exchanged
between animal groups (and permits experiments that attenuate or amplify spe-
cific information types). To move towards addressing this overarching aim, here
we use a simplified system that comprises multiple populations of the same
species, and we examine this using individual-based simulation modelling.
In recent work, it has been shown that juvenile honeybees interacting with
robots can reach collective decisions jointly with those robots, and moreover,
that such collective decisions can be coordinated across multiple populations of
animals that reside in distinct habitats [7]. This work showed that robots using
cross-inhibition and local excitation led to high levels of collective decision-
making. However, it only compared ‘all-or-nothing’ coupling between the two
habitats. In this paper, we examine the sensitivity of decision-making and coor-
dination of those decisions across arenas, with respect to the inter-robot commu-
nication topology. We find that even relatively sparse numbers of links between
habitats can be sufficient to coordinate outcomes across those habitats. These
findings improve our understanding about the interactions that are sufficient to
coordinate behaviours among separated groups of animals, and the limits that
can be tolerated.

1.1 Related Work

Individual-based modelling and embodiment of behaviour in robots are becoming
increasingly used as tools for studying animal behaviour, at both the individual
level [8] and the collective level [9,10]. Moreover, rather than merely attempt
to replicate (or to abstract) a behaviour under study with robots, using robots
to interact with animals can enable investigation of social behaviours in a more
direct way [6]. For instance, [3] present a programmable fish that is used to
investigate leadership among fish movement. [5] present robot cockroaches that
are accepted by the insects, and together exhibit group dynamics equivalent to
wholly animal societies. This work also showed that by changing the behaviour
of the robots they could modulate the overall collective decisions reached.
Honeybees are another social insect that aggregate in environmental regions
that they favour: [11] studies thermal conditions (rather than the light level
preferences in cockroaches). Individual animals do not systematically select these
preferred regions however: it is a collective decision that depends on being part
of a group of a sufficient size [12].
252 R. Mills and L. Correia

Fig. 1. A preliminary experiment with Fig. 2. How we split and name the are-
a hybrid society comprising robots, real nas into zones during the analysis.
bees, virtual bees and simulated robots,
yielding collective decisions among vir-
tual and real bees.

Our recent work [7] has examined how collective decisions can be reached by
hybrid animal–robot societies, with individual-based simulation following pre-
liminary work that coupled real bees to simulated bees via physical and sim-
ulated robots (see Fig. 1). This work uses robots that are able to manipulate
key environmental variables for honeybees, including the temperature, light, and
vibration in the vicinity of the robot [13], as well as being able to detect the pres-
ence of bees nearby. Using two robots, when we introduced a positive feedback
loop between the heat each robot emits and the presence of bees nearby, the ani-
mals make a collective decision by aggregating around one of the robots. There
was nothing to discriminate between the robots to start with, but the action of
the bee population breaks symmetry – initially by chance, but reinforced by one
or other robot. Moreover, we also showed that collective decisions made in two
separate arenas, each with a population of bees and two robots, can be coordi-
nated when the robots share task-speciﬁc information with another robot in the
other arena. The current paper builds on these results, examining the inﬂuence
of the inter-robot links used to couple two arenas of simulated honeybees.

2 Methods
We use a real-time platform for 2D robot simulation1 to simulate the inter-
action of bees and robots. We model both bees and robots as agents in this
world, making use of a basic motile robot for the bees, and use a ﬁxed robot
with a customised model that corresponds to the bespoke robots designed in
our laboratory-based work [13]. While simulation modelling cannot fully replace
reality, it does allow us to explore relationships between key micro-level mecha-
nisms and how these can give rise to observed macro-level dynamics. The simu-
lator design enables execution of the exact same robot controllers in simulation
1
Enki – an open source fast 2D robot simulator http://home.gna.org/enki/
The Inﬂuence of Topology in Coordinating Collective Decision-Making 253

and the physical robots, adding substantial value to the resolution of models
employed, within the larger cycle of modelling and empirical work.
We model juvenile honeybee behaviour using Beeclust [14]. This is a social
model that results in aggregations in zones of highly favoured stimulus. This
model was developed based on observations of honeybees: speciﬁcally, that they
exhibit a preference to aggregate in regions with temperatures in the range
34◦ C–38◦ C; that groups of bees are able to identify optimal temperature zones,
but individual bees do not do so; and that speciﬁc inter-animal chemical cues
(e.g., pheromones) have not been shown to be important in this collective aggre-
gation [14]. It has previously been used to illustrate light-seeking behaviour in a
swarm of robots [9]. Here we simulate the bees in a thermal environment.
The robot and bee models used in this work are the same as those in [7] and
for completeness we describe them fully in the remainder of this section.

2.1 Bee Model

There are two main phases in the bee behaviour: (i) random exploration; and
(ii) pausing; in addition to obstacle avoidance (this interrupts (i)). When a bee
encounters or collides with another bee, it enters the pausing state, and remains
paused for a duration proportional to the local temperature (warmer regions
yield longer pauses). There is a positive feedback loop between pause duration
and the chance of further exploring bees encountering a bee in a given location,
and thus an aggregation in a warm zone can undergo amplification.
We model bees that can discriminate between conspecifics and inanimate
obstacles at close proximity, using infra-red (IR) sensors that provide a distance
d and object type y. The bees have three sensors l, f, r that are oriented at −45◦ ,
0, +45◦ relative to the bee’s bearing. The behaviour is defined as follows.
loop:
1. delay(dt)
2. ((yl , yf , yr ), (dl , df , dr ) ) ← read sensors(l, f, r)
3. if ∃i,(di < 0.5 ∧ yi = bee), for i ∈ [l, f, r],
(a) stop()
(b) T ← measure temperature
(c) twait ← compute wait(T )
(d) sleep(twait )
(e) random turn()
4. else if ∃i,(di < 0.5 ∧ yi = obstacle), for i ∈ [l, f, r],
random turn()
5. else:
forwards()
end loop
We use data observed in juvenile honeybees (collected and analysed by Univ.
Graz, fitted with a hill function) to define the compute wait(T ) mapping:

(a + b · T )c
twait = · e + f, (1)
(a + b · T )c + dc
254 R. Mills and L. Correia

where a = 3.09, b = −0.0403, c = −28.5, d = 1.79, e = 22.5, and f = 0.645. It is

similar to a sigmoid, with low waiting times (∼ 1 s) for T< 25◦ C and high waiting
times (∼ 25 s) for T = 38◦ C.

2.2 Robot Controller

In our research we employ custom-designed robotic devices which are able to

generate stimuli of several modalities that the animals are sensitve to, includ-
ing heat and light; and the robots an also sense various environmental factors
(e.g., temperature, IR) [13]. The robots occupy fixed positions within the exper-
imental arenas to interact with the animals. In this paper we use the robots’
thermal actuators and 6 IR proximity sensors. The robots can also communi-
cate with specific neighbours (the topologies are described below) The following
program defines how the robots determine their temperatures through time.
At initialisation: set vector m of length mmax to zero for m[0]..m[mmax − 1].
For each timestep, t:
1. draw ← count IR sensors above their threshold
2. m[mod(t, mmax )] ← saturate(draw )
3. neighbours)
send(m,
4. if mod(t, tupdate ) = 0:
(a) dx ← receive message(s) from neighbour(s)
(b) Tnew ← density to heat(m, dx )

is a time-
where draw is a raw estimate of local bee density in a given timestep, m
averaged estimate computed as the mean value of the memory vector m, dx is a
vector of density estimates received from other robots in the interaction neigh-
bourhood. In this paper, robots have zero, one, or two neighbours depending on
the speciﬁc topology under test.
We use saturate(s) = min(4, s) in this study. The density to heat function
maps the time-averaged detection count to an output temperature via a linear
transformation, and is parameterised to allow for diﬀerent topologies examined.

Each involved robot x makes a contribution cx = d4x (Tmax − Tmin ) that depends
on a robot’s temperature. These are combined as a weighted sum:

Tnew = Tmin + cx wx , (2)
x∈{l,r,c}

where the relative weights of each robot’s contribution depends on the specific
setup. Each robot can be influenced by the local environment cl , cross-inhibitory
signals from a competitor cc , and collaborative signals from a specific remote
robot cr . In this paper we use cross-inhibition wc = −0.5 throughout. When a
robot has an incoming collaborative link then we set wl = wr = 0.5, and otherwise
set wl = 1 and wr = 0. The topologies tested are shown below.
The Influence of Topology in Coordinating Collective Decision-Making 255

2.3 Measuring Collective Decisions

Characterising behaviours at the group level is not as clear cut as for decisions
at the individual level. However, by using binary choice assays where individuals
can exhibit a clear choice, we can use statistical tests to formally quantify when
the frequency of choices differs significantly from an accidental outcome. We
follow the methodology of [5] and [12] that uses the binomial test to formally
quantify when a collective decision is reached. Extending the setup to include
multiple binary choices affords these benefits of quantifiable behaviours at the
level of collective while also admitting more complex environments.
Specifically, at the end of each experiment, we divide the location of the bees
into two different zones within each arena, such that one of the two robots has
‘won’ the competition for that bee (see Fig. 2). We define the null hypothesis
to be that the bees made their choice at random and without bias. When the
outcome differs significantly from this, we consider it a collective decision (CD).
Since here we concentrate on the ability to coordinate the collective decisions
(CCD) reached in two arenas, we also consider a test that lumps together all bees
from both populations, as if they were in a single group. We use the binomial
test to quantify when such outcomes are significant within a single experiment.
We also apply χ2 tests and binomial tests across a set of repeats and between
different conditions, to verify when outcomes are significantly different.

2.4 Parameters and Setup

With our choices, we aim to reﬂect key conditions used in our animal-based
experiments, such as the arena and robot setup, and the temperatures used are
in a range that is relevant for the animals without harming them.
In the experiments below, Tmin = 28◦ C and Tmax = 38◦ C; the ambient tem-
perature is 27◦ C. The modelled bees measure 13.5 mm × 5 mm (based on
our measurements), and detection range is 5 mm. Memory length mmax = 18,
tupdate = 3 s, tresample = 0.5 s. Since other methods did not provide more accurate
estimates in preliminary testing we use a simple time average across the whole
memory m. To facilitate the observation of binary choices, the arenas used are
rectangular with rounded ends (see Fig. 1). They have dimensions 210×65 mm
(internal). The two robots are positioned 60 mm either side of the centre, on the
midline. Bees are initialised with random position and orientation.

3 Simulation Experiments
This paper aims to examine the sensitivity of coordinating collective decision-
making between arenas as a function of inter-arena communication links. To
address this aim, we examine a range of diﬀerent topologies within the limits
examined in prior work, employing a basic setup that uses two identical arenas,
each comprising a population of bees and two robots. We vary the inter-arena
links, keeping the link weights positive where present. We use six diﬀerent topolo-
gies that vary in the number and direction of coupling that they provide between
256 R. Mills and L. Correia

(a) t-a (b) t-b (c) t-c

(d) t-d (e) t-e (f) t-f

Fig. 3. Topologies tested in the multi-arena experiments. Solid lines indicate positive
contributions and dashed lines indicate negative contributions, with respect to the
receiving robot. From top-left to bottom-right, the two arenas become more loosely
coupled.

the two arenas. Fig. 3 shows these topologies. Moving from top-left to bottom-
right, t-a has the strongest coupling; t-b and t-c have some reciprocal paths;
t-d and t-e have links in one direction only. t-f is the other extreme without
any between-arena links, which we use to establish a baseline for the other out-
comes. These motifs give broad coverage of the space and while other topologies
are possible with more links or more classes of link, our motivation to understand
sparser networks is better served by these networks with fewer rather than more
inter-arena links.
Our prior work showed strong coordination under (a) and confirmed that
the absence of links (f) does not lead to coordination [7]. Intuitively, we expect
a weaker ability to coordinate as the links become sparser; however, we do not
know what the limits are or how gracefully the system will degrade.
We run 50 independent repeats for each of the six topologies, each experiment
lasting for 15 mins. Fig. 5 shows the frequency of statistically collective decisions
made, for each of the topologies. Fig. 4 provides a slightly different view of the
experiments by showing the mean percentage of bees that were present in the east
Zone during the last 120 s. All 50 repeats in each topology have a point plotted
in this graph, and while it is not always the case that the points in the extremes
correspond to a significant collective decision at the time of measurement, the
two views are strongly linked.
Considering the distributions shown in Fig. 5, we perform the following statis-
tical tests to identify the collective decision-making and coordination that arises
The Influence of Topology in Coordinating Collective Decision-Making 257

Fig. 4. Final states of each of the topologies, with one point shown per experiment.
When the system has coordinated the decisions made in each arena, the points only
appear in two of the four corners (indicating that the decisions are mutually con-
strained).

under each topology. Using a χ2 test with a null model of equal likelihood for
each of the four possible decision pairings, topologies t-e and t-f do not deviate
significantly from the null model (ρ > 0.05). The other four topologies all devi-
ate significantly, i.e., they exhibit coordination between the two arenas. We also
compare the overall rate of collective decision-making across different topologies.
The three cases that use two links (t-b, t-c, t-d) have similar ability to induce
coordinated collective decisions as t-a (binomial test, ρ > 0.05). Comparing the
rate of collective decision within the two arenas separately (i.e., any of the four
outcomes), t-b, t-c, t-e have significantly lower rates than t-a (binomial test,
ρ < 0.05); however, although the t-d rate is lower, it is not significantly lower
(ρ > 0.05). None of the topologies exhibit a bias towards either coordinated out-
come (binomial test, ρ > 0.05), with the exception of t-d (ρ < 0.05).
Overall, these results show that: (i) All topologies with two or more links
coupling the two arenas are able to coordinate the decision-making. t-e, the
sparsest topology, is not able to reliably coordinate the decision-making (nor
is the unlinked case t-f but of course this is to be expected). (ii) Most of the
sparser topologies are less frequently able to induce collective decision-making
than the most tightly linked case t-a. t-d is a marginal exception in this regard.
(iii) t-d exhibits some anomalies regarding a bias towards the WW outcome over
the EE outcome. Given the absence of bias in the model or the topology, this is
somewhat surprising and requires further investigation to identify the source of
this bias.
258 R. Mills and L. Correia

Fig. 5. Frequency of runs with signiﬁcant collective decisions, from 50 repeats. Values in
brackets indicate the frequency of coordinated collective decisions made, i.e., lumping
together both populations. Other values indicate assessment of the two populations
separately.

To obtain a better understanding of the diﬀerences between the diﬀerent

conditions, we inspect some example trajectories (Fig. 6), which illustrate some
of the issues faced by different topologies. (The runs are characteristic of several
seen in each case, but frame (c) was selected to highlight a difficulty rather than
be the most frequent trajectory type). These figures divide the bee locations into
two zones per arena, one for each robot (see Fig. 2). We compute the average
percentage of bees in the East zone during each period (here, 30 s), and addition-
ally, apply a binomial test to quantify whether a significant collective decision
is reached at the end of each period (for ρ < 0.05). In frame (a), we see that the
strong coupling of t-a results in tight changes in bee location in each arena.
In frame (b), the two populations initially move towards opposite ends, but as
the decision in the South arena solidifies, it is able to coordinate this decision
with the North arena. The progression of the two populations is not typically as
tightly in lock-step as for t-a, with reciprocal feedback, but the topology does
result in coordination with a high frequency. In frame (c) there is only one link
(t-e), and it is far slower for the South arena to exert a coordinating influence
over the North arena. In this case it was able to coordinate within 15 mins, but
in other cases different decisions are reached in each arena and the single link is
not always able to overturn the result.
The Influence of Topology in Coordinating Collective Decision-Making 259

(a) topology t-a (strongest links)

(b) topology t-d (two links S→N)

(c) topology t-e (a single link S→N)

Fig. 6. Example trajectories showing a fraction of bees in East half of each arena, and
annotated for where the collective decisions are made with thick, solid lines.

4 Discussion and Future Work

The diﬀerent topologies vary in strength of coupling between the two arenas, and
this also results in varying abilities to coordinate the collective decisions reached
within those arenas. With four links in two reciprocal pairs (t-a), the activity in
the two arenas is tightly coupled and reliably results in coordination across both
bee populations. With two links forming a reciprocal network (t-b, t-c), the
two arenas still coordinate frequently, although to a lesser extent than for t-a.
Using two links in the same direction (t-d) is able to couple the decisions made
at a slightly higher rate than the other two-link topologies, although clearly the
260 R. Mills and L. Correia

single direction of information-sharing may not always be appropriate. Using

only a single link (t-e) is not sufficient coupling to significantly coordinate the
decisions made across the two arenas.
In summary, we find that the topology influences the ability of multiple
populations to coordinate their activities, but also that relatively few links are
still sufficient to result in some coordination. This is welcome news, since it
suggests that such distributed collective systems will exhibit favourable tolerance
to failures in links (provided some ability to detect those failures – for instance,
monitoring an absence of messages received over a given link could be used to
adapt the robot’s behaviour). Moreover, as we begin to consider larger systems,
sparse networks of localised interactions are generally preferable.
The scenarios modelled above all have static background environments. We
are also interested in how dynamic environments (e.g., exogeneous shocks) affect
coordination ability. This introduces the question: can the overall system restore
a coordinated state following disruption? which may more strongly discriminate
between different topologies. Perturbations could be due to an exogenous heat
source, or another stimulus modality that modulates the bee behaviour (for
example, it is thought that vibrations can act as stop signals). Interestingly, when
considering a multi-species bio-hybrid system, the source of variability could also
be endogenous, and the role of the robots in enabling coordination would be even
greater. In any of these more dynamic scenarios, it would become important to
measure the speed of coordinated decision-making, as well as the longevity (see
e.g., [15]). A further open question relates to the advantages and tradeoffs within
an extended system comprising more sub-groups of animals (each of which offers
some distributed ‘memory’ of a coordinated decision): more smaller groups may
be better able to retain a decision; but dividing into groups that are too small
will likely degrade the ability of each group to form a decision in the first place. In
more complex scenarios such as these we aim to investigate how further adaptive
mechanisms within the robot network can improve efficiency.
In this paper we have investigated how inter-robot interaction topology influ-
ences the ability of system-level coordinated decisions. We have shown that the
rate of coordination does depend on topology, but also that the basin of attrac-
tion for coordinated states is relatively robust. In general, better understanding
the limits and affordances of interactions in such systems will enable the devel-
opment of more capable mixed animal–robot societies.

Acknowledgments. This work is supported by: EU-ICU project “ASSISI|bf”

no. 601074, and by centre grant (to BioISI, ref: UID/MULTI/04046/2013), from
FCT/MCTES/ PIDDAC, Portugal.

References
1. Nadell, C.D., Xavier, J.B., Foster, K.R.: The sociobiology of bioﬁlms. FEMS Micro-
biol. Rev. 33(1), 206–224 (2009)
2. King, A.J., Cowlishaw, G.: Leaders, followers, and group decision-making.
Commun. Integr. Biol. 2(2), 147–150 (2009)
The Inﬂuence of Topology in Coordinating Collective Decision-Making 261

3. Faria, J.J., Dyer, J.R., Clément, R.O., Couzin, I.D., Holt, N., Ward, A.J.,
Waters, D., Krause, J.: A novel method for investigating the collective behaviour
of fish: introducing ‘robofish’. Behav. Ecol. Sociobiol. 64(8), 1211–1218 (2010)
4. Parrish, J.K., Edelstein-Keshet, L.: Complexity, pattern, and evolutionary trade-
offs in animal aggregation. Science 284(5411), 99–101 (1999)
5. Halloy, J., Sempo, G., Caprari, G., Rivault, C., Asadpour, M., Tâche, F.,
Said, I., Durier, V., Canonge, S., Amé, J.M., et al.: Social integration of robots
into groups of cockroaches to control self-organized choices. Science 318(5853),
1155–1158 (2007)
6. De Schutter, G., Theraulaz, G., Deneubourg, J.L.: Animal-robots collective intel-
ligence. Ann. Math. Artif. Intel. 31(1–4), 223–238 (2001)
7. Mills, R., Zahadat, P., Silva, F., Mliklic, D., Mariano, P., Schmickl, T., Correia, L.:
Coordination of collective behaviours in spatially separated agents. In: Procs.
ECAL (2015)
8. Webb, B.: Can robots make good models of biological behaviour? Behav. Brain.
Sci. 24(06), 1033–1050 (2001)
9. Kernbach, S., Thenius, R., Kernbach, O., Schmickl, T.: Re-embodiment of hon-
eybee aggregation behavior in an artificial micro-robotic system. Adapt. Behav.
17(3), 237–259 (2009)
10. Campo, A., Garnier, S., Dédriche, O., Zekkri, M., Dorigo, M.: Self-organized dis-
crimination of resources. PLoS ONE 6(5), e19888 (2011)
11. Grodzicki, P., Caputa, M.: Social versus individual behaviour: a comparative app-
roach to thermal behaviour of the honeybee (Apis mellifera L.) and the american
cockroach (Periplaneta americana L.). J. Insect. Physiol. 51(3), 315–322 (2005)
12. Szopek, M., Schmickl, T., Thenius, R., Radspieler, G., Crailsheim, K.: Dynamics
of collective decision making of honeybees in complex temperature fields. PLoS
ONE 8(10), e76250 (2013)
13. Griparic, K., Haus, T., Bogdan, S., Miklic, D.: Combined actuator sensor unit for
interaction with honeybees. In: Sensor Applications Symposium (2015)
14. Schmickl, T., Thenius, R., Moeslinger, C., Radspieler, G., Kernbach, S.,
Szymanski, M., Crailsheim, K.: Get in touch: cooperative decision making based on
robot-to-robot collisions. Auton. Agent. Multi. Agent. Syst. 18(1), 133–155 (2009)
15. Gautrais, J., Michelena, P., Sibbald, A., Bon, R., Deneubourg, J.L.: Allelomimetic
synchronization in merino sheep. Anim. Behav. 74(5), 1443–1454 (2007)
A Differential Evolution Algorithm
for Optimization Including Linear
Equality Constraints

Helio J.C. Barbosa1,2(B) , Rodrigo L. Araujo2 , and Heder S. Bernardino2

1
Laboratório Nacional de Computação Cientı́ﬁca, Petrópolis, RJ, Brazil
[email protected]
http://www.lncc.br/~hcbm
2
Universidade Federal de Juiz de Fora, Juiz de Fora, MG, Brazil
[email protected], [email protected]
http://www.lncc.br/~hedersb

Abstract. In this paper a diﬀerential evolution technique is proposed in

order to tackle continuous optimization problems subject to a set of lin-
ear equality constraints, in addition to general non-linear equality and
inequality constraints. The idea is to exactly satisfy the linear equal-
ity constraints, while the remaining constraints can be dealt with via
standard constraint handling techniques for metaheuristics. A procedure
is proposed in order to generate a random initial population which is
feasible with respect to the linear equality constraints. Then a muta-
tion scheme that maintains such feasibility is deﬁned. The procedure is
applied to test-problems from the literature and its performance is also
compared with the case where the constraints are handled via a selection
scheme or an adaptive penalty technique.

1 Introduction

Constrained optimization problems are common in many areas, and due to the
growing complexity of the applications tackled, nature-inspired metaheuristics
in general, and evolutionary algorithms in particular, are becoming increasingly
popular. That is due to the fact that they can be readily applied to situations
where the objective function(s) and/or constraints are not known as explicit
functions of the decision variables, and when potentially expensive computer
models must be run in order to compute the objective function and/or check the
constraints every time a candidate solution needs to be evaluated.
As move operators are usually blind to the constraints (i.e. when operating
upon feasible individuals they do not necessarily generate feasible oﬀspring) stan-
dard metaheuristics must be equipped with a constraint handling technique. In
simpler situations, repair techniques [18], special move operators [19], or special
decoders [9] can be designed to ensure that all candidate solutions are feasible.
We will not attempt to survey the current literature on constraint handling here,
and the reader is referred to [4], [13], and [5].

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 262–273, 2015.
DOI: 10.1007/978-3-319-23485-4 26
A Diﬀerential Evolution Algorithm 263

Here we will focus on obtaining solutions automatically satisfying all linear

equality constraints (in the form Ex = c). Any remaining constraints present in
the problem can be dealt with using constraint handling techniques available in
the literature.
The first approach to this problem seems to be the GENOCOP system app-
roach [14] where the linear equalities are used to eliminate some of the variables,
which are now written as a function of the remaining ones, thus reducing the
number of variables. As a result, any linear inequality constraint present in the
problem has to be adequately modified. Another idea is that of the use of a homo-
morphous mapping [10] that transforms a space constrained by Ex = c into a
space that is not only fully unconstrained, but also of lower dimensionality [15].
In [16] two modifications of the particle swarm optimization (PSO) technique
were proposed to tackle linear equality constraints: LPSO and CLPSO. LPSO
starts from a feasible initial population and then maintains feasibility by modi-
fying the standard PSO formulas for particle velocity. CLPSO tries to improve
on some observed shortcomings of LPSO and changes the equation for the best
particle in the swarm so that it explores the feasible domain using a random
velocity vector in the null space of E, that is, this velocity vector keeps the best
particle feasible with respect to the linear equality constraints.
In this paper a simple modification of the differential evolution technique
(denoted here by DELEqC) is proposed in order to exactly treat the linear
equality constraints present in continuous optimization problems that may also
include additional non-linear equality and inequality constraints, which can be
dealt with via existing constraint handling techniques. Starting from a popula-
tion which is feasible with respect to the linear equality constraints, feasibility
is maintained by avoiding the standard DE crossover and using an adequate
mutation scheme along the search process.

2 The Problem

The problem considered here is to ﬁnd x ∈ Rn that minimizes the objective

function f (x) subject to m < n linear equality constraints, writen as Ex = c,
in addition to p general inequality contraints gi (x) ≤ 0 and q general equality
constraints hj (x) = 0.
When using metaheuristics, general equality constraints are usually relaxed
to |hj (x)| ≤ where the parameter > 0 must be conveniently set by the
user. As a result, candidate solutions strictly feasible with respect to all equality
constraints are very hard to be obtained. Whenever x violates the j-th constraint
one deﬁnes the corresponding constraint violation as vj (x) = |hj (x)| − and
aggregates for the individual x as v(x) = j vj (x).
Being able to exactly satisfy all linear equalities is a valuable improvement
in a general constrained optimization setting, and should thus be pursued.
It is assumed here that the constraints are linearly independent so that E
has full rank (rank(E) = m). A candidate solution x1 ∈ Rn is said to be feasible
if x1 ∈ E where E denotes the feasible set:
264 H.J.C. Barbosa et al.

E = {x ∈ Rn : Ex = c}
A vector d ∈ Rn is said to be a feasible direction at the point x ∈ E if x + d
is feasible: E(x + d) = c. It follows that the feasible direction d must satisfy
Ed = 0 or, alternatively, that any feasible direction belongs to the null space of
the matrix E
N (E) = {x ∈ Rn : Ex = 0}
Now, given two feasible vectors x1 and x2 it is easy to see that d = x1 − x2 is
a feasible direction, as E(x1 − x2 ) = 0. As a result, one can see that the standard
mutation formulae adopted within DE (see Section 3) would always generate a
feasible vector whenever the vectors involved in the diﬀerences are themselves
feasible. If crossover is avoided, one could start from a feasible random initial
population and proceed, always generating feasible individuals.

3 Diﬀerential Evolution

Differential evolution (DE) [20] is a simple and effective algorithm for global
optimization, specially for continuous variables. The basic operation performed is
the addition to each design variable in a given candidate solution of a term which
is the scaled difference between the values of such variable in other candidate
solutions in the population. The number of differences applied, the way in which
the individuals are selected, and the type of recombination performed define the
DE variant (also called DE strategy). Although many DE variants can be found
in the literature [17], the simplest one (DE/rand/1/bin) is adopted here:

ui,j,G+1 = xr1 ,j,G + F · (xr2 ,j,G − xr3 ,j,G )

where r1 , r2 and r3 correspond to distinct randomly picked indexes, and G

denotes the iteration counter of the search technique.
In addition, in the general case, a crossover operation is performed, according
to the parameter CR. However, in order to maintain feasibility, crossover is not
performed in the proposed DE variant. Here, ui,j,G+1 replaces xi,j,G when u is
better then x for each i.

3.1 A Selection Scheme

In order to handle the constraints, a popular technique is Deb’s selection scheme

[6] (denoted here by DSS) which enforces the following criteria: (i) any feasible
solution is preferred to any infeasible solution, (ii) between two feasible solutions,
the one having better objective function value is preferred, and (iii) between
two infeasible solutions, the one having smaller constraint violation (v(x))is
preferred.
A Diﬀerential Evolution Algorithm 265

3.2 An Adaptive Penalty Technique

A parameterless adaptive penalty method (APM) was developed in [2,3,11]
which does not require the knowledge of the explicit form of the constraints
as a function of the design variables, and is free of parameters to be set by
the user. An adaptive scheme automatically sizes the penalty parameter corre-
sponding to each constraint along the evolutionary process. The ﬁtness function
proposed is written as:
⎧
⎪
⎨ f (x), m if x is feasible,
F (x) = f (x) +
⎪
⎩ kj vj (x), otherwise
j=1

f (x), if f (x) > f (x), vj (x)
f (x) = kj = |f (x)| m [v
f (x) otherwise l=1 l (x)]
2

where f (x) is the average of the objective function values in the current popu-
lation and vl (x) is the violation of the l-th constraint averaged over the current
population. The idea is that the values of the penalty coefficients should be dis-
tributed in a way that those constraints which are more difficult to be satisfied
should have a relatively higher penalty coefficient.

4 The Proposal
Differently from penalty or selection schemes, and from special decoders, the
proposed DE algorithm that satisfies linear equality constraints is classified as a
feasibility preserving approach.
In order to generate a feasible initial population of size NP one could think of
starting from a feasible vector x0 and proceed by moving from x0 along random
feasible directions di : xi = x0 + di , i = 1,2, . . . , NP. A random feasible direction
can be obtained by projecting a random vector onto the null space of E. The
projection matrix is given by [12]
PN (E) = I − E T (EE T )−1 E (1)
where the superscript T denotes transposition. Random feasible candidate solu-
tions can then be generated as
xi = x0 + PN (E) vi , i = 1,2, . . . , NP
where vi ∈ Rn is randomly generated and x0 is computed as x0 = E T (EE T )−1 c.
It is clear that x0 is a feasible vector, as Ex0 = EE T (EE T )−1 c = c.
It should be mentioned that the matrix inversion in eq. 1 is not actually
performed and, then, PN (E) is never computed (see Algorithm 1).
The Differential Evolution for Linear Equality Constraints (DELEqC) algo-
rithm is defined as DE/rand/1/bin, equipped with the feasible initial population
generation procedure in Algorithm 1, and running without crossover.
Notice that any additional non-linear equality or inequality constraint can
be dealt with via existing constraint handling techniques, such as those in
sections 3.1 and 3.2.
266 H.J.C. Barbosa et al.

Algorithm 1. Algorithm CreateInitialPopulation.

input : NP (population size)
1 M = EE T ;
2 Perform LU Decomposition: M = LU ;
3 Solve M y = c (Lw = c and U y = w) ;
4 x0 = E T y ;
5 for i ← 1 to NP do
6 d ∈ Rn is randomly generated;
7 z = Ed;
8 Solve M u = z (Lw = z and U u = w) ;
9 v = ETu ;
10 xi = x0 + d − v;

5 Computational Experiments
In order to test the proposal (DELEqC) and assess its performance, a set of test
problems with linear equality constraints was taken from the literature (their
descriptions are available in Appendix A). The results produced are then com-
pared with those from alternative procedures available in the metaheuristics
literature [15,16], as well as running them with established constraint handling
techniques (Deb’s selection scheme and an adaptive penalty method) to enforce
the linear equality constraints. One hundred independent runs were executed in
all experiments.
Initially, computational experiments were performed aiming at selecting the
values for population size (NP) and F. The tested values here are NP ∈ {5, 10, 20,
30, . . . , 90, 100} and F ∈ {0.1, 0.2, . . . , 0.9, 1}. Due the large number of combi-
nations, performance profiles [7] were used to identify the parameters which
generate the best results. We adopted the maximum budget allowed in [15,16]
as a stop criterion and the final objective function value as the quality metric.
The area under the performance profiles curves [1] indicates that the best per-
forming parameters according to these rules are NP = 50 and F = 0.7, and these
values were then used for all DEs in the computational experiments.

5.1 Results
First, we analyze how fast the proposed technique is in order to ﬁnd the best
known solution of each test-problem when compared with a DE using (i) Deb’s
selection scheme (DE+DSS) or (ii) an adaptive penalty method (DE+APM).
The objective in this test case is to verify if DELEqC is able to obtain the best
known solutions using a similar number of objective function evaluations. We
used CR = 0.9, NP = 50, and F = 0.7 for both DE+DSS and DE+APM. Statistical
information (best, median, mean, standard deviation, worst), obtained from 100
independent runs, The number of successful runs (sr) is also shown. A successful
run is one in which the best known solution is found (absolute error less or
A Diﬀerential Evolution Algorithm 267

Table 1. Statistical comparisons. Number of objective function evaluations required to

obtain the best known solution with a absolute error less or equal to 10−4 . The bounds
[−1000; 1000] were adopted for all test-problems. For DE+APM and DE+DSS, the
tolerance for equality constraints is = 0.0001.

TP technique best median mean st. dev. worst sr

DELEqC 2800 3475 3458.00 2.51e + 02 4050 100
1 DE+APM 14750 16400 16456.50 6.68e + 02 18200 100
DE+DSS 16300 18350 18336.00 7.54e + 02 20250 100
DELEqC 2750 3300 3249.50 1.93e + 02 3600 100
2 DE+APM 13050 15375 15385.00 7.79e + 02 17350 100
DE+DSS 16300 18100 18121.50 6.92e + 02 20150 100
DELEqC 1500 2050 2029.00 1.74e + 02 2400 100
3 DE+APM 12950 14250 14256.50 5.49e + 02 15800 100
DE+DSS 18300 20225 20158.50 7.68e + 02 22000 100
DELEqC 1250 1900 1906.00 1.61e + 02 2250 100
4 DE+APM 12250 14350 14280.00 7.30e + 02 15650 100
DE+DSS 17200 19000 19060.50 7.60e + 02 20800 100
DELEqC 1700 2050 2055.00 1.46e + 02 2350 100
5 DE+APM 13950 15300 15382.83 6.17e + 02 16650 99
DE+DSS 17550 19150 19157.50 6.93e + 02 21400 100
DELEqC 1450 1950 1932.50 1.56e + 02 2250 100
6 DE+APM 13850 15600 15582.00 6.32e + 02 17150 100
DE+DSS 16850 18850 18916.50 7.45e + 02 20750 100
DELEqC 6550 7300 7326.00 3.24e + 02 8150 100
7 DE+APM 77050 83750 83938.00 3.55e + 03 95750 100
DE+DSS 104050 118550 118711.50 7.01e + 03 134100 100
DELEqC 6700 8050 8018.00 4.05e + 02 8950 100
8 DE+APM 80850 88675 88901.00 3.71e + 03 98600 100
DE+DSS 108250 122000 122199.50 7.26e + 03 139000 100
DELEqC 18150 26100 30482.76 1.59e + 04 91400 58
9 DE+APM 117150 140950 148259.78 2.19e + 04 208900 46
DE+DSS 145950 175875 182394.57 2.63e + 04 256150 46
DELEqC 6900 7900 7959.00 4.65e + 02 9200 100
10 DE+APM 68200 77000 77211.50 3.46e + 03 86150 100
DE+DSS 98000 119175 119015.00 7.54e + 03 138350 100
DELEqC 12150 25975 31196.50 1.51e + 04 73150 100
11 DE+APM 101850 109600 121478.57 2.76e + 04 210700 14
DE+DSS 131450 141350 143682.14 9.04e + 03 165550 14
268 H.J.C. Barbosa et al.

Table 2. Results for the test-problems 7, 8, 9, 10, and 11 using the reference budget
(rb) and 2 × rb.

TP nofe technique best median mean st. dev. worst

Results using the reference budget (rb)

DELEqC 39.5143 115.8354 122.5069 5.03e + 01 260.6652
Genocop II [16] 38.322 - 739.438 8.40e + 02 1.63e + 3
7 1,250
LPSO [16] 37.420 - 7.03e + 03 8.01e + 03 4.63e + 4
CLPSO [16] 32.138 - 35.197 2.21e + 01 252.826
DELEqC 35.3784 35.3961 35.4051 2.78e − 02 35.5360
Genocop II 37.939 - 104.192 5.99e + 01 262.656
8 5,000
LPSO 240.101 - 8.46e + 3 1.05e + 04 7.79e + 4
CLPSO 35.377 - 82.077 6.10e + 01 197.389
DELEqC 40.5363 58.7789 58.1392 8.04e + 00 77.4060
Genocop II 49.581 - 56.694 8.93e + 00 75.906
9 5,000
LPSO 36.981 - 77.398 2.35e + 01 149.429
CLPSO 36.975 - 72.451 2.57e + 01 167.644
DELEqC 21485.2614 21485.2983 21485.2962 5.87e − 03 21485.3000
Genocop II 22334.971 - 58249.328 6.25e + 04 2.00e + 5
10 10,000
LPSO 1.95e + 5 - 1.38e + 9 4.48e + 09 3.55e + 10
CLPSO 21485.306 - 6.52e + 8 2.39e + 09 2.23e + 10
DELEqC 0.3091 0.5910 0.5821 9.56e − 02 0.8099
Genocop II 0.713 - 1.009 1.30e − 01 1.131
11 5,000
LPSO 0.529 - 6.853 6.20e + 00 36.861
CLPSO 0.632 - 7.470 7.27e + 00 44.071
Results using twice the reference budget (2 × rb)
DELEqC 32.5550 34.2880 34.7973 1.97e + 00 41.1522
Genocop II 37.612 - 304.884 3.88e + 02 1.17e + 3
LPSO 32.137 - 445.316 8.03e + 02 4.51e + 3
7 2,500 CLPSO 32.137 - 32.139 6.69e − 03 32.183
Constricted PSO [15] 32.137 - 32.137 2.00e − 10 32.137
BareBones PSO [15] 32.137 - 32.137 1.00e − 14 32.137
PSOGauss [15] 32.137 - 32.137 1.00e − 14 32.137
DELEqC 35.3769 35.3770 35.3770 2.57e − 05 35.3770
Genocop II 35.393 - 49.945 1.10e + 01 82.221
LPSO 35.400 - 758.525 1.50e + 03 1.12e + 4
8 10,000 CLPSO 35.377 - 68.570 5.39e + 01 196.067
Constricted PSO 35.377 - 36.165 3.12e + 00 55.538
BareBones PSO 35.377 - 40.019 9.61e + 00 75.147
PSOGauss 35.377 - 38.998 8.59e + 00 72.482
DELEqC 36.9755 44.9910 46.7872 8.30e + 00 67.1005
Genocop II 37.116 - 52.379 7.50e + 00 67.564
LPSO 36.975 - 76.487 3.07e + 01 232.979
9 10,000 CLPSO 36.975 - 69.039 2.16e + 01 154.379
Constricted PSO 36.975 - 50.431 1.23e + 01 85.728
BareBones PSO 36.975 - 55.921 1.61e + 01 119.556
PSOGauss 36.975 - 55.622 1.48e + 01 119.094
DELEqC 21485.2614 21485.2983 21485.2962 5.87e − 03 21485.3000
Genocop II 21490.840 - 21630.020 1.54e + 02 22030.988
LPSO 21554.158 - 4.44e + 6 2.28e + 07 2.18e + 8
10 20,000 CLPSO 21485.305 - 7.45e + 5 7.12e + 06 7.11e + 7
Constricted PSO 21485.3 - 21485.3 6.00e − 11 21485.3
BareBones PSO 21485.3 - 21485.3 6.00e − 11 21485.3
PSOGauss 21485.3 - 21485.3 6.00e − 11 21485.3
DELEqC 0.1509 0.4299 0.4163 1.07e − 01 0.6677
Genocop II 0.417 - 0.702 1.87e − 01 0.971
LPSO 0.387 - 2.997 2.94e + 00 15.805
11 10,000 CLPSO 0.236 - 3.049 3.10e + 00 16.427
Constricted PSO 0.151 - 0.488 1.68e − 01 0.83
BareBones PSO 0.203 - 0.523 1.81e − 01 0.912
PSOGauss 0.151 - 0.53 1.68e − 01 0.958
A Diﬀerential Evolution Algorithm 269

Table 3. Results for the test-problems 7, 8, 9, 10, and 11 using 3 × rb and 4 × rb.

TP nofe technique best median mean st. dev. worst

Results using three times the reference budget (3 × rb)

DELEqC 32.1447 32.1881 32.2008 5.53e − 02 32.5202
Genocop II 33.837 - 69.154 2.67e + 01 124.820
7 3,750
LPSO 32.137 - 35.071 2.15e + 01 244.077
CLPSO 32.137 - 32.137 1.83e − 04 32.138
DELEqC 35.3769 35.3770 35.3770 2.57e − 05 35.3770
Genocop II 35.772 - 42.393 6.86e + 00 60.110
8 15,000
LPSO 35.377 - 125.727 2.31e + 02 1.72e + 3
CLPSO 35.377 - 59.001 5.00e + 01 196.065
DELEqC 36.9751 37.1959 39.6822 4.52e + 00 54.5931
Genocop II 37.326 - 47.643 8.45e + 00 67.128
9 15,000
LPSO 36.975 - 74.338 2.83e + 01 234.968
CLPSO 37.970 - 77.409 3.09e + 01 224.024
DELEqC 21485.2614 21485.2983 21485.2962 5.87e − 03 21485.3000
Genocop II 21487.098 - 21546.332 8.53e + 01 21836.797
10 30,000
LPSO 21483.373 - 3.71e + 5 2.41e + 06 2.05e + 7
CLPSO 21485.305 - 21485.305 9.83e − 08 21485.305
DELEqC 0.1508 0.2361 0.2605 8.79e − 02 0.5190
Genocop II 0.351 - 0.702 1.72e − 01 0.962
11 15,000
LPSO 0.250 - 2.653 2.72e + 00 14.405
CLPSO 0.250 - 2.146 2.21e + 00 11.983
Results using four times the reference budget (4 × rb)
DELEqC 32.1371 32.1381 32.1386 1.41e − 03 32.1436
Genocop II 32.544 − 54.846 1.69e − 01 107.584
LPSO 32.137 − 32.137 7.18e − 12 32.137
7 5,000 CLPSO 32.137 − 32.137 3.02e − 06 32.137
Constricted PSO 32.137 - 32.137 1.00e − 14 32.137
BareBones PSO 32.137 - 32.137 1.00e − 14 32.137
PSOGauss 32.137 - 32.137 1.00e − 14 32.137
DELEqC 35.3769 35.3770 35.3770 2.57e − 05 35.3770
Genocop II 35.410 - 39.500 6.78e + 00 56.613
LPSO 35.377 - 59.762 3.98e + 01 246.905
8 20,000 CLPSO 35.377 - 39.832 1.09e + 01 71.380
Constricted PSO 35.377 - 35.783 2.39e + 00 55.538
BareBones PSO 35.377 - 37.079 5.33e + 00 55.538
PSOGauss 35.377 - 35.589 5.28e − 01 36.892
DELEqC 36.9748 36.9755 38.5722 3.27e + 00 51.4201
Genocop II 37.011 - 43.059 6.14e + 00 59.959
LPSO 38.965 - 75.011 2.77e + 01 184.226
9 20,000 CLPSO 36.975 - 76.896 2.73e + 01 151.394
Constricted PSO 36.975 - 46.199 7.48e + 00 76.736
BareBones PSO 36.975 - 49.238 1.02e + 01 76.774
PSOGauss 36.975 - 47.11 8.14e + 00 68.802
DELEqC 21485.2614 21485.2983 21485.2962 5.87e − 03 21485.3000
Genocop II 21485.363 - 21485.714 4.00e − 01 21486.646
LPSO 21485.925 - 1.260e + 05 1.04e + 06 1.04e + 07
10 40,000 CLPSO 21485.305 - 21485.305 9.40e − 08 21485.305
Constricted PSO 21485.3 - 21485.3 6.00e − 11 21485.3
BareBones PSO 21485.3 - 21485.3 6.00e − 11 21485.3
PSOGauss 21485.3 - 21485.3 6.00e − 11 21485.3
DELEqC 0.1482 0.2019 0.2241 6.11e − 02 0.3849
Genocop II 0.201 - 0.584 1.31e − 01 0.843
LPSO 0.338 - 1.695 1.92e + 00 14.401
11 20,000 CLPSO 0.236 - 1.900 2.38e + 00 17.259
Constricted PSO 0.151 - 0.413 1.45e − 01 0.792
BareBones PSO 0.151 - 0.444 1.58e − 01 0.83
PSOGauss 0.151 - 0.454 1.74e − 01 0.83
270 H.J.C. Barbosa et al.

equal to 10−4 ) using up to the maximum allowed number of objective function

evaluations (5,000,000). The best results are highlighted in boldface. It is easy
to see that DELEqC requires much less objective function evaluations to find
the best known solutions of the test-problems when compared to DE+DSS and
DE+APM. Also, notice that DELEqC obtained more successful runs than both
DE+DSS and DE+APM. Finally, it is important to highlight that the results
obtained by the proposed technique are statistically different: p-values< 0.05,
with respect to (i) pairwise comparisons using Wilcoxon rank-sum test, and
(ii) p-values adjusted by Bonferroni correction.
We also investigated if the proposed technique produces results better or
similar to those available in the literature using the same number of objective
function evaluations. To do so, test-problems 7-11 were considered as in [15,16].
Each test-problem has its allowed number of objective function evaluations (nofe)
grouped in 4 different computational budgets: a reference budget (rb), 2 × rb,
3 × rb, and 4 × rb. Statistical information of the results is presented in Tables 2
and 3, where the best results are highlighted in boldface.
Analyzing the results in Tables 2 and 3, it is important to highlight that
DELEqC obtains the best mean values in 15 of the 20 cases considered here (5
test-problems and 4 different budgets). Also, the best results with respect to the
best value found were attained in 15 situations. Notice that CLPSO, the best
performing technique in [16], obtained the best mean values in only 6 cases, and
the best results, concerning the best value found, in 14 cases.
When compared to Constricted PSO –the best performing technique in [15]
and which has results available for 10 of the 20 cases considered here– one can
notice that DELEqC obtained the best mean values in 8 cases, and the best
results, concerning the best value found, in 9 of the 10 cases, while Constricted
PSO found the best mean values in only 3 cases, and the best results, concerning
the best value found, in 9 of the 10 cases.
It should be noted that sometimes more than one algorithm reached the best
result. In general, for test-problems 7-11, one can notice that despite the use of
the simplest DE variant in DELEqC: (i) it performed similarly to the techniques
from the literature with respect to best results, concerning the best value found;
(ii) it obtained the best mean values in more test-cases; and (iii) it performed
well independently of the number of objective function evaluations tested here.

6 Concluding Remarks

Existing metaheuristics usually only approximately satisfy equality constraints

(according to a user specified tolerance value), even when they are linear. Here,
a modified DE algorithm (DELEqC) is proposed to exactly satisfy linear equal-
ity constraints while allowing any available constraint handling technique to be
applied to the remaining constraints of the optimization problem. A procedure
for the generation of a random feasible initial population is proposed. By avoid-
ing the standard DE crossover and using mutation operator formulae contain-
ing only differences of feasible candidate solutions, a DE algorithm is proposed
A Differential Evolution Algorithm 271

which maintains feasibility with respect to the linear equality constraints along
the search process. Results from the computational experiments indicate that
DELEqC outperforms the few alternatives that could be found in the literature
and is a useful additional tool for the practitioner.
Further ongoing work concerns the extension of DELEqC so that linear
inequality constraints are also exactly satisﬁed, as well as the introduction of
a crossover operator that maintains feasibility with respect to the linear equality
constraints.

Acknowledgments. The authors would like to thank the reviewers for their com-
ments, which helped improve the paper, and the support provided by CNPq (grant
310778/2013-1), CAPES, and Pós-Graduação em Modelagem Computacional da Uni-
versidade Federal de Juiz de Fora (PGMC/UFJF).

References
1. Barbosa, H.J.C., Bernardino, H.S., Barreto, A.M.S.: Using performance profiles
to analyze the results of the 2006 CEC constrained optimization competition. In:
IEEE Congress on Evolutionary Computation, pp. 1–8 (2010)
2. Barbosa, H.J.C., Lemonge, A.C.C.: An adaptive penalty scheme in genetic algo-
rithms for constrained optimization problems. In: Langdon, W.B., et al. (ed.) Proc.
of the Genetic and Evolutionary Computation Conference. USA (2002)
3. Barbosa, H.J.C., Lemonge, A.C.C.: A new adaptive penalty scheme for genetic
algorithms. Information Sciences 156, 215–251 (2003)
4. Coello, C.A.C.: Theoretical and numerical constraint-handling techniques used
with evolutionary algorithms: a survey of the state of the art. Computer Methods
in Applied Mechanics and Engineering 191(11–12), 1245–1287 (2002)
5. Datta, R., Deb, K. (eds.): Evolutionary Constrained Optimization. Infosys Science
Foundation Series. Springer, India (2015)
6. Deb, K.: An efficient constraint handling method for genetic algorithms. Comput.
Methods Appl. Mech. Engrg 186, 311–338 (2000)
7. Dolan, E., Moré, J.J.: Benchmarking optimization software with performance pro-
files. Math. Programming 91(2), 201–213 (2002)
8. Hock, W., Schittkowski, K.: Test Examples for Nonlinear Programming Codes.
Springer-Verlag New York Inc., Secaucus (1981)
9. Koziel, S., Michalewicz, Z.: A decoder-based evolutionary algorithm for con-
strained parameter optimization problems. In: Eiben, A.E., Bäck, T., Schoenauer,
M., Schwefel, H.-P. (eds.) PPSN 1998. LNCS, vol. 1498, pp. 231–240. Springer,
Heidelberg (1998)
10. Koziel, S., Michalewicz, Z.: Evolutionary algorithms, homomorphous mappings,
and constrained parameter optimization. Evol. Comput. 7(1), 19–44 (1999)
11. Lemonge, A.C.C., Barbosa, H.J.C.: An adaptive penalty scheme for genetic
algorithms in structural optimization. Intl. Journal for Numerical Methods in
Engineering 59(5), 703–736 (2004)
12. Luenberger, D.G., Ye, Y.: Linear and Nonlinear Programming. Springer-Verlag
(2008)
13. Mezura-Montes, E., Coello, C.A.C.: Constraint-handling in nature-inspired numer-
ical optimization: Past, present and future. Swarm and Evolutionary Computation
1(4), 173–194 (2011)
272 H.J.C. Barbosa et al.

14. Michalewicz, Z., Janikow, C.Z.: Genocop: A genetic algorithm for numerical opti-
mization problems with linear constraints. Commun. ACM 39(12es), 175–201
(1996)
15. Monson, C.K., Seppi, K.D.: Linear equality constraints and homomorphous map-
pings in PSO. IEEE Congress on Evolutionary Computation 1, 73–80 (2005)
16. Paquet, U., Engelbrecht, A.P.: Particle swarms for linearly constrained optimisa-
tion. Fundamenta Informaticae 76(1), 147–170 (2007)
17. Price, K.V.: An introduction to differential evolution. New Ideas in Optimization,
pp. 79–108 (1999)
18. Salcedo-Sanz, S.: A survey of repair methods used as constraint handling techniques
in evolutionary algorithms. Computer Science Review 3(3), 175–192 (2009)
19. Schoenauer, M., Michalewicz, Z.: Evolutionary computation at the edge of feasi-
bility. In: Ebeling, W., Rechenberg, I., Voigt, H.-M., Schwefel, H.-P. (eds.) PPSN
1996. LNCS, vol. 1141, pp. 245–254. Springer, Heidelberg (1996)
20. Storn, R., Price, K.V.: Differential evolution - a simple and efficient heuristic for
global optimization over continuous spaces. Journal of Global Optimization 11,
341–359 (1997)

Appendix A. Test-Problems
Test-problems 1 to 6 and 7 to 11 were taken from [8] and [15], respectively. Also
note that problems 7 to 11 are subject to the same set of linear constraints (2).
Problem 1 - The solution is x∗ = (1, 1, 1, 1, 1)T with f (x∗ ) = 0.

min (x1 − 1)2 + (x2 − x3 )2 + (x4 − x5 )2

s.t. x1 + x2 + x3 + x4 + x5 = 5
x3 − 2x4 − 2x5 = −3

Problem 2 - The solution is x∗ = (1, 1, 1, 1, 1)T with f (x∗ ) = 0.

min (x1 − x2 )2 + (x3 − 1)2 + (x4 − 1)4 + (x5 − 1)6

s.t. x1 + x2 + x3 + 4x4 = 7
x3 + 5x5 = 6

Problem 3 - The solution is x∗ = (1, 1, 1, 1, 1)T with f (x∗ ) = 0.

min (x1 − x2 )2 + (x2 − x3 )2 + (x3 − x4 )4 + (x4 − x5 )2

s.t. x1 + 2x2 + 3x3 = 6
x2 + 2x3 + 3x4 = 6
x3 + 2x4 + 3x5 = 6

Problem 4 - The solution is x∗ = (1, 1, 1, 1, 1)T with f (x∗ ) = 0.

min (x1 − x2 )2 + (x2 + x3 − 2)2 + (x4 − 1)2 + (x4 − 1)2 + (x5 − 1)2
s.t. x1 + 3x2 = 4
x3 + x4 − 2x5 = 0
x2 − x5 = 0
A Diﬀerential Evolution Algorithm 273

Problem 5 - The solution is x∗ = (−33/349, 11/349, 180/349, −158/349,

1/349)T with f (x∗ ) = 5.326647564.
min (4x1 − x2 )2 + (x2 + x3 − 2)2 + (x4 − 1)2 + (x5 − 1)2
s.t. x1 + 3x2 = 0
x3 + x4 − 2x5 = 0
x2 − x5 = 0

Problem 6 - The solution is x∗ = (−33/43, 11/43, 27/43, −5/43, 11/43)T with

f (x∗ ) = 4.093023256.
min (x1 − x2 )2 + (x2 + x3 − 2)2 + (x4 − 1)2 + (x5 − 1)2
s.t. x1 + 3x2 = 0
x3 + x4 − 2x5 = 0
x2 − x5 = 0
Problem 7 (Sphere) - f (x∗ ) = 32.137

10
min x2i
x∈E
i=1

The feasible set E is given by the linear equality constraints:

⎧
⎪
⎪ −3x2 − x3 + 2x6 − 6x7 − 4x9 − 2x10 = 3
⎪
⎪
⎨ −x1 − 3x2 − x3 − 5x7 − x8 − 7x9 − 2x10 = 0
x3 + x6 + 3x7 − 2x9 + 2x10 = 9 (2)
⎪
⎪
⎪ 2x1 + 6x2 + 2x3 + 2x4 + 4x7 + 6x8 + 16x9 + 4x10 = −16
⎪
⎩
−x1 − 6x2 − x3 − 2x4 − 2x5 + 3x6 − 6x7 − 5x8 − 13x9 − 4x10 = 30
Problem 8 (Quadratic) - f (x∗ ) = 35.377

10
10
2
10
min e−(xi −xj ) xi xj + xi
x∈E
i=1 j=1 i=1

Problem 9 (Rastrigin) - f (x∗ ) = 36.975

10
min x2i + 10 − 10 cos(2πxi )
x∈E
i=1

Problem 10 (Rosenbrock) - f (x∗ ) = 21485.3

9
min 100(xi+1 − x2i )2 + (xi − 1)2
x∈E
i=1

Problem 11 (Griewank) - f (x∗ ) = 0.151

1 2
10 10
xi
min x − cos( √ ) + 1
x∈E 4000 i=1 i i=10 i
Multiobjective Firefly Algorithm for Variable
Selection in Multivariate Calibration

Lauro Cássio Martins de Paula(B) and Anderson da Silva Soares

Institute of Informatics, Federal University of Goiás, Goiânia, Goiás, Brazil

{laurocassio,anderson}@inf.ufg.br
http://www.inf.ufg.br

Abstract. Fireﬂy Algorithm is a newly proposed method with poten-

tial application on several real world problems, such as variable selec-
tion problem. This paper presents a Multiobjective Fireﬂy Algorithm
(MOFA) for variable selection in multivariate calibration models. The
main objective is to propose an optimization to reduce the error value
prediction of the property of interest, as well as reducing the number of
variables selected. Based on the results obtained, it is possible to demon-
strate that our proposal may be a viable alternative in order to deal
with conﬂicting objective-functions. Additionally, we compare MOFA
with traditional algorithms for variable selection and show that it is a
more relevant contribution for the variable selection problem.

Keywords: Fireﬂy algorithm · Multiobjective optimization · Variable

selection · Multivariate calibration

1 Introduction
Multivariate calibration may be considered as a procedure for constructing a
mathematical model that establishes the relationship between the properties
measured by an instrument and the concentration of a sample to be deter-
mined [3]. However, the building of a model from a subset of explanatory vari-
ables usually involves some conflicting objectives, such as extracting information
from a measured data with many possible independent variables. Thus, a tech-
nique called variable selection may be used [3]. In this sense, the development of
efficient algorithms for variable selection becomes important in order to deal with
large and complex data. Furthermore, the application of Multiobjective Opti-
mization (MOO) may significantly contribute to efficiently construct an accurate
model [8].
Previous works about multivariate calibration have demonstrated that while
monoobjective formulation uses a bigger number of variables, multiobjective
algorithms can use fewer variables with a lower prediction error [2][1]. On the
one hand, such works have used only genetic algorithms for exploiting MOO. On
the other hand, the application of MOO in bioinspired metaheuristics such as
Firefly Algorithm may be a better alternative in order to obtain a model with

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 274–279, 2015.
DOI: 10.1007/978-3-319-23485-4 27
Multiobjective Firefly Algorithm for Variable Selection 275

a more appropriate prediction capacity [8]. In this sense, some works have used
FA to solve many types of problems. Regarding multiobjective characteristic,
Yang [8] was the ﬁrst one to present a multiobjective FA (MOFA) to solve
optimization problems and showed that MOFA has advantages in dealing with
multiobjective optimization.
As far as we know, the application of MOO-based Fireﬂy Algorithm is not
still widely used. There is no work in the literature that uses a multiobjective
FA to select variables in multivariate calibration. Therefore, this paper presents
an implementation of a MOFA for variable selection in multivariate calibra-
tion models. Additionally, estimates from the proposed MOFA are compared
with predictions from the following traditional algorithms: Successive Projec-
tions Algorithm (SPA-MLR) [6], Genetic Algorithm (GA-MLR) [1] and Partial
Least Squares (PLS). Based on the results obtained, we concluded that our pro-
posed algorithm may be a more viable tool for variable selection in multivariate
calibration models.
Section 2 describes multivariate calibration and the original FA. The pro-
posed MOFA is presented in Section 3. Section 4 describes the material and
methods used in the experiments. Results are described in Section 5. Finally,
Section 6 shows the conclusions of the paper.

2 Background
2.1 Multivariate Calibration
The multivariate calibration model provides the value of a quantity y based on
values measured from a set of explanatory variables {x1 , x2 , . . . , xk }T [3]. The
model can be deﬁned as:

y = β0 + β1 x1 + ... + βk xk + ε, (1)
where β0 , β1 , ..., βk , i = 1, 2, ..., k, are the coeﬃcients to be determined, and ε
is a portion of random error. Equation (2) shows how the regression coeﬃcients
may be calculated using the Moore-Penrose pseudoinverse [4]:

β = (XT X)−1 XT y, (2)

where X is the matrix of samples and independent variables, y is the vector of
dependent variables, and β is the vector of regression coeﬃcients.
As shown in Equations (3) and (4), the predictive ability of MLR models
comparing predictions with reference values for a test set from the squared devi-
ations can be calculated by RMSEP or MAPE [3][5]:

N
i=1 (yi − ŷi )
2
RMSEP = , (3)
N
where y is the reference value of the property of interest, N is the number of
observations, and ŷ = {ŷ1 , ŷ2 , . . . , ŷk }T is the estimated value.
276 L.C.M. de Paula and A. da Silva Soares

yi −ŷi ei
| yi | | yi |
M AP E = (100) =(100), (4)
N N
where yi is the actual data at variable i, ŷi is the forecast at variable i, ei is the
forecast error at variable i, and N is the number of samples.

2.2 Firefly Algorithm

Nature-inspired metaheuristics have been a powerful tool in solving various types

of problems [8]. FA is a recently developed optimization algorithm proposed by
Yang [8]. It is based on the behaviour of the flashing characteristics of fireflies.
A pseudocode for the original FA can be obtained in the work of Yang [8]. In the
original algorithm, there are two important issues to be treated: i) the variation
of light intensity; and ii) the attractiveness formulation. The attractiveness of a
firefly is determined by its brightness or light intensity, which is associated with
the encoded objective function [8].
As a firefly’s attractiveness is proportional to the light intensity seen by
adjacent fireflies, one can define the attractiveness ω of a firefly by:
2
ω = ωo e−γr , (5)
where ωo is the attractiveness at r = 0.
According to Yang [8], a firefly i is attracted to a brighter firefly j and its
movement is determined by:
2 1
xi = xj + ω0 e−γri,j (xj − xi ) + α(rand − ), (6)
2
where rand is a random number generated in [0, 1].

3 Proposal
Previous works have showed that multiobjective algorithms can use fewer vari-
ables and obtain lower prediction error [2]. Thus, this paper presents a Multiob-
jective Firefly Algorithm (MOFA) for variable selection in multivariate calibra-
tion. In the multiobjective formulation of FA, the choice of current best solution
is based on two conditions: i) error of prediction; and ii) number of variables
selected. Among non-dominated solutions, it is applied a multiobjective decision
maker method called W ilcoxon Signed-Rank 1 to choose the final best one [2].
Algorithm 1 shows a pseudocode for the proposed MOFA. In line 9 of Algo-
rithm 1, a firefly i dominates another firefly j when its RMSEP/MAPE and
number of variables selected are lower.

1
W ilcoxon Signed-Rank is a nonparametric hypotheses test used when comparing
two related samples to evaluate if the rank of the population means are diﬀerent [7].
Multiobjective Fireﬂy Algorithm for Variable Selection 277

Algorithm 1. Proposed Multiobjective Fireﬂy Algorithm.

1. Parameters: Xn×m , yn×1
2. s ← number of fireflies
3. for n = 1 : M axGeneration
4. Generate randomly a population P ops×m of fireflies
5. Compute Equations (2), (3), and the number of variables selected for each
firefly
6. for i = 1: s
7. for j = 1: s
8. if firefly i-th dominates firefly j-th
9. Move firefly j towards firefly i using Equation (6)
10. end if
11. end for j
12. end for i
13. end for n
14. Calculate RMSEP and variables selected for all fireflies
15. Select the best firefly by a decision maker based in [2]

4 Experimental Results
The proposed MOFA was implemented using α = 0.2, γ = 1 and ω0 = 0.97. The
number of firelies and the number of generations were 200 and 100, respectively.
We have used for RMSEP comparison three traditional methods for variable
selection: SPA-MLR [6], GA-MLR [1] and the PLS. The number of iterations
was the same for all algorithms and the multiobjective approach was not applied
in this three traditional methods.
The dataset employed in this work consists of 775 NIR spectra of whole-kernel
wheat, which were used as shoot-out data in the 2008 International Diffuse
Reflectance Conference (http://www.idrc-chambersburg.org/shootout.html).
Protein content (%) was used as the y-property in the regression calculations.
All calculations were carried out by using a desktop computer with an Intel
Core i7 2600 (3.40 GHz), 8 GB of RAM memory and Windows 7 Professional.
The Matlab 8.1.0.604 (R2013a) software platform was employed throughout.
Regarding the outcomes, it is important to note that all of them were obtained
by averaging fifty executions.

5 Results and Discussion

Figure 1(a) shows the first population of fireflies generated. Figure 1(b) illus-
trates the behaviour of fireflies when a monoobjective formulation is employed.
In the chart, the only goal was to reduce RMSEP.
The application of multiobjective optimization is presented in Figure 2. The
fireflies create a relatively perfect Pareto Front tending to a minimum error value
as well as a minimum number of variables selected. It is possible to note that the
278 L.C.M. de Paula and A. da Silva Soares

390 130
fireflies fireflies
380 120

370 110

Number of variables
Number of variables

360 100

350 90

340 80

330 70

320 60

310 50
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.06 0.065 0.07 0.075 0.08 0.085 0.09 0.095 0.1
RMSEP RMSEP

Fig. 1. Behaviour of fireflies with: (a) randomly generated fireflies.; (b) monoobjective
formulation.

100
fireflies

80
Number of variables

0
0 0.5 1 1.5 2
RMSEP

Fig. 2. Behaviour of ﬁreﬂies with multiobjective optimization.

Table 1. Results for the FA, MOFA, SPA-MLR, GA-MLR and PLS.

Number of variables RMSEP MAPE

GA-MLR 146 0.21 1.50%
FA 103 0.07 0.81%
MOFA 37 0.05 0.72%
PLS 15 0.21 1.50%
SPA-MLR 13 0.20 1.43%

application of multiobjective formulation can move ﬁreﬂies to more appropriate

solutions using the non-dominance characteristic.
A comparison between MOFA and traditional algorithms is showed in
Table 1. Despite SPA-MLR was able to yield the lowest number of variables
Multiobjective Fireﬂy Algorithm for Variable Selection 279

selected, MOFA presented the lowest RMSEP and MAPE2 . A comparison of

computational time between these algorithms can be obtained in [4].

6 Conclusion

This paper proposed a Multiobjective Fireﬂy Algorithm (MOFA) for variable

selection in multivariate calibration models. The objective was to present an
optimization to reduce the error value prediction of the property of interest as
well as reducing the number of variables selected. In terms of error reduction,
MOFA presented the lowest values when compared with traditional algorithms.
Therefore, through the results obtained we were able to demonstrate that MOFA
may be a better solution for obtaining a model with an adequate prediction
capacity.

Acknowledgments. Authors thank the research agencies CAPES and FAPEG for
the support provided to this work.

References
1. Soares, A.S., de Lima, T.W., Soares, F.A.A.M.N., Coelho, C.J., Federson, F.M.,
Delbem, A.C.B., Van Baalen, J.: Mutation-based compact genetic algorithm for
spectroscopy variable selection in determining protein concentration in wheat grain.
Electronics Letters 50, 932–934 (2014)
2. Lucena, D.V., Soares, A.S., Soares, T.W., Coelho, C.J.: Multi-Objective Evolu-
tionary Algorithm NSGA-II for Variables Selection in Multivariate Calibration
Problems. International Journal of Natural Computing Research 3, 43–58 (2012)
3. Martens, H.: Multivariate Calibration. John Wiley & Sons (1991)
4. Paula, L.C.M., Soares, A.S., Soares, T.W., Delbem, A.C.B., Coelho, C.J., Filho,
A.R.G.: Parallelization of a Modified Firefly Algorithm using GPU for Variable
Selection in a Multivariate Calibration Problem. International Journal of Natural
Computing Research 4, 31–42 (2014)
5. Hibon, M., Makridakis, S.: Evaluating Accuracy (or Error) Measures. INSEAD
(1995)
6. Arajo, M.C.U., Saldanha, T.C., Galvo, R.K., Yoneyama, T.: The successive pro-
jections algorithm for variable selection in spectroscopic multicomponent analysis.
Chemometrics and Intelligent Laboratory Systems 57, 65–73 (2001)
7. Ramsey, P.H.: Significance probabilities of the wilcoxon signed-rank test. Journal of
Nonparametric Statistics 2, 133–153 (1993)
8. Yang, X.S.: Multiobjective firefly algorithm for continuous optimization. Engineer-
ing with Computers 29, 175–184 (2013)

2
It is worth noting that the Successive Projections Algorithm is composed of three
phases, and its main objective is to select a subset of variables with low collinear-
ity [6].
Semantic Learning Machine: A Feedforward
Neural Network Construction Algorithm
Inspired by Geometric Semantic Genetic
Programming

Ivo Gonçalves1,2(B) , Sara Silva1,2,3 , and Carlos M. Fonseca1

1
CISUC, Department of Informatics Engineering,
University of Coimbra, 3030-290 Coimbra, Portugal
[email protected]
2
BioISI - Biosystems & Integrative Sciences Institute, Faculty of Sciences,
University of Lisbon, 1749-016 Campo Grande, Lisbon, Portugal
[email protected]
3
NOVA IMS, Universidade Nova de Lisboa, 1070-312 Lisbon, Portugal
[email protected]

Abstract. Geometric Semantic Genetic Programming (GSGP) is a

recently proposed form of Genetic Programming in which the fitness
landscape seen by its variation operators is unimodal with a linear slope
by construction and, consequently, easy to search. This is valid across
all supervised learning problems. In this paper we propose a feedfor-
ward Neural Network construction algorithm derived from GSGP. This
algorithm shares the same fitness landscape as GSGP, which allows an
efficient search to be performed on the space of feedforward Neural Net-
works, without the need to use backpropagation. Experiments are con-
ducted on real-life multidimensional symbolic regression datasets and
results show that the proposed algorithm is able to surpass GSGP, with
statistical significance, in terms of learning the training data. In terms
of generalization, results are similar to GSGP.

1 Introduction
Moraglio et al. [6] recently proposed a new Genetic Programming formulation
called Geometric Semantic Genetic Programming (GSGP). GSGP derives its
name from the fact that it is formulated under a geometric framework [5] and
from the fact that it operates directly in the space of the underlying semantics
of the individuals. In this context, semantics is deﬁned as the outputs of an
individual over a set of data instances. The most interesting property of GSGP
is that the ﬁtness landscape seen by its variation operators is always unimodal
with a linear slope (cone landscape) by construction. This implies that there
are no local optima, and consequently, that this type of landscape is easy to
search. When applied to multidimensional real-life datasets, GSGP has shown
competitive results in learning and generalization [3,7]. In this paper, we adapt

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 280–285, 2015.
DOI: 10.1007/978-3-319-23485-4 28
Semantic Learning Machine 281

the geometric semantic mutation to the realm of feedforward Neural Networks

by proposing the Semantic Learning Machine (SLM). Section 2 deﬁnes the SLM.
Section 3 describes the experimental setup. Section 4 presents and discusses the
results of the SLM and GSGP, and Section 5 concludes.

2 Semantic Learning Machine

Given that the geometric semantic operators are defined over the semantic space
(outputs), they can be extended for different representations. The Semantic
Learning Machine (SLM) proposed in this section is based on a derivation of
the GSGP mutation operator for real-value semantics. This implies that the
SLM shares the same semantic landscape proprieties as GSGP. Particularly, the
fitness landscape induced by its operator is always unimodal with a linear slope
(cone landscape) by construction, and consequently easy to search. This is valid
across all supervised learning problems.

2.1 A Geometric Semantic Mutation Operator for Feedforward

Neural Networks
The GSGP mutation for real-value semantics [6] is deﬁned as follows:

Definition 1. (GSGP Mutation). Given a parent function T : Rn → R, the

geometric semantic mutation with mutation step ms returns the real function
TM = T + ms · (TR1 − TR2 ), where TR1 and TR2 are random real functions.

This mutation essentially performs a linear combination of two individuals:

the parent and a randomly generated tree (which results from subtracting the
two subtrees TR1 and TR2 ). The degree of semantic change is controlled by the
mutation step.
An equivalent geometric semantic mutation operator can be derived for feed-
forward Neural Networks (NN). The only three small restrictions for this NN
mutation operator are: the NN must have at least one hidden layer; the output
layer must have only one neuron; and the output neuron must have a linear acti-
vation function. Each application of the operator adds a new neuron to the last
hidden layer. The weight from the new neuron to the output neuron is deﬁned
by the learning step (SLM parameter). This learning step is the equivalent of
the mutation step in the GSGP mutation. It deﬁnes the amount of semantic
change for each application of the operator. The weights from the last hidden
layer to the previous layer are randomly generated. This is the equivalent of
generating the two random subtrees in the GSGP mutation. In this work these
weights are generated with uniform probability between -1.0 and 1.0. If more
than one hidden layer is used, all other weights remain constant once initialized.
In this work all experiments are conducted with a single hidden layer. The acti-
vation function for the neurons in the last hidden layer can be freely chosen.
However, it has been recently shown, in the context of GSGP, that applying a
structural bound to the randomly generated tree (which results from subtracting
282 I. Gonçalves et al.

the two subtrees TR1 and TR2 ) results in significant improvements in terms of
generalization ability [3]. In fact, if a unbounded mutation (equivalent to using
a linear activation function) is used, there is a tendency for GSGP to greatly
overfit the training data [3]. For this reason, it is recommended that the acti-
vation function for the neurons in the last hidden layer to be a function with
a relatively small codomain. In this work a modified logistic function (trans-
forming the logistic function output to range in the interval [−1, 1]) is used for
this purpose. In terms of generalization ability, it is also essential to use a small
learning/mutation step [3]. If more than one hidden layer is used, the activation
functions for the remaining neurons may be freely chosen.

2.2 Algorithm

The SLM algorithm is essentially a geometric semantic hill climber for feedfor-
ward neural networks. The idea is to perform a semantic sampling with a given
size (SLM parameter) by applying the mutation operator deﬁned in the previ-
ous subsection. As is common in hill climbers, only one solution (in this case a
neural network) is kept along the run. At each iteration, the mentioned semantic
sampling is performed to produce N neighbors. At the end of the iteration, the
best individual from the previous best and the newly generated individuals is
kept. The process is repeated until a given number of iterations (SLM param-
eter) has been reached. As mentioned in the previous subsection, the mutation
operator always adds a new neuron to the last hidden layer, so the number of
neurons in the last hidden layer is at most the same as the number of iterations.
This number of neurons can be smaller than the number of iterations if in some
iterations it was not possible to generate an individual superior to the current
best.

3 Experimental Setup

The experimental setup is based on the setup of Vanneschi et al. [7] and
Gonçalves et al. [3], since these works recently provided results for GSGP. Exper-
iments are run for 500 iterations/generations because that is where the statistical
comparisons were made in the mentioned works. 30 runs are conducted. Popu-
lation/sample size is 100. Training and testing set division is 70% - 30%. Fitness
is computed as the root mean squared error. The initial tree initialization is
performed with the ramped half-and-half method, with a maximum depth of 6.
Besides GSGP, the Semantic Stochastic Hill Climber (SSHC) [6] is also used as
baseline for comparison. The variation operators used are the variants deﬁned
for real-value semantics [6]: SGXM crossover for GSGP, and SGMR mutation
for GSGP and SSHC. For GSGP a probability of 0.5 is used for both opera-
tors. The function set contains the four binary arithmetic operators: +, -, *,
and / (protected). No constants are used in the terminal set. Parent selection in
GSGP is based on tournaments of size 4. Also for GSGP, survivor selection is
elitist as the best individual always survives to the next generation. All claims
Semantic Learning Machine 283

of statistical signiﬁcance are based on Mann-Whitney U tests, with Bonferroni

correction, and considering a significance level of α = 0.05. For each dataset 30
different random partitions are used. Each method uses the same 30 partitions.
Experiments are conducted on three multidimensional symbolic regression real-
life datasets. These datasets are the Bioavailability (hereafter Bio), the Plasma
Protein Binding (hereafter PPB), and the Toxicity (hereafter LD50). The first
two were also used by Vanneschi et al. [7] and Gonçalves et al. [3]. These datasets
have, respectively: 359 instances and 241 features; 131 instances and 626 features;
and 234 instances and 626 features. For a detailed description of these datasets
the reader is referred to Archetti et al. [1]. These datasets have also been used
in other Genetic Programming studies, e.g., [2,4].

4 Experimental Study
Figure 1 presents the training and testing error evolution plots for SLM, GSGP
and SSHC. These evolution plots are constructed by taking the median over 30
runs of the training and testing error of the best individuals in the training data.
The mutation/learning step used was 1 for the the Bio and PPB datasets (as in
Vanneschi et al. [7] and Gonçalves et al. [3]), and 10 for the LD50 as it was found,
in preliminary testing, to be a suitable value (other values tested were: 0.1, 1, and
100). A consideration for the different initial values (at iteration/generation 0)
is in order. The SLM presents much higher errors than GSGP/SSHC after the
random initialization. This is explained by the fact that the weights for the SLM
are generated with uniform probability between -1.0 and 1.0, and consequently, the
amount of data fitting is clearly bounded. On the other hand, GSGP and SSHC
have no explicit bound on the random trees and therefore can provide a superior
initial explanation of the data. It is interesting to note that, despite this initial
disadvantage, the SLM compensates with a much higher learning rate. This higher
learning efficiency is confirmed by the statistically significant superiority found in
terms of training error across all datasets, against GSGP (p-values: Bio 2.872 ×
10−11 , PPB 2.872 × 10−11 and LD50 7.733 × 10−10 ), and against SSHC (p-values:
Bio 2.872 × 10−11 , PPB 2.872 × 10−11 and LD50 3.261 × 10−5 ).
This learning superiority is particularly interesting when considering that
the SLM and the SSHC use the exact same geometric semantic mutation oper-
ator. This raise the question: how can two methods with the same variation
operator, the same induced semantic landscape, and the same parametrizations
achieve such different outcomes? The answer lies in the different semantic dis-
tributions that result from the random initializations. Different representations
have different natural ways of being randomly initialized. This translates into
different semantic distributions and, consequently, to different offspring distri-
butions. From the results it is clear that the distribution induced by the random
initialization of a list of weights (used in SLM), is more well-behaved than the
initialization of a random tree (used in SSHC). In the original GSGP proposal,
Moraglio et al. [6] provided a discussion on whether syntax (representation)
matters in terms of search. They argued that, in abstract, the offspring distri-
butions may be affected by the different syntax initializations. In our work, we
284 I. Gonçalves et al.

80 80

60 60
Training error

Testing error
SLM
40 40 GSGP
SSHC

20 20

0 0
0 100 200 300 400 500 0 100 200 300 400 500
Iterations / generations Iterations / generations
80 80

60 60
Training error

Testing error SLM

40 40 GSGP
SSHC

20 20

0 0
0 100 200 300 400 500 0 100 200 300 400 500
Iterations / generations Iterations / generations

2600 2600
Training error

2400 2400
Testing error

SLM
2200 2200 GSGP
SSHC
2000 2000

1800 1800
0 100 200 300 400 500 0 100 200 300 400 500
Iterations / generations Iterations / generations

Fig. 1. Bio (top), PPB (center) and LD50 (bottom) training and testing error evolution
plots

can empirically see how different representations induce different offspring dis-
tributions and consequently reach considerably different outcomes. A possible
research venue lies in analyzing the semantic distributions induced by different
tree initialization methods, and to possibly propose new tree initializations that
are more well-behaved.
In terms of generalization, results show that all methods achieve similar
results. The only statistically significant difference shows that the SLM is supe-
rior to GSGP in the Bio dataset (p-value: 1.948 × 10−4 ). However, it seems
that in this case, GSGP is still evolving and that in a few more generations
may reach a generalization similar to the SLM. On a final note, the evolution
plots also show that SSHC consistently learns the training data faster and better
than GSGP. This should be expected as the semantic space has no local optima
and consequently the search can be focused around the best individual in the
population. These differences are confirmed as statistically significant (p-values:
Bio 2.872 × 10−11 , PPB 2.872 × 10−11 and LD50 1.732 × 10−4 ). There are no
statistically significant differences in terms of generalization.
Semantic Learning Machine 285

5 Conclusions
This work presented a novel feedforward Neural Network (NN) construction algo-
rithm, derived from Geometric Semantic Genetic Programming (GSGP). The
proposed algorithm shares the same fitness landscape as GSGP, which enables
an efficient search for any supervised learning problem. Results in regression
datasets show that the proposed NN construction algorithm is able to surpass
GSGP, with statistical significance, in terms of learning the training data. Gen-
eralization results are similar to those of GSGP. Future work involves extending
the experimental analysis to other regression datasets and to provide results for
classification tasks. Comparisons with other NN algorithms and other commonly
used supervised learning algorithms (e.g. Support Vector Machines) are also in
order.

Acknowledgments. This work was partially supported by national funds through

FCT under contract UID/Multi/04046/2013 and projects PTDC/EEI-CTP/2975/2012
(MaSSGP), PTDC/DTP-FTO/1747/2012 (InteleGen) and EXPL/EMS-SIS/1954/
2013 (CancerSys). The ﬁrst author work is supported by FCT, Portugal, under the
grant SFRH/BD/79964/2011.

References
1. Archetti, F., Lanzeni, S., Messina, E., Vanneschi, L.: Genetic programming for
computational pharmacokinetics in drug discovery and development. Genetic
Programming and Evolvable Machines 8(4), 413–432 (2007)
2. Gonçalves, I., Silva, S.: Balancing learning and overfitting in genetic program-
ming with interleaved sampling of training data. In: Krawiec, K., Moraglio, A.,
Hu, T., Etaner-Uyar, A.Ş., Hu, B. (eds.) EuroGP 2013. LNCS, vol. 7831, pp. 73–84.
Springer, Heidelberg (2013)
3. Gonçalves, I., Silva, S., Fonseca, C.M.: On the generalization ability of geometric
semantic genetic programming. In: Machado, P., Heywood, M.I., McDermott, J.,
Castelli, M., Garcı́a-Sánchez, P., Burelli, P., Risi, S., Sim, K. (eds.) Genetic Pro-
gramming. LNCS, vol. 9025, pp. 41–52. Springer, Heidelberg (2015)
4. Gonçalves, I., Silva, S., Melo, J.B., Carreiras, J.M.B.: Random sampling tech-
nique for overfitting control in genetic programming. In: Moraglio, A., Silva, S.,
Krawiec, K., Machado, P., Cotta, C. (eds.) EuroGP 2012. LNCS, vol. 7244,
pp. 218–229. Springer, Heidelberg (2012)
5. Moraglio, A.: Towards a Geometric Unification of Evolutionary Algorithms. Ph.D.
thesis, Department of Computer Science, University of Essex, UK, November 2007
6. Moraglio, A., Krawiec, K., Johnson, C.G.: Geometric semantic genetic program-
ming. In: Coello, C.A.C., Cutello, V., Deb, K., Forrest, S., Nicosia, G., Pavone, M.
(eds.) PPSN 2012, Part I. LNCS, vol. 7491, pp. 21–31. Springer, Heidelberg (2012)
7. Vanneschi, L., Castelli, M., Manzoni, L., Silva, S.: A new implementation of geo-
metric semantic GP and its application to problems in pharmacokinetics. In:
Krawiec, K., Moraglio, A., Hu, T., Etaner-Uyar, A.Ş., Hu, B. (eds.) EuroGP 2013.
LNCS, vol. 7831, pp. 205–216. Springer, Heidelberg (2013)
Eager Random Search for Differential Evolution
in Continuous Optimization

Miguel Leon(B) and Ning Xiong

Mälardalen University, Västerås, Sweden

{miguel.leonortiz,ning.xiong}@mdh.se

Abstract. This paper proposes a memetic computing algorithm by

incorporating Eager Random Search (ERS) into diﬀerential evolution
(DE) to enhance its search ability. ERS is a local search method that
is eager to move to a position that is identiﬁed as better than the cur-
rent one without considering other opportunities. Forsaking optimality of
moves in ERS is advantageous to increase the randomness and diversity
of search for avoiding premature convergence. Three concrete local search
strategies within ERS are introduced and discussed, leading to variants
of the proposed memetic DE algorithm. The results of evaluations on
a set of benchmark problems have demonstrated that the integration of
DE with Eager Random Search can improve the performance of pure DE
algorithms while not incurring extra computing expenses.

Keywords: Evolutionary algorithm · Diﬀerential evolution · Eager

Random Search · Memetic algorithm · Optimization

1 Introduction
Differential evolution [1] presents a class of metaheuristics [2] to solve real param-
eter optimization tasks with nonlinear and multimodal objective functions. DE
has been used as very competitive alternative in many practical applications due
to its simple and compact structure, easy use with fewer control parameters, as
well as high convergence in large problem spaces. However, the performance of
DE is not always excellent to ensure fast convergence to the global optimum. It
can easily get stagnation resulting in low precision of acquired results or even
failure.
Hybridization of EAs with local search (LS) techniques can greatly improve
the efficiency of the search. EAs that are augmented with LS for self-refinement
are called Memetic Algorithms (MAs) [3]. Memetic computing has been used
with DE to refine individuals in their neighborhood. Norman and Iba [4] pro-
posed a crossover-based adaptive method to generate offspring in the vicinity of
parents. Many other works apply local search mechanisms to certain individuals
of every generation to obtain possibly even better solutions, see examples in ([5],
[6], [7]). Other Researchers investigate adaptation of control parameters of DE
to improve the performance ([8], [9]).

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 286–291, 2015.
DOI: 10.1007/978-3-319-23485-4 29
Eager Random Search for Differential Evolution in Continuous Optimization 287

This paper proposes a new memetic DE algorithm by incorporating Eager

Random Search (ERS) to enhance the performance of a conventional DE algo-
rithm. ERS is a local search method that is eager to move to a position that is
identiﬁed as better than the current one without considering other opportuni-
ties in the neighborhood. This is diﬀerent from common local search methods
such as gradient descent or hill climbing which seek local optimal actions during
the search. Forsaking optimality of moves in ERS is advantageous to increase
randomness and diversity of search for avoiding premature convergence. Three
concrete local search strategies within ERS are introduced and discussed, lead-
ing to variants of the proposed memetic DE algorithm. In addition, only a small
subset of randomly selected variables is used in every step of the local search for
randomly deciding the next trial point. The results of tests on a set of benchmark
problems have demonstrated that the hybridization of DE with Eager Random
Search can bring improvement of performance compared to pure DE algorithms
while not incurring extra computing expenses.

2 Basic DE
DE is a stochastic and population based algorithm with Np individuals in the
population. Every individual in the population stands for a possible solution to
the problem. One of the Np individuals is represented by Xi,g with i = 1, 2, ., Np
and g is the index of the generation. DE has three consecutive steps in every
iteration: mutation, recombination and selection. The explanation of these steps
is given below:
MUTATION. Np mutated individuals are generated using some individuals
of the population. The vector for the mutated solution is called mutant vec-
tor and it is represented by Vi,g . There are some ways to mutate the current
population, but the most common one is called random mutation strategy. This
mutation strategy will be explained below. The other mutation strategies and
their performance are given in [10].

Vi,g = Xr1 ,g + F × (Xr2 ,g − Xr3 ,g ) (1)

where Vi,g represents the mutant vector, i stands for the index of the vector, g
stands for the generation, r1 , r2 , r3 ∈ { 1,2,. . . ,Np } are random integers and F
is the scaling factor in the interval [0, 2].
CROSSOVER. In step two we recombine the set of mutated solutions created
in step 1 (mutation) with the original population members to produce trial
solutions. A new trial vector is denoted by Ti,g where i is the index and g is the
generation. Every parameter in the trial vector is calculated with equation 2.

Vi,g [j] if rand[0, 1] < CR or j = jrand
Ti,g [j] = (2)
Xi,g [j] otherwise
where j stands for the index of every parameter in a vector, CR is the probability
of the recombination and jrand is a randomly selected integer in [1, Np ] to ensure
that at least one parameter from the mutant vector is selected.
288 M. Leon and N. Xiong

SELECTION. In this last step we compare the fitness of a trial vector with
the fitness of its parent in the population with the same index i, selecting the
individual with the better fitness to enter the next generation. So we compare

3 DE Integrated with ERS

This section is devoted to the proposal of the memetic DE algorithm with inte-
grated ERS for local search. We will ﬁrst introduce ERS as a general local search
method together with its three concrete (search) strategies, and then we shall
outline how ERS can be incorporated into DE to enable self-reﬁnement of indi-
viduals inside a DE process.

3.1 Eager Random Local Search (ERS)

The main idea of ERS is to immediately move to a randomly created new posi-
tion in the neighborhood without considering other opportunities as long as this
new position receives a better fitness score than the current position. This is dif-
ferent from some other conventional local search methods such as Hill Climbing
in which the next move is always to the best position in the surroundings. For-
saking optimality of moves in ERS is beneficial to achieve more randomness and
diversity of search for avoiding local optima. Further, in exploiting the neighbor-
hood, only a small subset of randomly selected variables undergoes changes to
randomly create a trial solution. If this trial solution is better, it simply replaces
the current one. Otherwise a new trial solution is generated with other randomly
selected variables. This procedure is terminated when a given number of trial
solutions have been created without finding improved ones.
The next more detailed issue with ERS is how to change a selected variable
in making a trial solution in the neighborhood. Our idea is to solve this issue
using a suitable probability function. We consider three probability distributions
(uniform, normal, and Cauchy) as alternatives for usage when generating a new
value for a selected parameter/variable. The use of different probability distribu-
tions lead to different local search strategies within the ERS family, which will
be explained in the sequel.

Random Local Search (RLS). In Random Local Search (RLS), we simply

use a uniform probability distribution when new trial solutions are created given
a current solution. To be more speciﬁc, when dimension k is selected for change,
the trial solution X will get the following value on this dimension regardless of
its initial value in the current solution:

X [k] = rand(ai , bi ) (3)

where rand(ai , bi ) is a uniform random number between ai and bi , and ai and
bi are the minimum and maximum values respectively on dimension k.
Eager Random Search for Diﬀerential Evolution in Continuous Optimization 289

Normal Local Search (NLS). In Normal Local Search (NLS), we create a new
trial solution by disturbing the current solution in terms of a normal probability
distribution. This means that, if dimension k is selected for change, the value on
this dimension for trial solution X will be given by

X [k] = X[k] + N (0, δ) (4)

where N (0, δ) represents a random number generated according to a normal
density function with its mean being zero.

Cauchy Local Search (CLS). In this third local search strategy, we apply
the Cauchy density function in creating trial solutions in the neighborhood. It is
called Cauchy Local search (CLS). A nice property of the Cauchy function is that
it is centered around its mean value whereas exhibiting a wider distribution than
the normal probability function. The value of trial solution X will be generated
as follows:

X [k] = X[k] + t × tan(π × (rand(0, 1) − 0.5)) (5)

where rand(0, 1) is a random uniform number between 0 and 1.

3.2 The Proposed Memetic DE Algorithm

Here with we propose a new memetic DE algorithm by combining basic DE with

Eager Random Search (ERS). ERS is applied in each generation after complet-
ing the mutation, crossover and selection operators. The best individual in the
population is used as the starting point when ERS is executed. If ERS termi-
nates with a better solution, it is inserted into the population and the current
best member in the population is discarded.
We use DERLS, DENLS, and DECLS to refer to the variants of the proposed
memetic DE algorithm that adopt RLS, NLS, and CLS respectively as local
search strategies.

4 Experiments and Results

To examine the merit our proposed memetic DE algorithm compared to basic

DE, we tested the algorithms in thirteen benchmark functions [10]. Functions 1
to 7 are unimodal and functions 8 to 13 are multimodal functions that contain
many local optima.

4.1 Experimental Settings

DE has three main control parameters: population size (Np ), crossover rate (CR)
and the scaling factor (F ) for mutation. The following speciﬁcation of these
parameters was used in the experiments: N p = 60, CR = 0.85 and F = 0.9. All
290 M. Leon and N. Xiong

the algorithms were applied to the benchmark problems with the aim to ﬁnd the
best solution for each of them. Every algorithm was executed 30 times on every
function to acquire a fair result for the comparison. The condition to ﬁnish the
execution of DE programs is that the error of the best result found is below 10e-8
with respect to the true minimum or the number of evaluations has exceeded
300,000. In DECLS, t = 0.2.

4.2 Performance of the Memetic DE with Random Mutation

Strategy

First, random mutation strategy (DE/rand/1) was used in all DE approaches to

study the eﬀect of the ERS local search strategies in the memetic DE algorithm.
The results can be observed in Table 1 and the values in boldface represent the
lowest average error found by the approaches.

Table 1. Average error of the found solutions on the test problems with random
mutation strategy

F. DE DERLS DENLS DECLS

f1 0,00E+00 (4,56E-14) 0,00E+00 (6,80E-13) 0,00E+00 (1,21E-13) 0,00E+00 (1,33E-14)
f2 1,82E-08 (1,13E-08) 5,30E-08 (2,39E-08) 2,26E-08 (1,32E-08) 1,42E-08 (1,07E-08)
f3 6,55E+01 (3,92E+01) 8,01E+01 (4,88E+01) 1,11E+00 (1,10E+00) 6,54E-01 (1,47E+01)
f4 6,22E+00 (5,07E+00) 2,37E+00 (1,87E+00) 1,80E-02 (7,28E-01) 5,66E-01 (3,68E-01)
f5 2,31E+01 (2,00E+01) 2,27E+01 (1,81E+01) 2,65E+01 (2,41E+01) 2,03E+01 (2,62E+01)
f6 0,00E+00 (0,00E+00) 0,00E+00 (0,00E+00) 0,00E+00 (0,00E+00) 0,00E+00 (0,00E+00)
f7 1,20E-01 (3,79E-03) 1,15E-02 (3,16E-03) 1,23E-02 (3,29E-03) 1,05E-02 (3,61E-03)
f8 2,72E+03 (8,15E+02) 2,31E+02 (1,50E+02) 1,86E+03 (5,46E+02) 1,58E+03 (5,16E+02)
f9 1,30E+01 (3,70E+00) 1,28E+01 (3,72E+00) 6,17E+00 (2,06E+00) 7,72E+00 (2,43E+00)
f10 1,88E+01 (4,28E+00) 1,87E+00 (4,76E+00) 4,94E+00 (8,11E+00) 5,50E+00 (7,53E+00)
f11 8,22E-04 (2,49E-03) 8,22E-04 (2,49E-03) 1,49E-02 (2,44E-02) 1,44E-02 (2,70E-02)
f12 3,46E-03 (1,86E-02) 3,46E-03 (1,86E-02) 1,04E-02 (3,11E-02) 0,00E+00 (8,49E-15)
f13 3,66E-04 (1,97E-03) 0,00E+00 (2,14E-13) 0,00E+00 (1,31E-14) 0,00E+00 (3,37E-15)

We can see in Table 1 that DECLS is the best in all the unimodal functions
except on Function 4 that is the second best. In multimodal functions, DERLS
is the best on Functions 8, 10 and 11. DECLS found the exact optimum all the
times in Functions 12 and 13. The basic, DE performed the worst in multimodal
functions. According to the above analysis, we can say that DECLS improve a
lot the performance of basic DE with random mutation strategy and also we
found out that DERLS is really good in multimodal functions particularly on
Function 8, which is the most diﬃcult function. Considering all the functions,
the best algorithm is DECLS and the weakest one is the basic DE.

5 Conclusions

In this paper we propose a memetic DE algorithm by incorporating Eager Ran-

dom Search (ERS) as a local search method to enhance the search ability of a
pure DE algorithm. Three concrete local search strategies (RLS, NLS, and CLS)
Eager Random Search for Diﬀerential Evolution in Continuous Optimization 291

are introduced and explained as instances of the general ERS method. The use
of diﬀerent local search strategies from the ERS family leads to variants of the
proposed memetic DE algorithm, which are abbreviated as DERLS, DENLS and
DECLS respectively. The results of the experiments have demonstrated that the
overall ranking of DECLS is superior to the ranking of basic DE and other
memetic DE variants considering all the test functions. In addition, we found
out that DERLS is much better than the other counterparts in very diﬃcult
multimodal functions.

Acknowledgment. The work is funded by the Swedish Knowledge Foundation (KKS)

grant (project no 16317). The authors are also grateful to ABB FACTS, Prevas and
VOITH for their co-ﬁnancing of the project.

References
1. Storn, R., Price, K.: Differential evolution - a simple and efficient heuristic for
global optimization over continuous spaces. Journal of Global Optimization 11(4),
341–359 (1997)
2. Xiong, N., Molina, D., Leon, M., Herrera, F.: A walk into metaheuristics for engi-
neering optimization: Principles, methods, and recent trends. International Journal
of Computational Intelligence Systems 8(4), 606–636 (2015)
3. Krasnogor, N., Smith, J.: A tutorial for competent memetic algorithms: Model, tax-
onomy, and design issue. IEEE Transactions on Evolutionary Computation 9(5),
474–488 (2005)
4. Norman, N., Ibai, H.: Accelerating differential evolution using an adaptative local
search. IEEE Transactions on Evolutionary Computation 12, 107–125 (2008)
5. Ali, M., Pant, M., Nagar, A.: Two local search strategies for differential evolution.
In: Proc. 2010 IEEE Fifth International Conference on Bio-Inspired Computing:
Theories and Applications (BIC-TA), Changsha, China, pp. 1429–1435 (2010)
6. Dai, Z., Zhou, A.: A diferential ecolution with an orthogonal local search. In:
Proc. 2013 IEEE Congress on Evolutionary Computation (CEC), Cancun, Mexico,
pp. 2329–2336 (2013)
7. Leon, M., Xiong, N.: Using random local search helps in avoiding local optimum in
diefferential evolution. In: Proc. Artificial Intelligence and Applications, AIA2014,
Innsbruck, Austria, pp. 413–420 (2014)
8. Qin, A., Suganthan, P.: Self-adaptive differential evolution algorithm for numerical
optimization. In: The 2005 IEEE Congress on Evolutionary Computation, vol. 2,
pp. 1785–1791 (2005)
9. Leon, M., Xiong, N.: Greedy adaptation of control parameters in differential evo-
lution for global optimization problems. In: IEEE Conference on Evolutionary
Computation, CEC2015, Japan, pp. 385–392 (2015)
10. Leon, M., Xiong, N.: Investigation of mutation strategies in differential evolution for
solving global optimization problems. In: Rutkowski, L., Korytkowski, M., Scherer,
R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2014, Part I. LNCS,
vol. 8467, pp. 372–383. Springer, Heidelberg (2014)
Learning from Play: Facilitating Character
Design Through Genetic Programming
and Human Mimicry

Swen E. Gaudl1(B) , Joseph Carter Osborn2 , and Joanna J. Bryson1

1
Department of Computer Science, University of Bath, Bath, UK
[email protected], [email protected]
2
Baskin School of Engineering, University of California SC, Santa Cruz, USA
[email protected]

Abstract. Mimicry and play are fundamental learning processes by

which individuals can acquire behaviours, skills and norms. In this paper
we utilise these two processes to create new game characters by mim-
icking and learning from actual human players. We present our app-
roach towards aiding the design process of game characters through
the use of genetic programming. The current state of the art in game
character design relies heavily on human designers to manually create
and edit scripts and rules for game characters. Computational creativ-
ity approaches this issue with fully autonomous character generators,
replacing most of the design process using black box solutions such as
neural networks. Our GP approach to this problem not only mimics
actual human play but creates character controllers which can be further
authored and developed by a designer. This keeps the designer in the loop
while reducing repetitive labour. Our system also provides insights into
how players express themselves in games and into deriving appropriate
models for representing those insights. We present our framework and
preliminary results supporting our claim.

Keywords: Agent design · Machine learning · Genetic programming ·

Games

1 Introduction
Designing intelligence is a suﬃciently complex task that it can itself be aided by
the proper application of AI techniques. Here we present a system that mines
human behaviour to create better Game AI. We utilise genetic programming
(GP) to generalise from and improve upon human game play. More importantly,
the resulting representations are amenable to further authoring and develop-
ment. We introduce a GP system for evolving game characters by utilising
recorded human play. The system uses the platformerAI toolkit, detailed in
section 3, and the Java genetic algorithm and genetic programming package
(JGAP) [6]. JGAP provides a system to evolve agents when given a set of com-
mand genes, a ﬁtness function, a genetic selector and an interface to the target

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 292–297, 2015.
DOI: 10.1007/978-3-319-23485-4 30
Learning from Play 293

application. Thereafter, our system generates players by creating and evolving

Java program code which is fed into the platformerAI toolkit and evaluated
using our player-based ﬁtness function.
The rest of this paper is organised as follows. In section 2 we describe how our
system derives from and improves upon the start of the art. Section 3 describes
our system and its core components, including details of our ﬁtness function. We
conclude our work by describing our initial results and possible future work.

2 Background and Related Work

In practice, making a good game is achieved by a good concept and long iterative
cycles in refining mechanics and visuals, a process which is resource consuming.
It requires a large number of human testers to evaluate the qualities of a game.
Thus, analysing tester feedback and incrementally adapting games to achieve
better play experience is tedious and time consuming. This is where our approach
comes into play by trying to minimise development, manual adaptation and
testing time, yet allow the developer to remain in full control.
Agent Design initially no more than creating 2D shapes on the screen, e.g.
the aliens in SpaceInvaders. Due to early hardware limitations, more complex
approaches were not feasible. With more powerful computers it became feasible
to integrate more complex approaches from science. In 2002 Isla introduced the
BehaviourTree (BT) for the game Halo, later elaborated by Champandard
[2]. BT has become the dominant approach in the industry. BTs are a combi-
nation of a decision tree (DT) with a pre-defined set of node types. A related
academic predecessor of the BT were the Posh dynamic plans of Bod [1,3].
Generative Approaches [4,7] build models to create better and more appeal-
ing agents. In turn, a generative agent uses machine learning techniques to
increase its capabilities. Using data derived from human interaction with a
game—referred to as human play traces—can allow the game to act on or re-
act to input created by the player. By training on such data it is possible to
derive models able to mimic certain characteristics of players. One obvious dis-
advantage of this approach is that the generated model only learns from the
behaviour exhibited in the data provided to it. Thus, interesting behaviours are
not accessible because they were never exhibited by a player.
In contrast to other generative agent approaches [7,9,15] our system combines
features which allow the generation and development of truly novel agents. The
first is the use of un-authored recorded player input as direct input into our
fitness function. This allows the specification of agents only by playing. The
second feature is that our agents are actual programs in the form of java code
which can be altered and modified after evolving into a desired state, creating
a white box solution. While Stanley and Miikkulainen[13] use neural networks
(NN) to create better agents and enhance games using Neuroevolution, we utilise
genetic programming [10] for the creation and evolution of artificial players in
human readable and modifiable form. The most comparable approach is that
of Perez et al.[9] which uses grammar based evolution to derive BTs given an
294 S.E. Gaudl et al.

initial set and structure of subtrees. In contrast, we start with a clean slate to
evolve novel agents as directly executable programs.

3 Setting and Environment

Evolutionary algorithms have the potential to solve problems in vast search

spaces, especially if the problems require multi-parameter optimisation [11, p.2].
For those problems humans are generally outperformed by programs [12]. Our
GP approach uses a pool of program chromosomes P and evolves those in the
form of decision trees (DTs) exploring the possible solution space. For our exper-
iments the platformerAI toolkit (http://www.platformersai.com) was used.
It consists of a 2D platformer game, similar to existing commercial products and
contains modules for recording a player, controlling agents and modifying the
environment and rules of the game.
The Problem Space is defined by all actions an agent can perform. Within
the game, agent A has to solve the complex task of selecting the appropriate
action each given frame. The game consists of A traversing a level which is not
fully observable. A level is 256 spatial units long and A should traverse it left
to right. Each level contains objects which act in a deterministic way. Some
of those objects can alter the player’s score, e.g. coins. Those bonus objects
present a secondary objective. The goal of the game, move from start to finish,
is augmented with the objective of gaining points. A can get points by collecting
objects or jumping onto enemies. To make it comparable to the experience of
similar commercial products we use a realistic time frame in which a human
would need to solve a level, 200 time units. The level observability is limited to
a 6 × 6 grid centred around the player, cf. Perez et al.[9].
Agent Control is handled through a 6-bit vector C: lef t, right, up, down,
jump and shoot|run. The vector is required each frame, simulating an input
device. However, some actions span more than one frame. This is a simple task
for a human but quite complex to learn for an agent. One such example, the high
jump, requires the player to press the jump button for multiple frames. Our sys-
tem has a gene for each element of C plus 14 additional genes formed of five gene
types: sensory information about the level or agent, executable actions, logical
operators, numbers and structural genes. All those are combined on creation
time into a chromosome represented as a DT using the grammar underlying
the Java language. Structural genes allow the execution of n genes in a fixed
sequence, reducing the combinatorial freedom provided by Java.
Evaluation of Fitness in our system is done using the Gamalyzer-based play
trace metric which determines the fitness of individual chromosomes based on
human traces as an evaluation criterion. For finding optimal solutions to a prob-
lem statistical fitness functions offer near-optimal results when optimality can
be defined. We are interested in understanding and modelling human-like or
human-believable behaviour in games. There is no known algorithm for measur-
ing how human-like behaviour is; identifying this may even be computationally
intractable. A near-best solution for the problem space of finding the optimal
Learning from Play 295

way through a level was given by Baumgarten [14] using the A∗ algorithm. This
approach produces agents which are extremely good at winning the level within
a minimum amount of time but at the same time are clearly distinguishable
from actual human players. For games and game designers a less distinguishable
approach is normally more appealing—based on our initial assumptions.

4 Fitness Function

Based on the biological concept of selection, all evolutionary systems require

some form of judgement about the quality of a specific individual—the fitness
value of the entity. Our Player Based Fitness (PBF) uses multiple traces of
human, th , and agent, ta , players to derive a fitness value by judging their simi-
larity. For that purpose we integrate the Gamalyzer Metric—a game independent
measurement of the difference between two play traces. It is based on the syn-
tactic edit distance ddis between pairs of sequences of player inputs [8]. It takes
pairs of sequences of events gathered during a game play along with designer-
provided rules for comparing individual events and yields a numerical value in
[0, 1]. Identical traces have distance ddis = 0 and incomparably different traces
ddis = 1. Gamalyzer finds the least expensive way to turn one play trace into
another by repeatedly deleting an event from the first trace, inserting an event
of the second trace into the first trace, or changing an event of the first trace
into an event of the second trace. The game designer or analyst must also pro-
vide a comparison function which describes the difficulty of changing one event
into another. The other important feature of Gamalyzer, warp window ω, is a
constraint that prevents early parts of the first trace from comparing against
late parts of the second. This is important for correctness (a running leap at the
beginning of the level has a very different connotation from a running leap at
the pole at the end of each stage). For our purpose, only the input commands
players use to control the agent are encoded—the six commands introduced ear-
lier. This allows us to compare against direct controller input for future studies
and to help designers sitting in front of the controls analysing the resulting char-
acter program. The PBF currently offers two parameters: the chunk size, cpf ,
and the warp window size, ω. The main advantage over a pure statistical fitness
function is that a designer can feed our system specific play traces of human
players without having to modify implicit values of a fitness score.
To make a stronger emphasis on playing the game well, we create a multi-
objective problem using an aggregation function g to take Δd—the moved
distance—and the fitness f (a) for an agent using the playerbased metric PBF
into account, see formula (1). Using g we were able to put equal focus on
the trace metric, fptm ∈ [0 . . . 1] ⊂ R, and the advancement along the game,
Δd ∈ [0 . . . 256] ⊂ N.

f (a) = g(fptm (ta , th ), Δd) (1)

296 S.E. Gaudl et al.

5 Preliminary Results and Future Work

Using our experimental configuration and the PBF fitness function we are now
able to execute, evaluate and compare platformerAI agents against human traces.
We are using the settings supplied in table 1. As a selection mechanism, the
weighted roulette wheel is used and we additionally preserve the fittest individual
of a generation. We use single point tree branch crossover on two selected parent
chromosomes and expose the resulting child to a single point mutation before it is
put into the new generation. Figure 1 illustrates the convergence of the program
pool against the global optimum. Good solutions are on average reached after
700 generations, when an agent finishes the given level. Our first experiments
show that our approach is able to train on and converge against raw human play
traces without stopping at local optima, visible in the two dents of the averaged
fitness (black) diverging from the fittest individual (red). A next step would be to
investigate the generated modifiable programs further and analyse their benefit
in understanding players better. However, our current solution already offers a
way to design agents for a game by simply playing it and creating learning agents
from those traces. Other possible directions could be expansion of the model
underlying Gamalyzer to model specific events within the game rather than
pure input actions. This should provide interesting feedback and offer a better
matching of expressed player behaviour and model generation. Our current agent
model consists of an unweighted tree representation containing program genes.
Currently subtrees are not taken into consideration when calculating the fitness
of an individual. By including those weights it would be possible to narrow
down the search space of good solutions for game characters dramatically, also
potentially reducing the bloat of the DT. So, to enhance the quality of our

Player Based Evolution Table 1. GP parameters used in our sys-

tem.
250
Parameter Value
200 Initial Population Size 100
Selection Weighted Roulette
Wheel
fitness

150 Genetic Operators Branch Typing

CrossOver and Single
Point Mutation
100
Initial Operator proba- 0.6 crossover, 0.2 new
bilities chromosomes, 0.01
50 mutation, fixed
Survival Elitism
Function Set if else, not, &&,
0 ||, sub, IsCoinAt,
0 2500 5000 7500 10000 IsEnemyAt,
generation IsBreakAbleAt,
...
Fig. 1. The evolved agents’ fitness using Terminal Set Integers [-6,6], ←,
PBF (10000 generations), in red the fittest →, ↓, IsT all, Jump,
Shoot, Run W ait,
individuals, in black the averaged fitness CanJump, CanShoot,
of all agents per generation. ...
Learning from Play 297

reproduction component we believe it might be interesting to investigate the

applicability of behavior-programming for GP (BPGP) [5] into our system.

References
1. Bryson, J.J., Stein, L.A.: Modularity and design in reactive intelligence. In: Pro-
ceedings of the 17th International Joint Conference on Artificial Intelligence,
pp. 1115–1120. Morgan Kaufmann, Seattle, August 2001
2. Champandard, A.J.: AI Game Development. New Riders Publishing (2003)
3. Gaudl, S.E., Davies, S., Bryson, J.J.: Behaviour oriented design for real-time-
strategy games - an approach on iterative development for starcraft ai. In: Proceed-
ings of the Foundations of Digital Games, pp. 198–205. Society for the Advance-
ment of Science of Digital Games (2013)
4. Holmgard, C., Liapis, A., Togelius, J., Yannakakis, G.: Evolving personas for player
decision modeling. In: 2014 IEEE Conference on Computational Intelligence and
Games (CIG), pp. 1–8, August 2014
5. Krawiec, K., O’Reilly, U.M.: Behavioral programming: a broader and more detailed
take on semantic gp. In: Proceedings of the 2014 Conference on Genetic and Evo-
lutionary Computation, pp. 935–942. ACM (2014)
6. Meffert, K., Rotstan, N., Knowles, C., Sangiorgi, U.: Jgap-java genetic algorithms
and genetic programming package, September 2000. http://jgap.sf.net (last viewed:
January 2015)
7. Ortega, J., Shaker, N., Togelius, J., Yannakakis, G.N.: Imitating human playing
styles in super mario bros. Entertainment Computing 4(2), 93–104 (2013)
8. Osborn, J.C., Mateas, M.: A game-independent play trace dissimilarity metric. In:
Proceedings of the Foundations of Digital Games. Society for the Advancement of
Science of Digital Games (2014)
9. Perez, D., Nicolau, M., O’Neill, M., Brabazon, A.: Evolving behaviour trees for
the mario ai competition using grammatical evolution. In: Di Chio, C., et al. (eds.)
EvoApplications 2011, Part I. LNCS, vol. 6624, pp. 123–132. Springer, Heidelberg
(2011)
10. Poli, R., Langdon, W.B., McPhee, N.F., Koza, J.R.: A field guide to genetic pro-
gramming. Lulu. com (2008)
11. Schwefel, H.P.P.: Evolution and optimum seeking: the sixth generation. John Wiley
& Sons, Inc. (1993)
12. Smit, S.K., Eiben, A.E.: Comparing parameter tuning methods for evolution-
ary algorithms. In: IEEE Congress on Evolutionary Computation, CEC 2009,
pp. 399–406. IEEE (2009)
13. Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting
topologies. Evolutionary Computation 10, 99–127 (2002)
14. Togelius, J., Karakovskiy, S., Baumgarten, R.: The 2009 mario ai com-
petition. In: 2010 IEEE Congress on Evolutionary Computation (CEC),
pp. 1–8. IEEE (2010)
15. Togelius, J., Yannakakis, G., Karakovskiy, S., Shaker, N.: Assessing believability.
In: Hingston, P. (ed.) Believable Bots, pp. 215–230. Springer, Heidelberg (2012)
Memetic Algorithm for Solving the 0-1
Multidimensional Knapsack Problem

Abdellah Rezoug1(B) , Dalila Boughaci2 , and Mohamed Badr-El-Den3

1
Department of Computer Science, University Mhamed Bougarra of Boumerdes,
Boumerdes, Algeria
[email protected]
2
FEI, Department of Computer Science Beb-Ezzouar, USTHB, Algiers, Algeria
[email protected], dalila [email protected]
3
School of Computing, Faculty of Technology, University of Portsmouth,
Portsmouth, UK
[email protected]

Abstract. In this paper, we propose a memetic algorithm for the Mul-

tidimensional Knapsack Problem (MKP). First, we propose to combine
a genetic algorithm with a stochastic local search (GA-SLS), then with
a simulated annealing (GA-SA). The two proposed versions of our app-
roach (GA-SLS and GA-SA) are implemented and evaluated on bench-
marks to measure their performance. The experiments show that both
GA-SLS and GA-SA are able to ﬁnd competitive results compared to
other well-known hybrid GA based approaches.

Keywords: Multidimensional knapsack problem · Stochastic local

search · Genetic algorithm · Simulated annealing · Local search ·
Memetic algorithm

1 Introduction
The Multidimensional Knapsack Problem (MKP) is a strong NP-hard combi-
natorial optimization problem [14]. The MKP has been extensively considered
because of its theoretical importance and wide range of applications. Many prac-
tical engineering design problems can be formulated as MKP such as: the capital
budgeting problem [17], the project selection [2] and so on.
The solutions for MKP can be classiﬁed into exact, approximate and hybrid.
The exact solutions are used for problems of small size. Branch and bound,
branch and cut, linear, dynamic and quadratic programming, etc. are the princi-
pal exact methods used for solving MKP [13,21]. The approximate solutions are
used when the data size is high but it is not sure to obtain the optimal results.
They are mainly based on heuristics such as: simulated annealing, tabu search,
genetic algorithm, ant colony particle swarm, harmony search, etc [5,6,20]. The
hybrid solutions combine two or more exact or/and approximate solutions. These
solutions are the most used in the ﬁeld of optimization and especially for MKP
such as [4,8–12,18] and so on.

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 298–304, 2015.
DOI: 10.1007/978-3-319-23485-4 31
Memetic Algorithm for Solving the 0-1 Multidimensional Knapsack Problem 299

In this paper, we propose a memetic algorithm for MKP. We developed two

versions of our method for MKP. The ﬁrst denoted GA-SLS is a GA combined
with the stochastic local search (SLS) [3]. The second denoted GA-SA is a com-
bination of GA with the simulated annealing (SA) [16]. The two versions are
implemented and evaluated on some well-known benchmarks for MKP where
the sizes of benchmarks arrange from small to large. A comparative study is
done with a pure GA and some algorithms for MKP. The objective is to show
the impact of the local search in the performance of the memetic approach.
The rest of the paper is organized as follows. Section 2 gives the MKP model.
The proposed Memetic approaches are detailed in Section 3. Section 4 describes
the experiments. Finally, Section 5 concludes the paper.

2 The Multidimensional Knapsack Problem

The MKP is composed of N items and a knapsack with m diﬀerent capacities bi

where i ∈ {1, . . . , m}. Each item j where j ∈ {1, . . . , n} has a proﬁt cj and can
take aij of the capacity i of the knapsack. The goal is to pack the items in the
knapsack so as to maximize the proﬁts of items without exceeding the capacities
of the knapsack. The MKP is modeled as the following integer program:

n
Maximize cj xj (1)
j=1

n
Subject to : aij xj ≤ bi i ∈ {1 . . . m} (2)
j=1

xj ∈ {0, 1} j ∈ {1 . . . n}

3 The Proposed Approaches for MKP

Two versions of memetic approach have been studied. The first one is the Genetic
Algorithm-Stochastic Local Search (GA-SLS) where GA is combined with SLS.
The second one is the Genetic Algorithm-Simulated Annealing (GA-SA) which is
GA combined with SA. The structures of both approaches are similar in the GA
part. The difference is in the local search. The GA-SLS applies SLS while GA-
SA applies SA. Their process consists in: Create the initial population P using
the Random Key method (RK)[1] and initialize Q = {}, N I and T = T0 (T for
GA-SA only). Select two parents X1 and X2 that are the two best individuals
in P and X1 , X2 ∈ Q. Exchange N CB items between the parents X1 and X2
to produce two new infeasible offspring X1 and X2 , then if conflict exists in X1
or X2 , repeatedly remove either the worst items or an item chosen randomly
according to a probability rp. Push the two parents X1 and X2 in Q. Apply
the local search (SLS for GA-SLS or SA for GA-SA) on offspring X1 , X2 . Find
the best individuals Xbest in P and replace randomly a number of items in X1
and X2 by items in Xbest . If the quality of X1 and X2 is better than the two
300 A. Rezoug et al.

Algorithm 1. GA-SLS Algorithm.

Require: An MKP instance, N I and Q = φ.
Ensure: An best solution found X ∗ .
1: Create the initial population P by the RK method.
2: for (Cpt = 1 to N I) do
3: Selection of the two best individuals X1 , X2 in P and X1 , X2 ∈ Q.
4: Crossover X1 , X2 to produce oﬀsprings X1 , X2
5: Repair oﬀsprings X1 , X2
6: Apply the local search method on X1 , X2
7: Mutation on X1 , X2 with Xbest of P
8: Xworst ←− the worst individual in P
9: if (f (X1 ) > f (Xworst )) then
10: P = P − {Xworst }
11: P = P ∪ {X1 }
12: end if
13: Xworst ←− the worst individual in P
14: if (f (X2 ) > f (Xworst )) then
15: P = P − {Xworst }
16: P = P ∪ {X2 }
17: end if
18: Q = Q ∪ {X1 , X2 }
19: end for
20: Return the best individual found.

worst individuals in P , then they replace them. If the number of iterations N I

is not attend then go to STEP 2.. Otherwise return the best individual in P .
The GA-SLS and GA-SA can be expressed by Algorithm 1.

4 The Experiments

GA, GA-SA and GA-SLS were implemented in C++ on 2 GHz Intel Core 2
Duo processor and 2 GB RAM. They were tested on the OR-Library [22] 54
benchmarks, with m = 2 to 30 and n = 6 to n = 105 and on the OR-Library
GK [22] with m = 15 to m = 50 and n = 100 to n = 1500. In all experiments the
parameters are chosen empirically such as: the number of iteration N I = 30000,
the population size P S = 100, the waiting time W T = 50, the number of crossing
bites N CB = 1/10. the initial temperature T0 = 50, the walk probability wp =
0.93, the number of local iteration N = 100 and the number of runs is 30.

Results for the SAC-94 Standard Instances. The average fitness (Result),
the average gap (GAP ), the best (Best) and the worst fitness (Worst), the
number of success runs (NSR), the number of success instance (NSI ) and the rate
of success runs (RSR) have been recorded by analyzing the recorded obtained
fitness. Also, the average CPU runtime (Time) has been calculated. All the
results and statistics computed by the GA, GA-SA and GA-SLS are reported
in Tables 1-2. From results, GA resolved to optimality one instance of 54 with
average gap of 4,454 %, GA-SA 35 instances with a global gap of 0,093 % and GA-
SLS 39 instances with a global gap of 0,0221 %. GA-SLS reached the optimum
at least once in 50 instances followed by GA-SA in 49 instances then GA in 18
instances. The RSR show that GA-SLS totally solved instances of groups hp, pb
Memetic Algorithm for Solving the 0-1 Multidimensional Knapsack Problem 301

Table 1. Comparison of GA, GA-SA and GA-SLS on SAC-94 datasets.

GA GA-SA GA-SLS
Dataset Opt Result GAP Result GAP Result GAP
hp 3418 3381,07 1,080 3418 0 3418 0
3186 3120,63 2,052 3186 0 3186 0
Average 3302 3250,85 1,566 3302 0 3302 0
3090 3060,27 0,962 3090 0 3090 0
3186 3139,13 1,471 3186 0 3186 0
pb 95168 93093,5 2,180 95168 0 95168 0
2139 2079,93 2,762 2139 0 2139 0
776 583,767 24,772 776 0 776 0
1035 1018,13 1,630 1035 0 1035 0
Average 17565,666 17162,454 5,629 17565,666 0 17565,666 0
87061 86760,1 0,346 87061 0 87061 0
4015 4015 0 4015 0 4015 0
pet 6120 6091 0,474 6120 0 6120 0
12400 12380,3 0,159 12400 0 12400 0
10618 10560,9 0,538 10609,1 0,084 10608,6 0,089
16537 16373,9 0,986 16528,1 0,054 16528,3 0,053
Average 22791,833 22696,866 0,417 22788,866 0,023 22788,816 0,024
sento 7772 7606,03 2,135 7772 0 7772 0
8722 8569,7 1,746 8721,2 0,009 8722 0
Average 8247 8087,865 1,941 8246,6 0,005 8247 0
141278 141263 0,011 141278 0 141278 0
130883 130857 0,020 130883 0 130883 0
95677 94496,2 1,234 95677 0 95677 0
weing 119337 118752 0,490 119337 0 119337 0
98796 97525,3 1,286 98796 0 98796 0
130623 130590 0,025 130623 0 130623 0
1095445 1086484,2 0,818 1094579,6 0,079 1095432,7 0,0011
624319 581683 6,829 623727 0,095 624319 0
Average 304545,375 297707,062 1,339 304362,625 0,022 304543,22 0,0001
4554 4530,03 0,526 4554 0 4554 0
4536 4506,77 0,644 4536 0 4536 0
4115 4009,37 2,567 4115 0 4115 0
4561 4131,07 9,426 4561 0 4561 0
4514 4159,73 7,848 4514 0 4514 0
5557 5491,73 1,175 5557 0 5557 0
5567 5428,37 2,490 5567 0 5567 0
5605 5509,43 1,705 5605 0 5605 0
5246 5104,5 2,697 5246 0 5246 0
6339 6014,23 5,123 6339 0 6339 0
5643 5234,33 7,242 5643 0 5643 0
6339 5916 6,673 6339 0 6339 0
6159 5769,5 6,324 6159 0 6159 0
6954 6495,6 6,592 6954 0 6954 0
weish 7486 6684,6 10,705 7486 0 7486 0
7289 6878,4 5,633 7289 0 7289 0
8633 8314,73 3,687 8629,5 0,041 8633 0
9580 9146,5 4,525 9559,63 0,213 9568,63 0,119
7698 7223,17 6,168 7698 0 7698 0
9450 8632,1 8,655 9448,63 0,014 9449,37 0,007
9074 8114,4 10,575 9073,23 0,008 9073,33 0,007
8947 8321,17 6,995 8926,73 0,227 8938,83 0,091
8344 7603,77 8,871 8321,97 0,264 8318,93 0,3
10220 9685,77 5,227 10152,9 0,657 10164,2 0,546
9939 9077,9 8,664 9900,07 0,392 9910,73 0,284
9584 8728,87 8,922 9539,4 0,465 9560,53 0,245
9819 8873,7 9,627 9777,9 0,419 9802,03 0,173
9492 8653,57 8,833 9423,87 0,718 9442,17 0,525
9410 8466,67 10,025 9359,5 0,537 9369,5 0,430
11191 10250,1 8,408 11106,3 0,757 11128,7 0,557
Average 7394,833 6898,536 6,218 7379,386 0,157 7386,772 0,109
302 A. Rezoug et al.

Table 2. Results of NSR, RSR and Time parameters obtained by GA, GA-SA and
GA-SLS.
GA GA-SA GA-SLS
NSR RSR Time NSR RSR Time NSR RSR Time
hp 1 1,67 1,798 2 100 6,077 2 100 7,101
pb 4 3,33 1,811 6 100 3,443 6 100 4,681
pet 4 32,78 1,395 6 81,67 11,296 6 80,55 12,179
sento 0 0,00 2,616 2 66,67 46,584 2 100 24,277
weing 6 39,58 1,669 8 76,25 10,259 8 99,58 10,586
weish 3 4,55 1,620 25 66,89 17,352 26 76,88 17,146
Average 18 13,65 1,818 49 81,91 15,835 50 92,83 12,662

Table 3. Results of the approaches test on the GK dataset.

Dataset GA GA-SA GA-SLS

Instance Optimal Result Gap Result Gap Result Gap
1 3766 3673,5 2,456 3704,3 1,638 3704,2 1,641
2 3958 3860,7 2,458 3894,8 1,596 3897,7 1,523
3 5656 5511,5 2,554 5538,8 2,072 5535,7 2,127
4 5767 5630,6 2,365 5655,2 1,938 5655,4 1,935
5 7560 7351,3 2,76 7395,1 2,181 7391,3 2,231
6 7677 7505,7 2,231 7528,4 1,935 7528,1 1,939
7 19220 18612,1 3,162 18691 2,752 18692,4 2,745
8 18806 18330,2 2,53 18393 2,196 18392,4 2,199
9 58091 56198,5 3,257 56371,1 2,96 56381,4 2,943
10 57295 55837,9 2,543 55959,3 2,331 55961,9 2,326
Average 18779,6 18251,2 2,632 18313,1 2,484 18314,05 2,479

and sento followed by GA-SA. GA-SLS obtained a total RSR better than GA-
SA (92,83% and 81,91%, respectively). At the same time, GA-SA and GA-SLS
widely surpass GA (13,65%). RSR shows that hybridization of GA with SA has
improved the success rate of 79,18% and its hybridization with SLS of 68,49%.
From Table 2, GA is the fastest with an global average CPU time of 1.818 sec.

Results for the Ten Large Instances. From results on the GK shown in
Table 3 GA-SA has the best value of Result and GAP for 1, 3, 5, 6 and 8
instances. GA-SLS has the best value of Result and GAP for instances 2, 4, 7,
9 and 10. Global, GA-SLS has the best performance for all instances with an
total average GAP of 2.479 %. GA-SA has almost the same performance with
average GAP of 2.484 %. Also, GA is not very far from GA-SA and GA-SLS
with an total average GAP of 2.632 %.

Comparison with Other GA Approaches. We compared results of the

proposed GA-SA and GA-SLS to other approaches. The results of the KHBA
[15], COTRO [7], TEVO [19], CHEBE [6] and HGA [10] were obtained from [10].
From Table 4 GA-SA and GA-SLS gave improved results compared to KHBA,
COTRO and TEVO, for almost all instances. GA-SA and GA-SLS were able to
ﬁnd the optimal solutions to 6, and 3 of 7 problems respectively. Furthermore,
GA-SA performs results quite similar to CHEBE and HGA.
Memetic Algorithm for Solving the 0-1 Multidimensional Knapsack Problem 303

Table 4. Comparison of GA-SA and GA-SLS with some GA-based approaches.

KHBA COTRO TEVO CHBE HGA GA-SA GA-SLS

problem Optimum Sol A. Sol A. Sol A. Sol A. Sol A. Sol A. Sol A.
sento1 7772 7626 7767,9 7754,2 7772 7772 7772 7772
sento2 8722 8685 8716,3 8719,5 8722 8722 8721,2 8722
weing7 1095445 1093897 1095296,1 1095398,1 1095445 1095445 1094579,6 1095432,7
weing8 624319 613383 622048,1 622021,3 624319 624319 623727 624319
weish23 8344 8165,1 8245,8 8286,7 8344 8344 8321,97 8344
hp1 3418 3385,1 3394,3 3401,6 3418 3418 3418 3418
pb2 3186 3091 3131,2 3112,5 3186 3186 3186 3186

5 Conclusion
In this paper we addressed the multidimensional knapsack problem (MKP). We
proposed, compared and tested two combinations: GA-SLS and GA-SA. GA-SLS
combines the genetic algorithm and the stochastic local search (SLS) while GA-
SA uses the simulated annealing (SA) instead of SLS. The experiments have
shown the performance of our methods for MKP. Also, the hybridization of
GA with local search methods allows to greatly improving its performance. As
perspectives, we plan to study the impact of local search method when used with
other evolutionary approaches such as: harmony search and particle swarm.

References
1. Bean, J.C.: Genetics and random keys for sequencing and optimization. ORSA
Journal of Computing 6(2), 154–160 (1994)
2. Beaujon, G.J., Martin, S.P., McDonald, C.C.: Balancing and optimizing a portfolio
of R&D projects. Naval Research Logistics 48, 18–40 (2001)
3. Boughaci, D., Benhamou, B., Drias, H.: Local Search Methods for the Optimal
Winner Determination Problem in Combinatorial Auctions. Math. Model. Algor.
9(1), 165–180 (2010)
4. Chih, M., Lin, C.J., Chern, M.S., Ou, T.Y.: Particle swarm optimization with
time-varying acceleration coeﬃcients for the multidimensional knapsack problem.
Applied Mathematical Modelling 38, 1338–1350 (2014)
5. Cho, J.H., Kim, Y.D.: A simulated annealing algorithm for resource-constrained
project scheduling problems. Operational Research Society 48, 736–744 (1997)
6. Chu, P., Beasley, J.: A Genetic Algorithm for the Multidimensional Knapsack
Problem. Heuristics 4, 63–86 (1998)
7. Cotta, C., Troya, J.: A Hybrid Genetic Algorithm for the 0–1 Multiple Knapsack
problem. Artiﬁcial Neural Nets and Genetic Algorithm 3, 250–254 (1994)
8. Deane, J., Agarwal, A.: Neural, Genetic, And Neurogenetic Approaches For Solv-
ing The 0–1 Multidimensional Knapsack Problem. Management & Information
Systems - First Quarter 2013 17(1) (2013)
9. Della Croce, F., Grosso, A.: Improved core problem based heuristics for the 0–1
multidimensional knapsack problem. Comp. & Oper. Res. 39, 27–31 (2012)
10. Djannaty, F., Doostdar, S.: A Hybrid Genetic Algorithm for the Multidimensional
Knapsack Problem. Contemp. Math. Sciences 3(9), 443–456 (2008)
11. Feng, L., Ke, Z., Ren, Z., Wei, X.: An ant colony optimization approach for the
multidimensional knapsack problem. Heuristics 16, 65–83 (2010)
304 A. Rezoug et al.

12. Feng, Y., Jia, K., He, Y.: An Improved Hybrid Encoding Cuckoo Search Algo-
rithm for 0–1 Knapsack Problems. Computational Intelligence and Neuroscience,
ID 970456 (2014)
13. Fukunaga, A.S.: A branch-and-bound algorithm for hard multiple knapsack prob-
lems. Annals of Operations Research 184, 97–119 (2011)
14. Garey, M.R., Johnson, D.S.: Computers and intractability: A guide to the theory
of NP-completeness. W. H. Freeman & Co, New York (1979)
15. Khuri, S., Bäck, T., Heitkötter, J.: The zero-one multiple knapsack problem and
genetic algorithms. In: Proceedings of the ACM Symposium on Applied Comput-
ing, pp. 188–193 (1994)
16. Kirkpatrick, S., Gelatt, C.D., Vecchi, P.M.: Optimization By Simulated Annealing.
Science 220, 671–680 (1983)
17. Meier, H., Christoﬁdes, N., Salkin, G.: Capital budgeting under uncertainty-an
integrated approach using contingent claims analysis and integer programming.
Operations Research 49, 196–206 (2001)
18. Tuo, S., Yong, L., Deng, F.: A Novel Harmony Search Algorithm Based on
Teaching-Learning Strategies for 0–1 Knapsack Problems. The Scientiﬁc World
Journal Article ID 637412, 19 pages (2014)
19. Thiel, J., Voss, S.: Some Experiences on Solving Multiconstraint Zero-One Knap-
sack Problems with Genetic Algorithms. INFOR 32, 226–242 (1994)
20. Vasquez, M., Vimont, Y.: Improved results on the 0–1 multidimensional knapsack
problem. Eur. J. Oper. Res. 165, 70–81 (2005)
21. Yoon, Y., Kim, Y.H.: A Memetic Lagrangian Heuristic for the 0–1 Multidimen-
sional Knapsack Problem. Discrete Dynamics in Nature and Society, Article ID
474852, 10 pages (2013)
22. http://people.brunel.ac.uk/∼mastjjb/jeb/orlib/mknapinfo.html
Synthesis of In-Place Iterative Sorting
Algorithms Using GP: A Comparison Between
STGP, SFGP, G3P and GE

David Pinheiro(B) , Alberto Cano, and Sebastián Ventura

Department of Computer Science and Numerical Analysis, University of Córdoba,

Córdoba, Spain
{dpinheiro,acano,sventura}@uco.es

Abstract. This work addresses the automatic synthesis of in-place,

iterative sorting algorithms of quadratic complexity. Four approaches
(Strongly Typed Genetic Programming, Strongly Formed Genetic Pro-
gramming, Grammar Guided Genetic Programming and Grammatical
Evolution) are analyzed and compared considering their performance
and scalability with relation to the size of the primitive set, and conse-
quently, of the search space. Performance gains, provided by protecting
composite data structure accesses and by another layer of knowledge
into strong typing, are presented. Constraints on index assignments to
grammar productions are shown to have a great performance impact.

Keywords: Automatic algorithm synthesis · Genetic programming ·

Sorting

1 Introduction
This work compares four approaches, namely Strongly Typed Genetic Pro-
gramming (STGP), Strongly Formed Genetic Programming (SFGP), Grammar
Guided Genetic Programming (G3P) and Grammatical Evolution (GE), at evolv-
ing sorting algorithms. Special emphasis is given on their ability to scale well in
spite of bigger primitive sets.
We restrict ourselves to iterative (non-recursive), in-place, comparison based
sorting algorithms, expecting quadratic running times (O(n2 )) and constant
(O(1)) additional memory. We make no assumptions about stability and adapt-
ability of the evolved algorithms. Bloat analysis and solution will be left for
future work.

2 Implementation and Experimental Context

The experiments were done on top of the EpochX [1] GP java library and were
designed to be as fair as possible. However our goal is to get a grasp of the

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 305–310, 2015.
DOI: 10.1007/978-3-319-23485-4 32
306 D. Pinheiro et al.

Table 1. GP parameters used in the experiments

STGP SFGP G3P GE

Population size 500
Initialization Grow initializer
Selection Tournament selection with a small size of 2 to prevent a lack of pop-
ulation diversity and to have a small selection pressure
Crossover operator Subtree Crossover Whigham One point
Crossover Crossover
Crossover probability 90%
Mutation operator Subtree Mutation Whigham Point Mutation
Mutation
Mutation probability 10%
Elitism Only one individual, to keep the best genome seen so far but with
minimal impact on genetic diversity
Reproduction No reproduction. 90% of individuals are obtained by crossover and
the remaining 10% by mutation
Max Initial Depth* 10 24
Max Depth* 10 32
Number of generations 50
Number of Runs 500
Fitness Function Levenshtein Distance
* As there is no easy direct relation between tree sizes in strongly typed and grammar guided
GP (since the first uses expression trees in which all nodes contribute to the computation and
the second uses parse trees in which only leafs contribute) several maximum tree depths were
tested and the ones that showed the best results, for the bigger primitive sets, were chosen.

scalability related to the primitive set size, therefore we will not delve into fine
tuning the choices made for operators and values (shown in Table 1).
The nonterminals used are shown in Table 2. In the first three columns a
number is attributed to every syntactic element to help understand the results,
followed by the name and a description. The node data type and the needed child
data types, used by STGP and SFGP, and the node type and child node types,
used by SFGP, fulfill the last four columns. The approaches based on grammars
do not need to define data or node types, for the very grammar contains the
specification of the requirements and restrictions on the types of data and syn-
tactic form of the solutions. The strongly typed approaches were designed to
achieve side effects, to change global variables, for that reason the Statement
nonterminals return data type Void. SFGP nonterminal nodes belong to only
three supertypes, namely CodeBlock, Statement and Expression. The implemen-
tation uses polymorphism then whenever it is required to use an Expression, for
example, any Expression subtype can be used. Only two terminals were used,
the minimum for quadratic algorithms, which in practice work as indexes to
array elements (next section tests the use of more terminals). Using insight from
human-made algorithms, the loops were restricted to a small set of widely used
variants, like looping from a specified position of the array, ascending or descend-
ing. To prevent infinite loops, a limit of 100 cycles in each loop was set.
Synthesis of In-Place Iterative Sorting Algorithms Using GP 307

Table 2. Non-terminal syntactic elements

# Name Description Data Child Node Child node types

type data super-
types type
1 Swap Swap two given array elements Void Array, Int, Statement Variable, Expression,
Int Expression
2 For Each Loop for each element of the array Void Array, Int, Statement Variable, Variable,
Void CodeBlock
3 If Then Conditional statement Void Bool, Void Statement Expression, CodeBlock
4 Less Than Relational less than operator, < Bool Int, Int Expression Expression, Expression
5 For Each From Loop for each array element, start- Void Array, Int, Statement Variable, Variable,
ing from a given position Int, Void Variable, CodeBlock
6 Swap Next Swap with the next element Void Array, Int Statement Variable, Expression
7 And Logical AND operator Bool Bool, Bool Expression Expression, Expression
8 Decreasing For Decreasing loop, for each array ele- Void Array, Int, Statement Variable, Variable,
Each From ment starting from a given position Int, Void Variable, CodeBlock
9 Swap Previous Swap with the previous element Void Array, Int Statement Variable, Expression
10 If Then Else Conditional statement with the Void Bool, Statement Expression, Code-
’else’ clause Void, Void Block, CodeBlock
11 Greater Than Relational greater than operator, > Bool Int, Int Expression Expression, Expression
12 Lesser Than or Relational less than or equal opera- Bool Int, Int Expression Expression, Expression
Equal tor, <=
13 Greater Than Relational greater than or equal Bool Int, Int Expression Expression, Expression
or Equal operator, >=
14 Equal Relational equality, == Bool Int, Int Expression Expression, Expression
15 Unequal Relational inequality, != Bool Int, Int Expression Expression, Expression
16 Not Logical NOT operator Bool Bool Expression Expression
17 Or Logical OR operator Bool Bool, Bool Expression Expression, Expression
18 Decreasing For Decreasing loop, for each array ele- Void Array, Int, Statement Variable, Variable,
Each ment from the last to the first Void CodeBlock

The experiments use sets of 4, 5, 7, 10 and 18 nonterminal primitives. They

were chosen in order to give a perspective of the asymptotic behavior. Set sizes
correspond to the numbers on the first column of Table 2. For example the set
of size 7 uses the nonterminals numbered 1 to 7, inclusive. Starting from the set
of size 4 it is possible to obtain a simple Selection Sort algorithm and thereafter
Bubble Sort and Insertion Sort.
The fitness function is defined by the Levenshtein Distance (an error measure
that needs to be minimized). Every evolved program runs against the five arrays
presented in [2]. If the program correctly sorts all of them, it runs against 30
arrays of random sizes between 10 and 20, filled with random integers between
0 and 100.

3 Experiments, Results and Analysis

To analyze the experiments we used the minimal computational eﬀort required

to find a solution with 99% confidence, presented in [3], but without the ceil-
ing operator, as suggested in [4]. Confidence intervals at 95% (shown between
parentheses in Table 3) were calculated using the Wilson score method [5].
Performance and scalability. The performance and scalability of the
approaches with growing primitive set sizes are shown in Figure 1 and com-
piled in the lines titled UAA (Unprotected Array Accesses) in Table 3. In our
setup, SFGP and G3P provided the best performance. Both G3P and GE reveal
308 D. Pinheiro et al.

the particularity that the set of size 18 presents more useful constructs to evolve
sorting algorithms than the set of size 10.
Protection against out of bounds array accesses. A recurrent situation that
happens when using indexed data types, for example arrays, is that the indexes
can get out of bounds of the data type when used inside some of the loops, causing
run-time exceptions. In this experiment we protected against out of bounds array
accesses, using the % (mod) operator against the size of the array, to ensure that
the index is always in the correct bounds, and obtained the results presented in
Figure 2 and Table 3. The important positive impact that this tweak had on the
performance and scalability of the approaches can be ascertained by comparing
lines named PAA (Protected Array Accesses) and UAA.
Influence of grammar context insensitivity. Context-free Grammars (CFG),
used in G3P and GE, show a lack of expressiveness to describe semantic con-
straints1 [6]. Their context insensitivity can have an appreciable negative impact
on the size of the search space, especially in the presence of loops and swaps that
repeatedly require the same index. As our system doesn’t allow us to specify that
certain terminal (index) assignments should be repeated in a given nonterminal,
we obtained the same result changing the grammar so that the rules which
require more than one index are split into two, one specifically for the index i,
another specifically for the index j2 . This acts as a kind of context sensitivity,
forcing these constructs to always correctly match the indexes. The results, pre-
sented in Figure 3 and lines PAACS (Protected Array Accesses with Context
Sensitivity) of Table 3, attest that this has a huge positive impact on the per-
formance of grammar guided approaches, even to the point that almost all runs
produce a correct individual.
Performance and scalability with bigger terminal sets. The last experiment
tests the performance and scalability of SFGP in the presence of bigger terminal
sets, the same number of terminals as nonterminals, between 4 and 18 of each.
The terminals consist of integers of node type Variable. From Figure 4 and line
PAANT (Protected Array Accesses with N Terminals) of Table 3 one can see that
the number of terminals has an important negative impact in the performance
but nevertheless SFGP remains scalable, showing almost the same performance
for sets of size 10 and 18. This gives us confidence in the introduction of more
data types in the evolutionary process, such as trees, graphs, stacks, etc., with
the important goal of evolving not-in-place algorithms.
1
For example, to define a loop, the grammar can state
<for>::= for(<index> = 0; <index> < array.length; <index>++);
<index>::= i | j | k;
which can be evaluated as
for(i = 0; j < array.length; k++){}
or any other combination of indexes that give an infinite loop. This situation could be
overcome by the evolutionary system, indicating that the second and third indexes
should be the same as the first.
2
For example, the for loop was subdivided into one loop for each index:
<loop i>::= for(i = 0; i < array.length; i++){}
<loop j>::= for(j = 0; j < array.length; j++){}
Synthesis of In-Place Iterative Sorting Algorithms Using GP 309

Table 3. Non-terminal syntactic elements

Nonterminal Set Size

4 5 7 10 18
53,506 51,172 157,263 334,924 469,402
UAA (47874-60134) (33872-77608) (139132-178635) (284776-395599) (387056-571624)

SFGP 10,800 9,364 38,537 142,002 90,467

PAA (8862-13218) (7777-11322) (26893-55438) (122458-165396) (80221-102535)

52,907 79,542 340,904 1,224,842 1,353,136

PAANT (54206-67532) (70970-89627) (288677-404302) (914425-1647128) (995924-1845724)

349,348 274,308 797,004 1,509,921 1,957,807

UAA (296183-413822) (235619-320754) (627793-1015894) (1093099-2093891) (1357643-2834324)
STGP
39,955 46,810 258,519 291,612 227,370
PAA (27707-57843) (31529-69768) (217208-308988) (239267-356876) (191936-270492)

8,109 11,380 75,596 342,219 270,186

UAA (7170-9304) (10212-12801) (45923-124924) (258684-454526) (210357-348421)

G3P 9,079 8,988 66,565 290,221 235,359

PAA (7560-10950) (10212-12801) (45923-124924) (220009-384361) (184705-301109)

741 741 1,642 13,576 25,606

PAACS (579-1026) (579-1026) (1477-1837) (10901-16978) (19052-34550)

16,265 20,147 66,565 1,209,531 1,015,323

UAA (12812-20733) (15475-26335) (41660-106773) (734764-1998785) (731689-1414445)

GE 7,317 10,436 33,723 163,316 229,105

PAA (6192-8682) (8588-12733) (24067-47439) (79315-337577) (98122-537000)

741 741 1,308 11,747 15,751

PAACS (579-1026) (579-1026) (1177-1464) (9565-14486) (12451-20007)

Fig. 1. Unprotected Array Accesses Fig. 2. Protected Array Accesses

Fig. 3. Context Free vs Context Sensitive Fig. 4. Use of n terminals in SFGP with
index assignment Protected Array Accesses
310 D. Pinheiro et al.

4 Conclusions and Future Work

In general all the approaches showed good promise for Automatic Algorithm
Synthesis, always converging in their scalability, and responding positively in
terms of performance to the techniques introduced to reduce the search space.
SFGP revealed eﬀective consistent performance and scalability improvements
over STGP in all experiments. We argue that the introduction of types of nodes,
in addition to data types, functions as another layer of restrictions on the search
space, and causes a substantial and structural reduction of its size.
Protected accesses to the array, in order to prevent index out of bounds
exceptions, revealed overall performance gains. The use of constraints on index
assignment in the grammar rules of Context Free Grammars had a great positive
impact on the performance and scalability of G3P and GE. The introduction
of the same number of terminals as nonterminals revealed that, although the
performance worsens, SFGP maintains its scalability. This result requires further
analysis, but increases our conﬁdence in the possibility of introducing high level
abstract data structures to support the synthesis of not-in-place algorithms.
Future work will include: Implement the divide and conquer paradigm (recur-
sion), along with a technique to assess algorithm performance, in order to enable
and encourage the evolution of faster (O(n log n)) sorting algorithms; Introduce
composite data structures as terminals, to enable the evolution of not-in-place
algorithms; Add nonterminals obtained from algorithms developed by humans,
to allow the application of GP to other areas beyond sorting; Analyze and reduce
bloat, applying strategies to obtain compact unbloated algorithms.

Acknowledgments. This research was supported by the Spanish Ministry of Science

and Technology, project TIN2014-55252-P, and by FEDER funds. This research was
also supported by the Spanish Ministry of Education under FPU grant AP2010-0042.
We thank Tom Castle for kindly giving access to his SFGP code.

References
1. Otero, F., Castle, T., Johnson, C.: Epochx: genetic programming in java withstatis-
tics and event monitoring. In: Proceedings of the 14th Annual Conference Compan-
ion on Genetic and Evolutionary Computation, pp. 93–100 (2012)
2. ONeill, M, Nicolau, M., Agapitos, A.: Experiments in program synthesis withgram-
matical evolution: a focus on Integer Sorting. In: IEEE Congress on Evolutionary
Computation (CEC), pp. 1504–1511 (2014)
3. Koza, J.R.: Genetic programming: on the programming of computers by means of
natural selection, vol. 1. MIT press (1992)
4. Christensen, S., Oppacher, F.: An analysis of koza’s computational effort statis-
tic for genetic programming. In: Foster, J.A., Lutton, E., Miller, J., Ryan, C.,
Tettamanzi, A.G.B. (eds.) EuroGP 2002. LNCS, vol. 2278, pp. 182–191. Springer,
Heidelberg (2002)
5. Walker, M., Edwards, H., Messom, C.H.: Confidence intervals for computational
effort comparisons. In: Ebner, M., O’Neill, M., Ekárt, A., Vanneschi, L., Esparcia-
Alcázar, A.I. (eds.) EuroGP 2007. LNCS, vol. 4445, pp. 23–32. Springer, Heidelberg
(2007)
6. Orlov, M., Sipper, M.: FINCH: a system for evolving Java (bytecode). In: Genetic
Programming Theory and Practice VIII, Springer, pp. 1–16 (2011)
Computational Methods in
Bioinformatics and Systems Biology
Variable Elimination Approaches
for Data-Noise Reduction in 3D QSAR Calculations

Rafael Dolezal1,2(), Agata Bodnarova3, Richard Cimler3, Martina Husakova3,

Lukas Najman1, Veronika Racakova1, Jiri Krenek1, Jan Korabecny1,2,
Kamil Kuca1,2, and Ondrej Krejcar1
1
Center for Basic and Applied Research, Faculty of Informatics and Management,
University of Hradec Kralove, Rokitanskeho 62, 50003 Hradec Kralove, Czech Republic
{rafael.dolezal,lukas.najman,veronika.racakova,jiri.krenek,
jan.korabecny,kamil.kuca,ondrej.krejcar}@uhk.cz
2
Biomedical Research Center, University Hospital Hradec Kralove,
Hradec Kralove, Czech Republic
3
Department of Information Technologies, Faculty of Informatics and Management,
University of Hradec Kralove, Hradec Kralove, Czech Republic
{agata.bodnarova,richard.cimler,martina.husakova.2}@uhk.cz

Abstract. In the last several decades, the drug research has moved to involve
various IT technologies in order to rationalize the design of novel bioactive
chemical compounds. An important role among these computer-aided drug de-
sign (CADD) methods is played by a technique known as quantitative structure-
activity relationship (QSAR). The approach is utilized to find a statistically
significant model correlating the biological activity with more or less extent da-
ta derived from the chemical structures. The present article deals with ap-
proaches for discriminating unimportant information in the data input within the
three dimensional variant of QSAR – 3D QSAR. Special attention is turned to
uninformative and iterative variable elimination (UVE/IVE) methods applicable
in connection with partial least square regression (PLS). Herein, we briefly in-
troduce 3D QSAR approach by analyzing 30 antituberculotics. The analysis is
examined by four UVE/IVE-PLS based data-noise reduction methods.

Keywords: UVE/IVE · Data-noise reduction · 3D QSAR · PLS · CADD

1 Introduction

The principle behind quantitative structure-activity relationships (QSAR) has been

known for more than 150 years. It was a logical inference resulting from discovering
the molecular structure of the matter. A first mathematical formalization of QSAR
being often highlighted in historical reviews on rational drug design methods is the
equation introduced by Crum-Brown and Fraser (Eq. 1).
=( ) (1)

© Springer International Publishing Switzerland 2015

F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 313–325, 2015.
DOI: 10.1007/978-3-319-23485-4_33
314 R. Dolezal et al.

Here, φ means the biological effect of a substance which is characterized by a set

of structural features C [1]. The equation (Eq. 1) was published in 1869 as a proposi-
tion of a correlation between the biological activity of different atropine derivatives
and their molecular structure. Although the validity of the Crum Brown-Fraser equa-
tion was confirmed after a 25 years’ lag, it has undoubtedly become a corner stone of
rational approaches in drug design and discovery.
In simplest terms, QSAR refers to a strategy aimed at building a statistically
significant correlation model between the biological activity and various molecular
descriptors by chemometric tools. The biological activity can be expressed as minimal
inhibition concentration (MIC), concentration causing 50% enzyme inhibition (IC50),
binding affinity (Ki), lethal dose (LD50), etc. Regarding the description of the molecu-
lar structure, a great progress has been achieved since the genesis of classical models
by Hansch or Free-Wilson in the sixties of the twentieth century [2, 3]. So far, thou-
sands of various molecular descriptors have been developed for utilization in QSAR
analyses. In order to build a statistically significant QSAR model from known biolog-
ical activities and molecular descriptors, linear (e.g. multiple linear regression MLR,
partial least squares PLS, principal component regression PCR) or non-linear (e.g.
artificial neural networks ANN, k-nearest neighbors kNN, Bayesian nets) data-mining
methods are commonly employed in QSAR analyses.
In the present paper, three dimensional version of QSAR (3D QSAR) methodology
is particularly studied. It is a method based on statistical processing of molecular inte-
raction fields (MIFs). The MIF matrix is regularly data-mined by PLS to build a linear
predictive 3D QSAR model. Unfortunately, PLS itself is not a sufficient tool for find-
ing a stable model utilizing the original MIFs since it considerably suffers from abun-
dant and noisy information in the input. The objective of our study is to evaluate
several statistical methods applicable in building robust 3D QSAR models. Chapter 2
introduces in simple terms what the principles of 3D QSAR analysis are. Data-
processing and noise reduction approaches based on uninformative/iterative variable
elimination (UVE/IVE) are depicted in Chapter 3. Finally, the merits of these me-
thods are demonstrated by 3D QSAR analysis of 30 antituberculotics in Chapter 4.

2 3D QSAR – Principles and Methodology

The core of 3D QSAR method, originally named comparative molecular field analysis
(CoMFA), was designed by Cramer et al. in 1988 as a four-step procedure: 1) supe-
rimposition of ligand molecules on a selected template structure, 2) representation of
ligand molecules by molecular interaction fields (MIFs), 3) data analysis of MIFs and
biological activities by PLS, utilizing cross-validation to select the most robust 3D
QSAR model, 4) graphical explanation of the results through three-dimensional pseu-
do βPLS coefficients contour plots.
Within the 3D QSAR analysis, a starting set of molecular models can be prepared
with any chemical software capable of creating and geometrically optimizing
chemical structures (e.g. HyperChem, Spartan, ChemBio3D Ultra, etc.). Usually, a
molecular dynamics method (e.g. simulated annealing, quenched molecular dynamics,
Variable Elimination Approaches for Data-Noise Reduction in 3D QSAR Calculations 315

Langevin dynamics, Monte Carlo) is employed to obtain the most thermodynamically

representative conformers. Having the chemical structures in a suitable geometry,
preferably the most biologically active one is chosen as a template and the others
denoted as candidate ligands. In the alignment step, all candidates are superimposed
on the template structure using a distance-based scoring function implemented in an
optimizing algorithm to find “the tightest” molecular alignment set. Once an optimum
molecular alignment set is gained, molecular interaction fields (MIFs) may be calcu-
lated for all candidates as well as the template compound. The MIFs calculations may
be adumbrated as calculating the steric and electrostatic potential in a gridbox
surrounding the ligand molecule (Fig. 1).

Fig. 1. Calculation of MIFs.

The potential energies (e.g. Lennard-Jones, van der Waals (VDW) and Coulomb
electrostatic potential energies (ESP)) experienced by unit charge or atom probe at
various points (xi, yj, zk) around the studied molecules are usually regressed by PLS on
the biological activities to reveal significant correlations. Generally, any supervised
learning method is applicable in 3D QSAR analysis instead of PLS, provided it is able
to process many thousands of “independent” x variables within a reasonable time [4].
A common MIF related to a 1.0Å spaced gridbox of the size 30.0 x 30.0 x 30.0Å
represents a chemical compound by 27 000 real numbers. Because not all points in
MIFs are typically related with the observed biological activity, the redundant infor-
mation in the data input may bring about overlearning which mostly causes unreliable
prediction for compounds outside the training and test sets. As a rule, the raw input
data for 3D QSAR analysis must be pre-processed by data-noise reduction methods
prior to building the final model by PLS. The data-noise reduction techniques like
fractional factor design (FFD), uninformative variable elimination (UVE) or iterative
variable elimination (IVE) frequently utilize cross-validated coefficient of determina-
tion (Q2) as a cost-function for selecting the most robust 3D QSAR model [5]. The
final step in deciding whether the derived 3D QSAR model is trustworthy is statistical
validation. Such methods as progressive Y-scrambling, randomization, leave-two-out
(LTO) or multiple leave-many-out cross-validation (LMO), and, above all, external
validation are employed for these ends [6].
316 R. Dolezal et al.

Besides quantitative pred dictions of unknown biological activities, 3D QSAR anaaly-

sis enables to disclose influuential features in the studied chemical structures throuugh
the spatial visualization of pseudo βPLS-coefficients. Usually, pseudo βPLS-coefficieents
are additionally multiplied by standard deviation of MIFs column vectors (notedd as
SD) to underline more varried regions in the chemicals structures. Since the pseuudo
βPLS-coefficients quantitativvely indicate how much each point of MIFs contributes to
the biological activity, the pharmacophore within the studied compound set mayy be
revealed. An example of 3D D contour maps outlining pseudo βPLS x SD coefficientts is
given in Fig. 2. Data for thee illustration were taken from the literature [5].

Fig. 2. 3D contour map of pseudo βPLS x SD coefficients.

By 3D contour maps of pseudo βPLS x SD coefficients one can disclose which m mo-
lecular features are crucial for
f the biological activity observed. Accordingly, mediccin-
al chemists can utilize the information to design novel drugs through their chemmical
intuition or they can emp ploy the found 3D QSAR model in ligand-based virttual
screening of convenient dru ug databases (e.g. zinc.docking.org).

3 Variable Elimin
nation as Data-Noise Reduction Method for 33D
QSAR Analysis

Currently, 3D QSAR anaalysis has been involved in a variety of drug reseaarch

branches. Examples of succh successful projects are discovery of biphenyl-based cy-
tostatics, design of mitochoondrial cytochrome P450 enzyme inhibitors, investigattion
otential therapeutics for neurodegenerative diseases or de-
of sirtuin 2 inhibitors as po
velopment of improved accetylcholinesterase reactivators [5, 7]. However, manyy of
recent 3D QSAR studies arre justified only by internal coefficient of determinationn R2
> 0.8 and by its leave-onee-out (LOO) cross-validated counterpart Q2LOO > 0.6.. In
scarcer cases, external valiidation of the 3D QSAR models is reported. On the otther
hand, it is very well know wn among QSAR experts that mostly unstable 3D QS SAR
Variable Elimination Approaches for Data-Noise Reduction in 3D QSAR Calculations 317

models result via PLS regression when a proper selection of independent variables
from the original MIFs is neglected. This drawback, which challenges not only 3D
QSAR models, manifests especially in external prediction or exhaustive LMO cross-
validation. Lately, several variable selection algorithms have been developed to ad-
dress the overlearning in PLS based models. In the present study, a special attention is
turned to variable elimination methods applicable in 3D QSAR analysis.

3.1 PLS Regression Analysis of Molecular Interaction Fields

The MIFs obtained by consecutive probing the molecules in 3D lattice box by differ-
ent probes are regularly reordered into long row vectors and stored as descriptor
matrix. Each row represents individual chemical compound and each column contains
the values of interaction energy at a given lattice intersection. Since the matrix
usually consists of thousands of columns and tens of rows, common MLR is useless
for data-mining these data due to singularities arising during the inversion of ( )
matrix. A method of choice for processing MIFs in 3D QSAR analysis has become
the partial least square regression (PLS). It is a supervised learning method which
combines principal component analysis (PCA) and MLR. By extracting orthogonal
factors (i.e. latent variables - LVs) from the original matrix PLS aims to predict
one or more dependent variables ( column vector or matrix).
PLS is a convenient method for deriving a linear regression model correlating a set
of dependent variables (i.e. biological activities) with an extent set of predictors (i.e.
MIFs or molecular descriptors). A significant strength of PLS is the possibility to
process multidimensional data with high degree of intercorrelation. Although PLS
was originally developed in social sciences, it has become one the most favorite che-
mometric tool utilized in 3D QSAR. The benefits of PLS are appreciated especially
when the number of rows is decreased after splitting the input data into training,
test and external sets. The nature of PLS may be briefly characterized as simultaneous
decomposition of and matrices. Formally, PLS works with several coupled
matrices (Eq. 2):

; ; ; (2)

where means the pseudo βPLS regression coefficients; and denote the
weight matrices; means the loading matrix; and are the score matrices. The
columns of are orthogonal and called the latent variables ( ). The crucial
operation within PLS analysis consists in simplifying the complexity of the system by
selecting only few LVs to build the model. The number of involved LVs is often de-
termined by cross-validation. When Q2LOO starts dropping or the standard error of
prediction (SDEP) increases, the optimum number of latent variables has been ex-
ceeded (Fig. 3). Considerably more robust algorithms for latent variable selection
implement leave-many-out cross-validated Q2LMO or coefficient of determination for
external prediction R2ext as a control function.
318 R. Dolezal et al.

Fig. 3. Determining the optimal number of LVs through cross-validation.

Since PLS is a PCA based method, it is highly sensitive to the variance of the val-
ues included in the input data. This problematic feature of PLS can lead to masking
significant information by data assuming greater values. For instance, the information
on weak hydrophobic interactions is suppressed by Coulombic interactions and hy-
drogen bonding that are stronger. However, it has become evident that hydrophobic
interactions play such an important role in drug binding to receptors that they cannot
be neglected in 3D QSAR calculations. To prevent discriminating variables with rela-
tively low values, the MIFs as well as the biological activities have to be column cen-
tered and normalized prior to PLS. In 3D QSAR analysis, different MIFs may be also
scaled as separated blocks by block unscaled weighting (BUW) to give each probe the
same significance in PLS [8]. PLS regression can be performed by a number of subtle
differing algorithms (e.g. NIPALS, SIMPLS, Lanczos bidiagonalization) [9]. In case
consists only of one column, then NIPALS algorithm may be simply illustrated as
follows:
[X,Y] = autoscale(X, Y); % Centering and normalization
T=[]; P=[]; W=[]; Q=[]; B=[]; % initialization
for a=1:LV % Calculate the entered number of LVs
w=(Y'*X)'; % Calculate weighting vector
w=w/sqrt(w'*w); % Normalization to unit length
t=X*w; % Calculate X scores
if a>1 t=t - T*(inv(T'*T)*(t'*T)'); end;
u=t/(t'*t);
p=(u'*X)'; % Calculate X loadings
X=X - t*p'; % Calculate X residuals
T=[T t]; % Calculate X scores
P=[P p]; % Calculate X loadings
W=[W w]; % Calculate X weights
Q=[Q;y'*u]; % Calculate Y loadings
B=[B W*inv(P'*W)*Q]; % Calculate PLS coefficients
end
Y_pred=X*B; % Internal prediction
Variable Elimination Approaches for Data-Noise Reduction in 3D QSAR Calculations 319

The above PLS algorithm derives the entered number of LVs and utilizes them in
the internal prediction of the autoscaled dependent variable . The robustness of the
resulting PLS model can be easily controlled by incorporating a cross-validation into
the algorithm. Nonetheless, the prediction instability of the 3D QSAR models has to
be solved mainly through preselecting the input data.

3.2 Unsupervised Preselection in the MIFs before PLS Analysis

Calculation of the physically based interactions between the probe and the molecules
placed into a gridbox necessarily leads to dramatic increase of the energy when the
probe gets closer to atom nuclei. Conversely, the energy calculated in gridpoints rela-
tively far from a molecule may assume negligible magnitude from the QSAR’s point
of view. These MIF properties enable to perform, to certain extension, a preliminary
input data clearance. It is a common practice in 3D QSAR analysis to remove from
the MIFs such points (i.e. whole columns in matrix) where the maximum energy
exceeds a chosen threshold (e.g. > 50 kcal/mol for steric van der Waals MIF, > 30
kcal/mol for electrostatic MIF). Other possible unsupervised techniques to eliminate
unimportant data in the MIFs are: zeroing all gridpoints having absolute value below
0.05 kcal/mol, removing all MIFs vectors of the standard deviation lower than 0.1
kcal/mol, removing all MIFs vectors assuming only several levels with skewed distri-
bution. By such type preselection it is possible to remove from 1% to 80% uninforma-
tive MIFs vectors depending on the cutoff levels used. Commonly, several hundreds
or thousands MIFs columns are left after unsupervised preselection to other variable
reduction approaches.

3.3 Uninformative Variable Elimination

The main goal of 3D QSAR analysis is to develop a stable mathematical model which
can be used for prediction of unseen biological activities. From this somewhat nar-
rowed point of view, any model that is not able to prove itself in external prediction
must be rejected from further consideration. However, such refusal does not provide
any suggestion why different models are successful or fail in predictions and how to
boost the predictive ability.
The method introduced by Centner and Massart focuses on what is noisy and/or ir-
relevant information and how to discriminate it [10]. Comparing to other variable
selection techniques like forward selection, stepwise selection, genetic and evolution-
al algorithms, uninformative variable elimination (UVE) does not attempt to find the
best subset of variables to build a statistically significant model but to remove such
variables that contain no useful information. Centner and Massart took their inspira-
tion from previously published studies which tried to eliminate those variables having
small loadings or pseudo βPLS coefficient in a model derived by PLS.
UVE-PLS method resembles stepwise variable selection used in MLR. The j-th va-
riable (i.e. a MIF vector) is eliminated from the original vector pool if its cj value is
lower than a certain cutoff level (Eq. 3)
320 R. Dolezal et al.

(3)

In order to obtain the mean ( ) and the standard deviation ( ) of pseudo

βPLS coefficient for each variable, a cross-validation is necessary to carry out. A criti-
cal point in this method is how to determine the cutoff level for the elimination. For
this purposes, Centner and Massart proposed to add several artificial random variables
into the original matrix and to calculate their cjrandom values. The original variables
that exhibit lower cj than the maximum in cjrandom’s determined for the artificial va-
riables are removed. Although the UVE method is capable to discriminate uninforma-
tive variables, the artificial variables used have to be of low magnitude comparing to
the original variables so as they do not disturb significantly the model. The artificial
random matrix is proposed to be of the same dimension as the matrix.
According to Centner and Massart, the UVE-PLS procedure can be summarized in
the following steps: 1) determination of the optimum number of LVs as the minimum
of the root-mean-square error prediction function RMSEP (Eq. 4); 2) generation of
the random matrix and its scaling by a small factor (e.g. 10-10); 3) PLS regression
of the conjugate matrix and leave-one-out cross-validation; 4) calculation of cj
values for all variables; 5) determination of max(abs(cjrandom)); 6) elimination of all
original variables for which abs(cj) < max(abs(cjrandom)); 7) evaluating of the new
model by leave-one-out cross-validation.

RMSEP ∑ (4)

Here, is the observed biological activity for i-th compound, stands for the pre-
dicted biological activity of i-th compound, n is the number of compounds in the set.
The above-mentioned UVE algorithm can be transformed to a more robust variant by
expressing the cj as median )/interquartile range ( ). The criterion for vari-
able elimination may be substituted for a 90-95 quantile of abs(cjrandom).

3.4 Iterative Variable Elimination

Uninformative variable elimination was designed to improve the predictive power and
the interpretability of 3D QSAR models via removing those MIFs parts which do not
contain fecund information in comparison to random noise introduced to the input
data. The UVE method is based on calculation of cj values indicating the ratio of the
size and standard deviation of pseudo βPLS coefficient related to j-th vector of MIFs.
All MIF vectors with cj smaller then a cutoff derived from c values of the random
variables are removed from the matrix in a single step procedure.
A modification of the UVE algorithm suggested by Polanski and Gieleciak revises
the very one-step elimination of MIF vectors with low cj values [11, 12]. Their im-
proved algorithm named iterative variable elimination (IVE) does not remove the
selected vectors in a single step but in a sequential manner. In the first version of IVE,
the MIF vector having the lowest pseudo βPLS coefficient is eliminated and the
Variable Elimination Approaches for Data-Noise Reduction in 3D QSAR Calculations 321

remaining matrix is regressed by PLS to evaluate the benefit. The iterative IVE
procedure can be described by the following protocol: 1) carry out PLS analysis with
a fixed number of LVs and estimate the performance by leave-one-out cross-
validation; 2) eliminate the matrix column with the lowest absolute value of pseu-
do βPLS coefficient; 3) carry out PLS analysis of the reduced matrix and
estimate the performance by leave-one-out cross-validation; 4) go to step 1 and repeat
until the maximal leave-one-out cross-validated coefficient of determination Q2LOO is
reached (Eq. 5).

∑
1 ∑
(5)

Here, means the observed biological activity for i-th compound, denotes
the biological activity of i-th compound predicted by the model derived without the i-
th compound, n is the number of compounds in the set. In the first version, the IVE
procedure was based on iterative elimination of MIF vectors with the lowest absolute
values of pseudo βPLS coefficients. In the next IVE variants, the criterion for MIF
vector elimination was substituted by cj values obtained by leave-one-out cross-
validation. The most robust form of IVE was proposed to involve optimization of the
LV number and cj values defined as median )/interquartile range ( ). It was
proved by Polanski and Gieleciak that the robust IVE form surpassed the other vari-
ants and gave the most reliable 3D QSAR models in terms of the highest Q2LOO and
sufficient resolution of pseudo βPLS coefficient contour maps.

3.5 Hybrid Variable Elimination

The issue concerning the selection or elimination of the right variables in 3D QSAR
studies can be addressed in several ways [13]. A logical candidate to be implemented
in improving the ability of 3D QSAR models seems to be genetic algorithm (GA) that
enables evaluating different sets of MIF vectors. However, thanks to many variables
in MIFs there is no guarantee that GA-PLS procedure can straightforwardly converge
to achieve the best solution. Other promising alternatives of variable selec-
tion/elimination techniques relevant in 3D QSAR are sub-window permutation analy-
sis coupled with PLS (SwPA-PLS), iterative predictor weighting PLS (IPW-PLS),
regularized elimination procedure (REP-PLS), interactive variable selection (IVS-
PLS), soft-thresholding PLS (ST-PLS) or powered PLS (PPLS) (see [13]).
Noteworthy, the MIFs generated for chemical compounds are not independent va-
riables but spatially inter-correlated. In the original 3D MIF matrix, the energy values
are ordered according to a molecular structure probed in a gridbox. This internal in-
formation is neglected and stays hidden when transforming the 3D MIF matrices into
row vectors of the matrix. Application of UVE and IVE methods on such unfolded
data therefore does not reflect the chemical information born by MIFs but treats it as
different molecular descriptors. To utilize also the spatial contiguity of the original 3D
MIFs, a methodology named smart region definition (SRD) has found its important
322 R. Dolezal et al.

position in 3D QSAR analysis. SRD procedure aims to rearrange the unfolded MIFs
into group variables related to the same chemical regions (e.g. points around the same
atoms). These groups of neighboring variables are explicitly associated with chemical
structures and when treated as logical units, the resulting 3D QSAR models are less
prone to chance correlations and easier to interpret [14]. Since the SRD procedure
clusters similar MIF vectors into groups, the time consumed by UVE or IVE analyses
is shorter than in the standard processing of all individual MIF vectors. The SRD
algorithm involves three major operations: 1) selecting the most important MIF vec-
tors (seeds) having the highest PLS weights; 2) building 3D Voronoi polyhedra
around the seeds; 3) collapsing Voronoi polyhedral into larger regions.
The starting point of the SRD procedure is PLS analysis of the whole matrix
which reveals through the magnitude of weights significant MIF vectors. Depending
on the user’s setting, a selection of important MIF vectors is denoted as seeds. In next
step, the remaining MIF vectors are assigned to the nearest seed according to preset
Euclidian distance. In case a MIF vector is too far from each seed, it is assigned to
“zero” region and eliminated from the matrix. After distributing the MIF vectors
into Voronoi polyhedra, further variable absorption is performed. Neighboring Vo-
ronoi polyhedra are statistically analysed and if found significantly correlated, they
are merged into one larger Voronoi polyhedra. The cutoff distances for initial building
the Voronoi polyhedra as well as for subsequent collapsing are critical points which
decide on the merit of SRD and, thus, have to be cautiously optimized.

4 Performance Comparison of UVE and IVE Based Methods

In order to practically evaluate the performance of the UVE and IVE based noise-
reduction methods, a 3D QSAR analysis has been carried out. We selected a group of
30 compounds, which are currently considered as potential antituberculotics [15, 16],
and analyzed them in Open3DAlign and Open3DQSAR programs [17, 18]. Since we
cannot provide a detailed description of all undertaken steps of the analysis in this
article, we will confine the present study only to the performance of the UVE and IVE
algorithms and their SRD hybridized variants. In the 3D QSAR analyses, the most
common or default setting was used.
First, the set of 30 compounds published in the literature was modeled in Hyper-
Chem 7.0 to prepare the initial molecular models. Then, the molecular ensemble was
submitted to quenched molecular dynamics and the resulting conformers were
processed by an aligning algorithm in Open3DAlign program to determine the optim-
al molecular superimposition. In Open3DQSAR program two MIFs were generated
(i.e. van der Waals MIF and Coulombic MIF) and processed by four 3D QSAR me-
thods: 1) UVE-PLS, 2) IVE-PLS, 3) UVE-SRD-PLS and 4) IVE-SRD-PLS (Fig. 4).
As a dependent variable, we used the published logMICs against Mycobacterium
tuberculosis. To evaluate the 3D QSAR model, the original set of compounds was
randomly divided into training and test sets in ratio 25 : 5.
Variable Elimination Approach
hes for Data-Noise Reduction in 3D QSAR Calculations 323

Fig. 4. Comparison of statistiical performance of PLS 3D QSAR models after applicationn of

various data-noise reduction methods. The results are related to 30 antituberculotics. The
present R2, R2ext, Q2LOO, Q2LTO and Q2LMO values.
vertical y-axes in all graphs rep

From the above plots fo ollows that an efficient data noise reduction is cruciall to
achieve a stable PLS mod del in cross-validations. The original dataset provided all
Q2LOO/LTO/LMO with negative values, which indicate an overtrained PLS model. Signnifi-
cant improvement was reaached even by unsupervised preselection of the mattrix.
UVE and SRD-UVE PLS models showed considerable better performance and exxhi-
bited nearly the same robusstness in cross-validation. Top scoring model resulted w
when
applying SRD/UVE-PLS wiith 6-7 LVs (Q2LMO = 0.7352 – 0.7361). The best 3D QS SAR
324 R. Dolezal et al.

model was obtained by IVE-PLS (5 LVs; Q2LMO = 0.7652). It is interesting that applica-
tion of SRD-IVE-PLS caused significant deterioration of the IVE-PLS model stability
in cross-validation (max(Q2LMO) = 0.4617; 6 LVs). It is likely a consequence of stepwise
elimination of larger groups of MIF vectors grouped by SRD. Without SRD, the IVE
algorithm iteratively investigates all MIF vectors and is better able to find the critical
portion of eliminated information from the input data. Numbers of eliminated MIF
vectors by the used data-noise reduction methods are given in Table 1.

Table 1. Remaining MIF vectors after data-noise reductions.

Number of MIF vectors remaining after data-noise reduction

MIF Original Unsupervised SRD- SRD-
UVE IVE
dataset preselection UVE IVE
VDW 11500 1261 237 128 344 536
ESP 11500 11479 4752 275 6026 576

5 Conclusion

In medicinal chemistry and bioinformatics various computerized technologies are in-

creasingly utilized to facilitate drug discovery. One of them is 3D QSAR analysis of
molecular interaction fields. By prediction of unseen biological activities, the time and
costs needed to develop a novel drug can be substantially decreased. However, the ben-
efit of such computational methods depends on reliability of the derived models. We
demonstrated in this work that a data-noise reduction algorithm to select only significant
MIF vectors from the original matrix is essential to build a stable 3D QSAR model.
A promising method for elimination of useless information in 3D QSAR analysis seems
to be iterative variable elimination (IVE) coupled with PLS. It was shown that IVE-PLS
method surpassed other techniques for data-noise reduction and provided the most sta-
tistically significant 3D QSAR model for 30 selected antituberculotics.

Acknowledgements. The paper is supported by the project of specific science “Application of

Artificial Intelligence in Bioinformatics” at the Faculty of Informatics and Management, Uni-
versity of Hradec Kralove, Czech Republic and by the long term development plan FNHK.

References
1. Brown, A.C., Fraser, T.R.: XX.—On the Connection between Chemical Constitution and
Physiological Action. Part II.—On the Physiological Action of the Ammonium Bases de-
rived from Atropia and Conia. Trans. Roy. Soc. Edinburgh 25, 693–739 (1869)
2. Hansch, C., Fujita, T.: ρ-σ-π Analysis. A method for the correlation of biological activity
and chemical structure. J. Am. Chem. Soc. 86, 1616–1626 (1964)
3. Free, S.M., Wilson, J.W.: A mathematical contribution to structure-activity studies. J.
Med. Chem. 7, 395–399 (1964)
4. Cramer Iii, R.D.: Partial least squares (PLS): its strengths and limitations. Perspect. Drug.
Discov. 1, 269–278 (1993)
Variable Elimination Approaches for Data-Noise Reduction in 3D QSAR Calculations 325

5. Dolezal, R., Korabecny, J., Malinak, D., Honegr, J., Musilek, K., Kuca, K.: Ligand-based
3D QSAR analysis of reactivation potency of mono- and bis-pyridinium aldoximes toward
VX-inhibited rat acetylcholinesterase. J. Mol. Graph. Model. 56c, 113–129 (2014)
6. Tropsha, A., Gramatica, P., Gombar, V.K.: The importance of being earnest: validation is
the absolute essential for successful application and interpretation of QSPR models. QSAR
Comb. Sci. 22, 69–77 (2003)
7. Chuang, Y.C., Chang, C.H., Lin, J.T., Yang, C.N.: Molecular modelling studies of sirtuin
2 inhibitors using three-dimensional structure-activity relationship analysis and molecular
dynamics simulations. Mol. Biosyst. 11, 723–733 (2015)
8. Kastenholz, M.A., Pastor, M., Cruciani, G., Haaksma, E.E., Fox, T.: GRID/CPCA: a new
computational tool to design selective ligands. J. Med. Chem. 43, 3033–3044 (2000)
9. Bro, R., Elden, L.: PLS works. J. Chemometr. 23, 69–71 (2009)
10. Centner, V., Massart, D.L., de Noord, O.E., de Jong, S., Vandeginste, B.M., Sterna, C.: Elimi-
nation of uninformative variables for multivariate calibration. Anal. Chem. 68, 3851–3858
(1996)
11. Polanski, J., Gieleciak, R.: The comparative molecular surface analysis (CoMSA) with
modified uniformative variable elimination-PLS (UVE-PLS) method: application to the
steroids binding the aromatase enzyme. J. Chem. Inf. Comp. Sci. 43, 656–666 (2003)
12. Gieleciak, R., Polanski, J.: Modeling robust QSAR. 2. iterative variable elimination
schemes for CoMSA: application for modeling benzoic acid pKa values. J. Chem. Inf.
Model. 47, 547–556 (2007)
13. Mehmood, T., Liland, K.H., Snipen, L., Sæbø, S.: A review of variable selection methods
in partial least squares regression. Chemometr. Intel. Lab. 118, 62–69 (2012)
14. Pastor, M., Cruciani, G., Clementi, S.: Smart region definition: a new way to improve the
predictive ability and interpretability of three-dimensional quantitative structure-activity
relationships. J. Med. Chem. 40, 1455–1464 (1997)
15. Dolezal, R., Waisser, K., Petrlikova, E., Kunes, J., Kubicova, L., Machacek, M., Kaustova, J.,
Dahse, H.M.: N-Benzylsalicylthioamides: Highly Active Potential Antituberculotics. Arch.
Pharm. 342, 113–119 (2009)
16. Waisser, K., Matyk, J., Kunes, J., Dolezal, R., Kaustova, J., Dahse, H.M.: Highly Active
Potential Antituberculotics: 3-(4-Alkylphenyl)-4-thioxo-2H-1,3-benzoxazine-2(3H)-ones
and 3-(4-Alkylphenyl)-2H-1,3-benzoxazine-2,4(3H)-dihiones Substituted in Ring-B by
Halogen. Archiv Der Pharmazie 341, 800–803 (2008)
17. Tosco, P., Balle, T., Shiri, F.: Open3DALIGN: an open-source software aimed at unsuper-
vised ligand alignment. J. Comput. Aid. Mol. Des. 25, 777–783 (2011)
18. Tosco, P., Balle, T.: Open3DQSAR: a new open-source software aimed at high-throughput
chemometric analysis of molecular interaction fields. J. Mol. Model. 17, 201–208 (2011)
Pattern-Based Biclustering with Constraints
for Gene Expression Data Analysis

Rui Henriques(B) and Sara C. Madeira

Inesc-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal

{rmch,sara.madeira}@tecnico.ulisboa.pt

Abstract. Biclustering has been largely applied for gene expression

data analysis. In recent years, a clearer understanding of the syner-
gies between pattern mining and biclustering gave rise to a new class
of biclustering algorithms, referred as pattern-based biclustering. These
algorithms are able to discover exhaustive structures of biclusters with
flexible coherency and quality. Background knowledge has also been
increasingly applied for biological data analysis to guarantee relevant
results. In this context, despite numerous contributions from domain-
driven pattern mining, there is not yet a solid view on whether and how
background knowledge can be applied to guide pattern-based bicluster-
ing tasks.
In this work, we extend pattern-based biclustering algorithms to effec-
tively seize efficiency gains in the presence of constraints. Furthermore,
we illustrate how constraints with succinct, (anti-)monotone and con-
vertible properties can be derived from knowledge repositories and user
expectations. Experimental results show the importance of incorporat-
ing background knowledge within pattern-based biclustering to foster
efficiency and guarantee non-trivial yet biologically relevant solutions.

1 Introduction
Biclustering, the task of finding subsets of rows with a coherent pattern across
subsets of columns in real-valued matrices, has been largely used for expression
data analysis [9,11]. Biclustering algorithms based on pattern mining methods
[9,11,12,18,22,25], referred in this work as pattern-based biclustering, are able
to perform flexible and exhaustive searches. Initial attempts to use background
knowledge for biclustering based on user expectations [5,7,15] and knowledge-
based repositories [18,20,26] show its key role to guide the task and guaran-
tee relevant solutions. In this context, two valuable synergies can be identified
based on these observations. First, the optimality and flexibility of pattern-based
biclustering provide an adequate basis upon which knowledge-driven constraints
can be incorporated. Contrasting with pattern-based biclustering, alternative
biclustering algorithms place restrictions on the structure (number, size and
positioning), coherency and quality of biclusters, which may prevent the incor-
poration of certain constraints [11,16]. Second, the effective use of background
knowledge to guide pattern mining searches has been largely researched in the
context of domain-driven pattern mining [4,23].

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 326–339, 2015.
DOI: 10.1007/978-3-319-23485-4 34
Pattern-Based Biclustering with Constraints 327

Despite these synergies, there is a lack of literature on the feasibility and

impact of integrating domain-driven pattern mining and biclustering. In particu-
lar, there is a lack of research on how to map the commonly available background
knowledge in the form of parameters or constraints to guide the biclustering task.
Additionally, the majority of existing pattern-based biclustering algorithms rely
on searches dependent on bitset vectors [18,22,25], which may turn their per-
formance impracticable for large and dense biological datasets. Although new
searches became recently available for biclustering large and dense data [13],
there are not yet contributions on how these searches can be adapted to seize
the benefits from the available background knowledge.
In this work, we address these problems. First, we list an extensive set of
key constraints with biological relevance and show how they can be specified for
pattern-based biclustering. Second, we extend F2G [13], a recent pattern-growth
search that tackles the efficiency bottlenecks of peer searches, to bed able to effec-
tively use constraints with succinct, (anti-)monotone and convertible properties.
To achieve these goals, we propose BiC2PAM (BiClustering with Constraints
using PAttern Mining), an algorithm that integrates recent breakthroughs on
pattern-based biclustering [9,11,12] and extends them to effectively incorporate
constraints. Experimental results confirm the role of BiC2PAM to foster the bio-
logical relevance of pattern-based biclustering solutions and to seize large effi-
ciency gains by adequately pruning the search space.
The paper is structured as follows. Section 2 provides background on pattern-
based biclustering and domain-driven pattern mining. Section 3 surveys key
contributions and limitations from related work. Section 4 lists biologically
meaningful constraints and proposes BiC2PAM for their effective incorporation.
Section 5 provides initial empirical evidence of BiC2PAM’s efficiency and ability
to unravel non-trivial yet biologically significant biclusters from gene expression
data. Finally, concluding remarks are synthesized.

2 Background
Definition 1. Given a matrix, A=(X, Y ), with a set of rows X={x1 , .., xn }, a
set of columns Y ={y1 , .., ym }, and elements aij ∈R relating row i and column j:
the biclustering task aims to identify a set of biclusters B={B1 , .., Bm }, where
each bicluster Bk = (Ik , Jk ) is a submatrix of A (Ik ⊆ X and Jk ⊆ Y ) satisfying
specific criteria of homogeneity and significance [11].
A real-valued matrix can thus be described by a (multivariate) distribution
of background values and a structure of biclusters, where each bicluster satis-
fies specific criteria of homogeneity and significance. The structure is defined by
the number, size and positioning of biclusters. Flexible structures are character-
ized by an arbitrary-high set of (possibly overlapping) biclusters. The coherency
(homogeneity) of a bicluster is defined by the observed correlation of values (see
Definition 2). The quality of a bicluster is defined by the type and amount of
accommodated noise. The statistical significance of a bicluster determines the
deviation of its probability of occurrence from expectations.
328 R. Henriques and S.C. Madeira

Definition 2. Let the elements in a bicluster aij ∈ (I, J) have coherency across
rows given by aij =kj +γi +ηij , where kj is the expected value for column j, γi is
the adjustment for row i, and ηij is the noise factor [16]. For a given real-valued
matrix A and coherency strength δ: aij =kj +γi +ηij where ηij ∈ [−δ/2, δ/2].
As motivated, the discovery of exhaustive and flexible structures of biclusters
satisfying certain homogeneity criteria (Definition 2) is a desirable condition to
effectively incorporate knowledge-driven constraints. However, due to the com-
plexity of such biclustering task , most of the existing algorithms are either based
on greedy or stochastic approaches, producing sub-optimal solutions and plac-
ing restrictions (e.g. fixed number of biclusters, non-overlapping structures, and
simplistic coherencies) that prevent the flexibility of the biclustering task [16].
Pattern-based biclustering appeared in recent years as one of various attempts
to address these limitations. As follows, we provide background on this class of
biclustering algorithms, as well as on constraint-based searches.
Pattern-Based Biclustering. Patterns are itemsets, rules, sequences or other
structures that appear in symbolic datasets with frequency above a specified
threshold. Patterns can be mapped as a bicluster with constant values across
rows (aij =cj ), and specific coherency strength determined by the number of
symbols in the dataset, δ=1/|L| where L is the alphabet of symbols. The rel-
evance of a pattern is primarily defined by its support (number of rows) and
length (number of columns). To allow this mapping, the pattern mining task
needs to output not only the patterns but also their supporting transactions
(full-patterns). Definitions 3 and 4 illustrate the paradigmatic mapping between
full-pattern mining and biclustering.

Deﬁnition 3. Let L be a ﬁnite set of items, and P an itemset P ⊆ L. A sym-

bolic matrix D is a ﬁnite set of transactions in L, {P1 , .., Pn }. Let the cover-
age ΦP of an itemset P be the set of transactions in D in which P occurs,
{Pi ∈ D | P ⊆ Pi }, and its support supP be the coverage size, |ΦP |.
A full-pattern is a pair (P, ΦP ), where P is an itemset and ΦP the set of all
transactions that contain P . A closed full-pattern (P, ΦP ) is a full-pattern
where P is not subset of another itemset with the same support, ∀P ⊃P |P | < |P |.
Given D and a minimum support threshold θ, the full-pattern mining task
[13] consists of computing: {(P, ΦP ) | P ⊆ L, supP ≥ θ,∀P ⊃P |P | < |P |}.

Given an illustrative symbolic matrix D={(t1 , {a, c, e}), (t2 , {a, b, d}),
(t3 , {a, c, e})}, we have Φ{a,c} ={t1 , t3 }, sup{a,c} =2. For a minimum support θ=2,
the full-pattern mining task over D returns the set of closed full-patterns,
{({a}, {t1 , t2 , t3 }), ({a, c, e}, {t1 , t3 })} (note that |Φ{a,c} |≤|Φ{a,c,e} |). Fig.1 illus-
trates how full-pattern mining can be used to derive constant biclusters1 .

Deﬁnition 4. Given a symbolic matrix D in L, let a matrix A be the concate-

nation of D elements with their column indexes. Let ΨP be the column indexes
1
Association rule mining, sequential pattern mining and graph mining can be also
used to respectively mine biclusters with noisy, order-preserving and diﬀerential
coherencies [9, 12].
Pattern-Based Biclustering with Constraints 329

Fig. 1. Discovery of biclusters with constant coherency on rows from full-patterns.

of an itemset P , and ΥP be the original items of P in L. The set of maximal

biclusters ∪k Bk = (Ik , Jk ) can be derived from the set of closed full-patterns
∪k Pk from A, by mapping Ik =ΦPk and Jk =ΨPk , to compose constant biclusters
with coherency across rows with pattern ΥP [11].
The inherent simplicity, efficiency and flexibility of pattern-based biclus-
tering explains the increasing attention [11,12,18,22,25]. The major contribu-
tions of pattern-based approaches for biclustering include: 1) efficient analysis of
large matrices due to the monotone search principles (and the support for dis-
tributed/partitioned data settings and approximate patterns [8]). 2) biclusters
with parameterizable coherency strength (beyond differential assumption) and
type (possibility to accommodate additive, multiplicative, order-preserving and
plaid models) [9,11,12]; 3) flexible structures of biclusters (arbitrary position-
ing of biclusters) and searches (no need to fix the number of biclusters apriori)
[22,25]; and 4) robustness to noise, missings and discretization problems [11].
Constraint-Based Pattern Mining. A constraint is a predicate on the pow-
erset of items C : 2L →{true,false}. A full-pattern (P, ΦP ) satisfies C if C(P ) is
true. Minimum support is the default constraint in full-pattern mining, Cf req (P )
=|ΦP | ≥ θ. Typical constraints with interesting properties include: regular expres-
sions on the items in the pattern, and inequalities based on aggregate functions,
such as length, maximum, minimum, range, sum, average and variance [24].
Definition 5. Let each item have a correspondence with a real value, L → R,
when numeric operators are considered. C is monotone if for any P satisfying
C, P supersets satisfy C (e.g. range(P ) ≥ v}). C is anti-monotone if for any
P not satisfying C, P supersets do not satisfy C (e.g. max(P ) ≤ v). Let P1
satisfy C, C is succint if for any P2 satisfying C, P1 ⊆ P2 (e.g. min(P2 ) ≤ v).
C is convertible w.r.t. an ordering of items RΣ if for any P satisfying C, P
suffixes satisfy C or/and itemsets with P as suffix satisfy C (e.g. avg(P ) ≥ v).
To illustrate these constraints, consider {(t1 , {a, b, c}), (t2 , {a, b, c, d}), (t3 ,
{a, d})}, θ=1 and {a:0,b:1,c:2,d:3} value correspondence. The set of closed full-
patterns under the monotone range(P ) ≥ 2 is {({a, b, c}, {t1 , t2 }), ({a, d}, {t1 , t3 })};
the anti-monotone sum(P ) ≤ 1 is {({a, b}, {t1 , t2 })}; the succint P ⊇ {c, d} is
{({a, b, c, d}, {t2 })}; and the convertible avg(P ) ≥ 2 is {({b, c, d}, {t2 })}.

3 Related Work
Knowledge-Driven Biclustering. The use of background knowledge to guide
biclustering has been increasingly motivated since solutions with good homo-
330 R. Henriques and S.C. Madeira

geneity and statistical signiﬁcance may not necessarily be biologically relevant.

However, only few biclustering algorithms are able to incorporate background
knowledge. AI-ISA [26], GenMiner [18] and scatter biclustering [20] are able to
annotate data with functional terms retrieved from repositories with ontologies,
and use these annotations to guide the search. COBIC [19] is able to adjust
its behavior (maximum-flow/minimum-cut parameters) in the presence of back-
ground knowledge. Similarly, the priors and architectures of generative biclus-
tering algorithms can also incorporate background knowledge [10]. However,
COBIC and generative peers are not able to deliver flexible biclustering solutions
and only consider simplistic constraints. Fang et al. [5] propose a constraint-
based algorithm that turns possible the discovery of dense biclusters associated
with high-order combinations of single-nucleotide polymorphisms (SNPs). Data-
Peeler [7], as well as algorithms from formal concept analysis [15] and bi-sets
mining [1], are able to efficiently discover dense biclusters in binary matrices in
the presence of (anti-)monotone constraints. However, these last sets of algo-
rithms impose a very restrictive form of homogeneity in the delivered biclusters.
Full-Pattern Mining for Biclustering. The majority of existing full-pattern
miners rely on frequent itemset mining with implementations based on bitset
vectors to represent transaction-sets. There are two major classes of searches
with this behavior. First, Apriori-based searches [8], generally suffering from
costs of candidate generation for low support thresholds (commonly required for
biological tasks [22]). Efficient implementations include LCM and CLOSE, used
respectively by BiModule [22] and GenMiner [18] biclustering algorithms. Sec-
ond, vertical-based searches, such as Eclat and Carpenter [8]. These searches rely
on intersection operations over transaction-sets to generate candidates, requiring
structures such as bitset vectors or diffsets. However, for datasets with a high
number of transactions the bitset cardinality becomes large, these structures
consume a significant amount of memory and operations become costly. MAFIA
is an implementation used by DeBi [25]. Only in recent years, a third class of
searches without the bottlenecks associated with bitset vectors were made avail-
able by extending pattern-growth searches for the discovery of full-patterns using
frequent-pattern trees (FP-Trees) annotated with transactions. F2G [13] used by
default in BicPAM [11] implements this third type of searches.
Constraint-Based Pattern Mining. A large number of studies explore how
constraints can be used with pattern mining. Two major paradigms are avail-
able: constraint-programming (CP) and dedicated searches. First, CP allows the
pattern mining task to be declaratively defined according to sets of constraints
[4,14]. These declarative models are expressive as they can allow mathematical
expressions over itemsets and transaction-sets. Nevertheless, due to the poor
scalability of CP methods, they have been only used in highly constrained set-
tings, small-to-medium data, or to mine approximative patterns [4,14].
Second, pattern mining methods have been adapted to optimally seize effi-
ciency gains from different types of constraints. Such efforts replace naı̈ve solu-
tions: post-filtering patterns that satisfy constraints. Instead, the constraints
Pattern-Based Biclustering with Constraints 331

are pushed as deeply as possible within the mining step for an optimal prun-
ing of the search space. The nice properties exhibited by constraints, such as
anti-monotone and succinct properties, have been initially seized by Apriori
methods [21] to aﬀect the generation of candidates. Convertible constraints,
can hardly be pushed in Apriori but can be handled by FP-Growth approaches
[23]. FICA, FICM, and more recently MCFPTree, are FP-Growth extensions to
seize the properties of anti-monotone, succinct and convertible constraints [23].
The inclusion of monotone constraints is more complex. Filtering methods, such
as ExAnte, are able to combine anti-monotone and monotone pruning based on
reduction procedures [2]. Reductions are optimally handled in FP-Trees [3].

4 Pattern-Based Biclustering with Constraints

BicPAM [11], BicSPAM [12] and BiP [9] are the state-of-the-art algorithms
for pattern-based biclustering. They integrate the dispersed contributions of
previous pattern-based algorithms and extend them to discover non-constant
coherencies and to guarantee their robustness to discretization (by assigning
multi-items to a single element [11]), noise and missings. In this section, we pro-
pose BiC2PAM (BiClustering with Constraints using PAttern Mining) to inte-
grate their contributions and adapt them to effectively incorporate constraints.
BiC2PAM is a composition of three major steps: 1) preprocessing to itemize real-
valued data; 2) mining step, corresponding to the application of full-pattern min-
ers; and 3) postprocessing to merge, reduce, extend and filter similar biclusters.
As follows, Section 4.1 lists native constraints supported by parameterizations
along these steps. Section 4.2 lists biologically meaningful constraints with prop-
erties of interest. Finally, we extend a pattern-growth search to seize efficiency
gains from succinct, (anti-)monotone and convertible constraints (Section 4.3 ).

4.1 Native Constraints

Below we list a set of structural constraints that can be incorporated by adapt-

ing the parameters that control the behavior of pattern-based biclustering algo-
rithms along their three major steps.
Relevant constraints provided in the pre-processing step:

– combined inclusion of annotations (such as functional terms) with succinct con-

straints. A functional term is associated with an interrelated group of genes,
and thus it can be appended as a new dedicated symbol to the respec-
tive transactions/genes, possibly leading to a set of transactions with varying
length. Illustrating, consider T1 and T2 terms to be respectively associated with
genes {g1 , g3 , g4 } and {g3 , g5 }, an illustrative dataset for this scenario would
be {(g1 , {a11 , .., a1m , T1 }), (g2 , {a21 , .., a2m }), (g3 {a31 , .., a3m , T1 , T2 }), ...}. Pattern
mining can then be applied on top of these annotated transactions with succinct
constraints to guarantee the inclusion of certain terms (such as P ∩ {T1 , T2 }=0).
This is useful to discover, for instance, biclusters with genes participating in speciﬁc
functions of interest.
332 R. Henriques and S.C. Madeira

– ranges of values (or symbols) to ignore from the input matrix, remove(S) where
S ⊆ R+ (or S ⊆ L). In gene expression, elements with default/non-diﬀerential
expression are generally less relevant and thus can be removed. This is achieved
by removing these elements from the transactions. Despite the simplicity of this
constraint, this option is not easily supported by peer biclustering algorithms [16].
– minimum coherency strength (or number of symbols) of the target biclusters:
δ=1/|L|. Decreasing the coherency strength (increasing the number of symbols)
reduces the noise-tolerance of the resulting set of bilusters and it is often associ-
ated with solutions composed by a larger number of biclusters with smaller areas.
– level of relaxation to handle noise by increasing the ηij noise range (Deﬁnition 2).
This constraint is used to adjust the behavior of BiC2PAM in the presence of noise
or discretization problems (values near a boundary of discretization). By default,
one symbol is associated with an element. Yet, this constraint gives the possibility
to assign an additional symbol to an element when its value is near a boundary
of discretization, or even a parameterizable number of symbols per element for a
high tolerance to noise (proof in [11]).

Relevant constraints provided in the mining step:

– minimum pattern length (minimum number of columns in the bicluster).

– stopping criteria: either the anti-monotone minimum support length (minimum
number of rows in the bicluster), or iteratively decreasing support until minimum
number of biclusters is discovered or minimum area of the input matrix is coverage
by the discovered biclusters.
– type of coherency and orientation. Currently, BiC2PAM supports the selection of
constant, additive, multiplicative, symmetric, order-preserving and plaid models
with coherency on rows or columns (according to [9, 11]).
– pattern representation: simple (all coherent biclusters), closed (all maximal biclus-
ters), and maximal (solutions with a compact number of biclusters with a prefer-
ence towards a high number of columns).

Understandably, constraints addressed at the postprocessing stage are not

desirable since they are not able to seize major eﬃciency gains. Nevertheless,
BiC2PAM supports two key types of constraints that could imply additional
computational costs, but are addressed with heightened eﬃciency: 1) maximum
percentage of noisy and missing elements per bicluster (based on merging proce-
dures [11]), and 2) minimum homogeneity of the target biclusters (using exten-
sion and reduction procedures with a parameterizable merit function [11]).

4.2 Biologically Meaningful Constraints

Diﬀerent types of constraints were introduced in Deﬁnition 5. In order to illus-

trate how such constraints can be specified and instantiated, a symbolic gene
expression matrix (and associated “price table”) is provided in Fig.2, where the
rows correspond to different genes and the values correspond to observed levels of
expressions for a specific condition (column). The {-3,-2}, {-1,0,1} and {2,3} sets
of symbols are respectively associated with repressed (down-regulated), default
(preserved) and activated (up-regulated) levels of expression.
Pattern-Based Biclustering with Constraints 333

Fig. 2. Illustrative symbolic dataset and “price table” for expression data analysis.

First, succinct constraints in gene expression analysis allow the discovery

of genes with specific constrained levels of expression across a subset of condi-
tions. Illustrating, min(P )=-3 implies an interest in biclusters (biological pro-
cesses) where genes are at least highly repressed in one condition. Alternatively,
succinct constraints can be used to discover non-trivial biclusters by focusing
on non-highly differential expression (e.g. patterns with symbols {-2,2}). Such
option contrasts with the large focus on dense biclusters [16]. Finally, succinct
constraints can also be used to guarantee that a specific condition of interest
appears in the resulting set (e.g. P ∩ {y2 -3, y2 -2, y2 2, y2 3} = ∅ to include y2 ), or
a specific annotation (P ∩ {N1 , N2 } = ∅).
Second, (anti-)monotone constraints are key to capture background knowl-
edge and guide biclustering. Illustrating, the non-succinct monotonic constraint
countVal (P ) ≥ 2 implies that at least two different levels of expression must
be present within a bicluster (biological process). In gene expression analysis,
biclusters should be able to accommodate genes with different degrees of up-
regulation and/or down-regulation. Yet, the majority of existing biclustering
approaches are only able to model constant values across conditions [11,16].
When constraints, such as the value-counting inequality, are available, the prun-
ing of the search space allows an efficient handling of very low support thresholds
for these non-trivial biclusters to be discovered.
Finally, convertible constraints also play an important role in biological
settings to guarantee, for instance, that the observed patterns have an average
of values within a specific range. Illustrating, the anti-monotonic convertible
constraint avg(P ) ≤ 0 indicates a preference for patterns with repression mech-
anisms without a strict exclusion of activation mechanisms. These constraints
are useful to focus the discovery on specific expression levels, while still allowing
for noise deviations. Understandably, they are a robust alternative to the use of
strict bounds from succinct constraints with maximum-minimum inequalities.

4.3 Eﬀective Use of Constraints in Pattern-based Biclustering

Although native constraints are supported through adequate parameterizations
of pattern-based biclustering algorithms, the previous (non-native) constraints
are not directly supported. Nevertheless, as surveyed, pattern mining searches
have been extended to seize eﬃciency gains when succinct, (anti-)monotone or
convertible constraints are considered. Although there is large consensus that
pattern-growth searches are better positioned to seize eﬃciency gains from con-
straints than peer methods based on bitset vectors, there is not yet proof whether
334 R. Henriques and S.C. Madeira

Fig. 3. Illustrative behavior of F2G [13].

this observation remains valid in the context of full-pattern mining. As such, we

extend the recently proposed F2G algorithm to guarantee an optimal pruning of
the search space in the presence of constraints and integrate F2G in BiC2PAM.
F2G implements a pattern-growth search that does not suffer from efficiency
bottlenecks since it relies on tree structures where transaction-IDs are stored
without duplicates2 . F2G behavior is illustrated in Fig.3. In this section, we first
show the compliance of F2G with principles to handle succinct and convert-
ible constraints [23]. Second, we show the compliance of F2G with principles to
handle difficult combinations of monotone and anti-monotone constraints [3].
Compliance with Different Types of Constraints. Unlike candidate gen-
eration methods, pattern growth methods (such as FP-Growth) provide further
pruning opportunities. Pruning principles can be standardly applied on both
the original database (full FP-Tree) and on each projected database (condi-
tional FP-Tree). CFG extensions to FP-Growth [23] seize the properties of such
constraints under three simple principles. First, supersets of itemsets violating
anti-monotone constraints are removed for each (conditional) FP-Tree (e.g. for
y1 2 conditional database, remove conflicting items ∪m i=1 {yi 2, yi 3} as their sum
violates sum(P ) ≤ 3). For an effective pruning, it is recommended to order the
symbols in the header table according to their value and support [23,24]. F2G
is compliant with these removals, since it allows the rising of transaction-IDs in
the FP-Tree according to the order of candidate items for removal in the header
table (property explained in [13]).
For the particular case of an anti-monotone convertible constraint, itemsets
that satisfy the constraint are efficiently generated under a pattern-growth search
[24] (e.g. {y1 -3, y2 2, y4 2} itemset is not included in the generated pattern set
respecting avg(P ) ≤ 0), and provide a simple criterion to either stop FP-tree
projections or prune items in a (conditional) FP-Tree.
Finally, the removal of conflicting transactions (e.g. t1 and t4 does not
satisfy the illustrated succinct constraint) and of individual items (e.g. ∪m i=1 {yi -
2
The FP-tree is recursively mined to enumerate all full-patterns. Unlike peer pattern-
growth searches, transaction-IDs are not lost at the first scan. Full-patterns are
generated by concatenating the pattern suffixes with the full-patterns discovered
from conditional FP-trees where suffixes are removed. F2G is applicable on top of
FP-Close trees to mine closed full-patterns [13].
Pattern-Based Biclustering with Constraints 335

1, yi 0, yi 1}) do not cause changes in the FP-Tree construction methods. Addi-

tionally, constraint checks can be avoided for subsets of itemsets satisfying a
monotone constraint (e.g. no further checks of countVal (P ) ≥ 2 constraint when
the range of values in the suﬃx is ≥2 under the {y1 0, y1 1}-conditional FP-Tree).
Combination of Constraints. The previous extensions of pattern-growth
searches are not able to eﬀectively comply with monotone constraints when
anti-monotone constraints (such as minimum support) are also considered. In
FP-Bonsai [3], principles to further explore the monotone properties for pruning
the search space are considered without reducing anti-monotone pruning oppor-
tunities. This method is based on the ExAnte synergy of two data-reduction
operations that seize the properties of monotone constraints: μ-reduction, which
deletes transactions not satisfying C; and α-reduction, which deletes from trans-
actions single items not satisfying C. Thanks to the recursive projecting approach
of FP-growth, the ExAnte data-reduction methods can be applied on each con-
ditional FP-tree to obtain a compact number of smaller FP-Trees (FP-Bonsais).
The FP-Bonsai method can be combined with the previously introduced prin-
ciples, which are particularly prone to handle succinct and convertible anti-
monotone constraints. Since F2G can be extended to support the pruning of
FP-Trees, it complies with the FP-Bonsai extension.

5 Results and Discussion

In this section, we assess the performance of BiC2PAM on synthetic and real
datasets with diﬀerent types of constraints and three distinct full-pattern miners:
AprioriTID3 , Eclat3 and F2G. BiC2PAM is implemented in Java (JVM v1.6.0-
24). The experiments were computed using an IC i5 2.30GHz with 6GB of RAM.
Results on Synthetic Data. The generated data settings are described in
Table 1. Biclusters with diﬀerent shapes and coherency strength (|L|∈{4,7,10})
were planted by varying the number of rows and columns using Uniform dis-
tributions with ranges in Table 1. For each setting we instantiated 20 matrices
with background values generated with Uniform and Gaussian distributions.

Table 1. Properties of the generated dataset settings.

Matrix size (rows × columns) 500×50 1000×100 2000×200 4000×400

Nr. of hidden patterns 5 10 15 25
Nr. transactions for the hidden patterns [10,14] [14,30] [30,50] [50,100]
Nr. items for the hidden patterns [5,7] [6,8] [7,9] [8,10]

BiC2PAM was applied with a default merging option (70% of overlapping)

and a decreasing support until a minimum number of 50 (maximal) biclusters
was found. Fig.4 provides the results of parameterizing BiC2PAM with diﬀerent
pattern miners and two simple constraints deﬁning the target coherency strength
3
http://www.philippe-fournier-viger.com/spmf/
336 R. Henriques and S.C. Madeira

and symbols to remove. We observe that the proposed F2G miner is the most
efficient option for denser data settings (looser coherency). Also, in contrast
with existing biclustering algorithms, BiC2PAM seizes large efficiency gains from
neglecting specific ranges of values (symbols) from the input matrix.

Fig. 4. BiC2PAM performance in the presence of simplistic native constraints.

In order to test the ability of BiC2PAM to seize further efficiency gains in the
presence of non-trivial constraints, we fixed the 2000×200 setting with 6 sym-
bols/values {-3,-2,-1,1,2,3}. In the baseline performance, constraints were satis-
fied using post-filtering procedures. Fig.5 illustrates this analysis. As observed,
the use of constraints can significantly reduce the search complexity when they
are properly incorporated within the full-pattern mining method. In particular,
CFG principles [23] are used to seize efficiency gains from convertible constraints
and FP-Bonsai [3] to seize efficiency gains from monotonic constraints.

Fig. 5. Eﬃciency gains of considering constraints in F2G using diﬀerent principles.

Results on Real Data. Fig.6 shows the (time and memory) efficiency of apply-
ing BiC2PAM in the yeast4 expression dataset with different pattern miners and
varying support thresholds for a desirable coherency strength of 10% (|L|=10).
The proposed F2G is the most efficient option in terms of time and, along with
Apriori, a competitive choice for efficient memory usage.
Finally, Figs.7 and 8 show the impact of biologically meaningful constraints
in the efficiency and effectiveness of BiC2PAM. For this purpose, we used the
complete gasch dataset (6152×176) [6] with six levels of expression (|L|=6). The
effect of constraints in the efficiency is shown in Fig.7. This analysis supports
their key role of providing opportunities to solve hard biomedical tasks.
4
http://www.upo.es/eps/bigs/datasets.html
Pattern-Based Biclustering with Constraints 337

Fig. 6. Computational time and memory of full-pattern miners for yeast (2884×17).

Fig. 7. Eﬃciency gains from using biological constraints for gasch (6152×176).

The impact of these constraints in the relevance of pattern-based biclustering

solutions is illustrated in Fig.8. The biological relevance of each bicluster was
derived from the functionally enriched terms using an hypergeometric test of
Gene Ontology (GO) annotations [17]. As a measure of significance, we counted
the number of terms with Bonferroni corrected p-values below 0.01 [17]. Two
major observations can be retrieved. First, when focusing on properties of inter-
est (e.g. differential expression), the average significance of biclusters increases
as their genes have higher propensity to be functionally co-regulated. This trend
is observed despite the smaller size of the constrained biclusters. Second, when
focusing on rare expression profiles (≥3 distinct levels of expression), the aver-
age relevance of biclusters slightly decreases as their co-regulation is less obvious.
Yet, such non-trivial biclusters hold unique properties with potential interest.

Fig. 8. Biological relevance of F2G for multiple constraint-based proﬁles of expression.

6 Conclusions
This work motivates the task of biclustering biological data in the presence of
constraints. To answer this task, we explore the synergies between pattern-based
338 R. Henriques and S.C. Madeira

biclustering and domain-driven pattern mining. As a result, BiC2PAM algorithm

is proposed to effectively incorporate constraints derived from user expectations
and available background knowledge.
Two major sets of constraints were proposed for the discovery of biclus-
ters with specific interestingness criteria. First, native constraints to guarantee
the discovery of biclusters with parameterizable coherency, noise-tolerance and
shape, and to consider annotations from knowledge-based repositories. Second,
constraints with succinct, monotone, anti-monotone and convertible properties
to focus the search space on non-trivial yet biologically meaningful patterns.
In this context, we extended a recent pattern-growth search to optimally
explore efficiency gains in the presence of different types of constraints.
Results from synthetic and real data show that biclustering benefits from
large efficiency gains in the presence of constraints derived from background
knowledge. We further provide evidence of the relevance of the supported types
of constraints to discover non-trivial yet meaningful biclusters in expression data.
Acknowledgments. This work was supported by FCT under the project UID/CEC/
50021/2013 and the PhD grant SFRH/BD/75924/2011 to RH.

References
1. Besson, J., Robardet, C., De Raedt, L., Boulicaut, J.-F.: Mining Bi-sets in numeri-
cal data. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS, vol. 4747, pp. 11–23.
Springer, Heidelberg (2007)
2. Bonchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D.: Exante: a preprocessing
method for frequent-pattern mining. IEEE Intel. Systems 20(3), 25–31 (2005)
3. Bonchi, F., Goethals, B.: FP-Bonsai: the art of growing and pruning small FP-
trees. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI),
vol. 3056, pp. 155–160. Springer, Heidelberg (2004)
4. Bonchi, F., Lucchese, C.: Extending the state-of-the-art of constraint-based pattern
discovery. Data Knowl. Eng. 60(2), 377–399 (2007)
5. Fang, G., Haznadar, M., Wang, W., Yu, H., Steinbach, M., Church, T.R.,
Oetting, W.S., Van Ness, B., Kumar, V.: High-Order SNP Combinations Associ-
ated with Complex Diseases: Eﬃcient Discovery, Statistical Power and Functional
Interactions. Plos One 7 (2012)
6. Gasch, A.P., Werner-Washburne, M.: The genomics of yeast responses to environ-
mental stress and starvation. Functional & integrative genomics 2(4–5), 181–192
(2002)
7. Guerra, I., Cerf, L., Foscarini, J., Boaventura, M., Meira, W.: Constraint-based
search of straddling biclusters and discriminative patterns. JIDM 4(2), 114–123
(2013)
8. Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and
future directions. Data Min. Knowl. Discov. 15(1), 55–86 (2007)
9. Henriques, R., Madeira, S.: Biclustering with ﬂexible plaid models to unravel inter-
actions between biological processes. IEEE/ACM Trans, Computational Biology
and Bioinfo (2015). doi:10.1109/TCBB.2014.2388206
10. Henriques, R., Antunes, C., Madeira, S.C.: Generative modeling of repositories
of health records for predictive tasks. Data Mining and Knowledge Discovery,
pp. 1–34 (2014)
Pattern-Based Biclustering with Constraints 339

11. Henriques, R., Madeira, S.: Bicpam: Pattern-based biclustering for biomedical data
analysis. Algorithms for Molecular Biology 9(1), 27 (2014)
12. Henriques, R., Madeira, S.: Bicspam: Flexible biclustering using sequential pat-
terns. BMC Bioinformatics 15, 130 (2014)
13. Henriques, R., Madeira, S.C., Antunes, C.: F2g: Efficient discovery of full-patterns.
In: ECML /PKDD IW on New Frontiers to Mine Complex Patterns. Springer-
Verlag, Prague, CR (2013)
14. Khiari, M., Boizumault, P., Crémilleux, B.: Constraint programming for mining
n-ary patterns. In: Cohen, D. (ed.) CP 2010. LNCS, vol. 6308, pp. 552–567.
Springer, Heidelberg (2010)
15. Kuznetsov, S.O., Poelmans, J.: Knowledge representation and processing with for-
mal concept analysis. Wiley Interdisc. Reviews: Data Mining and Knowledge Dis-
covery 3(3), 200–215 (2013)
16. Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis:
A survey. IEEE/ACM Trans. Comput. Biol. Bioinformatics 1(1), 24–45 (2004)
17. Martin, D., Brun, C., Remy, E., Mouren, P., Thieffry, D., Jacq, B.: Gotoolbox:
functional analysis of gene datasets based on gene ontology. Genome Biology (12),
101 (2004)
18. Martinez, R., Pasquier, C., Pasquier, N.: Genminer: Mining informative association
rules from genomic data. In: BIBM, pp. 15–22. IEEE CS (2007)
19. Mouhoubi, K., Létocart, L., Rouveirol, C.: A knowledge-driven bi-clustering
method for mining noisy datasets. In: Huang, T., Zeng, Z., Li, C., Leung, C.S.
(eds.) ICONIP 2012, Part III. LNCS, vol. 7665, pp. 585–593. Springer, Heidelberg
(2012)
20. Nepomuceno, J.A., Troncoso, A., Nepomuceno-Chamorro, I.A., Aguilar-Ruiz, J.S.:
Integrating biological knowledge based on functional annotations for biclustering
of gene expression data. Computer Methods and Programs in Biomedicine (2015)
21. Ng, R.T., Lakshmanan, L.V.S., Han, J., Pang, A.: Exploratory mining and pruning
optimizations of constrained associations rules. SIGMOD R. 27(2), 13–24 (1998)
22. Okada, Y., Fujibuchi, W., Horton, P.: A biclustering method for gene expression
module discovery using closed itemset enumeration algorithm. IPSJ T. on Bioinfo.
48(SIG5), 39–48 (2007)
23. Pei, J., Han, J.: Can we push more constraints into frequent pattern mining? In:
KDD. pp. 350–354. ACM, New York (2000)
24. Pei, J., Han, J.: Constrained frequent pattern mining: a pattern-growth view.
SIGKDD Explor. Newsl. 4(1), 31–39 (2002)
25. Serin, A., Vingron, M.: Debi: Discovering differentially expressed biclusters using
a frequent itemset approach. Algorithms for Molecular Biology 6, 1–12 (2011)
26. Visconti, A., Cordero, F., Pensa, R.G.: Leveraging additional knowledge to sup-
port coherent bicluster discovery in gene expression data. Intell. Data Anal. 18(5),
837–855 (2014)
A Critical Evaluation of Methods for the
Reconstruction of Tissue-Specific Models

Sara Correia and Miguel Rocha(B)

Centre of Biological Engineering, University of Minho, Braga, Portugal

[email protected]

Abstract. Under the framework of constraint based modeling, genome-

scale metabolic models (GSMMs) have been used for several tasks,
such as metabolic engineering and phenotype prediction. More recently,
their application in health related research has spanned drug discovery,
biomarker identification and host-pathogen interactions, targeting dis-
eases such as cancer, Alzheimer, obesity or diabetes. In the last years, the
development of novel techniques for genome sequencing and other high-
throughput methods, together with advances in Bioinformatics, allowed
the reconstruction of GSMMs for human cells. Considering the diversity
of cell types and tissues present in the human body, it is imperative
to develop tissue-specific metabolic models. Methods to automatically
generate these models, based on generic human metabolic models and a
plethora of omics data, have been proposed. However, their results have
not yet been adequately and critically evaluated and compared.
This work presents a survey of the most important tissue or cell
type specific metabolic model reconstruction methods, which use litera-
ture, transcriptomics, proteomics and metabolomics data, together with
a global template model. As a case study, we analyzed the consistency
between several omics data sources and reconstructed distinct metabolic
models of hepatocytes using different methods and data sources as
inputs. The results show that omics data sources have a poor overlap-
ping and, in some cases, are even contradictory. Additionally, the hepa-
tocyte metabolic models generated are in many cases not able to perform
metabolic functions known to be present in the liver tissue. We conclude
that reliable methods for a priori omics data integration are required to
support the reconstruction of complex models of human cells.

1 Introduction
Over the last years, genome-scale metabolic models (GSMMs) for several organ-
isms have been developed, mainly for microbes with an interest in Biotechnology
[6,20]. These models have been used to predict cellular metabolism and promote
biological discovery [17], under constraint-based approaches such as Flux Bal-
ance Analysis (FBA) [18]. FBA ﬁnds a ﬂux distribution that maximizes biomass
production, considering the knowledge of stoichiometry and reversibility of reac-
tions, and taking some simplifying assumptions, namely assuming quasi steady-
state conditions.

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 340–352, 2015.
DOI: 10.1007/978-3-319-23485-4 35
A Critical Evaluation of Methods 341

Recently, eﬀorts on model reconstruction have also addressed more complex

multicellular organisms, including humans [5,9,25]. In biomedical research, they
have been used, for instance, to elucidate the role of proliferative adaptation
causing the Warburg effect in cancer [23], to predict metabolic markers for inborn
errors of metabolism [24] and to identify drug targets for specific diseases [11].
However, the human organism is quite complex, with a large number of cell
types/tissues and huge diversity in their metabolic functions. This led to the
need of developing tissue/cell type specific metabolic models, which could allow
studying in more depth specific cell phenotypes. Towards this end, it was imper-
ative to better characterize specific cell types, gathering relevant data. Indeed,
an important set of technological advances in the last decades greatly increased
available biological data through high-throughput studies that allow the iden-
tification and quantification of cell components (gene expression, proteins and
metabolites). These are collectively known as ‘omics’ data and have generated
new fields of study, such as transcriptomics, proteomics and metabolomics.
The most widely available omics data are transcriptomics, the quantification
of gene expression levels in a cell, using DNA microarrays or sequencing (e.g.
RNA-seq). The most significant databases for gene expression data are the Gene
Expression Omnibus (GEO) [3] and the ArrayExpress [19]. Other resources use
those databases as references to synthesize their information, such as the Gene
Expression Barcode [16], which provides absolute measures of expression for the
most annotated genes, organized by tissue, cell-types and diseases.
In the cells, mRNA is not always translated into protein, and the amounts
of protein depend on gene expression but also on other factors. Thus, knowledge
about the amounts of proteins in the cell, provided by proteomics data, is of
foremost relevance. These data can confirm the presence of proteins and provide
measurements of their quantities for each protein within a cell. For human cells, a
database is available with millions of high-resolution images showing the spatial
distribution of protein expression profiles in normal tissues, cancer and cell lines
- the Human Protein Atlas (HPA) portal [26].
Another source of information is provided by metabolomics data that involve
the quantification of the small molecules present in cells, tissues, organs and
biological fluids using techniques such as Nuclear Magnetic Resonance spec-
troscopy or Gas Chromatography combined with Mass Spectrometry [13]. The
Human Metabolome Database (HMDB) [28] contains spectroscopic, quantita-
tive, analytic and molecular-scale information about human metabolites, asso-
ciated enzymes or transporters, their abundance and disease-related properties.
Resources for omics data, together with generic human metabolic models,
have been used to generate context-specific models. This has been achieved
through the development of methods, such as the Model Building Algorithm
(MBA) [12], the Metabolic Context-specificity Assessed by Deterministic Reac-
tion Evaluation (mCADRE)[27] and the Task-driven Integrative Network Infer-
ence for Tissues (tINIT)[2].
The reconstructed models have allowed, for instance, to find metabolic tar-
gets to inhibit the proliferation of cancer cells [29], to study the interaction
342 S. Correia and M. Rocha

between distinct brain cells [14], and to find potential therapeutic targets for
the treatment of non-alcoholic steatohepatitis [15].
However, the aforementioned methods have not yet been critically and sys-
tematically evaluated on standardized case studies. Indeed, each of the methods
is proposed and validated with distinct cases and taking distinct omics data
sources as inputs. Thus, the impact of using different omics datasets on the final
results of those algorithms is a question that remains to be answered. Here, we
present a critical evaluation of the most important methods for the reconstruc-
tion of tissue-specific metabolic models published until now.
We have developed a framework where we implemented different methods
for the reconstruction of tissue-specific metabolic models. In this scenario, the
algorithms use sets of metabolites and/or maps of scores for each reaction as
input. So, in our framework the algorithms are independent from the omics data
source, and the separation of these two layers allows to use different data sources
in each algorithm for the generation of tissue-specific metabolic models. As a
case study, to compare the three different approaches implemented, metabolic
models were reconstructed for hepatocytes, using the same set of data sources as
inputs for each algorithm. Moreover, distinct combinations of data sources are
evaluated to check their influence on the final results.

2 Materials and Methods

2.1 Human Metabolic Models

Modeling metabolic systems requires the analysis and prediction of metabolic

ﬂux distributions under diverse physiological and genetic conditions. The human
organism is one of the most complex organisms to build a metabolic model
since the number of genes, types of cells and their diversity are huge. In the
last years, a few human metabolic models were proposed [5,9,15,25]. In this
work, we will use the Recon 2 human metabolic model that accounts for 7440
reactions, 5063 metabolites and 1789 enzyme-encoding genes. This model is a
community-driven expansion of the previous human reconstruction, Recon 1 [5],
with additional information from diﬀerent resources: EHMN [9], Hepatonet1[8],
Ac-FAO module[21] and the small intestinal enterocyte reconstruction [22].

2.2 Algorithms for Tissue-Specific Metabolic Models

Reconstruction

Although there are several applications of the human GSMMs, the speciﬁcity of
cell types requires the reconstruction of tissue-speciﬁc metabolic models. Some
approaches have been proposed based on existing generic human models. Here,
we present three of the most well-known approaches for this task that will be
used in the remaining of this work.
A Critical Evaluation of Methods 343

MBA. The Model-Building Algorithm (MBA) [12] reconstructs a tissue-speciﬁc

metabolic model from a generic model and two sets of reactions, denoted as core
reactions (CH ) and reactions with a moderate probability to be carried out in
the specific tissue (CM ). These sets were previously built according to evidence
levels based on omics data, literature and experimental knowledge. In general,
the CH set includes human-curated tissue-specific reactions and the CM set
includes reactions certified by omics data. The algorithm iteratively removes
one reaction from the generic model, in a random order, and validates if the
model remains consistent. The process ends when the removal of all reactions,
except the ones in CH , is tried. As a result, this algorithm reconstructs a model
containing all the CH reactions, as many as possible CM reactions, and a minimal
set of other reactions that are required for obtaining overall model consistency
(for each reaction there is a flux distribution in which it is active).
Since reactions are scanned in a random order, the authors recommend to
run the algorithm a large number of times to generate intermediate models.
After this step, a score per each reaction is calculated, according to the number
of times it appears in these models. The final model is built starting from CH
and iteratively adding reactions ordered by their scores, until a final consistent
model is achieved.

INIT/ tINIT. The Integrative Network Inference for Tissues (INIT) [1] uses
the Human Protein Atlas (HPA) as its main source of evidence. Expression data
can be used when proteomic evidence is missing. It also allows the integration
of metabolomics data by imposing a positive net production of metabolites for
which there is experimental support, for instance in HMDB. The algorithm is
formulated using mixed integer-linear programming (MILP), so that the final
model contains reactions with high scores from HPA data. This algorithm does
not impose strict steady-state conditions for all internal metabolites, allowing a
small net accumulation rate. A couple of years later, a new version of this algo-
rithm was proposed, the Task-driven Integrative Network Inference for Tissues
(tINIT) [2], which reconstructs tissue-specific metabolic models based on protein
evidence from HPA and a set of metabolic tasks that the final context-specific
model must perform. These tasks are used to test the production or uptake of
external metabolites, but also the activation of pathways that occur in a spe-
cific tissue. Another improvement from the previous version is the addition of
constraints to guarantee that irreversible reactions operate in one direction only.

mCADRE. The Metabolic Context speciﬁcity Assessed by Deterministic Reac-

tion Evaluation (mCADRE) [27] method is able to infer a tissue-speciﬁc network
based on gene expression data, network topology and reaction conﬁdence lev-
els. Based on the expression score, the reactions of the global model, used as
template, are ranked and separated in two sets - core and non-core. All reac-
tions with expression-based scores higher than a threshold value are included
in the core set, while the remaining reactions make the non-core set. In this
method, the expression score does not represent the level of expression, but
344 S. Correia and M. Rocha

rather the frequency of expressed states over several transcript profiles. So, it
is necessary to previously binarize the expression data. Thus, it is possible to
use data retrieved from the Gene Expression Barcode project that already con-
tains binary information on which genes are present or not in a specific tissue/
cell type. Reactions from the non-core set are ranked according to the expres-
sion scores, connectivity-based scores and confidence level-based scores. Then,
sequentially, each reaction is removed and the consistency of the model is tested.
The elimination only occurs if the reaction does not prevent the production of
a key-metabolite and the core consistency is preserved. Comparing with the
MBA algorithm, mCADRE presents two improvements: it allows the definition
of key metabolites, i.e. metabolites that have evidence to be produced in the
context-specific model reconstruction, and relaxes the condition of including all
core reactions in the final model.
Table 1 shows the mathematical formulation and pseudocode for all algo-
rithms described above.

Table 1. Formulation and description of algorithms of MBA, tINIT and mCADRE.

In the table RG represents the list of reaction from the global model, RC the set of
core reaction on mCADRE algorithm, CH and CM the core and moderate probability
sets used in MBA algorithm, r a reaction and the f or(i) and the rev(i) represent the
i-th reaction direction (forward and reverse).

MBA tINIT mCADRE

generateM odel(RG , CH , CM ) min i∈R wi ∗ yi generateM odel(RG , treshold)
RP ← RG s.t. RP ← RG
RS ← RP \(CH ∪ CM ) Sv = b RC ← score(RP ) > treshold
P ← randomP ermutation(RS ) |vi | ≤ vmax coreActiveG ← f lux(r)! = 0, r ∈ RC
f or(r ∈ P ) 0 < vi + (vmax ∗ yi ) ≤ vmax RN C ← RP \RC
inactiveR ← CheckM odel(RP , r) bj ≥ δ j ∈ M etabolomics f or(r ∈ order(RN C ))
eH ← inactiveR ∩ CH bj = 0 j ∈ M etabolomics inactiveR ← CheckM odel(RP , r)
eM ← inactiveR ∩ CM yf or(i) + yrev(i) ≤ 1 s1 = |inactiveR ∩ RC |
eX ← inactiveR\(CH ∪ CH ) vi ≥ δ, i ∈ RequiredReac s2 = |inactiveR ∩ RN C |
if (|eH | == 0 AN D |eM | < δ ∗ |eX |) yi ∈ 0, 1 if (r ∈ withExpressionV alues AN D
RP ← RP \(eM ∪ eX ) wi , score f or i ∈ R s1\s2 <= RACIO AN D
endif checkM odelF unction(Rp \inactiveR))
endf or RP ← RP \inactiveR
returnRP elseif (|s1| == 0 AN D
endf unction checkM odelF unction(Rp \inactiveR))
RP ← RP \inactiveR
endif
returnRP
endf unction

2.3 Omics Data

Proteomics data used in this work were retrieved from the Human Protein Atlas
(HPA) [26], which contains the profiles of human proteins in all major human
healthy and cancer cells. We collected information for the liver tissue (hepato-
cytes) from HPA version 12 and Ensembl [7] version 73.37. After a conversion
from Ensembl gene identifiers to gene symbols, duplicated genes with different
evidence levels were removed (Table S1 from supplementary data)1 .
1
All supplementary files are provided in http://darwin.di.uminho.pt/epia2015
A Critical Evaluation of Methods 345

Transcriptomics data were collected from Gene Expression Barcode (GEB)

[16] (HGU133plus2 (Human) cells v3). The conversion to gene expression
levels was done considering the average level of probes for each gene. The
mapping between probes and gene symbols was performed using the library
“hgu133plus2.db” [4] from Bioconductor. The gene expression is classiﬁed as
High, Moderate and Low if the gene expression evidence on that tissue is greater
than 0.9, between 0.5 and 0.9, and between 0.1 and 0.5, respectively. The genes
with expression evidence below 0.1 were considered not expressed in hepatocytes.
The reaction scores were obtained through the Gene-Protein-Rules present
in the Recon2 model, based on the scores associated with each gene in the data.
The reaction scores were calculated by taking the maximum (minimum) value of
expression scores for genes connected by an “OR” (“AND”). If one of the gene
scores is unknown, the other gene score is assumed in the conversion rule.

3 Results

To compare the metabolic models generated by the different algorithms and the
effects of distinct omics data sources, we chose the reconstruction of hepatocytes
metabolic models as our case study. Hepatocytes are the principal site of the
metabolic conversions underlying the diverse physiological functions of the liver
[10]. The hepatocytes metabolic models were generated using Recon2 as a tem-
plate model and the GEB, HPA and the sets CH and CM from [12] as input
data, for the three methods described in the previous section.
In the experiments, we seek to answer two main questions: Are omics data
consistent across different data sources? What is the overlap of the resulting
metabolic models obtained using different methods and different data sources? In
2010, a manually curated genome-scale metabolic network of human hepatocytes
was presented, the HepatoNet1 [8], used as a reference in the validation process.

3.1 Omics Data Consistency

The HPA has evidence information related with 16324 genes in hepatocytes. The
reliability of the data is also scored as “supportive” or “uncertain”, depending on
similarity in immunostaining patterns and consistency with protein/gene char-
acterization data. On the other hand, the GEB transcriptome has information
for 20149 genes, of which 5772 have evidence of being expressed in hepatocytes.
Together, these two data sources have information for 21921 genes, but only
14552 are present in both (Figure 1A). Moreover, the number of genes with
evidence of being expressed in the tissue in both sources is only of 3549, around
24% of all shared genes (Figure 1B). These numbers decrease signiﬁcantly if
using only HPA information marked as “supportive”. In this scenario, only 3868
genes are present also in GEB and only 1294 of them have expression evidence.
Next, evidence levels frequencies (High, Moderate, Low ) were calculated
across the GEB and HPA, as shown in Figure 2. Only a small number of genes
have similar evidence levels in both data sources. Furthermore, a signiﬁcant
346 S. Correia and M. Rocha

Fig. 1. A) Number of genes present in Gene Expression Barcode and Human Protein
Atlas. In HPA, the number of genes with reliability “supportive” and “uncertain” are
shown. B) Number of genes with evidence level “Low”, “Moderate” or “High” in HPA
and gene expression evidence higher than 0 in Gene Expression Barcode.

Fig. 2. A) Distribution of genes from Gene Expression Barcode project and Human
Protein Atlas across the evidence levels - “High”, “Moderate” and “Low”. The ranges
[0.9, 1], [0.5, 0.9[ and [0.1, 0.5[ were used to classify the data into “Low”, “Moderate”
and “High” levels. B) Genes with no evidence to be present in hepatocytes from GEB,
but with evidence in the HPA. C) Genes with no evidence to be present in hepatocytes
from HPA, but with evidence in GEB.

number of genes have contradictory levels of evidence - genes with expression

evidence in one data source and not expressed in the other. If we focus only
in the genes present in the model Recon2 with information in GEB and HPA
(supportive), there are 15% of genes with “High” or “Moderate” evidence in one
of the sources and not expressed in the other. This number increases to 22% if
we also consider “Low” evidence level (Supplementary Figure S1).
The methods to reconstruct tissue-speciﬁc metabolic models use reaction
scores calculated based on omics data to determine their inclusion in the ﬁnal
models. So, we analyzed the impact of these omics discrepancies in the values of
reaction scores and compared those with the manually curated set CH from Jerby
et al. [12]. In Figure 3A, the poor overlap of the reaction scores calculated based
A Critical Evaluation of Methods 347

Fig. 3. A) Reactions with evidence that support their inclusion in the hepatocytes
metabolic model. B) Number of reactions that have a high level of evidence of expres-
sion for each data source. C) Number of reactions that have a moderate evidence of
expression for each data source

on diﬀerent sources can be observed. Considering all data sources, 3243 reactions
show some evidence that support their inclusion in the hepatocytes metabolic
model, but only 388 are supported by all sources. The numbers are further
dramatically reduced if we consider only moderate or high levels of evidence
(Figure 3 B-C).

3.2 Metabolic Models

We applied each of the three algorithms to each omics data source, resulting
in nine metabolic models for hepatocytes. In the application of mCADRE, we
consider the list of key metabolites as published in the original article and a
threshold of 0.5 to calculate the core set. A set of core metabolic tasks, that
should occur in all cell types, was retrieved from [2] and used in the tINIT
algorithm. The final MBA models were constructed based on 50 intermediate
metabolic models. According to [12], a larger number would be desirable, but
the time needed to generate each model prevented larger numbers of replicates.
The detailed list of reactions that compose each metabolic model are available
in supplementary material.
In Figure 4 A-C, we observe the consistency of the intermediate models gen-
erated by MBA, as well as the number of occurrences of reactions present in the
final model. Moreover, Figure 4D shows the relations between the nine metabolic
models generated through hierarchical clustering. The models obtained using the
CH and CM sets as input data group together. Regarding the remaining, the
mCADRE and MBA resulting models group according to their data (HPA and
GEB), while the models created by tINIT cluster together. Overall, the data
used as input seems to be the most relevant factor in the final result.
A more detailed comparison between the models reconstructed using the
same algorithm or the same data source is available in Figure 5, A and B respec-
tively. Considering the models generated by the same algorithm, it is observed
that mCADRE has a smaller overlap (only 812 reactions) compared to the other
methods. This could be explained by the possibility of removing core reactions
during the mCADRE reconstruction process. Note that both reactions with
348 S. Correia and M. Rocha

Fig. 4. A-C) Distribution of reactions across the 50 models for each data type. Grey
bars show an histogram with the number of reactions present in a certain number of
models. Green bars show the reactions that are present in the ﬁnal model. D) Results
from hierarchical clustering of the resulting nine models.

“High” and “Moderate” evidence levels, and from CH and CM sets, are all
considered as belonging to the core. Furthermore, the mean of reactions that
belong to all models of the same algorithm is around 45%. When the compari-
son is made by grouping models with the same input data, the variance between
models is lower than grouping by algorithm. Here, the mean of reactions com-
mon to all models with the same data source is around 67% (Supplementary
Table S3). Again, the variability of the ﬁnal results seems to be dominated by
the data source factor.
The quality of the metabolic models was further validated using the metabolic
functions that are known to occur in hepatocytes [8]. The generic Recon2 human
metabolic model, used as template in the reconstruction process, is able to satisfy
337 of the 408 metabolic functions available. Metabolic functions related with
disease or involving metabolites not present in Recon2 were removed from the
original list.
A Critical Evaluation of Methods 349

Fig. 5. Metabolic models reaction intersection considering: (A) the same algorithm;
(B) the same omics data source.

Table 2. Number of reactions and the percentage of liver metabolic functions that
each metabolic model performs when compared with the template model - Recon 2.

Sets HPA GEB

Method
N. Reac. Tasks N. Reac Tasks N. Reac. Tasks
MBA 2044 18% 2633 24% 2909 6%
mCADRE 1728 2% 2387 3% 2327 4%
tINIT 2005 4% 2665 5% 3255 6%

The results of this functional validation, showing also the number of reactions
in each metabolic model, are given on Table 2. These show that the number of
satisﬁed metabolic tasks is very low compared with the manual curated metabolic
model HepatoNet1. The metabolic model which performs the higher number of
metabolic tasks was obtained using the MBA algorithm with the HPA evidence.
Nevertheless, the success percentage is less than 25% when comparing with the
performance by the template metabolic model - Recon2.

4 Conclusions
In this work, we present a survey of the most important methods for the recon-
struction of tissue-speciﬁc metabolic models. Each method was proposed to use
350 S. Correia and M. Rocha

diﬀerent data sources as input. Here, we analyze the consistency of information

across important omics data sources used in this context and verify the impact
of such differences in the final metabolic models generated by the methods.
The results show that metabolic models obtained depend more on the data
sources used as inputs, than on the algorithm used for the reconstruction. To
validate the accuracy of the obtained metabolic models, a set of metabolic func-
tions that should be performed in hepatocytes was tested for each metabolic
model. We found that the number of satisfied liver metabolic functions was sur-
prisingly low. This shows that methods for the reconstruction of tissue-specific
metabolic models based on a single omics data source are not enough to gen-
erate high quality metabolic models. Methods to combine several omics data
sources to rank the reactions for the reconstruction process could be a solution
to improve the results of these methods. Indeed, this study emphasizes the need
for the development of reliable methods for omics data integration, which seem
to be required to support the reconstruction of complex models of human cells,
but also reinforce the need to be able to incorporate known phenotypical data
available from literature or human experts.

Acknowledgments. S.C. thanks the FCT for the Ph.D. Grant SFRH/BD/
80925/2011. The authors thank the FCT Strategic Project of UID/BIO/04469/2013
unit, the project RECI/BBB-EBI/0179/2012 (FCOMP-01-0124-FEDER-027462) and
the project “BioInd - Biotechnology and Bioengineering for improved Industrial and
Agro-Food processes”, REF. NORTE-07-0124-FEDER-000028 Co-funded by the Pro-
grama Operacional Regional do Norte (ON.2 - O Novo Norte), QREN, FEDER.

References
1. Agren, R., Bordel, S., Mardinoglu, A., Pornputtapong, N., Nookaew, I., Nielsen, J.:
Reconstruction of Genome-Scale Active Metabolic Networks for 69 Human Cell
Types and 16 Cancer Types Using INIT. PLoS Computational Biology 8(5),
e1002518 (2012)
2. Agren, R., Mardinoglu, A., Asplund, A., Kampf, C., Uhlen, M., Nielsen, J.: Iden-
tiﬁcation of anticancer drugs for hepatocellular carcinoma through personalized
genome-scale metabolic modeling. Molecular Systems Biology 10, 721 (2014)
3. Barrett, T., Troup, D.B., Wilhite, S.E., Ledoux, P., et al.: NCBI GEO: archive for
functional genomics data sets - 10 years on. Nucleic Acids Research 39(suppl 1),
D1005–D1010 (2011)
4. Carlson, M.: hgu133plus2.db: Aﬀymetrix Human Genome U133 Plus 2.0 Array
annotation data (chip hgu133plus2) (2014). r package version 3.0.0
5. Duarte, N.C., Becker, S.A., Jamshidi, N., Thiele, I., Mo, M.L., Vo, T.D., Srivas, R.,
Palsson, B.O.: Global reconstruction of the human metabolic network based on
genomic and bibliomic data. Proceedings of the National Academy of Sciences of the
United States of America 104(6), 1777–1782 (2007)
6. Duarte, N.C., Herrgård, M.J., Palsson, B.O.: Reconstruction and validation of Sac-
charomyces cerevisiae iND750, a fully compartmentalized genome-scale metabolic
model. Genome Research 14(7), 1298–1309 (2004)
A Critical Evaluation of Methods 351

7. Flicek, P., Amode, M.R., Barrell, D., et al.: Ensembl 2014. Nucleic Acids Research
42(D1), D749–D755 (2014)
8. Gille, C., Bölling, C., Hoppe, A., et al.: HepatoNet1: a comprehensive metabolic
reconstruction of the human hepatocyte for the analysis of liver physiology. Molec-
ular Systems Biology 6(411), 411 (2010)
9. Hao, T., Ma, H.W., Zhao, X.M., Goryanin, I.: Compartmentalization of the Edin-
burgh Human Metabolic Network. BMC Bioinformatics 11, 393 (2010)
10. Ishibashi, H., Nakamura, M., Komori, A., Migita, K., Shimoda, S.: Liver architec-
ture, cell function, and disease. Seminars in Immunopathology 31(3) (2009)
11. Jerby, L., Ruppin, E.: Predicting Drug Targets and Biomarkers of Cancer via
Genome-Scale Metabolic Modeling. Clinical Cancer Research : An Official Journal
of the American Association for Cancer Research 18(20), 5572–5584 (2012)
12. Jerby, L., Shlomi, T., Ruppin, E.: Computational reconstruction of tissue-specific
metabolic models: application to human liver metabolism. Molecular Systems Biol-
ogy 6(401), 401 (2010)
13. Kaddurah-Daouk, R., Kristal, B., Weinshilboum, R.: Metabolomics: a global bio-
chemical approach to drug response and disease. Annu. Rev. Pharmacol. Toxicol.
48, 653–683 (2008)
14. Lewis, N.E., Schramm, G., Bordbar, A., Schellenberger, J., Andersen, M.P., Cheng,
J.K., Patel, N., Yee, A., Lewis, R.A., Eils, R., König, R., Palsson, B.O.: Large-scale
in silico modeling of metabolic interactions between cell types in the human brain.
Nature Biotechnology 28(12), 1279–1285 (2010)
15. Mardinoglu, A., Agren, R., Kampf, C., Asplund, A., Uhlen, M., Nielsen, J.:
Genome-scale metabolic modelling of hepatocytes reveals serine deficiency in
patients with non-alcoholic fatty liver disease. Nature Communications 5, Jan 2014
16. McCall, M.N., Jaffee, H.A., Zelisko, S.J., Sinha, N., et al.: The Gene Expression
Barcode 3.0: improved data processing and mining tools. Nucleic Acids Research
42(D1), D938–D943 (2014)
17. Oberhardt, M.A., Palsson, B.O., Papin, J.A.: Applications of genome-scale
metabolic reconstructions. Molecular Systems Biology 5(320), 320 (2009)
18. Orth, J.D., Thiele, I., Palsson, B.O.: What is flux balance analysis? Nature Biotech-
nology 28(3), 245–248 (2010)
19. Parkinson, H., Sarkans, U., Shojatalab, M., Abeygunawardena, N., et al.:
ArrayExpress-a public repository for microarray gene expression data at the EBI.
Nucleic Acids Research 33(Database issue), Jan 2005
20. Reed, J.L., Vo, T.D., Schilling, C.H., Palsson, B.O.: An expanded genome-scale
model of Escherichia coli K-12 ( i JR904 GSM / GPR ) 4(9), 1–12 (2003)
21. Sahoo, S., Franzson, L., Jonsson, J.J., Thiele, I.: A compendium of inborn errors
of metabolism mapped onto the human metabolic network. Mol. BioSyst. 8(10),
2545–2558 (2012)
22. Sahoo, S., Thiele, I.: Predicting the impact of diet and enzymopathies on human
small intestinal epithelial cells. Human Molecular Genetics 22(13), 2705–2722
(2013)
23. Shlomi, T., Benyamini, T., Gottlieb, E., Sharan, R., Ruppin, E.: Genome-scale
metabolic modeling elucidates the role of proliferative adaptation in causing the
Warburg effect. PLoS Computational Biology 7(3), e1002018 (2011)
24. Shlomi, T., Cabili, M.N., Ruppin, E.: Predicting metabolic biomarkers of human
inborn errors of metabolism. Molecular Systems Biology 5(263), 263 (2009)
25. Thiele, I., Swainston, N., Fleming, R.M.T., et al.: A community-driven global
reconstruction of human metabolism. Nature Biotechnology 31(5), May 2013
352 S. Correia and M. Rocha

26. Uhlen, M., Oksvold, P., Fagerberg, L., Lundberg, E., et al.: Towards a knowledge-
based Human Protein Atlas. Nat Biotech 28(12), 1248–1250 (2010)
27. Wang, Y., Eddy, J.A., Price, N.D.: Reconstruction of genome-scale metabolic mod-
els for 126 human tissues using mCADRE. BMC Systems Biology 6(1), 153 (2012)
28. Wishart, D.S., Knox, C., Guo, A.C., Eisner, R., et al.: HMDB: a knowledgebase
for the human metabolome. Nucleic Acids Research 37(suppl 1), Jan 2009
29. Yizhak, K., Le Dévédec, S.E., Rogkoti, V.M.M., et al.: A computational study of
the Warburg eﬀect identiﬁes metabolic targets inhibiting cancer migration. Molec-
ular Systems Biology 10(8) (2014)
Fuzzy Clustering for Incomplete Short Time
Series Data

Lúcia P. Cruz, Susana M. Vieira, and Susana Vinga(B)

IDMEC, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal

{lucia.cruz,susana.vieira,susanavinga}@tecnico.ulisboa.pt

Abstract. The analysis of clinical time series is currently a key topic in

biostatistics and machine learning applications to medical research. The
extraction of relevant features from longitudinal patients data brings
several problems for which novel algorithms are warranted. It is usually
impossible to measure many data points due to practical and also ethical
restrictions, which leads to short time series (STS) data. The sampling
might also be at unequally spaced time-points and many of the predicted
measurements are often missing. These problems constitute the rationale
of the present work, where we present two methods to deal with miss-
ing data in STS using fuzzy clustering analysis. The methods are tested
and compared using data with equal and varying time sampling inter-
val lengths, with and without missing data. The results illustrate the
potential of these methods in clinical studies for patient classiﬁcation
and feature selection using biomarker time series data.

Keywords: Short time series · Missing data · Fuzzy c-means (FCM)

clustering

1 Introduction

In medical care, the eﬃcient acquisition of information is subject to many obsta-

cles related with ethical and experimental restrictions, which usually leads to
longitudinal data with unequal and long sampling periods. Furthermore, these
time series are usually sparse and incomplete, which further hampers their anal-
ysis. Along with this, the high costs associated with medical analysis lead to
the necessity of performing less frequent tests. For Intensive Care Unit (ICU)
cases, for example, this issues have been studied. In [1] the issue of missing data,
in medical datasets, was addressed. In oncological studies, survival analysis is
usually applied to identify probability distributions. Regression methods, such
as Cox proportional hazards models, are further used to identify the statistically
signiﬁcant features associated with survival [2,3]. In this type of clinical studies
it is common to have biomarkers time series, which may be used to diagnose and
predict the outcome of the disease.
The main motivation of this work is to developed and test clustering algo-
rithms for biomarkers time series, a problem arising for example in the analysis of

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 353–359, 2015.
DOI: 10.1007/978-3-319-23485-4 36
354 L.P. Cruz et al.

bone metastatic patients. Due to the problems referred before, these biomarkers
measurements are commonly short time series with missing data. Many methods
are not able to deal, at the same time, with missing data and short time series,
let alone, with unevenly sampled time series. The proposed clustering algorithm
is able to take into account both missing data and short time series, evenly or
unevenly sampled. The approaches are unsupervised, since the marker study
objective is to relate the outcome of the unsupervised clustering with outcomes
of the patient’s health.

2 Methods for Short Incomplete Time Series Data

In this section we present the deﬁnitions and clustering algorithms to deal with a
collection of n short time series with missing data. All the presented approaches
are based on fuzzy c-means (FCM) algorithm [4].
A time series k of length s can be represented as xk = (xk1 , . . . , xkj , . . . , xks )
with xkj ≡ xk (tj ), with 1 ≤ j ≤ s and 1 ≤ k ≤ n.
Furthermore, with the creation of cluster prototypes, vij , for FCM algorithm,
i represents cluster numbers, subject to 1 ≤ i ≤ c, where c is the selected cluster
number.

2.1 Incomplete Data

In this section several approaches to deal with missing data are presented. These
might include some sort of imputation and deletion [1], further compared for
pattern recognition problems in [5] and explored in a fuzzy clustering setting [4].
The two methods here presented are the Partial Distance Strategy (PDS) [4] and
Optimal Completion Strategy (OCS) [4], both applied to fuzzy clustering. Both
use the Euclidean distance, but any other metric can also be used.
The PDS method computes the distance between two vectors, scaling it to
the proportion of non missing values to complete vectors size. For the entries with
missing values, the distance is simply updated with zero, as expressed in Eq. 1:
s
s
Dik = (xkj − vij )2 Ikj , (1)
Ik j=1

s 0, if xkj ∈ XM
where Ik = j=1 Ikj and Ikj = , for 1 ≤ j ≤ s , 1 ≤ k ≤ n.
1, if xkj ∈ XP
The OCS approach is an imputation method where the entries with missing
data, xkj , are substituted with a given value. For fuzzy clustering these values are
initialized at random, and their update results from the computation of Eq. 2,
using the partition matrix Uik and cluster prototypes vij . This equation derives
from the calculation of the cluster prototypes [4].
c m
i=1 (Uik ) vij
xkj = c m
(2)
i=1 (Uik )
Fuzzy Clustering for Incomplete Short Time Series Data 355

2.2 Short Time Series

This section presents the Short Time Series (STS) approach, where both even
and uneven sampled time series can be dealt with.
In survey [6] the only method capable of handling short time series is pre-
sented in [7], which also deals with unevenly sampled time series. The STS
s v(j+1) −vj
x(j+1) −xj 2
distance proposed dST S (x, v) = j=0 t(j+1) −tj − t(j+1) −tj is computed
between the cluster prototypes, vj , and the time series data points, xj , including
the corresponding times, tj . Based in the STS approach, two new methods to
deal with STS were developed.
1st order derivative calculation with normalization (Slopes): It uses the
approximate derivatives between each consecutive data point as input. First
the z -score normalization of the series is calculated [7], zk = xks−x x
,where x is
the mean value of xk and sx is the standard deviation of xk . The derivatives of
z −zj
each consecutive points are then computed, dvkj = t(j+1) (j+1) −tj
.
Combination of 1st order derivatives of time points (Slopes Comb): This
method’s objective is to generate a combination of time series data points deriva-
tives yet with diﬀerent, increasing lags. A vector is created with the derivatives
of consecutive data points (or intervals of one point). It is then incremented with
the derivatives of data points separated by two sample points, and so on and
so forth, until no more points separation is possible (maximum lag possible is
attained with xs and x1 ).
As an example lets consider the time series S = (x1 , x2 , x3 , x4 ) and equivalent
time points
T = (t1 , t2 , t3 ,
t4 ). The vector
creation,
V = (V1 , V 2 , V3 ), would
be
V1 = xt22 −x 1 x3 −x2 x4 −x3
−t1 , t3 −t2 , t4 −t3 , V2 =
x3 −x1 x4 −x2
t3 −t1 , t4 −t2 and V3 = xt44 −x
−t1 , where
1

V1 is the derivative of data points with interval of one point, V2 is the derivative
of data points with interval of two points and V3 is the derivative of data points
with interval of three points. The vector V will be the input used in this method.

2.3 Short Incomplete Time Series Fuzzy Clustering Algorithms

The algorithm construction is described as follows, making use of FCM [4].
For the FCM algorithm c theobjective function for any of the methods is to
n m 2
minimize J(x, v, U ) = i=1 k=1 Ui,k d (xk , vi ).
Having the methods to deal with missing data (PDS and OCS) and the
method to deal with short time series (STS), we propose 5 combinations: PDS-
Slopes, OCS-Slopes, PDS-Slopes Comb, OCS-Slopes Comb and STS-OCS. For
the combinations with Slopes and Slopes Comb, the FCM algorithm is the same
as in [4]. The alterations occur prior to the algorithm calculations, where to
generate the new dataset to be used it is applied either Slopes or Slopes Comb.
The STS - OCS case uses mainly the algorithm described in [7], with the
STS distance, yet the dataset to be used needs to be initialized with the missing
values imputation. This imputation is done after the normalization part in the
algorithm [7], and is based on Eq. 2, however the cluster prototypes calculation
follows the procedure described in [7].
356 L.P. Cruz et al.

3 Results and Discussion

These methods were tested on 4 datasets generated according to [7]. Each original
dataset contains 20 time series classified into four classes (five time series per
class) but with different characteristics, depending on the number of time points
t and if they are equally or unequally spaced. In Fig. 1 these datasets are
represented, where each cluster is defined with the same line type.
Every method was tested with the complete information (0% missing val-
ues) and with an increasing percentage of missing entries. These were generated
randomly, equally distributed through all the dataset. The restrictions imposed
were that no line (time series samples) nor column (time points) could be entirely
composed of missing data. The results presented were obtained for 500 runs with
500 different random seeds. The 500 seeds are kept the same for every method
tested to guarantee comparison exactness. In Fig. 2 the results for the 4 datasets
tested are presented in terms of accuracy of the final clustering obtained. We
compared the Mean number of misclassifications with the Percentage of missing
data (%), given that these results are the mean values of the 500 runs results.
The percentages of missing data tested were 0% (complete dataset), 10%, 20%,
30% and 40%. Since all datasets contain 20 time series samples, for e.g. a mean
number of misclassification of 10 time series corresponds to 50% of misclassifi-
cations. By observing Fig. 2 it is clear that the more missing values the dataset
has, the more misclassifications will be obtained. PDS and OCS roughly main-
tain their performance throughout increasing percentages of missings. STS-OCS
is the method which deteriorated the fastest, most likely due to the bias caused
by the imputation and the inadequate missing values update. When comparing
Slopes with Slopes and Slopes Comb with Slopes Comb, it is noteworthy that the

A) B)
5 4

0
0
−2

−4
−5
−6

−8

−10 −10
0 1 2 3 4 5 6 7 8 9 1011121314151617181920 0 1 2 3 4 5 8 9 11 16 18
Time Time

C) D)
6 6

4 4

2 2

0 0

−2 −2

−4 −4

−6 −6

−8 −8
1 2 3 4 5 6 1 2 4 7 10 13
Time Time

Fig. 1. Time Series classification. For each dataset the 20 time series are divided by
the 4 clusters, specified by the different contours.
Fuzzy Clustering for Incomplete Short Time Series Data 357

A) B)
12 12

10 10

misclassificaƟons
misclassificaƟons

Mean number of
Mean number of
8 8

6 6

4 4

2 2 PDS
0 0 PDS - Slopes
0 10 20 30 40 0 10 20 30 40 PDS - Slopes Comb
Percentage of missing data (%) Percentage of missing data (%)
OCS
C) D) OCS - Slopes
12 12
OCS - Slopes Comb
10 10
STS - OCS

misclassificaƟons
misclassificaƟons

Mean number of
Mean number of

8 8

6 6

4 4

2 2

0 0
0 10 20 30 40 0 10 20 30 40
Percentage of missing data (%) Percentage of missing data (%)

Fig. 2. Misclassiﬁcations of clustering methods. Average number of misclassiﬁcations

for 500 runs of time series with 20 samples, four classes, using diﬀerent sampling time
points T and intervals.

Table 1. Results of FCM variations for 40% missing data.

Datasets Methods Iteration Count Misclassifications Objective Function Error NID

PDS 70.52 (22.69) 5.75 (1.51) 5.26E-03 (6.06E-02) 0

PDS - Slopes 25.57 (13.85) 1.04 (1.67) 9.13E-05 (1.38E-03) 22
T = 20 PDS - Slopes Comb 16.32 (4.98) 0.03 (0.42) 5.95E-06 (2.25E-06) 0
equal OCS 88.21 (16.00) 6.88 (1.61) 5.97E-03 (3.49E-02) 0
OCS - Slopes 52.10 (31.12) 6.24 (3.82) 4.92E-03 (3.50E-02) 0
OCS - Slopes Comb 51.67 (24.58) 7.64 (3.93) 6.19E-03 (5.66E-02) 6
STS - OCS 49.45 (30.62) 11.12 (1.56) 8.27E-04 (4.93E-03) 0
PDS 63.62 (23.89) 7.94 (1.48) 1.47E-03 (1.51E-02) 0
PDS - Slopes 34.89 (18.52) 5.62 (2.34) 1.39E-05 (7.45E-05) 358
T = 10 PDS - Slopes Comb 24.69 (12.77) 0.71 (1.31) 9.87E-06 (5.61E-05) 11
unequal OCS 75.65 (21.06) 8.40 (1.54) 1.24E-03 (1.41E-02) 0
OCS - Slopes 89.17 (18.78) 6.24 (2.02) 9.65E-04 (5.49E-03) 0
OCS - Slopes Comb 71.19 (28.97) 4.99 (3.55) 1.36E-02 (9.08E-02) 1
STS - OCS 80.67 (20.11) 9.27 (1.78) 2.14E-04 (9.26E-04) 0
PDS 54.80 (22.06) 6.60 (1.44) 1.64E-03 (2.39E-02) 0
PDS - Slopes 27.67 (5.51) 8.33 (3.21) 7.10E-06 (2.52E-06) 497
T = 6 PDS - Slopes Comb 28.13 (15.10) 3.52 (2.30) 2.22E-04 (3.14E-03) 285
equal OCS 63.84 (24.28) 7.07 (1.50) 1.03E-03 (2.15E-02) 0
OCS - Slopes 85.38 (19.71) 7.36 (1.81) 7.49E-04 (4.69E-03) 0
OCS - Slopes Comb 87.09 (19.73) 5.26 (2.06) 4.23E-03 (2.90E-02) 1
STS - OCS 70.46 (23.40) 8.92 (1.72) 1.83E-04 (1.53E-03) 0
PDS 54.80 (22.06) 6.60 (1.44) 1.64E-03 (2.39E-02) 0
PDS - Slopes 15.67 (5.13) 8.33 (2.31) 4.22E-06 (3.96E-06) 497
T = 6 PDS - Slopes Comb 28.13 (15.10) 3.52 (2.30) 2.22E-04 (3.14E-03) 285
unequal OCS 63.84 (24.28) 7.07 (1.50) 1.03E-03 (2.15E-02) 0
OCS - Slopes 74.89 (23.76) 8.08 (1.78) 3.72E-04 (3.10E-03) 0
OCS - Slopes Comb 87.09 (19.73) 5.27 (2.05) 4.23E-03 (2.90E-02) 1
STS - OCS 45.31 (20.40) 8.70 (1.69) 1.36E-05 (7.26E-05) 0

PDS performs better than the OCS, probably due to the bias arising from the
imputation. In all datasets, the PDS-Slopes Comb method performed the best.
In Table 1 the results for 40% of missing data are summarised. For each
dataset every method was computed with 500 runs, and the mean and standard
358 L.P. Cruz et al.

deviation of Iteration Count, Misclassifications, Objective Function Error and

Number of Ignored Datasets (NID) is shown. Highlighted in each table are the
best results for each dataset. The NID has values diﬀerent from zero only for
Slopes and Slopes Comb. This occurs whenever a time series sample has only one
non missing value. It is noteworthy that the OCS approach avoids this problem
through imputation while the PDS cannot overcome it.

4 Conclusions
This work describes and compares several algorithms for the clustering of short
and incomplete time series data. As expected, all methods perform very well with
complete data and the performance decreases when the percentage of missing
data increases. Nevertheless it is possible to maintain a reasonable accuracy in
the final classification even for values as large as 40%. When the original time
series have some sort of underlying patterns or evident trends, the methods
using the combination of different slopes (Slopes comb) with all possible lags is
preferable. Overall, PDS with combined slopes achieves an excellent performance
in the datasets tested, with average values of zero misclassified series when 10% of
the values are missing. Interestingly, even for higher percentage of missing values,
the performance of this method with combined derivatives or slopes is very high,
with average misclassification much lower than what would be expected.
These algorithms can be applied directly in several areas of clinical stud-
ies, namely in oncology. One possible future application is the stratification of
patients based on biomarker evolution, which is expected to have a direct impact
on feature selection methods and survival analysis of oncological patients.

Acknowledgments. This work was supported by FCT, through IDMEC, under

LAETA, project UID/EMS/50022/2013, CancerSys (EXPL/EMS-SIS/1954/2013) and
Program Investigador FCT (IF/00653/2012 and IF/00833/2014), co-funded by the
European Social Fund (ESF) through the Operational Program Human Poten-
tial (POPH).

References
1. Cismondi, F., et al.: Missing data in medical databases: Impute, delete or classify?
Artiﬁcial Intelligence in Medicine 58(1), 63–72 (2013)
2. Westhoﬀ, P.G., et al.: An Easy Tool to Predict Survival in Patients Receiving
Radiation Therapy for Painful Bone Metastases. International Journal of Radiation
Oncology*Biology*Physics 90(4), 739–747 (2014)
3. Harries, M., et al.: Incidence of bone metastases and survival after a diagnosis of
bone metastases in breast cancer patients. Cancer Epidemiology 38(4), 427–434
(2014)
4. Hathaway, R.J., Bezdek, J.C.: Fuzzy c-means clustering of incomplete data. IEEE
Transactions on Systems, Man, and Cybernetics. Part B, Cybernetics : a Publication
of the IEEE Systems, Man, and Cybernetics Society 31(5), 735–744 (2001)
Fuzzy Clustering for Incomplete Short Time Series Data 359

5. Dixon, J.K.: Pattern recognition with partly missing data. IEEE Transactions on
Systems, Man and Cybernetics 9(10), 617–621 (1979)
6. Warren Liao, T.: Clustering of time series data - A survey. Pattern Recognition
38(11), 1857–1874 (2005)
7. Möller-Levet, C.S., Klawonn, F., Cho, K.-H., Wolkenhauer, O.: Fuzzy clustering
of short time-series and unevenly distributed sampling points. In: Berthold, M.,
Lenz, H.-J., Bradley, E., Kruse, R., Borgelt, C. (eds.) IDA 2003. LNCS, vol. 2810,
pp. 330–340. Springer, Heidelberg (2003)
General Artificial Intelligence
Allowing Cyclic Dependencies in Modular Logic
Programming

João Moura(B) and Carlos Viegas Damásio

CENTRIA / NOVA Laboratory for Computer Science and Informatics

(NOVA-LINCS), Universidade NOVA de Lisboa, Lisbon, Portugal
[email protected], [email protected]

Abstract. Even though modularity has been studied extensively in con-

ventional logic programming, there are few approaches on how to incor-
porate modularity into Answer Set Programming, a prominent rule-based
declarative programming paradigm. A major approach is Oikarinnen
and Janhunen’s Gaifman-Shapiro-style architecture of program modules,
which provides the composition of program modules. Their module the-
orem properly strengthens Lifschitz and Turner’s splitting set theorem
for normal logic programs. However, this approach is limited by module
conditions that are imposed in order to ensure the compatibility of their
module system with the stable model semantics, namely forcing output
signatures of composing modules to be disjoint and disallowing positive
cyclic dependencies between diﬀerent modules. These conditions turn out
to be too restrictive in practice and after recently discussing alternative
ways of lifting the ﬁrst restriction [17], we now show how one can allow
positive cyclic dependencies between modules, thus widening the appli-
cability of this framework and the scope of the module theorem.

1 Introduction
Over the last few years, answer set programming (ASP) [2,6,12,15,18] emerged
as one of the most important methods for declarative knowledge representation
and reasoning. Despite its declarative nature, developing ASP programs resem-
bles conventional programming: one often writes a series of gradually improving
programs for solving a particular problem, e.g., optimizing execution time and
space. Until recently, ASP programs were considered as integral entities, which
becomes problematic as programs become more complex, and their instances
grow. Even though modularity is extensively studied in logic programming, there
are only a few approaches on how to incorporate it into ASP [1,5,8,19] or other
module-based constraint modeling frameworks [11,22]. The research on modular
systems of logic program has followed two main-streams [3], one is program-
ming in-the-large where compositional operators are deﬁned in order to combine
diﬀerent modules [8,14,20]. These operators allow combining programs alge-
braically, which does not require an extension of the theory of logic programs.
The other direction is programming-in-the-small [10,16], aiming at enhancing
logic programming with scoping and abstraction mechanisms available in other

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 363–375, 2015.
DOI: 10.1007/978-3-319-23485-4 37
364 J. Moura and C.V. Damásio

programming paradigms. This approach requires the introduction of new logical

connectives in an extended logical language. The two mainstreams are thus quite
divergent.
The approach of [19] defines modules as structures specified by a program
(knowledge rules) and by an interface defined by input and output atoms which
for a single module are, naturally, disjoint. The authors also provide a module
theorem capturing the compositionality of their module composition operator.
However, two conditions are imposed: there cannot be positive cyclic dependen-
cies between modules and there cannot be common output atoms in the modules
being combined. Both introduce serious limitations, particularly in applications
requiring integration of knowledge from different sources. The techniques used
in [5] for handling positive cycles among modules are shown not to be adaptable
for the setting of [19].
In this paper we discuss two alternative solutions to the cyclic dependencies
problem, generalizing the module theorem by allowing positive loops between
atoms in the interfaces of the modules being composed. A use case for this
requirement can be found in the following example.
Example 1. Alice wants to buy a safe and inexpensive car; she preselected 3 cars,
namely c1 , c2 and c3 . Her friend Bob says that car c2 is expensive and Charlie says
that car c3 is expensive. Meanwhile, she consulted two car magazines reviewing
all three cars. The first considered c1 safe and the second considered c1 to be
safe while saying that c3 may be safe if it has an optional airbag. Furthermore,
if a friend declares that a car is expensive, then she will consider it safe. Alice
is very picky regarding safety, and so she seeks some kind of agreement between
the reviews.
The described situation can be captured with five modules, one for Alice,
other three for her friends, and another for each magazine. Alice should conclude
that c1 is safe since both magazines agree on this. Therefore, one would expect
Alice to opt for car c1 since it is not expensive, and it is safe.
In summary, the fundamental results of [19] require a syntactic operation to
combine modules – basically corresponding to the union of programs –, and a
compositional semantic operation joining the models of the modules. The module
theorem states that the models of the combined modules can be obtained by
applying the semantics of the natural join operation to the original models of
the modules – which is compositional.
This paper proceeds in Section 2 with an overview of the modular logic pro-
gramming paradigm, identifying some of its shortcomings. In Section 3 we dis-
cuss alternative methods for lifting the restriction that disallows positive cyclic
dependencies. We finish with conclusions and a general discussion.

2 Modularity in Answer Set Programming

Modular aspects of ASP have been clariﬁed in recent years, with authors describ-
ing how and when two program parts (modules) can be composed [5,11,19] under
Allowing Cyclic Dependencies in Modular Logic Programming 365

the stable model semantics. In this paper, we will make use of Oikarinen and
Janhunen’s logic program modules deﬁned in analogy to [8] which we review
after presenting the syntax of ASP.

Answer Set Programming Logic programs in the ASP paradigm are formed by
ﬁnite sets of rules r having the following syntax:

L1 ← L2 , . . . , Lm , not Lm+1 , . . . , not Ln . (n ≥ m ≥ 0)(1)

where each Li is a logical atom without the occurrence of function symbols –

arguments are either variables or constants of the logical alphabet.
Considering a rule of the form (1), let HeadP (r) = L1 be the literal in
the head, BodyP+ (r) = {L2 , . . . , Lm } be the set with all positive literals in the
body, BodyP− (r) = {Lm+1 , . . . , Ln } be the set containing all negative literals in
the body, and BodyP (r) = {L2 , . . . , Ln } be the set containing all literals in the
body. If a program is positive we will omit the superscript in BodyP+ (r). Also,
if the context is clear we will omit the subscript mentioning the program and
write simply Head(r) and Body(r) as well as the argument mentioning the rule.
The semantics of stable models is deﬁned via the reduct operation [9]. Given an
interpretation M (a set of ground atoms), the reduct P M of a program P with
respect to M is program P M = {Head(r) ← Body + (r) | r ∈ P, Body − (r) ∩ M =
∅}. An interpretation M is a stable model (SM) of P iﬀ M = LM (P M ), where
LM (P M ) is the least model of program P M .
The syntax of logic programs has been extended with other constructs,
namely weighted and choice rules [18]. In particular, choice rules have the fol-
lowing form:

{A1 , . . . , An } ← B1 , . . . Bk , not C1 , . . . , not Cm .(n ≥ 1)(2)

As observed by [19], the heads of choice rules possessing multiple atoms can be
freely split without aﬀecting their semantics. When splitting such rules into n
diﬀerent rules

{ai } ← B1 , . . . Bk , not C1 , . . . , not Cm where 1 ≤ i ≤ n,

the only concern is the creation of n copies of the rule body

B1 , . . . Bk , not C1 , . . . , not Cm .

However, new atoms can be introduced to circumvent this. There is a translation

of these choice rules to normal logic programs [7], which we assume is performed
throughout this paper but that is omitted for readability. We deal only with
ground programs and use variables as syntactic place-holders.

2.1 Modular Logic Programming (MLP)

Modules in the sense of [19] are essentially sets of rules with Input/Output
interfaces:
366 J. Moura and C.V. Damásio

Definition 1 (Program Module). A logic program module P is a tuple

R, I, O, H where:
1. R is a finite set of rules;
2. I, O, and H are pairwise disjoint sets of input, output, and hidden atoms;
3. At(R) ⊆ At(P) defined by At(P) = I ∪ O ∪ H; and
4. Head(R) ∩ I = ∅.
The set of atoms in Atv (P) = I ∪ O are considered to be visible and hence
accessible to other modules composed with P either to produce input for P
or to make use of the output of P. We use Ati (P) = I and Ato (P) = O to
represent the input and output signatures of P, respectively. The hidden atoms
in Ath (P) = At(P)\Atv (P) = H are used to formalize some auxiliary concepts of
P which may not be sensible for other modules but may save space substantially.
The condition head(R) ∈ I ensures that a module may not interfere with its own
input by defining input atoms of I in terms of its rules. Thus, input atoms are
only allowed to appear as conditions in rule bodies.
Example 2. The use case in Example 1 is encoded into the five modules shown
here:
PA =< { buy(X) ← car(X), saf e(X), not exp(X).
car(c1 ). car(c2 ). car(c3 ).},
{ saf e(c1 ), saf e(c2 ), saf e(c3 ), exp(c1 ), exp(c2 ), exp(c3 )},
{ buy(c1 ), buy(c2 ), buy(c3 )},
{ car(c1 ), car(c2 ), car(c3 )} >
PB =< { exp(c2 ).}, {}, {exp(c2 ), exp(c3 )}, {} >
PC =< { exp(c3 ).}, {}, {exp(c1 ), exp(c2 ), exp(c3 )}, {} >
Pmg1 =< { ← not saf e(c1 ). airbag(C) ← saf e(C).},
{ saf e(C)},
{ airbag(C)},
{}>
Pmg2 =< { saf e(X) ← car(X), airbag(X).
car(c1 ). car(c2 ). car(c3 ). ← not airbag(c1 ). {← not airbag(c3 )}. },
{ airbag(C)},
{ saf e(c1 ), saf e(c2 ), saf e(c3 )},
{ airbag(c1 ), airbag(c2 ), airbag(c3 ), car(c1 ), car(c2 ), car(c3 )} >
In Example 2, module PA encodes the rule used by Alice to decide if a car should
be bought. The safe and expensive atoms are its inputs, and the buy atoms
its outputs; it uses hidden atoms car/1 to represent the domain of variables.
Modules PB , PC and Pmg1 captures the factual information in Example 1 and
depends on input literal saf e to determine if its output states that a car has
an airbag or not. They have no input and no hidden atoms, but Bob has only
analyzed the price of cars c2 and c3 . The ASP program module for the second
magazine is more interesting1 , and expresses the rule used to determine if a car
1
car belongs to both hidden signatures of PA and Pmg2 which is not allowed when
composing these modules, but for clarity we omit a renaming of the car/1 predicate.
Allowing Cyclic Dependencies in Modular Logic Programming 367

is safe, namely that a car is safe if it has an airbag; it is known that car c1 has
an airbag, c2 does not, and the choice rule states that car c3 may or may not
have an airbag.
Next, the SM semantics is generalized to cover modules by introducing a
generalization of the Gelfond-Lifschitz’s fixpoint definition. In addition to weakly
negated literals (i.e., not ), also literals involving input atoms are used in the
stability condition. In [19], the SMs of a module are defined as follows:

Definition 2 (Stable Models of Modules). An interpretation M ⊆

At(P) program module P = R, I, O, H , iﬀ M =
is a SM of an ASP
LM RM ∪ {a.|a ∈ M ∩ I} . The SMs of P are denoted by AS(P).

Intuitively, the SMs of a module are obtained from the SMs of the rules part,
for each possible combination of the input atoms.

Example 3. Program modules PB , PC , and Pmg1 have each a single answer set:
AS(PB ) = {{exp(c2 )}}, AS(PC ) = {{exp(c3 )}}, and AS(Pmg1 ) = {{saf e(c1 ),
airbag(c1 )}}.
Module Pmg2 has two SMs, namely:
{saf e(c1 ), car(c1 ), car(c2 ), car(c3 ), airbag(c1 )}, and
{saf e(c1 ), saf e(c3 ), car(c1 ), car(c2 ), car(c3 ), airbag(c1 ), airbag(c3 )}.
Alice’s ASP program module has 26 = 64 models corresponding each to an
input combination of safe and expensive atoms. Some of these models are:

{ buy(c1 ), car(c1 ), car(c2 ), car(c3 ), saf e(c1 ) }

{ buy(c1 ), buy(c3 ), car(c1 ), car(c2 ), car(c3 ), saf e(c1 ), saf e(c3 ) }
{ buy(c1 ), car(c1 ), car(c2 ), car(c3 ), exp(c3 ), saf e(c1 ), saf e(c3 ) }

2.2 Composing Programs from Models

The composition of models is obtained from the union of program rules and by
constructing the composed output set as the union of modules’ output sets, thus
removing from the input all the specified output atoms. [19] define their first
composition operator as follows: Given two modules P1 = R1 , I1 , O1 , H1 and
P2 = R2 , I2 , O2 , H2 , their composition P1 ⊕ P2 is defined when their output
signatures are disjoint, that is, O1 ∩ O2 = ∅, and they respect each others hidden
atoms, i.e., H1 ∩ At(P2 ) = ∅ and H2 ∩ At(P1 ) = ∅. Then their composition is

P1 ⊕ P2 = R1 ∪ R2 , (I1 \O2 ) ∪ (I2 \O1 ), O1 ∪ O2 , H1 ∪ H2

However, the conditions given for ⊕ are not enough to guarantee composi-
tionality in the case of answer sets and as such they deﬁne a restricted form:

Definition 3 (Module Union Operator ). Given modules P1 , P2 , their

union is P1 P2 = P1 ⊕ P2 whenever (i) P1 ⊕ P2 is defined and (ii) P1 and P2
are mutually independent meaning that there are no positive cyclic dependencies
among rules in different modules, defined as loops through input and output
signatures.
368 J. Moura and C.V. Damásio

Natural join () on visible atoms is used in [19] to combine the stable models
of modules as follows:
Definition 4 (Join). Given modules P1 and P2 and sets of interpretations
A1 ⊆ 2At(P1 ) and A2 ⊆ 2At(P2 ) , the natural join of A1 and A2 is:

A1 A2 = { M1 ∪ M2 | M1 ∈ A1 , M2 ∈ A2 and M1 ∩ Atv (P2 ) = M2 ∩ Atv (P1 )}

This leads to their main result, stating that:

Theorem 1 (Module Theorem). If P1 , P2 are modules such that P1 P2 is
deﬁned, then:
AS(P1 P2 ) = AS(P1 ) AS(P2 )
Still according to [19], their module theorem also straightforwardly general-
izes for a collection of modules because the module union operator is commu-
tative, associative, and has the identity element < ∅, ∅, ∅, ∅ >.

Example 4. Consider the composition Q = (PA Pmg1 ) PB . First, we have

{buy(X) ← car(X), saf e(X), not exp(X).

car(c ). car(c ). car(c ). saf e(c ).},
1 2 3 1
PA Pmg1 = {exp(c1 ), exp(c2 ), exp(c3 )},
{buy(c1 ), buy(c2 ), buy(c3 ), saf e(c1 ), saf e(c2 ), saf e(c3 )},
{car(c1 ), car(c2 ), car(c3 )}

It is immediate to see that the module theorem holds in this case. The visible
atoms of PA are saf e/1, exp/1 and buy/1, and the visible atoms for Pmg1 are
{saf e(c1 ), saf e(c2 )}. The only model for Pmg1 = {saf e(c1 )} when naturally
joined with the models of PA , results in eight possible models where saf e(c1 ),
not saf e(c2 ), and not saf e(c3 ) hold, and exp/1 vary. The ﬁnal ASP program
module Q is
{buy(X) ← car(X), saf e(X), not exp(X).
car(c ). car(c ). car(c ). exp(c ). saf e(c ).},
1 2 3 2 1
{exp(c1 )},
{buy(c1 ), buy(c2 ), buy(c3 ), exp(c2 ), saf e(c1 ), saf e(c2 ), saf e(c3 )},
{car(c1 ), car(c2 ), car(c3 )}

The SMs of Q are thus:

{saf e(c1 ), exp(c1 ), exp(c2 ), car(c1 ), car(c2 ), car(c3 )} and
{buy(c1 ), saf e(c1 ), exp(c2 ), car(c1 ), car(c2 ), car(c3 )}

2.3 Shortcomings
The conditions imposed in these deﬁnitions bring about some shortcomings such
as the fact that the output signatures of two modules must be disjoint which dis-
allows many practical applications e.g., we are not able to combine the results of
program module Q with any of PC or Pmg2 , and thus it is impossible to obtain
Allowing Cyclic Dependencies in Modular Logic Programming 369

the combination of the five modules. Also because of this, the module union
operator is not reflexive. By trivially waiving this condition, we immediately
get problems with conflicting modules. The compatibility criterion for the oper-
ator also rules out the compositionality of mutually dependent modules, but
allows positive loops inside modules or negative loops in general. We illustrate
this in Example 5, which has been solved recently in [17] and the issue with
positive loops between modules in Example 6 .

Example 5 (Common Outputs). Given PB and PC , which respectively have:

AS(PB ) = {{exp(c2 )}} and AS(PC )={{exp(c3 )}},
the single SM of their union AS(PB PC ) is: {exp(c2 ), exp(c3 )}. However, the
join of their SMs is AS(PB ) AS(PC ) = ∅, invalidating the module theorem.

Example 6 (Cyclic Dependencies). Take the following two program modules (a

simpliﬁcation of the magazine modules in Example 2):

P1 = {airbag ← saf e.}, {saf e}, {airbag}, ∅

P2 = {saf e ← airbag.}, {airbag}, {saf e}, ∅

Their SMs are: AS(P1 ) = AS(P2 ) = {{}, {airbag, saf e}} while the single SM
of the union AS(P1 P2 ) is the empty model {}. Therefore AS(P1 P2 ) =
AS(P1 ) AS(P2 ) = {{}, {airbag, saf e}}, also invalidating the module theorem.

3 Positive Cyclic Dependencies Between Modules

To attain a generalized form of compositionality we need to be able to deal
with both restrictions identiﬁed previously and in particular cyclic dependencies
between modules. In the literature, [5] presents a solution based on a model
minimality property. It forces one to check for minimality on every comparable
models of all program modules being composed. It is not applicable to our setting
though, which can be seen in Example 7 where logical constant ⊥ represents value
f alse. Example 7 shows that [5] is not compositional in the sense of Oikarinen
and Janhunen.

Example 7. Given modules P1 = {a ← b. ⊥ ← not b.}, {b}, {a}, {} with one
SM {a, b}, and P2 = {b ← a.}, {a}, {b}, {} with SMs {} and {a, b}, their
composition has no inputs and no intended SMs while their minimal join contains
{a, b}.

Another possible solution requires the introduction of extra information in

the models to allow detecting mutual positive dependencies. This route has been
identiﬁed before [21] and is left for future work.
370 J. Moura and C.V. Damásio

3.1 Model Minimization

We present a model join operation that requires one to look at every model of
both modules being composed in order to check for minimality on models com-
parable on account of their inputs. However, this operation is able to distinguish
between atoms that are self supported through positive loops and atoms with
proper support, allowing one to lift the condition in Deﬁnition 3 disallowing
positive dependencies between modules.

Definition 5 (Minimal Join). Given modules P1 and P2 , let their composition

be PC = P1 ⊕ P2 . Deﬁne AS(P1 ) min AS(P2 ) = {M | M ∈ AS(P1 ) AS(P2 )
such that M ∈AS(P1 )AS(P2 ) : M ⊂ M and M ∩ Ati (PC ) = M ∩ Ati (PC )}

Example 8 (Minimal Join). A car is safe if it has an airbag and it has an airbag
if it is safe and the airbag is an available option. This is captured by two modules,
namely: P1 = {airbag ← saf e, available option.}, {saf e, available option},
{airbag}, ∅ and P2 = {saf e ← airbag.}, {airbag}, {saf e}, ∅ which respec-
tively have AS(P1 ) = {{}, {saf e}, {available option}, {airbag, saf e, available
option}} and AS(P2 ) = {{}, {airbag, saf e}}. The composition has
as its input signature {available option} and therefore its answer set
{airbag,safe,available option} is not minimal regarding the input signature of
the composition because {available option} is also a SM (and the only intended
model among these two). Thus AS(P1 ⊕ P2 ) = AS(P1 ) min AS(P2 ) =
{{}, {available option}}.

This join operator allows us to lift the prohibition of composing mutually depen-
dent modules under certain situations. Integrity constraints containing only
input atoms in their body are still a problem with this approach as these would
exclude models that would otherwise be minimal in the presence of unsupported
loops.

Theorem 2 (Minimal Module Theorem). If P1 , P2 are modules such that

P1 ⊕ P2 is deﬁned (allowing cyclic dependencies between modules), and that only
normal rules are used in modules, then:

AS(P1 ⊕ P2 ) = AS(P1 ) min AS(P2 )

3.2 Annotated Models for Composing Mutualy Dependent Modules

Because the former operator is not general and it forces us to compare one model
with every other model for minimality, thus it is not local, we present next an
alternative that requires adding annotations to models. We start by looking at
positive cyclic dependencies (loops) that are formed by composition. It is known
from the literature (e.g. [21]) that in order to do without looking at the rules of
the program modules being composed, which in the setting of MLP we assume
not having access to, we need to have extra information incorporated into the
models.
Allowing Cyclic Dependencies in Modular Logic Programming 371

Definition 6 (Dependency Transformation). Let P be an MLP. Its depen-

dency transformation is deﬁned as the set of rules (RP )A obtained from RP by
replacing each clause L1 ← L2 , . . . , Lm , not Lm+1 , . . . , not Ln .(n = m) in RP
with the following clause, where n = m and D = D2 ∪ . . . ∪ Dm is a set of
dependency sets:

(1) L1 : D ← L2 : D2 , . . . , Lm : Dm , not Lm+1 : Dm+1 , . . . , not Ln : Dn .

Definition 7 (Annotated Model). Given a module P = RA , I, O, H , its

set of annotated models is constructed as before: An interpretation M ⊆ AtA (P)
is an annotated answer set of an ASP program module P = R, I, O, H , if and
only if:
M = LM (RA )M ∪ {a : {{a}}. | a : D ∈ M ∩ I} ,
where (RA )MI is the version of the Gelfond-Lifschitz reduct allowing weighted
and choice rules, of the dependency transformation of R, and LM is the operator
returning the least model of the positive program argument. The set of annotated
stable models of P is denoted by AS A (P).

Semantic of Annotated Programs. An annotated interpretation maps every

atom into a set of subsets of input atoms, tracking the dependencies of the atom
in combinations of input atoms. The semantics of annotated programs is obtained
by iterating an immediate consequences monotonic operator applied to a deﬁnite
program, deﬁned as follows:

TP (I)(L1 ) = TP (I)(L2 ) ∪ . . . ∪ TP (I)(Lm ) | L1 ← L2 , . . . Lm ∈ RA

starting from the interpretation mapping every atom into the empty set. In order
to consider input atoms in modules we set I(a) = {{a}} for every a ∈ M ∩ I,
and {} otherwise.

Collapsed Annotated Models. Previous Deﬁnition 7 generates equivalent

models for each alternative rule where atoms from the model belong to the head
of the rule. We need to merge them into a collapsed annotated model where
the alternatives are listed as sets of annotations, in order to retain a one to
one correspondence between these and the SMs of the original program. As
we are only interested in this collapsed form, we will henceforth take collapsed
annotated models as annotated models.

Definition 8 (Collapsed Annotated Model). Let M and M be two anno-

tated models such that for every atom a ∈ M , it is also the case that a ∈ M
and vice-versa. A collapsed annotated model M of M and M is constructed as
follows:
M = {a : {D , D } | a : D ∈ M and a : D ∈ M }
372 J. Moura and C.V. Damásio

Given a module P, a program P (M ) can conversely be constructed from

one of the module’s annotated models M simply by adding rules of the form
a ← D1 , . . . , Dm . for each annotated atom a{D1 ,...,Dm } ∈ M . Such constructed
program P (M ) will be equivalent (but not strongly equivalent [13]) to taking
the original program and adding facts that belong to the annotated model M ,
intersected with the input signature of P, correspondingly.
Example 9 (Annotated Model). Let P = {a ← b, c. b ← d, not e, not f.}, {d, f },
{a, b}, ∅ be a module. P has one annotated model as per Definition 7: {b{{d}} , d}.

In the previous example, the first rule a ← b, c. can never be activated because
c is not an input atom (c ∈/ I) and it is not satisfied by the rules of the module
(RP |= c). Thus, the only potential positive loop is identified by {b{d},d }. If
we compose P with a module containing e.g., rule d ← b. and thus with an
annotated model {d{b} , b}, then it is easy to identify this as being a loop and if
any atoms in the loop are satisfied by the module composition then there will
be a stable model reflecting that. Also notice that since e is not a visible atom,
it does not interfere with other modules, as long as it is respected, and thus it
does not need to be in the annotation.

Cyclic Compatibility. We deﬁne next the compatibility of mutually dependent

models. We assume that the outputs are disjoint as per the original definitions.
The compatibility is defined as a two step criterion. The first is similar to the
original compatibility criterion, only adapted to dealing with annotated models
by disregarding the annotations. This first step makes annotations of negative
dependencies unnecessary. The second step takes models that are compatible
according to the first step and, after reconstructing two possible programs from
the compatible annotated models, implies computing the minimal model of the
union of these reconstructed programs and see if the union of the compatible
models is a model of the union of the reconstructed programs.
Definition 9 (Basic Model Compatibility). Let P1 and P2 be two mod-
ules. Let AS A (P1 ), respectively AS A (P2 ) be their annotated models. Let now
M1 ∈ AS A (P1 ) and M2 ∈ AS A (P2 ) be two models of the modules, they will be
compatible if:
M1 ∩ Atv (P2 ) = M2 ∩ Atv (P1 )
Now, for the second step of the cyclic compatibility criterion one takes mod-
els that passed the basic compatibility criterion and reconstruct their respective
possible programs as defined previously. Then one computes the minimal model
of the union of these reconstructed programs and see if the union of the origi-
nating models is a model of the union of their reconstructed programs.
Definition 10 (Annotation Compatibility). Let P1 and P2 be two modules.
Let AS A (P1 ), respectively AS A (P2 ) be their annotated models. Let now M1 ∈
AS A (P1 ) and M2 ∈ AS A (P2 ) be two compatible models according to Definition
9. They will be compatible annotated models if AS A (P1 ∪ P2 ) = M1 ∪ M2 .
Allowing Cyclic Dependencies in Modular Logic Programming 373

3.3 Attaining Cyclic Compositionality

After setting the way by which one can deal with positive loops by using annota-
tions in models, the join operator needs to be redeﬁned. The original composition
operators are applicable to annotated modules after applying Deﬁnition 6. This
way, their atoms positive dependencies are added to their respective models.

Definition 11 (Modified Join). Given two compatible annotated (in the sense
of Definition 10) modules P1 , P2 , their composition is P1 ⊗P2 = P1 ⊕P2 provided
that (i) P1 ⊕ P2 is defined. This way, given modules P1 and P2 and sets of
annotated interpretations AA1 ⊆ 2
At (P1 )
and AA
2 ⊆ 2
At (P2 )
, the natural join of
AA1 and AA
2 , denoted by AA
1 A A A
2 , is defined as follows for intersecting output
atoms:

{M1 ∪ M2 | M1 ∈ A1 , M2 ∈ A2 , s.t. M1 and M2 are compatible.}

Theorem 3 (Cyclic Module Theorem). If P1 , P2 are modules with anno-

tated models such that P1 P2 is deﬁned, then:

AS A (P1 P2 ) = AS A (P1 ) A AS A (P2 )

3.4 Shortcomings Revisited

By adding the facts contained in stable models of one composing module to
the other composing module, through a program transformation, one is able to
counter the fact that the inputs of the composed module are removed if they
are met by the outputs of either composing modules [17]. As for positive loops,
going back to Example 6, the new composition operator also produces desired
results:

Example 10 (Cyclic Dependencies Revisited). Take again the two program mod-
ules in Example 6:

P1 = {airbag ← saf e.}, {saf e}, {airbag}, ∅

P2 = {saf e ← airbag.}, {airbag}, {saf e}, ∅

which respectively have annotated models AS A (P1 )= {{}, {airbag{saf e} , saf e}}
and AS A (P2 )={{},{airbag, saf e{airbag} }} while AS A (P1 ⊗ P2 ) = {{},
{airbag{saf e} , saf e{airbag} }}. Because of this, AS A (P1 ⊗ P2 ) = AS A (P1 ) A
AS A (P2 ). Now, take P3 = {airbag.}, {}, {airbag}, ∅ and compose it with
P1 ⊗ P2 . We get AS A (P1 ⊗ P2 ⊗ P3 )= {{airbag, saf e}}.

4 Conclusions and Future Work

We lift the restriction that disallows composing modules with cyclic dependen-
cies in the framework of Modular Logic Programming [19]. We present a model
join operation that requires one to look at every model of two modules being
374 J. Moura and C.V. Damásio

composed in order to check for minimality of models that are comparable on

account of their inputs. This operation is able to distinguish between atoms that
are self supported through positive loops and atoms with proper support, allow-
ing one to lift the condition disallowing positive dependencies between modules.
However, this approach is not local as it requires comparing every models and,
as it is not general because it does not allow combining modules with integrity
constraints, it is of limited applicability.
Because of this lack of generality of the former approach, we present an
alternative solution requiring the introduction of extra information in the models
for one to be able to detect dependencies. We use models annotated with the way
they depend on the atoms in their module’s input signature. We then define their
semantics in terms of a fixed point operator. After setting the way by which one
deals with positive loops by using annotations in models, the join operator needs
to be redefined. The original composition operators are applicable to annotated
modules after applying Definition 7. This way, their positive dependencies are
added to their respective models. This approach turns out to be local, in the
sense that we need only look at two models being joined and unlike the first
alternative we presented, it works well with integrity constraints.
As future work we can straightforwardly extend these results to probabilistic
reasoning with ASP by applying the new module theorem to [4], as well as to
DLP functions and general stable models. An implementation of the framework
is also foreseen in order to assess the overhead when compared with the original
benchmarks in [19].

Acknowledgments. The work of João Moura was supported by grant SFRH/BD/

69006/2010 from Fundação para a Ciência e Tecnologia (FCT) from the Portuguese
Ministério do Ensino e da Ciência.

References
1. Babb, J., Lee, J.: Module theorem for the general theory of stable models. TPLP
12(4–5), 719–735 (2012)
2. Baral, C.: Knowledge Representation, Reasoning, and Declarative Problem Solv-
ing. Cambridge University Press (2003)
3. Bugliesi, M., Lamma, E., Mello, P.: Modularity in logic programming. J. Log.
Program. 19(20), 443–502 (1994)
4. Viegas Damásio, C., Moura, J.: Modularity of P-log programs. In: Delgrande, J.P.,
Faber, W. (eds.) LPNMR 2011. LNCS, vol. 6645, pp. 13–25. Springer, Heidelberg
(2011)
5. Dao-Tran, M., Eiter, T., Fink, M., Krennwallner, T.: Modular nonmonotonic logic
programming revisited. In: Hill, P.M., Warren, D.S. (eds.) ICLP 2009. LNCS,
vol. 5649, pp. 145–159. Springer, Heidelberg (2009)
6. Eiter, T., Faber, W., Leone, N., Pfeifer, G.: Computing preferred and weakly pre-
ferred answer sets bymeta-interpretation in answer set programming. In: Proceed-
ings AAAI 2001 Spring Symposium on Answer Set Programming, pp. 45–52. AAAI
Press (2001)
Allowing Cyclic Dependencies in Modular Logic Programming 375

7. Ferraris, P., Lifschitz, V.: Weight constraints as nested expressions. TPLP 5(1–2),
45–74 (2005)
8. Gaifman, H., Shapiro, E.: Fully abstract compositional semantics for logic
programs. In: Symposium on Principles of Programming Languages, POPL,
pp. 134–142. ACM, New York (1989)
9. Gelfond, M., Lifschitz, V.: The stable model semantics for logic programming. In:
Proceedings of the 5th International Conference on Logic Program. MIT Press
(1988)
10. Giordano, L., Martelli, A.: Structuring logic programs: a modal approach. The
Journal of Logic Programming 21(2), 59–94 (1994)
11. Järvisalo, M., Oikarinen, E., Janhunen, T., Niemelä, I.: A module-based framework
for multi-language constraint modeling. In: Erdem, E., Lin, F., Schaub, T. (eds.)
LPNMR 2009. LNCS, vol. 5753, pp. 155–168. Springer, Heidelberg (2009)
12. Lifschitz, V.: Answer set programming and plan generation. Artificial Intelligence
138(1–2), 39–54 (2002)
13. Lifschitz, V., Pearce, D., Valverde, A.: Strongly equivalent logic programs. ACM
Transactions on Computational Logic 2, 2001 (2000)
14. Mancarella, P., Pedreschi, D.: An algebra of logic programs. In: ICLP/SLP,
pp. 1006–1023 (1988)
15. Marek, V.W., Truszczynski, M.: Stable models and an alternative logic program-
ming paradigm. In: The Logic Programming Paradigm: A 25-Year Perspective
(1999)
16. Miller, D.: A theory of modules for logic programming. In. In Symp. Logic Pro-
gramming, pp. 106–114 (1986)
17. Moura, J., Damásio, C.V.: Generalising modular logic programs. In: 15th Interna-
tional Workshop on Non-Monotonic Reasoning (NMR 2014) (2014)
18. Niemelä, I.: Logic programs with stable model semantics as a constraint program-
ming paradigm. Annals of Mathematics and Artificial Intelligence 25, 72–79 (1998)
19. Oikarinen, E., Janhunen, T.: Achieving compositionality of the stable model
semantics for smodels programs. Theory Pract. Log. Program. 8(5–6), 717–761
(2008)
20. O’Keefe, R.A.: Towards an algebra for constructing logic programs. In: SLP,
pp. 152–160 (1985)
21. Slota, M., Leite, J.: Robust equivalence models for semantic updates of answer-set
programs. In: Brewka, G., Eiter, T., McIlraith, S.A. (eds.) Proc. of KR 2012. AAAI
Press (2012)
22. Tasharrofi, S., Ternovska, E.: A semantic account for modularity in multi-language
modelling of search problems. In: Tinelli, C., Sofronie-Stokkermans, V. (eds.) Fro-
CoS 2011. LNCS, vol. 6989, pp. 259–274. Springer, Heidelberg (2011)
Probabilistic Constraint Programming
for Parameters Optimisation
of Generative Models

Massimiliano Zanin(B) , Marco Correia, Pedro A.C. Sousa, and Jorge Cruz

NOVA Laboratory for Computer Science and Informatics,

FCT/UNL, Caparica, Portugal
[email protected], {mvc,pas,jcrc}@fct.unl.pt

Abstract. Complex networks theory has commonly been used for mod-
elling and understanding the interactions taking place between the ele-
ments composing complex systems. More recently, the use of generative
models has gained momentum, as they allow identifying which forces and
mechanisms are responsible for the appearance of given structural prop-
erties. In spite of this interest, several problems remain open, one of the
most important being the design of robust mechanisms for ﬁnding the
optimal parameters of a generative model, given a set of real networks. In
this contribution, we address this problem by means of Probabilistic Con-
straint Programming. By using as an example the reconstruction of net-
works representing brain dynamics, we show how this approach is superior
to other solutions, in that it allows a better characterisation of the param-
eters space, while requiring a signiﬁcantly lower computational cost.

Keywords: Probabilistic Constraint Programming · Complex net-

works · Generative models · Brain dynamics

1 Introduction
The last decades have witnessed a revolution in science, thanks to the appearance
of the concept of complex systems: systems that are composed of a large number
of interacting elements, and whose interactions are as important as the elements
themselves [1]. In order to study the structures created by such relationships,
several tools have been developed, among which complex networks theory [2,3],
a statistical mechanics understanding of graph theory, stands out.
Complex networks have been used to characterise a large number of different
systems, from social [4] to transportation ones [5]. They have also been valu-
able in the study of brain dynamics, as one of the greatest challenges in modern
science is the characterisation of how the brain organises its activity to carry
out complex computations and tasks. Constructing a complete picture of the
computation performed by the brain requires specific mathematical, statistical
and computational techniques. As brain activity is usually complex, with differ-
ent regions coordinating and creating temporally multi-scale, spatially extended

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 376–387, 2015.
DOI: 10.1007/978-3-319-23485-4 38
Probabilistic Constraint Programming for Parameters Optimisation 377

networks, complex networks theory appears as the natural framework for its
characterisation.
When complex networks are applied to brain dynamics, nodes are associated
to sensors (e.g. measuring the electric and magnetic activity of neurons), thus
to specific brain locations, and links to some specific conditions. For instance,
brain functional networks are constructed such that pairs of nodes are connected
if some kind of synchronisation, or correlated activity, is detected in those nodes
- the rationale being that a coordinated dynamics is the result of some kind
of information sharing [6]. Once these networks are reconstructed, graph the-
ory allows endowing them with a great number of quantitative properties, thus
vastly enriching the set of objective descriptors of brain structure and function
at neuroscientists’ disposal. This has especially been fruitful in the characterisa-
tion of the differences between healthy (control) subjects and patients suffering
from neurologic pathologies [7].
Once the topology (or structure) of a network has been described, a further
question may be posed: can such topology be explained by a set of simple gen-
erative rules, like a higher connectivity of neighbouring regions, or the influence
of nodes physical position? When a set of rules (a generative model) has been
defined, it has to be optimised and validated: one ought to obtain the best set
of parameters, such that the networks yielded by the model are topologically
equivalent to the real ones. This usually requires maximising a function of the
p-values representing the differences between the characteristics of the synthetic
and real networks. In spite of being accepted as a standard strategy, this method
presents several drawbacks. First, its high computational complexity: large sets
of networks have to be created and analysed for every possible combination of
parameters; and second, its unfitness for assessing the presence of multiple local
minima.
In this contribution, we propose the use of probabilistic constraint program-
ming (PCP) for characterising the space created by the parameters of a gen-
erative model, i.e. a space representing the distance between the topological
characteristics of real and synthetic networks. We show how this approach allows
recovering a larger quantity of information about the relationship between model
parameters and network topology, with a fraction of the computational cost
required by other methods. Additionally, PCP can be applied to single subjects
(networks), thus avoiding the constraints associated with working with a large
and homogeneous population. We further validate the PCP approach by study-
ing a simple generative model, and by applying it to a data set of brain activity
of healthy people.
The remainder of the text is organised as follows. Besides this introduction,
Sections 2 and 2.1 respectively review the state of the art in constraint program-
ming and its probabilistic version. Afterwards, the application of PCP is pre-
sented in Section 3 for a data set of brain magneto-encephalographic recordings,
and the advantages of PCP are discussed in Section 4. Finally, some conclusions
are drawn in Section 5.
378 M. Zanin et al.

2 Constraint Programming
A constraint satisfaction problem [8] is a classical artificial intelligence paradigm
characterised by a set of variables and a set of constraints, the latter specifying
relations among subsets of these variables. Solutions are assignments of values
to all variables that satisfy all the constraints.
Constraint programming is a form of declarative programming, in the sense
that instead of specifying a sequence of steps to be executed, it relies on prop-
erties of the solutions to be found that are explicitly defined by the constraints.
A constraint programming framework must provide a set of constraint reasoning
algorithms that take advantage of constraints to reduce the search space, avoid-
ing regions inconsistent with the constraints. These algorithms are supported by
specialised techniques that explore the specificity of the constraint model, such
as the domain of its variables and the structure of its constraints.
Continuous constraint programming [9,10] has been widely used to model
safe reasoning in applications where uncertainty on the values of the variables is
modelled by intervals including all their possibilities. A Continuous Constraint
Satisfaction Problem (CCSP) is a triple X, D, C, where X is a tuple of n real
variables x1 , · · · , xn , D is a Cartesian product of intervals D(x1 ) × · · · × D(xn )
(a box), each D(xi ) being the domain of variable xi , and C is a set of numerical
constraints (equations or inequalities) on subsets of the variables in X. A solution
of the CCSP is a value assignment to all variables satisfying all the constraints
in C. The feasible space F is the set of all CCSP solutions within D.
Continuous constraint reasoning relies on branch-and-prune algorithms [11]
to obtain sets of boxes that cover exact solutions for the constraints (the feasible
space F ). These algorithms begin with an initial crude cover of the feasible space
(the initial search space, D) which is recursively refined by interleaving pruning
and branching steps until a stopping criterion is satisfied. The branching step
splits a box from the covering into sub-boxes (usually two). The pruning step
either eliminates a box from the covering or reduces it into a smaller (or equal)
box maintaining all the exact solutions. Pruning is achieved through an algo-
rithm [12] that combines constraint propagation and consistency techniques [13]:
each box is reduced through the consecutive application of narrowing operators
associated with the constraints, until a fixed-point is attained. These opera-
tors must be correct (do not eliminate solutions) and contracting (the obtained
box is contained in the original). To guarantee such properties, interval analysis
methods are used.
Interval analysis [14] is an extension of real analysis that allows computations
with intervals of reals instead of reals, where arithmetic operations and unary
functions are extended for interval operands. For instance, [1, 3] + [3, 7] results
in the interval [4, 10], which encloses all the results from a point-wise evaluation
of the real arithmetic operator on all the values of the operands. In practice these
extensions simply consider the bounds of the operands to compute the bounds
of the result, since the involved operations are monotonic. As such, the narrow-
ing operator Z ← Z ∩ (X + Y ) may be associated with constraint x + y = z
to prune the domain of variable z based on the domains of variables x and y.
Probabilistic Constraint Programming for Parameters Optimisation 379

Similarly, in solving the equation with respect to x and y, two additional narrow-
ing operators can be associated with the constraint, to safely narrow the domains
of these variables. With this technique, based on interval arithmetic, the obtained
narrowing operators are able to reduce a box X × Y × Z = [1, 3] × [3, 7] × [0, 5]
into [1, 2] × [3, 4] × [4, 5], with the guarantee that no possible solution is lost.

2.1 Probabilistic Constraint Programming

In classical CCSPs, uncertainty is modelled by intervals that represent the

domains of the variables. Constraint reasoning reduces uncertainty, providing
a safe method for computing a set of boxes enclosing the feasible space. Nev-
ertheless this paradigm cannot distinguish between different scenarios, and all
combination of values within such enclosure are considered equally plausible. In
this work we use probabilistic constraint programming [15], which extends the
continuous constraint framework with probabilistic reasoning, allowing to fur-
ther characterise uncertainty with probability distributions over the domains of
the variables.
In the continuous case, the usual method for specifying a probabilistic
model [16] assumes, either explicitly or implicitly, a joint probability density
function (p.d.f.) over the considered random variables, which assigns a proba-
bility measure to each point of the sample space Ω. The probability of an event
H, given a p.d.f. f , is its multidimensional integral on the region defined by the
event: ˆ
P (H) = f (x)dx (1)
H
The idea of probabilistic constraint programming is to associate a proba-
bilistic space to the classical CCSP by defining an appropriate density function.
A probabilistic constraint space is a pair X, D, C´, f , where X, D, C is a
CCSP and f is a p.d.f. defined in Ω ⊇ D such that: Ω f (x)dx = 1.
A constraint (or a conjunction of constraints) can be viewed as an event H
whose probability can be computed by integrating the density function f over its
feasible space as in equation (1). The probabilistic constraint framework relies
on continuous constraint reasoning to get a tight box cover of the region of inte-
gration H, and computes the overall integral by summing up the contributions
of each box in the cover. Generic quadrature methods may be used to evaluate
the integral at each box.
In this work, Monte Carlo methods [17] are used to estimate the value of the
integrals at each box. The integral can be estimated by randomly selecting N
points in the multidimensional space and averaging the function values at these
points. This method displays √1N convergence, i.e. by quadrupling the number
of sampled points the error is halved, regardless of the number of dimensions.
The advantages obtainable from this close collaboration between constraint
pruning and random sampling were previously illustrated in ocean colour remote
sensing studies [18], where this approach achieved quite accurate results even
with small sampling rates. The success of this technique relies on the reduction
380 M. Zanin et al.

Fig. 1. Schematic representation of the use of generative models for analysing func-
tional networks. f and fˆ respectively represent real and synthetic topological features,
as the ones described in Sec. 3.2. Refer to Sec. 3 for a description of all steps of the
analysis.

of the sampling space, where a pure non-naı̈ve Monte Carlo (adaptive) method
is not only hard to tune, but also impractical in small error settings.

3 From Brain Activity to Network Models

In order to validate the use of PCP for analysing the parameters space of a
generative models, here we consider a set of magneto-encephalographic (MEG)
recordings. A series of preliminary steps are required, as shown in Fig. 1. First,
starting from the left, real brain data (or data representing any other real com-
plex system) have to be recorded and encoded in networks, then transformed
into a set of topological (structural) features. In parallel, as depicted in the right
part, a generative model has to be deﬁned: this allows to generate networks
as a function of the model parameters, and extract their topological features.
Finally, both features should be matched, i.e. the model parameters should be
optimised to minimise the distance between the vectors of topological features
of the synthetic and real networks.

3.1 MEG Data Recording

Magneto-encephalographic (MEG) scans were obtained for 19 right handed

elderly and healthy participants, recruited from the Geriatric Unit of the Hospi-
tal Universitario San Carlos Madrid and the Centro de Prevención del Deterioro
Cognitivo, Ayuntamiento de Madrid, Spain. Before the task execution, all partic-
ipants or legal representatives gave informed consent to participate in the study.
The study was approved by the local ethics committee.
Brain activity scans correspond to a modified version of the Sternberg’s letter-
probe task [19], a standard task used to evaluate elders memory proficiency. MEG
signals were recorded with a 254 Hz sampling rate, using 148-channel whole head
magnetometer, confined in a magnetically shielded room (MSR). 35 artefact-free
epochs were randomly chosen from those corresponding to correct answers for
each of participant.
Probabilistic Constraint Programming for Parameters Optimisation 381

3.2 Networks Reconstruction and Evaluation

Following the diagram of Fig. 1, MEG recordings are converted in functional

networks. Nodes, corresponding to MEG sensors and therefore to different brain
regions, are pairwise connected when some kind of common dynamics is detected
between the corresponding time series. Such relationship is assessed through
Synchronization Likelihood (SL) [20], a metric able to detect generalised syn-
chronisation, i.e. situations in which two time series react to a given input in
different, yet consistent ways [21]. It thus goes beyond simple linear correlations,
as it is able to detect non-linear and potentially chaotic relations. Applying SL
yields a correlation matrix C{wij } of size 148×148 (the number of sensors in the
MEG machine) for each epoch available. In order to filter any kind of transient
or noise specific to one epoch, the 35 matrices corresponding to each subjects
have been averaged: the final result is then a single weight matrix C̃{wij } for
each subject.
While a correlation matrix can readily be interpreted as a weighted fully-
connected network, few metrics are available to describe the structure of such
objects. It is then customary to apply a threshold, i.e. discard all links whose
weight is not significant, and thus obtain an unweighted network. This presents
several advantages. First of all, brain networks are expected to be naturally
sparse, as increasing the connectivity implies a higher physiological cost. Fur-
thermore, low synchronisation values may be the result of statistical fluctuations,
e.g. of correlated noise; in such cases, deleting spurious links can only improve the
understanding of the system. Lastly, a pruning can also help deleting indirect,
second order correlations, which do not represent direct dynamical relationships.
The final step involves the calculation of the topological metrics associated to
each pruned network, i.e. the f s of Fig. 1. Two have here been considered, repre-
senting two complementary aspects of brain information processing; their selec-
tion has been motivated by the generative model used afterwards (see Section
3.3):

Clustering Coeﬃcient. The clustering coeﬃcient, also known as transitivity,

measures the presence of triangles in the network [22]. Mathematically, it is
defined as the relationship between the number of triangles and the number
of connected triples in the network: C = 3NΔ /N3 . Here, a triangle is a set of
three nodes with links between each pair of them, while a connected triple is
a set of three nodes where each one can be reached from each other (directly
or indirectly). From a biological point of view, the clustering coefficient rep-
resents how brain regions are locally connected, creating dense communities
computing some information in a collaborative way.
Efficiency. It is defined as the inverse of the harmonic mean of the length of
the shortest paths connecting pairs of nodes [23]:
1 1
E= , (2)
N (N − 1) dij
i=j
382 M. Zanin et al.

0.5 0.5

0.4 0.4
Link density

Link density
0.3 0.3

0.2 0.2

0.1 0.1

0.0 0.0
0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8

Clustering coefficient Efficiency

Fig. 2. Evolution of clustering coeﬃcient (Left) and eﬃciency (Right) as a function of

the link density for the 19 functional brain networks reconstructed in Sec. 3.2.

dij being the distance between nodes i and j, i.e. the number of jumps
required to travel between them. A low value of E implies that all brain
regions are connected by short paths.

It has to be noticed how these two measures are complementary, the cluster-
ing coefficient and efficiency respectively representing the segregation and inte-
gration of information [24,25]. Additionally, both C and E are here defined as
a function of the threshold τ applied to prune the networks - their evolution is
represented in Fig. 2.

3.3 Generative Model Deﬁnition

Jumping to the right side of Fig. 1, it is now necessary to deﬁne a generative
model. As an example, we have here implemented a Economical Clustering Model
model as deﬁned in [26,27]. Given two nodes i and j, the probability of creating
a connection between them is given by:
γ −η
Pi,j ∝ ki,j di,j . (3)

ki,j is the number of neighbours common to i and j, and di,j is the physical
distance between the two nodes. This model thus includes two different forces
that compete to create links. On one side, γ controls the appearance of trian-
gles in the network, by positive biasing the connectivity between nodes having
nearest neighbours in common; it thus defines the clustering coefficient and the
appearance of computational communities. On the other side, η accounts for the
distance in the connection, such that long-range connections, which are biologi-
cally costly, are penalised.

3.4 Parameters Estimation Through P-values

The problem is now identifying the best values of γ and η that permit recovering
the topological properties obtained in Sec. 3.2 for the experimental brain networks.
Probabilistic Constraint Programming for Parameters Optimisation 383

4.0

First parameter (η)

2.0
2.0 4.0
Second parameter (η)

Fig. 3. (Left) Contour plot of the energy E (see Eq. 4) in the parameters space, for
a link density of 0.3. (Right) Energy contour plots for ten link densities, from 0.05
(bottom) to 0.5 (top); for the sake of clarity, only region outlines are visible.

As an example of a standard p-value based mechanism, we here use a simpliﬁed

version of the energy function proposed in Refs. [26,27]:

E = 1/ Pi . (4)
i

Pi represents the p-value of the Kolmogorov-Smirnoﬀ (K-S) test between the

distributions estimated from the model and experimental networks, and i runs
over all topological metrics. As just two topological properties are here studied,
the previous formula simplifies to: E = 1/(PE · PC ).
For each considered value of γ and η, a set of networks have been generated
according to the model of Eq. 3; their topological features extracted; and the
resulting probability distribution compared with the distribution corresponding
to the real networks, through a K-S test.
Fig. 3 presents the result of plotting the energy evolution in the parameters
space. Specifically, Fig. 3 Left reports the evolution of the energy for a link
density of 0.3. It can be noticed that a large portion of the space, constructed
around the values of γ and η suggested in [26], maximises the energy. Fig. 3
Right represents the same information for ten different link densities, from 0.05
(bottom part) to 0.5 (upper part).

3.5 Parameters Estimation Through Probabilistic Constraint

Programming
As an alternative solution, the previously described PCP method is here used
to recover the shape of the parameters space. Two preliminary steps have to
be completed: ﬁrst, reconstruct a set of synthetic networks using the generative
model of Eq. 3, for diﬀerent γ and η values, and extract their topological charac-
teristics; and second, obtain approximated functions describing the evolution of
the topological metrics as a function of the model parameters, i.e. C = f˜C (γ, η)
384 M. Zanin et al.

and E = f˜E (γ, η). Afterwards, each observed feature oi is modelled as a function

fi of the model parameters plus an associated error term i ∼ N μ = 0, σ 2 :

oi = fi (γ, η) + i

For n observations, a probabilistic constraint space is considered with random

variables γ and η, a set of constraints C,

C = {−3σ ≤ oi − fi (γ, η) ≤ 3σ|1 ≤ i ≤ n}

3σ being chosen to keep the error within reasonable bounds, and the joint p.d.f.
f,
n

f (γ, η) = g (oi − fi (γ, η)) (5)
i=1

where g is the normal distribution with 0 mean and standard deviation σ.

To compute the probability distribution of the random variables γ and η,
a grid is constructed over their domains and a branch-and-prune algorithm is
initially used to obtain a grid box cover of the feasible space (where each box
belongs to a single grid cell). Then, for each box in the cover, a Monte Carlo
method is used to compute its contribution to equation (1) with the p.d.f. deﬁned
in equation (5). The probability of the respective cell is updated accordingly and
normalised in end of the process.
Fig. 4 reports the results obtained, i.e. the probability of obtaining networks
with the generative model which are compatible with the real ones, as a function
of the two parameters γ and η, and as a function of the link density. In the next
Section, both approaches and their results are compared.

4 Comparing P-value and Probabilistic Constraint

Programming

Results presented in Sec. 3.4 and 3.5 allow comparing the p-value and PCP
methods, and highlight the advantages that the latter presents over the former.
The extremely high computational cost of analysing the parameters space
by means of K-S tests seldom allows a full characterisation of such space. This
is due to the fact that, for any set of parameters, a large number of networks
have to be created and characterised. Increasing the resolution of the analysis, or
enlarging the region of the space considered, increases the computational cost in
a linear way. This problem is far from being trivial, as, for instance, the networks
required to create Fig. 3 represents approximatively 3 GB of information and
several days of computation in a standard computer. Such computational cost
implies that it is easy to miss some important information. Let us consider, for
instance, the result presented in Fig. 3 Left. The shape of the iso-lines suggests
that the maximum is included in the region under analysis, and that no further
explorations are required - while Figs. 4 and 5 prove otherwise.
Probabilistic Constraint Programming for Parameters Optimisation 385

Fig. 4. Contour plot of the parameters space, as obtained by the PCP method, for
the whole population of subjects and as a function of the link density. The colour of
each point represents the normalised probability of generating topologically equivalent
networks.

11
First parameter ( )

7
First parameter ( )

-1

-2 2.3 6.6 11 Second parameter ( )

Second parameter ( )

Fig. 5. (Left) Parameters space, as obtained with the PCP method, for a link density
of 0.3 and for the whole studied population. (Right) Parameters space for six subjects.
The scale of the right graphs is the same as the left one; the colour scale is the same
of the one of Fig. 4.

On the other hand, estimating the functions f˜C and f˜E requires the creation
and analysis of a constant number of networks, independently on the size of
the parameters space. The total computational cost drops below the hour in
a standard computer, implying a 3 orders of magnitude reduction. This has
important consequences on the kind of information one can obtain. Fig. 5 Left
presents the same information as Fig. 3 Left, but calculated by means of PCP
386 M. Zanin et al.

over a larger region. It is then clear that the maximum identified in Fig. 3 is just
one of the two maxima presents in the system.
The second important advantage is that, while the PCP can yield results for
just one network or subject, a p-value analysis requires a probability distribu-
tion. It is therefore not possible to characterise the parameters space for just
one subject, but only for a large population. Fig. 5 Right explores this issue, by
showing the probability evolution in the parameters space for six different sub-
jects. It is interesting to notice how subjects are characterised by different shapes
in the space. This allows a better description of subjects, aimed for instance at
detecting differences among them.

5 Conclusions
In this contribution, we have presented the use of Probabilistic Constraint Pro-
gramming for optimising the parameters of a generative model, aimed at describ-
ing the mechanisms responsible for the appearance of some given topological
structures in real complex networks. As a validation case, we have here presented
the results corresponding to functional networks of brain activity, as obtained
through MEG recordings of healthy people.
The advantages of this method against other customary solutions, e.g. the
use of p-values obtained from Kolmogorov-Smirnoff tests, have been discussed.
First, the lower computational cost, and especially its independence on the size
of the parameters space and on the resolution of the analysis. This allows a bet-
ter characterisation of such space, reducing the risk of missing relevant results
when multiple local minima are present. Second, the possibility of characteris-
ing the parameters space for single subjects, thus avoiding the need of having
data for a full population. This will in turn open new doors for understand-
ing the differences between individuals: as, for instance, for the identification of
characteristics associated to specific diseases in diagnosis and prognosis tasks.

References
1. Anderson, P.W.: More is diﬀerent. Science 177, 393–396 (1972)
2. Albert, R., Barabási, A.L.: Statistical mechanics of complex networks. Reviews of
Modern Physics 74, 47 (2002)
3. Newman, M.E.: The structure and function of complex networks. SIAM Review
45, 167–256 (2003)
4. Costa, L.D.F., Oliveira Jr, O.N., Travieso, G., Rodrigues, F.A., Villas Boas, P.R.,
Antiqueira, L., Viana, M.P., Correa Rocha, L.E.: Analyzing and modeling real-
world phenomena with complex networks: a survey of applications. Advances in
Physics 60, 329–412 (2011)
5. Zanin, M., Lillo, F.: Modelling the air transport with complex networks: A short
review. The European Physical Journal Special Topics 215, 5–21 (2013)
6. Bullmore, E., Sporns, O.: Complex brain networks: graph theoretical analysis
of structural and functional systems. Nature Reviews Neuroscience 10, 186–198
(2009)
Probabilistic Constraint Programming for Parameters Optimisation 387

7. Papo, D., Zanin, M., Pineda-Pardo, J.A., Boccaletti, S., Buldú, J.M.: Functional
brain networks: great expectations, hard times and the big leap forward. Philo-
sophical Transactions of the Royal Society of London B: Biological Sciences 369,
20130525 (2014)
8. Mackworth, A.K.: Consistency in networks of relations. Artificial Intelligence 8,
99–118 (1977)
9. Lhomme, O.: Consistency techniques for numeric CSPs. In: Proc. of the 13th
IJCAI, pp. 232–238 (1993)
10. Benhamou, F., McAllester, D., van Hentenryck, P.: CLP(intervals) revisited. In:
ISLP, pp. 124–138 (1994)
11. Van Hentenryck, P., McAllester, D., Kapur, D.: Solving polynomial systems using
a branch and prune approach. SIAM Journal on Numerical Analysis 34, 797–827
(1997)
12. Granvilliers, L., Benhamou, F.: Algorithm 852: realpaver: an interval solver using
constraint satisfaction techniques. ACM Transactions on Mathematical Software
32, 138–156 (2006)
13. Benhamou, F., Goualard, F., Granvilliers, L., Puget, J.-F.: Revising hull and box
consistency. In: Procs. of ICLP, pp. 230–244 (1999)
14. Moore, R.: Interval analysis. Prentice-Hall, Englewood Cliffs (1966)
15. Carvalho, E.: Probabilistic constraint reasoning. PhD Thesis (2012)
16. Halpern, J.Y.: Reasoning about uncertainty. MIT, Cambridge (2003)
17. Hammersley, J.M., Handscomb, D.C.: Monte Carlo methods. Methuen, London
(1964)
18. Carvalho, E., Cruz, J., Barahona, P.: Probabilistic constraints for nonlinear inverse
problems. Constraints 18, 344–376 (2013)
19. Maestú, F., Fernández, A., Simos, P.G., Gil-Gregorio, P., Amo, C., Rodriguez, R.,
Arrazola, J., Ortiz, T.: Spatio-temporal patterns of brain magnetic activity during
a memory task in Alzheimer’s disease. Neuroreport 12, 3917–3922 (2001)
20. Stam, C.J., Van Dijk, B.W.: Synchronization likelihood: an unbiased measure
of generalized synchronization in multivariate data sets. Physica D: Nonlinear
Phenomena 163, 236–251 (2002)
21. Yang, S., Duan, C.: Generalized synchronization in chaotic systems. Chaos, Solitons
& Fractals 9, 1703–1707 (1998)
22. Newman, M.E.: Scientific collaboration networks. I. Network construction and fun-
damental results. Physical Review E 64, 016131 (2001)
23. Latora, V., Marchiori, M.: Efficient behavior of small-world networks. Physical
Review Letters 87, 198–701 (2001)
24. Tononi, G., Sporns, O., Edelman, G.M.: A measure for brain complexity: relating
functional segregation and integration in the nervous system. Proceedings of the
National Academy of Sciences 91, 5033–5037 (1994)
25. Rad, A.A., Sendiña-Nadal, I., Papo, D., Zanin, M., Buldu, J.M., del Pozo, F.,
Boccaletti, S.: Topological measure locating the effective crossover between segre-
gation and integration in a modular network. Physical Review Letters 108, 228701
(2012)
26. Vértes, P.E., Alexander-Bloch, A.F., Gogtay, N., Giedd, J.N., Rapoport, J.L.,
Bullmore, E.T.: Simple models of human brain functional networks. Proceedings
of the National Academy of Sciences 109, 5868–5873 (2012)
27. Vértes, P.E., Alexander-Bloch, A., Bullmore, E.T.: Generative models of rich clubs
in Hebbian neuronal networks and large-scale human brain networks. Philosophical
Transactions of the Royal Society B: Biological Sciences 369, 20130531 (2014)
Reasoning over Ontologies
and Non-monotonic Rules

Vadim Ivanov, Matthias Knorr(B) , and João Leite

NOVA LINCS, Departamento de Informática, Faculdade Ciências e Tecnologia,

Universidade Nova de Lisboa, Caparica, Portugal
[email protected]

Abstract. Ontology languages and non-monotonic rule languages are

both well-known formalisms in knowledge representation and reasoning,
each with its own distinct benefits and features which are quite orthog-
onal to each other. Both appear in the Semantic Web stack in distinct
standards – OWL and RIF – and over the last decade a considerable
research effort has been put into trying to provide a framework that
combines the two. Yet, the considerable number of theoretical approaches
resulted, so far, in very few practical reasoners, while realistic use-cases
are scarce. In fact, there is little evidence that developing applications
with combinations of ontologies and rules is actually viable. In this paper,
we present a tool called NoHR that allows one to reason over ontologies
and non-monotonic rules, illustrate its use in a realistic application, and
provide tests of scalability of the tool, thereby showing that this research
effort can be turned into practice.

1 Introduction
Ontology languages in the form of Description Logics (DLs) [4] and non-
monotonic rule languages as known from Logic Programming (LP) [6] are both
well-known formalisms in knowledge representation and reasoning (KRR) each
with its own distinct benefits and features. This is also witnessed by the emer-
gence of the Web Ontology Language (OWL) [18] and the Rule Interchange
Format (RIF) [7] in the ongoing standardization of the Semantic Web driven by
the W3C1 .
On the one hand, ontology languages have become widely used to represent
and reason over taxonomic knowledge and, since DLs are (usually) decidable
fragments of first-order logic, are monotonic by nature which means that once
drawn conclusions persist when adopting new additional information. They also
allow reasoning on abstract information, such as relations between classes of
objects even without knowing any concrete instances and a main theme inherited
from DLs is the balance between expressiveness and complexity of reasoning. In
fact, the very expressive general language OWL 2 with its high worst-case com-
plexity includes three tractable (polynomial) profiles [27] each with a different
application purpose in mind.
1
http://www.w3.org

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 388–401, 2015.
DOI: 10.1007/978-3-319-23485-4 39
Reasoning over Ontologies and Non-monotonic Rules 389

On the other hand, non-monotonic rules are focused on reasoning over

instances and commonly apply the Closed World Assumption (CWA), i.e., the
absence of a piece of information suffices to derive it being false, until new
information to the contrary is provided, hence the term non-monotonic. This
permits to declaratively model defaults and exceptions, in the sense that the
absence of an exceptional feature can be used to derive that the (more) common
case applies, and also integrity constraints, which can be used to ensure that the
considered data is conform with desired specifications.
Combining both formalisms has been frequently requested by applications [1].
For example, in clinical health care, large ontologies such as SNOMED CT2 , that
are captured by the OWL 2 profile OWL 2 EL and its underlying description logic
(DL) EL++ [5], are used for electronic health record systems, clinical decision
support systems, or remote intensive care monitoring, to name only a few. Yet,
expressing conditions such as dextrocardia, i.e., that the heart is exceptionally
on the right side of the body, is not possible and requires non-monotonic rules.
Finding such a combination is a non-trivial problem due to the considerable
differences as to how decidability is ensured in each of the two formalisms and
a naive combination is easily undecidable. In recent years, there has been a
considerable amount of effort devoted to combining DLs with non-monotonic
rules as known from Logic Programming – see, e.g., related work in [12,28]) – but
this has not been accompanied by similar variety of reasoners and applications.
In fact, only very few reasoners for combining ontologies and non-monotonic rules
exist and realistic use-cases are scarce. In other words, there is little evidence
so far that developing applications in combinations of ontologies and rules is
actually viable.
In this paper, we want to contribute to showing that this paradigm is viable
by describing a tool called NoHR and show how it can be used to handle a real
use-case efficiently as well as its scalability. NoHR is theoretically founded in
the formalism of Hybrid MKNF under the well-founded semantics [22] which
comes with two main arguments in its favor. First, the overall approach, which
was introduced in [28] and is based on the logic of minimal knowledge and
negation as failure (MKNF) [26], provides a very general and flexible framework
for combining DL ontologies and non-monotonic rules (see [28]). Second, [22],
which is a variant of [28] based on the well-founded semantics [13] for logic
programs, has a lower data complexity than the former – it is polynomial for
polynomial DLs – and is amenable for applying top-down query procedures, such
as SLG(O) [2], to answer queries based only on the information relevant for the
query, and without computing the entire model – no doubt a crucial feature
when dealing with large ontologies and huge amounts of data.
NoHR is realized as a plug-in for the ontology editor Protégé 4.X3 , that
allows the user to query combinations of EL+ ⊥ ontologies and non-monotonic
rules in a top-down manner. To the best of our knowledge, it is the first Protégé
plug-in to integrate non-monotonic rules and top-down queries. We describe its

2
http://www.ihtsdo.org/snomed-ct/
3
http://protege.stanford.edu
390 V. Ivanov et al.

Table 1. Syntax and semantics of EL+

⊥.

Syntax Semantics
atomic concept A ∈ NC AI ⊆ ΔI
atomic role R ∈ NR R ⊆ ΔI × ΔI
I

individual a ∈ NI aI ∈ Δ I
top ΔI
bottom ⊥ ∅
conjunction C D C I ∩ DI
existential restriction ∃R.C {x ∈ ΔI | ∃y ∈ ΔI : (x, y) ∈ RI ∧ y ∈ C I }
concept inclusion C D C I ⊆ DI
role inclusion R S RI ⊆ S I
role composition R1 ◦ · · · ◦ Rk S (x1 , x2 ) ∈ R1 ∧ . . . ∧ (xk , y) ∈ RkI → (x1 , y) ∈ S I
I

concept assertion C(a) aI ∈ C I

role assertion R(a, b) (a , b ) ∈ RI
I I

features including the possibility to load and edit rule bases, and deﬁne pred-
icates with arbitrary arity; guaranteed termination of query answering, with a
choice between one/many answers; robustness w.r.t. inconsistencies between the
ontology and the rule part and demonstrate its eﬀective usage on the applica-
tion use-case combining EL+ ⊥ ontologies and non-monotonic rules outlined in
the following and adapted from [29], as well as an evaluation for real ontology
SNOMED CT with over 300,000 concepts.

Example 1. The customs service for any developed country assesses imported
cargo for a variety of risk factors including terrorism, narcotics, food and con-
sumer safety, pest infestation, tariﬀ violations, and intellectual property rights.
Assessing this risk, even at a preliminary level, involves extensive knowledge
about commodities, business entities, trade patterns, government policies and
trade agreements. Parts of this knowledge is ontological information and taxo-
nomic, such as the classiﬁcation of commodities, while other parts require the
CWA and thus non-monotonic rules, such as the policies involving, e.g., already
known suspects. The overall task then is to access all the information and assess
whether some shipment should be inspected in full detail, under certain condi-
tions randomly, or not at all.

The remainder of the paper is structured as follows. In Sect. 2, we brieﬂy

recall the DL EL+ ⊥ and MKNF knowledge bases as a tight combination of the
former DL and non-monotonic rules. Then, in Sect. 3, we present the Protégé
plug-in NoHR, and, in Sect. 4, we discuss the cargo shipment use case and its
realization using NoHR. We present some evaluation data in Sect. 5, before we
conclude in Sect. 64 .
4
Details on the translation of EL ontologies into rules used in NoHR can be found in
[19].
Reasoning over Ontologies and Non-monotonic Rules 391

2 Preliminaries
2.1 Description Logic EL+
⊥

We start by recalling the syntax and semantics of EL+ ⊥ , a large fragment of

EL++ [5], the DL underlying the tractable profile OWL 2 EL [27], following the
presentation in [21]. For a more general and thorough introduction to DLs we
refer to [4].
The language of EL+ ⊥ is defined over countably infinite sets of concept names
NC , role names NR , and individual names NI as shown in the upper part of
Table 1. Building on these, complex concepts are introduced in the middle part
of Table 1, which, together with atomic concepts, form the set of concepts. We
conveniently denote individuals by a and b, (atomic) roles by R and S, atomic
concepts by A and B, and concepts by C and D. All expressions in the lower
part of Table 1 are axioms. A concept equivalence C ≡ D is an abbreviation for
C D and D C. Concept and role assertions are ABox axioms and all other
axioms TBox axioms, and an ontology is a finite set of axioms.
I I
The semantics of EL+ ⊥ is defined in terms of an interpretation I = (Δ , · )
I I
consisting of a non-empty domain Δ and an interpretation function · . The
latter is defined for (arbitrary) concepts, roles, and individuals as in Table 1.
Moreover, an interpretation I satisfies an axiom α, written I |= α, if the cor-
responding condition in Table 1 holds. If I satisfies all axioms occurring in an
ontology O, then I is a model of O, written I |= O. If O has at least one model,
then it is called consistent, otherwise inconsistent. Also, O entails axiom α, writ-
ten O |= α, if every model of O satisfies α. Classification requires to compute
all concept inclusions between atomic concepts entailed by O.

2.2 MKNF Knowledge Bases

MKNF knowledge bases (KBs) build on the logic of minimal knowledge and
negation as failure (MKNF) [26]. Two main diﬀerent semantics have been deﬁned
[22,28], and we focus on the well-founded version [22], due to its lower compu-
tational complexity and amenability to top-down querying without computing
the entire model. Here, we only point out important notions, and refer to [22]
and [2] for the details.
We start by recalling MKNF knowledge bases as presented in [2] to combine
an (EL+ ⊥ ) ontology and a set of non-monotonic rules (similar to a normal logic
program).

Definition 2. Let O be an ontology. A function-free first-order atom

P (t1 , . . . , tn ) s.t. P occurs in O is called DL-atom; otherwise non-DL-atom.
A rule r is of the form

H ← A1 , . . . , An , notB1 , . . . , notBm (1)

where the head of r, H, and all Ai with 1 ≤ i ≤ n and Bj with 1 ≤ j ≤ m in

the body of r are atoms. A program P is a finite set of rules, and an MKNF
392 V. Ivanov et al.

knowledge base K is a pair (O, P). A rule r is DL-safe if all its variables occur
in at least one non-DL-atom Ai with 1 ≤ i ≤ n, and K is DL-safe if all its rules
are DL-safe. The ground instantiation of K is the KB KG = (O, PG ) where PG
is obtained from P by replacing each rule r of P with a set of rules substituting
each variable in r with constants from K in all possible ways.
DL-safety ensures decidability of reasoning with MKNF knowledge bases and can
be achieved by introducing a new predicate o, adding o(i) to P for all constants
i appearing in K and, for each rule r ∈ P, adding o(X) for each variable X
appearing in r to the body of r. Therefore, we only consider DL-safe MKNF
knowledge bases.
The semantics of K is based on a transformation of K into an MKNF formula
to which the MKNF semantics can be applied (see [22,26,28] for details). Instead
of spelling out the technical details of the original MKNF semantics [28] or its
three-valued counterpart [22], we focus on a compact representation of models
for which the computation of the well-founded MKNF model is defined5 . This
representation is based on a set of K-atoms and π(O), the translation of O into
first-order logic.
Definition 3. Let KG = (O, PG ) be a ground hybrid MKNF knowledge base.
The set of K-atoms of KG , written KA(KG ), is the smallest set that contains (i)
all ground atoms occurring in PG , and (ii) an atom ξ for each ground not-atom
notξ occurring in PG . For a subset S of KA(KG ), the objective knowledge of S
w.r.t. KG is the set of first-order formulas OBO,S = {π(O)} ∪ S.
The set KA(KG ) contains all atoms occurring in KG , only with not-atoms substi-
tuted by corresponding atoms, while OBO,S provides a first-order representation
of O together with a set of known/derived facts. In the three-valued MKNF
semantics, this set of K-atoms can be divided into true, undefined and false
atoms. Next, we recall operators from [22] that derive consequences based on
KG and a set of K-atoms that is considered to hold.
Definition 4. Let KG = (O, PG ) be a positive, ground hybrid MKNF knowledge
base. The operators RKG , DKG , and TKG are defined on subsets of KA(KG ):

RKG (S) ={H | PG contains a rule of the form H ← A1 , . . . An

such that, for all i, 1 ≤ i ≤ n, Ai ∈ S}
DKG (S) ={ξ | ξ ∈ KA(KG ) and OBO,S |= ξ}
TKG (S) =RKG (S) ∪ DKG (S)

The operator TKG is monotonic, and thus has a least fixpoint TKG ↑ ω. Trans-
formations can be defined that turn an arbitrary hybrid MKNF KB KG into a
positive one (respecting the given set S) to which TKG can be applied. To ensure
coherence, i.e., that classical negation in the DL enforces default negation in the
rules, two slightly different transformations are defined (see [22] for details).
5
Strictly speaking, this computation yields the so-called well-founded partition from
which the well-founded MKNF model is defined (see [22] for details).
Reasoning over Ontologies and Non-monotonic Rules 393

Java Virtual Machine XSB

NoHR Plugin
ELK XSB
Knowledge
Base
Translator

OWL File Ontology Query

Ontology InterProlog Answering
NM Rules NM Rules Query
Base Processor
Tracer/
Debugger
GUI
NM Rules NoHR NoHR
File Rules Tab Query Tab Tables

Fig. 1. System Architecture of NoHR

Definition 5. Let KG = (O, PG ) be a ground hybrid MKNF knowledge base and

S ⊆ KA(KG ). The MKNF transform KG /S is defined as KG /S = (O, PG /S),
where PG /S contains all rules H ← A1 , . . . , An for which there exists a rule
of the form (1) in PG with Bj ∈ S for all 1 ≤ j ≤ m. The MKNF-coherent
transform KG //S is defined as KG //S = (O, PG //S), where PG //S contains
all rules H ← A1 , . . . , An for which there exists a rule of the form (1) with
Bj ∈ S for all 1 ≤ j ≤ m and OBO,S |= ¬H. We define ΓKG (S) = TKG /S ↑ ω
and ΓK G (S) = TKG //S ↑ ω.

Based on these two antitonic operators [22], two sequences Pi and Ni are
deﬁned, which correspond to the true and non-false derivations.

P0 = ∅ N0 = KA(KG )
Pn+1 = ΓKG (Nn ) Nn+1 = ΓK G (Pn )

Pω = Pi Nω = Ni

The ﬁxpoints yield the well-founded MKNF model [22] (in polynomial time).

Definition 6. The well-founded MKNF model of an MKNF-consistent ground

hybrid MKNF knowledge base KG = (O, PG ) is defined as (Pω , KA(KG ) \ Nω ).

If K is MKNF-consistent, then this partition does correspond to the unique

model of K [22], and, like in [2], we call the partition the well-founded MKNF
model Mwf (K). Here, K may indeed not be MKNF-consistent if the EL+ ⊥ ontology
alone is inconsistent, which is possible if ⊥ occurs, or by the combination of
appropriate axioms in O and P, e.g., A ⊥ and A(a) ←. In the former case, we
argue that the ontology alone should be consistent and be repaired if necessary
before combining it with non-monotonic rules. Thus, we assume in the following
that O occurring in K is consistent.
394 V. Ivanov et al.

Fig. 2. NoHR Query tab with a query for TariﬀCharge(x, y) (see Sect. 4)

3 System Description

In this section, we briefly describe the architecture of the plug-in for Protégé
as shown in Fig. 1 and discuss some features of the implementation and how
querying is realized.
The input for the plug-in consists of an OWL file in the DL EL+ ⊥ as described
in Sect. 2.1, which can be manipulated as usual in Protégé, and a rule file. For
the latter, we provide a tab called NoHR Rules that allows us to load, save and
edit rule files in a text panel following standard Prolog conventions.
The NoHR Query tab (see Fig. 2) also allows for the visualization of the
rules, but its main purpose is to provide an interface for querying the combined
KB. Whenever the first query is posed by pushing “Execute”, a translator is
started, initiating the ontology reasoner ELK [21] tailored for EL+ ⊥ and con-
siderably faster than other reasoners when comparing classification time [21].
ELK is used to classify the ontology O and then return the inferred axioms to
the translator. It is also verified whether DisjointW ith axioms appear in O,
i.e., in EL+⊥ notation, axioms of the form C D ⊥ for arbitrary classes C
and D, which determines whether inconsistencies may occur in the combined
hybrid knowledge base. Then the result of the classification is translated into
rules and joined with the already given non-monotonic rules in P, and the result
is conditionally further transformed if inconsistency detection is required.
The result is used as input for the top-down query engine XSB Prolog6
which realizes the well-founded semantics for logic programs [13]. To guarantee

6
http://xsb.sourceforge.net
Reasoning over Ontologies and Non-monotonic Rules 395

full compatibility with XSB Prolog’s more restrictive admitted input syntax,
the joint resulting rule set is further transformed such that all predicates and
constants are encoded using MD5. The result is transfered to XSB via Inter-
Prolog [9]7 , which is an open-source Java front-end allowing the communication
between Java and a Prolog engine.
Next, the query is sent via InterProlog to XSB, and answers are returned to
the query processor, which collects them and sets up a table showing for which
variable substitutions we obtain true, undefined, or inconsistent valuations (or
just shows the truth value for a ground query). The table itself is shown in the
Result tab (see Fig. 2) of the Output panel, while the Log tab shows measured
times of pre-processing the knowledge base and answering the query. XSB itself
not only answers queries very efficiently in a top-down manner, with tabling, it
also avoids infinite loops.
Once the query has been answered, the user may pose other queries, and the
system will simply send them directly without any repeated preprocessing. If
the user changes data in the ontology or in the rules, then the system offers the
option to recompile, but always restricted to the part that actually changed.

4 Cargo Shipment Use Case

The customs service for any developed country assesses imported cargo for a
variety of risk factors including terrorism, narcotics, food and consumer safety,
pest infestation, tariff violations, and intellectual property rights8 . Assessing this
risk, even at a preliminary level, involves extensive knowledge about commodi-
ties, business entities, trade patterns, government policies and trade agreements.
Some of this knowledge may be external to a given customs agency: for instance
the broad classification of commodities according to the international Harmo-
nized Tariff System (HTS), or international trade agreements. Other knowledge
may be internal to a customs agency, such as lists of suspected violators or of
importers who have a history of good compliance with regulations.
Figure 3 shows a simplified fragment K = (O, P) of such a knowledge base. In
this fragment, a shipment has several attributes: the country of its origination,
the commodity it contains, its importer and producer. The ontology contains
a geographic classification, along with information about producers who are
located in various countries. It also contains (partial) information about three
shipments: s1 , s2 and s3 . There is also a set of rules indicating information
about importers, and about whether to inspect a shipment either to check for
compliance of tariff information or for food safety issues. For that purpose, the set
of rules also includes a classification of commodities based on their harmonized
tariff information (HTS chapters, headings and codes, cf. http://www.usitc.gov/
tata/hts), and tariff information, based on the classification of commodities as
given by the ontology.
7
http://www.declarativa.com/interprolog/
8
The system described here is not intended to reflect the policies of any country or
agency.
396 V. Ivanov et al.

* * * O* * *

Commodity ≡ (∃HTSCode.) Tomato EdibleVegetable

CherryTomato Tomato GrapeTomato Tomato
CherryTomato GrapeTomato ⊥ Bulk Prepackaged ⊥
EURegisteredProducer ≡ (∃RegisteredProducer.EUCountry)
LowRiskEUCommodity ≡ (∃ExpeditableImporter.) (∃CommodCountry.EUCountry)

ShpmtCommod(s1 , c1 ) ShpmtDeclHTSCode(s1 , h7022)

ShpmtImporter(s1 , i1 ) CherryTomato(c1 ) Bulk(c1 )
ShpmtCommod(s2 , c2 ) ShpmtDeclHTSCode(s2 , h7022)
ShpmtImporter(s2 , i2 ) GrapeTomato(c2 ) Prepackaged(c2 )
ShpmtCountry(s2 , portugal )
ShpmtCommod(s3 , c3 ) ShpmtDeclHTSCode(s3 , h7021)
ShpmtImporter(s3 , i3 ) GrapeTomato(c3 ) Bulk(c3 )
ShpmtCountry(s3 , portugal ) ShpmtProducer(s3 , p1 )
RegisteredProducer(p1 , portugal ) EUCountry(portugal )
RegisteredProducer(p2 , slovakia) EUCountry(slovakia)

* * * P * * *

AdmissibleImporter(x) ← ShpmtImporter(y, x), notSuspectedBadGuy(x).

SuspectedBadGuy(i1 ).
ApprovedImporterOf(i2 , x) ← EdibleVegetable(x).
ApprovedImporterOf(i3 , x) ← GrapeTomato(x).
CommodCountry(x, y) ← ShpmtCommod(z, x), ShpmtCountry(z, y).
ExpeditableImporter(x, y) ← ShpmtCommod(z, x), ShpmtImporter(z, y),
AdmissibleImporter(y), ApprovedImporterOf(y, x).
CompliantShpmt(x) ← ShpmtCommod(x, y), HTSCode(y, z), ShpmtDeclHTSCode(x, z).
Random(x) ← ShpmtCommod(x, y), notRandom(x).
NoInspection(x) ← ShpmtCommod(x, y), CommodCountry(y, z), EUCountry(z).
Inspection(x) ← ShpmtCommod(x, y), notNoInspection(x), Random(x).
Inspection(x) ← ShpmtCommod(x, y), notCompliantShpmt(x).
Inspection(x) ← ShpmtCommod(x, y), Tomato(y), ShpmtCountry(x, slovakia).
HTSChapter(x, 7) ← EdibleVegetable(x).
HTSHeading(x, 702) ← Tomato(x).
HTSCode(x, h7022) ← CherryTomato(x).
HTSCode(x, h7021) ← GrapeTomato(x).
TariffCharge(x, 0) ← CherryTomato(x), Bulk(x).
TariffCharge(x, 40) ← GrapeTomato(x), Bulk(x).
TariffCharge(x, 50) ← CherryTomato(x), Prepackaged(x).
TariffCharge(x, 100) ← GrapeTomato(x), Prepackaged(x).

Fig. 3. MKNF knowledge base for Cargo Imports

Reasoning over Ontologies and Non-monotonic Rules 397

The overall task then is to access all the information and assess whether some
shipment should be inspected in full detail, under certain conditions randomly,
or not at all. In fact, an inspection is considered if either a random inspection is
indicated, or some shipment is not compliant, i.e., there is a mismatch between
the filed cargo codes and the actually carried commodities, or some suspicious
cargo is observed, in this case tomatoes from slovakia. In the first case, a potential
random inspection is indicated whenever certain exclusion conditions do not
hold. To ensure that one can distinguish between strictly required and random
inspections, a random inspection is assigned the truth value undefined based on
the rule Random(x) ← ShpmtCommod(x, y), notRandom(x).
The result of querying this knowledge base for Inspection(x) reveals that of
the three shipments, s2 requires an inspection (due to mislabeling) while s1 may
be subject to a random inspection as it does not knowingly originate from the
EU. It can also be verified using the tool that preprocessing the knowledge base
can be handled within 300ms and the query only takes 12ms, which certainly
suffices as interactive response. Please also note that the example indeed utilizes
the features of rules and ontologies: for example exceptions to the potential
random inspections can be expressed, but at the same time, taxonomic and non-
closed knowledge is used, e.g., some shipment may in fact originate from the EU,
this information is just not available.

5 Evaluation
In this section, we present some tests showing that a) the huge EL+ ontology
SNOMED CT can be preprocessed for querying in a short period of time, b)
adding rules increases the time of the translation only linearly, and c) querying
time is in comparison to a) and b) in general completely neglectable. We per-
formed the tests on a Mac book air 13 under Mac OS X 10.8.4 with a 1.8 GHz
Intel Core i5 processor and 8 GB 1600 MHz DDR3 of memory. We ran all tests
in a terminal version and Java with the “-XX:+AggressiveHeap” option, and
test results are averages over 5 runs.
We considered SNOMED CT, freely available for research and evaluation9 ,
and added a varying number of non-monotonic rules. These rules were generated
arbitrarily, using predicates from the ontology and additional new predicates (up
to arity three), producing rules with a random number of body atoms varying
from 1 to 10 and facts (rules without body atoms) with a ratio of 1:10. Note
that, due to the translation of the DL part into rules, all atoms literally become
non-DL-atoms. So ensuring that each variable appearing in the rule is contained
in at least one non-negated body atom suﬃces to guarantee DL-safety for these
rules.
The results are shown in Fig. 4 (containing also a constant line for classiﬁca-
tion of ELK alone and starting with the values for the case without additional
rules), and clearly show that a) preprocessing an ontology with over 300,000
concepts takes less than 70 sec. (time for translator+loading in XSB), b) the
9
http://www.ihtsdo.org/licensing/
398 V. Ivanov et al.

40
Time (s)

30 ELK
Translator
20
XSB

0
0 44 88 132 176 220 264 308 352 396 440
NM Rules + Facts ( 1000)

Fig. 4. Preprocessing time for SNOMED with a varying number of Rules

time of translator and loading the file in XSB only grows linearly on the number
of rules with a small degree, in particular in the case of translator, and c) even
with up to 500,000 added rules the time for translating does not surpass ELK
classification, which itself is really fast [21], by more than a factor 2.5. All this
data indicates that even on with a very large ontology, preprocessing can be
handled very efficiently.
Finally, we also tested the querying time. To this purpose, we randomly
generated and handcrafted several queries of different sizes and shapes using
SNOMED with a varying number of non-monotonic rules as described before.
In all cases, we observed that the query response time is interactive, observing
longer reply times only if the number of replies is very high because either the
queried class contains many subclasses in the hierarchy or if the arbitrarily gen-
erated rules create too many meaningless links, thus in the worst case requiring
to compute the entire model. Requesting only one solution avoids this problem.
Still, the question of realistic randomly generated rule bodies for testing querying
time remain an issue of future work.

6 Conclusions
We have presented NoHR, the ﬁrst plug-in for the ontology editor Protégé that
integrates non-monotonic rules and top-down queries with ontologies in the OWL
2 proﬁle OWL 2 EL. We have discussed how this procedure is implemented as
a tool and shown how it can be used to implement a real use case on cargo
shipment inspections. We have also presented an evaluation which shows that
the tool is applicable to really huge ontologies, here SNOMED CT.
There are several relevant approaches discussed in the literature. Most closely
related are probably [15,23], because both build on the well-founded MKNF
semantics [22]. In fact, [15] is maybe closest in spirit to the original idea of
Reasoning over Ontologies and Non-monotonic Rules 399

SLG(O) oracles represented in [2] on which the implementation of NoHR is

theoretically founded. It utilizes the CDF framework already integrated in XSB,
but its non-standard language is a drawback if we want to achieve compatibility
with standard OWL tools based on the OWL API. On the other hand, [23],
presents an OWL 2 QL oracle based on common rewritings in the underlying
DL DL-Lite [3]. Less closely related is the work pursued in [8,14] that investigates
direct non-monotonic extensions of EL, so that the main reasoning task focuses
on finding default subset inclusions, unlike this query-centered approach.
Two other related tools are DReW [30] and HD Rules [10], but both are based
on different underlying formalisms to combine ontologies and non-monotonic
rules. The former builds on dl-programs [12] and focuses on datalog-rewritable
DLs [17], and the latter builds on Hybrid Rules [11]. While a more detailed com-
parison is surely of interest, the main problem is that both underlying formalisms
differ from MKNF knowledge bases in the way information can flow between its
two components and how flexible the language is [12,28].
We conclude with pointing out that given the successful application of the
tool to the use-case as well as its evaluation, an obvious next step will be to
try applying it to other use-case domains. This will allow gathering data, which
may then be used for a) further dissemination in particular of query processing,
which would b) stimulate application-driven optimizations and enhancements of
the tool NoHR. Other future directions are extensions to paraconsistency [20] or
more general formalisms [16,24,25].

Acknowledgments. We would like to thank the referees for their comments. We

acknowledge partial support by FCT under project ERRO (PTDC/EIA-CCO/121823/
2010) and under strategic project NOVA LINCS (PEst/UID/CEC/04516/2013). V.
Ivanov was partially supported by a MULTIC – Erasmus Mundus Action 2 grant and
M. Knorr was also partially supported by FCT grant SFRH/BPD/86970/2012.

References
1. Alberti, M., Knorr, M., Gomes, A.S., Leite, J., Gonçalves, R., Slota, M.: Norma-
tive systems require hybrid knowledge bases. In: Procs. of AAMAS, IFAAMAS,
pp. 1425–1426 (2012)
2. Alferes, J.J., Knorr, M., Swift, T.: Query-driven procedures for hybrid MKNF
knowledge bases. ACM Trans. Comput. Log. 14(2), 1–43 (2013)
3. Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite family
and relations. J. Artif. Intell. Res. (JAIR) 36, 1–69 (2009)
4. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F.
(eds.): The Description Logic Handbook: Theory, Implementation, and Applica-
tions, 3rd edn. Cambridge University Press (2010)
5. Baader, F., Brandt, S., Lutz, C.: Pushing the EL envelope. In: Procs. of IJCAI
(2005)
6. Baral, C., Gelfond, M.: Logic programming and knowledge representation. J. Log.
Program. 19(20), 73–148 (1994)
7. Boley, H., Kifer, M. (eds.): RIF Overview. W3C Recommendation, February 5,
2013 (2013). http://www.w3.org/TR/rif-overview/
400 V. Ivanov et al.

8. Bonatti, P.A., Faella, M., Sauro, L.: EL with default attributes and overriding.
In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z.,
Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 64–79.
Springer, Heidelberg (2010)
9. Calejo, M.: InterProlog: towards a declarative embedding of logic programming
in java. In: Alferes, J.J., Leite, J. (eds.) JELIA 2004. LNCS (LNAI), vol. 3229,
pp. 714–717. Springer, Heidelberg (2004)
10. Drabent, W., Henriksson, J., Maluszynski, J.: Hd-rules: a hybrid system interfacing
prolog with dl-reasoners. In: Procs. of ALPSWS, vol. 287 (2007)
11. Drabent, W., Maluszynski, J.: Hybrid rules with well-founded semantics. Knowl.
Inf. Syst. 25(1), 137–168 (2010)
12. Eiter, T., Ianni, G., Lukasiewicz, T., Schindlauer, R., Tompits, H.: Combining
answer set programming with description logics for the semantic web. Artif. Intell.
172(12–13), 1495–1539 (2008)
13. Gelder, A.V., Ross, K.A., Schlipf, J.S.: The well-founded semantics for general
logic programs. J. ACM 38(3), 620–650 (1991)
14. Giordano, L., Gliozzi, V., Olivetti, N., Pozzato, G.L.: Reasoning about typicality
in low complexity dls: the logics EL⊥ Tmin and DL-Lite c Tmin . In: Procs. of IJCAI
(2011)
15. Gomes, A.S., Alferes, J.J., Swift, T.: Implementing query answering for hybrid
MKNF knowledge bases. In: Carro, M., Peña, R. (eds.) PADL 2010. LNCS,
vol. 5937, pp. 25–39. Springer, Heidelberg (2010)
16. Gonçalves, R., Alferes, J.J.: Parametrized logic programming. In: Janhunen, T.,
Niemelä, I. (eds.) JELIA 2010. LNCS, vol. 6341, pp. 182–194. Springer, Heidelberg
(2010)
17. Heymans, S., Eiter, T., Xiao, G.: Tractable reasoning with dl-programs over
datalog-rewritable description logics. In: Procs of ECAI, pp. 35–40. IOS Press
(2010)
18. Hitzler, P., Krötzsch, M., Parsia, B., Patel-Schneider, P.F., Rudolph, S. (eds.):
OWL 2 Web Ontology Language: Primer (Second Edition). W3C Recommenda-
tion, December 11, 2012 (2012). http://www.w3.org/TR/owl2-primer/
19. Ivanov, V., Knorr, M., Leite, J.: A query tool for EL with non-monotonic rules.
In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo,
L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part I. LNCS, vol. 8218,
pp. 216–231. Springer, Heidelberg (2013)
20. Kaminski, T., Knorr, M., Leite, J.: Eﬃcient paraconsistent reasoning with ontolo-
gies and rules. In: Procs. of IJCAI. IJCAI/AAAI (2015)
21. Kazakov, Y., Krötzsch, M., Simančı́k, F.: The incredible ELK: From polynomial
procedures to eﬃcient reasoning with EL ontologies. Journal of Automated Rea-
soning 53, 1–61 (2013)
22. Knorr, M., Alferes, J.J., Hitzler, P.: Local closed world reasoning with description
logics under the well-founded semantics. Artif. Intell. 175(9–10), 1528–1554 (2011)
23. Knorr, M., Alferes, J.J.: Querying OWL 2 QL and non-monotonic rules. In: Aroyo,
L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist,
E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 338–353. Springer, Heidelberg
(2011)
24. Knorr, M., Hitzler, P., Maier, F.: Reconciling OWL and non-monotonic rules for
the semantic web. In: Procs. of ECAI, pp. 474–479. IOS Press (2012)
25. Knorr, M., Slota, M., Leite, J., Homola, M.: What if no hybrid reasoner is available?
Hybrid MKNF in multi-context systems. J. Log. Comput. 24(6), 1279–1311 (2014)
Reasoning over Ontologies and Non-monotonic Rules 401

26. Lifschitz, V.: Nonmonotonic databases and epistemic queries. In: Procs. of IJCAI
(1991)
27. Motik, B., Cuenca Grau, B., Horrocks, I., Wu, Z., Fokoue, A., Lutz, C. (eds.):
OWL 2 Web Ontology Language: Proﬁles. W3C Recommendation, February 5,
2013. http://www.w3.org/TR/owl2-proﬁles/
28. Motik, B., Rosati, R.: Reconciling description logics and rules. J. ACM 57(5)
(2010)
29. Slota, M., Leite, J., Swift, T.: Splitting and updating hybrid knowledge bases.
TPLP 11(4–5), 801–819 (2011)
30. Xiao, G., Eiter, T., Heymans, S.: The DReW system for nonmonotonic dl-
programs. In: Procs. of SWWS 2012. Springer Proceedings in Complexity. Springer
(2013)
On the Cognitive Surprise in Risk Management:
An Analysis of the Value-at-Risk (VaR)
Historical

Davi Baccan1(B) , Elton Sbruzzi2 , and Luis Macedo1

1
CISUC, Department of Informatics Engineering, University of Coimbra,
Coimbra, Portugal
{baccan,macedo}@dei.uc.pt
2
CCFEA, University of Essex, Colchester, UK
[email protected]

Abstract. Financial markets are environments in which a variety of

products are negotiated by heterogeneous agents. In such environments,
agents need to cope with uncertainty and with different kinds of risks.
In trying to assess the risks they face, agents use a myriad of differ-
ent approaches to somewhat quantify the occurrence of risks and events
that may have a significant impact. In this paper we address the prob-
lem of risk management from the cognitive science perspective. We com-
pute the cognitive surprise “felt” by an agent relying on a popular risk
management tool known as Value-at-Risk (VaR) historical. We applied
this approach to the S&P500 index from 26-11-1990 to 01-07-2009, and
divided the series into two subperiods, a calm period and a crash period.
We carried out an experiment with twelve different treatments and for
each treatment we compare the intensity of surprise “felt” by the agent
under these two different regimes. This interdisciplinary work contributes
toward the truly understanding and improvement of complex economical
and financial systems, specifically in providing insights on the behaviour
of cognitive agents in those contexts.

Keywords: Agent-based computational economics · Artiﬁcial agents ·

Cognitive architectures · Cognitive modelling · Social simulation and
modelling

1 Introduction
Financial markets such as stock markets are complex and dynamic environments
in which a variety of products are negotiated by a very large number of hetero-
geneous agents ([1], [2]). Agents, either human, artificial or hybrid, are hetero-
geneous in the sense that they have, for instance, different preferences, beliefs,
goals, and trading strategies (e.g., [3], [4]). Additionally, in such environments
agents need to cope with uncertainty and with different kinds of risks [5].
Generally speaking, in economical and financial systems, agents usually try
to assess in an objective or subjective way the risks they face. Ideally they would

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 402–413, 2015.
DOI: 10.1007/978-3-319-23485-4 40
On the Cognitive Surprise in Risk Management 403

be able to come up with probabilities, either mathematical or not (e.g., subjec-

tive belief), to the occurrence of events that may have a positive or negative
impact (e.g., [6], [7]) given his/her preferences and considering his/her goals [8].
Furthermore, agents tend to typically trade based on the risk-return trade-off,
i.e., lower (higher) levels of risk are generally associated with lower (higher) levels
of potential returns [9].
There are several different theories and hypotheses that try to explain how
agents behave and how a stock market works (e.g., [3]). In short, there are two dif-
ferent and opposite perspectives. On the one hand, traditional economic theories
(e.g., Efficient Market Hypothesis (EMH) [1]) rely, for instance, on the assumption
that, when confronted with decisions that involve risk, agents are able to correctly
form their probabilistic assessments according to the laws of probability [10], cal-
culating which of the alternative courses of action maximize their expected util-
ity. On the other hand, behavioral economics (e.g., [11]), i.e., the combination of
psychology and economics that aims to understand human decision-making under
risk as well as how this behaviour matters in economic contexts, have documented
that there are deviations, known as behavioral biases, from the so-called ratio-
nal behaviour. These behavioral biases are believed to be ubiquitous to humans,
and several of them are clearly counterproductive from the economics perspective.
Although the discussion regarding these different perspectives is of importance
and fascinating for us, it is out of the scope of this work.
Nevertheless, we claim that, given the nature and complexity of financial
markets together with the sophisticated and complex human decision-making
mechanism (which is influenced by a myriad of intertwined factors), that the
task of risk management is quite complex and falls on the category of those who
requires the application of different and novel approaches in order to be improved
([12], [13], [14]). For instance, there are extensive empirical evidence (e.g., [5])
suggesting that the financial crisis of 2007/2008 was significantly aggravated by
inappropriate risk management systems and tools.
However, perhaps more important for agents than having a good risk man-
agement system or tool is to rely on a system or tool that is in line with his/her
behavioral and emotional profile. For instance, the underestimation of the occur-
rence of a given event may lead a particular agent to make a risky decision (e.g.,
excessive leverage) that may eventually elicit the surprise emotion in the agent
with a high level of intensity as well as may probably result in substantial finan-
cial losses [15].
In this paper we address the problem of risk management from the cognitive
science perspective by computing the cognitive surprise “felt” by an agent relying
on a popular and widespread used risk management tool known as Value-at-
Risk (VaR) historical. To this end, we model the VaR historical based on the
principles of cognitive emotion theories and compute the cognitive surprise based
on an artificial surprise model. We applied this approach to the S&P500 index
and divide the series into two subperiods, a calm period and a crash period.
We carried out an experiment with twelve different treatments, and for each
treatment we compare the intensity of surprise “felt” by the agent under these
two different regimes.
404 D. Baccan et al.

2 Value-at-Risk (VaR)
The Value-at-Risk (VaR) tool is one of the most popular financial risk measures,
used by financial institutions all over the world [16]. The objective of the VaR
is to measure the probability for significant loss in a portfolio of financial assets
[17]. Generally speaking, we can assume that for a given time horizon t and a
confidence level p, VaR is the loss in market value over the time horizon t that is
exceeded with probability 1 − p [18]. For example, suppose a period of one-day
(t = 1) and a confidence level p of 95%, the VaR would be 0.05 or 5% the critical
value. There are several different methods for calculating VaR. For instance, let
us briefly present the following two methods to calculate VaR, the statistical and
historical approach [19].
The VaR statistical assumes that the historical returns respect the EMH. The
EMH in turn assumes that the series of historical financial returns are Gaussian,
with the average value μ of zero and constant variance of σ 2 , i.e., returns ∼
N (0, σ 2 ). Based on the EMH assumptions and on the Gaussian characteristics,
we could compute that VaR statistical for a confidence level p of 99% and 95% are
−2σ and −3σ, respectively. For example, if a series of returns show a standard
deviation of 5%, the VaR statistical for a confidence level of 99% and 95% are
-10% and -15%, respectively.
Unlike the VaR statistical method, an alternative way to calculate VaR is to
rank the historical simulation from the smallest to the highest, which is named
VaR historical. Suppose that the series of T returns are r1 ,..., rt , we define
that this series of returns are ranked if r1 ≤r2 ≤...≤rT . In this case, the VaR
historical is the return on the position integer ((1 − p) T ). For example, suppose
a confidence level p of 99% and T of 250, the VaR historical would be r3 . In the
case of p of 95%, the VaR historical would be r5 .
We consider the VaR historical as the most appropriate method for this work
for several reasons. First, because the VaR historical method is widely used
by practitioners ([20], [21], [22]). Second, because it is an easy to understand
measure that computes the estimation based on historical data and a confidence
level (we will later explain in Section 3.2 the particular importance of confidence).
Last but not least, because unlike the VaR statistical, VaR historical is free of
the assumption about the distribution of the series of returns [19].

3 Surprise in Cognitive Science

From the perspective of cognitive emotions theories (e.g., [23], [24]), surprise
can be thought of as a belief-disconfirmation signal or a mismatch generated as
a result of the comparison between a newly acquired belief and a pre-existing
belief. Formally speaking, surprise is a neutral valence emotion, defined as a
peculiar state of mind, usually of brief duration, caused by unexpected events, or
proximally the detection of a contradiction or conflict between a newly acquired
and pre-existing belief ([25], [26]).
Surprise serves us in many functions and can be considered as a key element
for survival in a complex and rapidly changing environment. It is closely related
On the Cognitive Surprise in Risk Management 405

to how beliefs are stored in memory. Our semantic memory, i.e., our general
knowledge and concepts about the world, is assumed to be represented in mem-
ory through knowledge structures known as schemas (e.g., [27]). A schema is a
well-integrated chunk of knowledge or sets of beliefs, which main source of infor-
mation available comes from abstraction from repeated personally experienced
events or generalizations, that are our episodic memory.

3.1 The Surprise Process

Meyer and colleagues [25] proposed a cognitive-psychoevolutionary model of sur-
prise. They claim surprise-eliciting events elicit a four-step sequence of processes.
The first step is the appraisal of an event as unexpected or schema-discrepant.
Then in the second step, if the degree of unexpectedness or schema-
discrepancy exceeds a certain threshold, surprise is experienced, ongoing mental
process are interrupted and resources such as attention are reallocated towards
the unexpected event.
The third step is the analysis and evaluation of the unexpected event. It
generally includes a set of subprocesses namely the verification of the schema
discrepancy, the analysis of the causes of the unexpected event, the evaluation
of the unexpected event’s significance for well-being, and the assessment of the
event’s relevance for ongoing action. It is assumed that some aspects of the
analysis concerning the unexpected event are stored as part of the schema for
this event so that in the future analysis of similar events can be significantly
reduced both in terms of time and cognitive effort.
The fourth step is the schema update. It involves producing the immediate
reactions to the unexpected event, and/or operations such as the update, exten-
sion, or revision of the schema or sets of beliefs that gave rise to the discrepancy.
The schema change ideally enables one to some extent to predict and control
future occurrences of the schema-discrepant event and, if possible, to avoid the
event if it is negative and uncontrollable, or to ignore the event if it is irrelevant
for action.

3.2 Artificial Surprise

Two models of artificial surprise for artificial agents can be stressed, namely
the model proposed by Macedo and Cardoso [28] and the model proposed by
Lorini and Castelfranchi [29]. Both models were mainly inspired by the cognitive-
psychoevolutionary model of surprise proposed by Meyer and colleagues [25] and
have influence of the analysis of the cognitive causes of surprise from a cognitive
science perspective proposed by Ortony and Partridge [30]. The comparative
study of these two models is out of the scope of this work. To a detailed descrip-
tion of the similarities and differences of the models please see [26]. The empiri-
cal tests we performed provide evidence in favor of using the model proposed by
Macedo and Cardoso in our work.
Macedo and colleagues carried out an empirical study [28] with the goal
of investigating how to compute the intensity of surprise in an artificial agent.
406 D. Baccan et al.

This study suggests that the intensity of surprise about an event Eg , from a
set of mutually exclusive events E1 , E2 , ..., Em , is a nonlinear function of the
difference, or contrast, between its probability/belief and the probability/belief
of the highest expected event (Eh ) in the set of mutually exclusive events E1 ,
E2 , ..., Em .
Formally, let (Ω, A, P ) be a probability space where Ω is the sample space
(i.e., the set of possible outcomes of the event), A = A1 , A2 , ..., An , is a σ-field of
subsets of Ω (also called the event space, i.e., all the possible events), and P is a
probability measure which assigns a real number P (F ) to every member F of the
σ-field A. Let E = E1 , E2 , ..., Em , Ei ∈ A, be a set of mutually exclusive events
m
in that probability space with probabilities P (Ei ) ≥ 0, such that P (Ei ) = 1.
i=1
Let Eh be the highest expected event from E. The intensity of surprise about an
event Eg , defined as S(Eg ), is calculated as S(Eg ) = log2 (1 + P (Eh ) − P (Eg ))
(Equation 1). In each set of mutually exclusive events, there is always at least
one event whose occurrence is unsurprising, namely Eh .

4 Experiments and Results

The goal of our experiment is to compute the cognitive surprise “felt” by an

agent relying on a popular risk management tool known as Value-at-Risk (VaR)
historical under different financial settings and regimes.
First, we selected the S&P500, i.e., an index based on market capitalization
that includes 500 companies in leading industries in the U.S. economy, from 26-
11-1990 to 01-07-2009, in a total of 4688 days. The S&P500 index is recognized as
one of the most important stock market indexes (perhaps the most important).
First, we applied a method similar to the work of Halbleib and Pohlmeier [20]
to divide the selected S&P500 period into two parts, a calm period from 26-11-
1990 to 31-08-2008 (total of 4478 days), and a crash period from 01-09-2008 to
01-07-2009 (total of 210 days). The data was obtained free of charge from Yahoo
Finance. Figure 1 presents the daily close, daily return, and the histogram of
daily returns of the S&P500 from 26-11-1990 to 01-07-2009. Figure 2 presents
the daily close and the histogram of daily returns of the S&P500 of calm and
crash periods.
We employed two different approaches in our experiment. The first approach
consists in specifying a rolling window size and a confidence level, which combi-
nation is used by the VaR historical to compute the estimation by considering
the most recent returns. The windows size set contains the 50, 125, 250, and
500 values. The confidence level set contains the 0.95 and 0.99 values. The com-
bination of the window size values with the confidence level values result in 8
different treatments.
The second approach consists in specifying a decay function so that recent
daily returns gain more weight as opposed to old returns. Decay functions are
generally used with the objective of emulating to a certain extent the human
memory process of “forgetting” as well as to contemplate some findings from
On the Cognitive Surprise in Risk Management 407

Fig. 1. SP500, daily close (left), daily return (center), and histogram of daily return
(right).

Fig. 2. SP500, daily close of calm period (left), daily close of crash period (center),
and histogram of daily return of calm period and crash period (right)

how humans use past experience in decision-making (e.g., [31]) which indicate
that in revising their beliefs, people tend to overweight recent information and
underweight prior information. The alpha set contains the 0.995, 0.99, 0,97, 0.94
values. A higher (lower) alpha implies a smaller (higher) level of forgetfulness.
Unlike the first approach, we opted to use just a confidence level of 0.99. The
reason is that we observed in our initial experiments that the combination of
a confidence level of 0.95 with the previous alpha values caused the agent to
“forget” too many returns, generating in the end a quite low and poor VaR
historical estimation. Therefore, in the second approach we have four treatments.
We conducted an experiment with these twelve different treatments.
The algorithm we used in our experiment works as follows. We first perform
initial adjustments to ensure that all iterations are actually carried out within a
specified period. For each simulation (1) for a window size value, windowk ∈ {50,
125, 250, 500}, (2) according to a uniform distribution function, we select a begin
day di within the period (for the calm period we fixed the di = 01-09-2008), (3)
we compute the VaR historical estimations VaR95 and VaR99 based on windowk
daily returns preceding the di as well as the confidence levels p ∈ {0.95, 0.99},
respectively, (4) for each day dk beginning in the day after di , from k = i + 1
to k = 210 (for the next 210 days), we check if daily return of dk ≤ VaR95 and
if daily return of dk ≤ VaR99 , (5) we advance the rolling window one day, i.e.,
408 D. Baccan et al.

i = i + 1, and (6) we back to step (3). For the alpha approach the algorithm
is similar, except for some minor adjustments, specifically in step (1) since the
algorithm runs for each alpha value ∈ {0.995, 0.99, 0, 97, 0.94}, and in step (3)
since we modify the preceding daily returns by applying the current alpha value
and compute the VaR historical estimation with a confidence level p of 0.99. All
other steps are exactly the same.
Let us now describe how we address this problem in the context of the artifi-
cial surprise, presented in Section 3. We essentially applied the concepts, ideas,
and method presented by Baccan and Macedo [32]. This work can be thought
of as a continuation and expansion of their initial work to other contexts. We
assume, for the sake of the experiment and simplicity, the confidence levels p
(0.99, and 0.95) as the subjective belief of the agent in the accurateness of the
VaR historical estimation. By making this assumption and considering a higher
subjective belief, we are empowering the agent with a “firmly believe” in the
accurateness of the VaR historical estimation. So, suppose an event Eg as VaR
historical estimation that can assume two mutually exclusive events, meaning
that it can be either correct (E1 ), i.e., daily return is not lower than estimation,
or incorrect (E2 ), i.e., daily return is lower than estimation.
The agent will either “feels” no surprise (a higher intensity of surprise) as
what he/she considered as more (less) likely, i.e., correct (E1 ) (incorrect (E2 )),
happened. More precisely, for the confidence level of 0.99 the surprise about
event E2 would be 0.9855004, i.e., S(Eg ) = log2 (1 + 0.99 − 0.01). Similarly, for
the confidence level of 0.95 the surprise about event E2 would be 0.9259994,
i.e., S(Eg ) = log2 (1 + 0.95 − 0.05). For each day dk in which the VaR historical
estimation is tested in step (4) of the algorithm described above, if the daily
return of dk ≤ VaRp , then we compute the cognitive surprise, surprisek .
In the end we have a sequence {surprise1 , ..., surprise210 }. Afterwards, for
each simulation, we compute the cumulative sum of the surprise. It means that
we generate a sequence of 210 elements as a result of the partial sums surprise1 ,
surprise1 + surprise2 , surprise1 + surprise2 + surprise3 , and so forth. In the
case of the calm period, we added the cumulative sum of the surprise of each
treatment and then average it by the number of simulations. The cumulative sum
and the average assumption make it easier both the observation of surprise over
time as well as the comparison of surprise between the calm and crash period.
The average cumulative sum of the surprise for a given treatment is presented
in the next figures for the calm period.
We ran 104 independent simulations for each treatment for the calm period
and one simulation for each treatment for the crash period (since the begin day
di = 01-09-2008 is fixed).
Figures 3 and 4 present the behaviour of all treatments for the calm period
and crash period, respectively. Figures 5 present a comparison between different
rolling window treatments and alpha treatments, respectively, for both periods.
On the Cognitive Surprise in Risk Management 409

Fig. 3. All treatments for the calm period.

Fig. 4. All treatments for the crash period.

Fig. 5. Comparing diﬀerent treatments for diﬀerent periods.

410 D. Baccan et al.

5 Discussion
First of all, it is important to bear in mind that financial markets are by their very
nature complex and dynamic systems. Such complexity significantly increases if
we take into account the sophisticated and complex human decision-making
mechanism. As a result, we believe that the task of risk management in finance
and economics is indeed quite difficult. Therefore, we do not have in this work
the goal of providing evidence neither in favor of nor against to a particular risk
management tool or system. Instead, our goal is to compute, in a systematic
and clear method, the cognitive surprise “felt” by an agent relying on the VaR
historical during different periods, a calm period and a crash period.
Let us then begin by analyzing some characteristics of the calm period in
comparison with the crash period. We can see in Figure 1 that daily returns
seem, as expected and in line with the existing literature, to reproduce some
statistical regularities that are often found in a large set of different assets and
markets known as stylized facts [33]. Specifically we can observe that returns
do not follow a Gaussian distribution (are not normally distributed), seem to
exhibit what is known as fat-tails, as well as to reproduce the volatility cluster-
ing fact, i.e., high-volatility events tend to cluster in time. Volatility clustering
resembles the concept of entropy used in a variety of areas such as information
and communication theory. We can see in Figure 2 that the comparison between
the daily returns of the calm period with the daily returns of the crash period
allows us to claim that the daily returns for the crash period seem to exhibit
fat tail distributions. We can also see that during the crash period the SP&500
depreciated almost 50% in value in a short period of time.
We now turn our attention to the analysis of the cognitive surprise. For the
calm period, we can observe in Figure 3 that the cumulative surprise for alpha
treatment (right) is higher when compared to all window treatments (left). Addi-
tionally, the lower the alpha, the higher the cumulative surprise and, similarly,
the larger the window size, the lower the surprise. The cumulative surprise of the
window treatments with a confidence level p of 0.95, VaR95 is, in turn, higher
when compared to the cumulative surprise with a confidence level p of 0.99,
VaR99 . Interestingly, if we assume that alpha somewhat emulates the human
memory process of “forgetting” and considering that a lower alpha implies a
higher level of forgetfulness, we may argue that an agent should be careful in
forgetting the past, at least in the context of a stock market, since the cognitive
surprise “felt” by an agent under this treatment is significantly higher than the
window treatment.
For the crash period, we first observe in Figure 4 that, once again, the lower
the alpha, the higher the cumulative surprise (right). However, quite contrary
to the calm period, the lower the window size, the lower the surprise (left). The
cognitive surprise “felt” is higher for agents that rely on a larger window size. It
may be explained by the fact that the crash period is indeed a period in which
the volatility is high. Therefore, a VaR historical based on a larger window size
takes more time to adapt to this new and changing environment, in comparison
with a VaR historical based on a smaller window size.
On the Cognitive Surprise in Risk Management 411

When comparing in Figure 5 how treatments behave during the calm period
and the crash period, we can observe that, as expected, the cognitive surprise is
higher during the crash period.
Generally speaking, each day the agent “felt” surprise represents a failed
VaR historical estimation. Therefore, the higher the surprise, the wrong a given
VaR historical treatment is. The analysis of the results indicate that it may be
quite difficult for an agent not to “feel” surprise when relying on VaR historical.
Indeed, there are several issues with models, like VaR historical, that take into
account historical data for their estimation. These models somewhat assume the
past is a good indicator of what may happen in the future, i.e., history repeats
itself. However, this inductive reasoning often underestimate the probability of
extreme returns and, consequently, underestimates the level of risk. Addition-
ally, an essential flaw of this kind of rationale is not to truly acknowledge that
“absence of evidence is not evidence of absence” and the existence of “unknown
unknowns” [34].
Consider, for instance, the turkey paradox ([35]). There is a butcher and a
turkey. Every day for let us say 100 days the butcher feeds the turkey. As time
goes by the turkey increases its belief that in the next day it will receive food
from the butcher. However, at a given day, for the “shock” and “surprise” of the
turkey, instead of being feed by the butcher, the butcher kills the turkey. The
same analogy may be applied to the black swan scenario [6] as well as to other
complex and risky financial operations that provide small but regular gains, until
the day they blow all the gains, resulting in huge losses [15].
The main contribution of our work resides in the fact that we have applied in
a systematic, clear and easy to reproduce way the ideas, concepts and methods
described by Baccan and Macedo [32] to the context of risk, uncertainty, and
therefore risk management. Our interdisciplinary work is in line with those who
claim that there is a need for novel approaches so that complex and financial
systems may be improved (e.g., [12], [13], [14]). It is, as far as we know, one of
the first attempts to apply a cognitive science perspective to risk management.

6 Conclusion and Future Work

In this paper we addressed the problem of risk management from the cognitive
science perspective 1 . We computed the cognitive surprise “felt” by an agent rely-
ing on a popular risk management tool known as Value-at-Risk (VaR) historical.
We applied this approach to the S&P500 stock market index from 26-11-1990 to
01-07-2009, and divide the series into two subperiods, a calm period and a crash
period. We carried out an experiment with twelve diﬀerent treatments and for
each treatment we compare the intensity of surprise “felt” by the agent under
these two diﬀerent regimes. This interdisciplinary work is in line with a broader
movement and contributes toward the truly understanding and improvement of

1
This work was supported by FCT, Portugal, SFRH/BD/60700/2009, and by
TribeCA project, funded by FEDER through POCentro, Portugal.
412 D. Baccan et al.

complex economical and ﬁnancial systems, speciﬁcally in providing insights on

the behaviour of cognitive agents in those risky and uncertain contexts.
The use of cognitive modeling approaches to the study of complex and
dynamic systems is in its early stages. We consider that the use of relatively
simple but powerful tools in conjunction with cognitive models, like those dis-
cussed in this work, offers a rich and novel set of possibilities. For instance, the
use of cognitive agents would allow us to carry out a variety of different agent
and multi-agent-based simulations of significant economical and financial events,
so that we would be able to compute individual cognitive surprise and, in the
end, the global surprise of agents regarding some event. It may provide contri-
butions toward the understanding of the behaviour of agents individually as well
as of the system as a whole. Last but not least, it would be interest to compare
the global surprise with other market sentiment indexes (e.g., VIX, the “fear”
indicator).

References
1. Fama, E.F.: Efficient capital markets: A review of theory and empirical work. The
Journal of Finance 25(2), 383–417 (1970)
2. Fama, E.F.: Two pillars of asset pricing. American Economic Review 104(6),
1467–1485 (2014)
3. Lo, A.W.: Reconciling efficient markets with behavioral finance: The adaptive mar-
kets hypothesis. Journal of Investment Consulting 7, 21–44 (2005)
4. Treleaven, P., Galas, M., Lalchand, V.: Algorithmic trading review. Commun. ACM
56(11), 76–85 (2013)
5. Lo, A., Mueller, M.: WARNING: physics envy may be hazardous to your wealth!.
Journal of Investment Management 8, 13–63 (2010)
6. Taleb, N.N.: The Black Swan: The Impact of the Highly Improbable, 1st edn.
Random House, April 2007
7. Meder, B., Lec, F.L., Osman, M.: Decision making in uncertain times: what can
cognitive and decision sciences say about or learn from economic crises? Trends in
Cognitive Sciences 17(6), 257–260 (2013)
8. Markowitz, H.: Portfolio selection. Journal of Finance 7, 77–91 (1952)
9. Kahneman, D.: The myth of risk attitudes. The Journal of Portfolio Management
36(1), 1 (2009)
10. Chiodo, A., Guidolin, M., Owyang, M.T., Shimoji, M.: Subjective probabili-
ties: psychological evidence and economic applications. Technical report, Federal
Reserve Bank of St. Louis (2003)
11. Kahneman, D., Tversky, A.: Prospect theory: An analysis of decision under risk.
Econometrica 47(2), 263–291 (1979)
12. Farmer, J.D., Foley, D.: The economy needs agent-based modelling. Nature
460(7256), 685–686 (2009)
13. Bouchaud, J.: Economics needs a scientific revolution. Nature 455(7217), 1181
(2008)
14. Gatti, D., Gaffeo, E., Gallegati, M.: Complex agent-based macroeconomics: a man-
ifesto for a new paradigm. Journal of Economic Interaction and Coordination 5(2),
111–135 (2010)
On the Cognitive Surprise in Risk Management 413

15. Taleb, N.N.: Antifragile: Things That Gain from Disorder. Reprint edition edn.
Random House Trade Paperbacks, New York (2014)
16. Kawata, R., Kijima, M.: Value-at-risk in a market subject to regime switching.
Quantitative Finance 7(6), 609–619 (2007)
17. Alexander, C., Sarabia, J.M.: Quantile Uncertainty and Value-at-Risk Model Risk.
Risk Analysis 32(8), 1293–1308 (2012)
18. Duffie, D., Pan, J.: An Overview of Value at Risk. The Journal of Derivatives 4(3),
7–49 (1997)
19. Jorion, P.: Value at Risk: The New Benchmark for Managing Financial Risk, 3rd
edn. McGraw-Hill (2006)
20. Halbleib, R., Pohlmeier, W.: Improving the value at risk forecasts: Theory and
evidence from the financial crisis. Journal of Economic Dynamics and Control
36(8), 1212–1228 (2012)
21. David Cabedo, J., Moya, I.: Estimating oil price Value at Risk using the historical
simulation approach. Energy Economics 25(3), 239–253 (2003)
22. Hendricks, D.: Evaluation of value-at-risk models using historical data. Economic
Policy Review, 39–69, April 1996
23. Reisenzein, R., Hudlicka, E., Dastani, M., Gratch, J., Hindriks, K., Lorini, E.,
Meyer, J.J.: Computational modeling of emotion: Towards improving the inter-
and intradisciplinary exchange. IEEE Transactions on Affective Computing 99(1),
1 (2013)
24. Reisenzein, R.: Emotions as metarepresentational states of mind: Naturalizing the
belief-desire theory of emotion. Cognitive Systems Research 10(1), 6–20 (2009)
25. Meyer, W.U., Reisenzein, R., Schutzwohl, A.: Toward a process analysis of emo-
tions: The case of surprise. Motivation and Emotion 21, 251–274 (1997)
26. Macedo, L., Cardoso, A., Reisenzein, R., Lorini, E., Castelfranchi, C.: Artificial
surprise. In: Handbook of Research on Synthetic Emotions and Sociable Robotics:
New Applications in Affective Computing and Artificial Intelligence, pp. 267–291
(2009)
27. Baddeley, A., Eysenck, M., Anderson, M.C.: Memory, 1 edn. Psychology Press,
February 2009
28. Macedo, L., Reisenzein, R., Cardoso, A.: Modeling forms of surprise in artificial
agents: empirical and theoretical study of surprise functions. In: 26th Annual Con-
ference of the Cognitive Science Society, pp. 588–593 (2004)
29. Lorini, E., Castelfranchi, C.: The cognitive structure of surprise: looking for basic
principles. Topoi: An International Review of Philosophy 26, 133–149 (2007)
30. Ortony, A., Partridge, D.: Surprisingness and expectation failure: what’s the dif-
ference? In: Proceedings of the 10th International Joint Conference on Artificial
Intelligence, vol. 1, pp. 106–108. Morgan Kaufmann Publishers Inc., Milan (1987)
31. Griffin, D., Tversky, A.: The weighing of evidence and the determinants of confi-
dence. Cognitive Psychology 24(3), 411–435 (1992)
32. Baccan, D., Macedo, L., Sbruzzi, E.: Towards modeling surprise in economics and
finance: a cognitive science perspective. In: STAIRS 2014. Frontiers in Artificial
Intelligence and Applications, vol. 264, Prague, Czech Republic, pp. 31–40 (2014)
33. Cont, R.: Empirical properties of asset returns: stylized facts and statistical issues.
Quantitative Finance 1(2), 223 (2001)
34. Rumsfeld, D.: DoD News Briefing (2002)
35. Taleb, N.N.: Fooled by Randomness: The Hidden Role of Chance in Life and in
the Markets, 2 edn. Random House, October 2008
Logic Programming Applied to Machine Ethics

Ari Saptawijaya1,2(B) and Luı́s Moniz Pereira1

1
NOVA Laboratory for Computer Science and Informatics (NOVA LINCS),
Departamento de Informática, Faculdade de Ciências e Tecnologia,
Universidade Nova de Lisboa, Lisboa, Portugal
[email protected], [email protected]
2
Faculty of Computer Science, Universitas Indonesia, Depok, Indonesia

Abstract. This paper summarizes our investigation on the application of

LP-based reasoning to machine ethics, a field that emerges from the need of
imbuing autonomous agents with the capacity for moral decision-making.
We identify morality viewpoints (concerning moral permissibility and the
dual-process model) as studied in moral philosophy and psychology, which
are amenable to computational modeling. Subsequently, various LP-based
reasoning features are applied to model these identified morality view-
points, via classic moral examples taken off-the-shelf from the literature.

1 Introduction
The need for systems or agents that can function in an ethically responsible
manner is becoming a pressing concern, as they become ever more autonomous
and act in groups, amidst populations of other agents, including humans. Its
importance has been emphasized as a research priority in AI with funding sup-
port [26]. Its field of enquiry, named machine ethics, is interdiscplinary, and is
not just important for equipping agents with some capacity for moral decision-
making, but also to help better understand morality, via the creation and testing
of computational models of ethical theories.
Several logic-based formalisms have been employed to model moral theories
or particular morality aspects, e.g., deontic logic in [2], non-monotonic reason-
ing in [6], and the use of Inductive Logic Programming (ILP) in [1]; some of
them only abstractly, whereas others also provide implementations (e.g., using
ILP-based systems [1], an interactive theorem prover [2], and answer set pro-
gramming (ASP) [6]). Despite the aforementioned logic-based formalisms, Logic
Programming (LP) itself is rather limitedly explored. The potential and suitabil-
ity of LP, and of computational logic in general, for machine ethics, is identified
and discussed at length in [11], on the heels of our work. LP permits declara-
tive knowledge representation of moral cases with sufficiently level of detail to
distinguish one moral case from other similar cases. It provides a logic-based
programming paradigm with a number of practical Prolog systems, thus allow-
ing not only addressing morality issues in an abstract logical formalism, but also
via a Prolog implementation as proof of concept and a testing ground for exper-
imentation. Furthermore, LP are also equipped with various reasoning features,

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 414–422, 2015.
DOI: 10.1007/978-3-319-23485-4 41
Logic Programming Applied to Machine Ethics 415

as identiﬁed in the paragraph below, whose applications to machine ethics are

promising, but still unexplored. This paper summarizes our integrative investi-
gation on the appropriateness of various LP-based reasoning to machine ethics,
not just abstractly, but also furnishing a proof of concept implementation for
the morality issues in hand.
We identify conceptual morality viewpoints, which are covered in two moral-
ity themes: (1) moral permissibility, taking into account viewpoints such as the
Doctrines of Double Effect (DDE) [15], Triple Effect (DTE) [10], and Scanlon’s
contractualist moral theory [23]; and (2) the dual-process model [3,14], which
stresses the interaction between deliberative and reactive behaviors in moral
judgment. The mapping of all these considered viewpoints into LP-based rea-
soning benefits from its features and their integration, such as abduction with
integrity constraints (ICs) [22], preferences over abductive scenarios [4], prob-
abilistic reasoning [7], updating [21], counterfactuals [20], and from LP tabling
technique [25].
We show, in Section 2, how these various LP-based reasoning features are
employed to model the aforementioned morality viewpoints, including: (1) The
use of a priori ICs and a posteriori preferences over abductive scenarios to cap-
ture deontological and utilitarian judgments; (2) Probabilistic moral reasoning,
to reason about actions, under uncertainty, that might have occurred, and thence
provide judgment adhering to moral principles within some prescribed uncer-
tainty level. This permits to capture a form of argumentation (wrt. Scanlon’s
contractualism [23]) in courts, through presenting different evidences as a consid-
eration whether an exception can justify a verdict of guilty (beyond reasonable
doubt) or non-guilty; (3) The use of Qualm, which combines LP abduction,
updating, and counterfactuals, supported by LP tabling mechanisms (based on
[20–22]) to examine moral permissibility wrt. DDE and DTE, via counterfac-
tual queries. Finally, Qualm is also employed to experiment with the issue of
moral updating, allowing for other (possibly overriding) moral rules (themselves
possibly subsequently overridden) to be adopted by an agent, on top of those it
currently follows.

2 Modeling Morality with Logic Programming

2.1 Moral Permissibility with Abduction, a Priori ICs and a

Posteriori Preferences

In [17], moral permissibility is modeled through several cases of the classic trol-
ley problem [5], by emphasizing the use of ICs in abduction and preferences over
abductive scenarios. The cases, which include moral principles, are modeled in
order to deliver appropriate moral decisions that conform with those the major-
ity of people make, on the basis of empirical results in [9]. DDE [15] is utilized
in [9] to explain the consistency of judgments, shared by subjects from demo-
graphically diverse populations, on a series of trolley dilemmas. In addition to
DDE, we also consider DTE [10].
416 A. Saptawijaya and L.M. Pereira

Each case of the trolley problem is modeled individually; their details being
referred to [17]. The key points of their modeling are as follows. The DDE and
DTE are modeled via a priori ICs and a posteriori preferences. Possible deci-
sions are modeled as abducibles, encoded in Acorda by even loops over default
negation. Moral decisions are therefore accomplished by satisfying a priori ICs,
computing abductive stable models from all possible abductive solutions, and
then appropriately preferring amongst them (by means of rules), a posteriori,
just some models, on the basis of their abductive solutions and consequences.
Such preferred models turn out to conform with the results reported in the
literature.

Capturing Deontological Judgment via a Priori ICs. In this application, ICs are
used for two purposes. First, they are utilized to force the goal in each case (like
in [9]), by observing the desired end goal resulting from each possible decision.
Such an IC thus enforces all available decisions to be abduced, together with
their consequences, from all possible observable hypothetical end goals. The
second purpose of ICs is for ruling out impermissible actions, viz., actions that
involve intentional killing in the process of reaching the goal, enforced by the
IC: f alse ← intentional killing. The definition of intentional killing depends
on rules in each case considered and whether DDE or DTE is to be upheld.
Since this IC serves as the first filter of abductive stable models, by ruling out
impermissible actions, it affords us with just those abductive stable models that
contain only permissible actions.

Capturing Utilitarian Judgment via a Posteriori Preferences. Additionally, one

can further prefer amongst permissible actions those resulting in greater good.
That is, whereas a priori ICs can be viewed as providing an agent’s reactive
behaviors, generating intuitively intended responses that comply with deonto-
logical judgment (enacted by ruling out the use of intentional harm), a posteriori
preferences amongst permissible actions provides instead a more involved reason-
ing about action-generated models, capturing utilitarian judgment that favors
welfare-maximizing behaviors (in line with the dual-process model [3]).
In this application, a preference predicate (e.g., based on a utility function
concerning the number of people died) is deﬁned to select those abductive stable
models [4] containing decisions with greater good of overall consequences. The
reader is referred to [17] for the results of various trolley problem cases.

2.2 Probabilistic Moral Reasoning

In [8], probabilistic moral reasoning is explored, where an example is contrived to
reason about actions, under uncertainty, and thence provide judgment adhering
to moral rules within some prescribed uncertainty level. The example takes a
variant of the Footbridge case within the context of a jury trials in court, in
order to proﬀer verdicts beyond reasonable doubt: Suppose a board of jurors in
a court is faced with the case where the actual action of an agent shoving the
man onto the track was not observed. Instead, they are just presented with the
Logic Programming Applied to Machine Ethics 417

fact that the man on the bridge died on the side track and the agent was seen on
the bridge at the occasion. Is the agent guilty (beyond reasonable doubt), in the
sense of violating DDE, of shoving the man onto the track intentionally?
To answer it, abduction is enacted to reason about the verdict, given the avail-
able evidence. Considering the active goal judge, to judge the case, two abducibles
are available: verdict(guilty brd) and verdict(not guilty), where guilty brd
stands for ‘guilty beyond reasonable doubt’. Depending on how probable each
verdict (the value of which is determined by the probability print shove (P ) of
intentional shoving), a preferred verdict(guilty brd) or verdict(not guilty) is
abduced as a solution.
The probability with which shoving is performed intentionally is causally
influenced by evidences and their attending truth values. Two evidences are con-
sidered, viz., (1) Whether the agent was running on the bridge in a hurry; and
(2) Whether the bridge was slippery at the time. The probability print shove (P )
of intentional shoving is therefore determined by the existence of evidence,
expressed as dynamic predicates evd run/1 and evd slip/1, whose sole argu-
ment is true or f alse, standing for the evidences that the agent was running in
a hurry and that the bridge was slippery, resp.
Based on this representation, different judgments can be delivered, subject
to available (observed) evidences and their attending truth value. By considering
the standard probability of proof beyond reasonable doubt –here the value of
0.95 is adopted [16]– as a common ground for the probability of guilty verdicts to
be qualified as ‘beyond reasonable doubt’, a form of argumentation (à la Scanlon
contractualism [23]) may take place through presenting different evidence (via
updating of observed evidence atoms, e.g., evd run(true), evd slip(f alse), etc.)
as a consideration to justify an exception. Whether the newly available evidence
is accepted as a justification to an exception –defeating the judgment based
on the priorly presented evidence– depends on its influence on the probability
print shove (P ) of intentional shoving, and thus eventually influences the final
verdict. That is, it depends on whether this probability is still within the agreed
standard of proof beyond reasonable doubt. The reader is referred to [8], which
details a scenario capturing this moral jurisprudence viewpoint.

2.3 Modeling Morality with QUALM

Distinct from the two previous applications, Qualm emphasizes the interplay
between LP abduction, updating and counterfactuals, supported furthermore by
their joint tabling techniques.

Counterfactuals in Morality. We revisit moral permissibility wrt. DDE and DTE,

but now applying counterfactuals. Counterfactuals may provide a general way
to examine DDE in dilemmas, like the classic trolley problem, by distinguishing
between a cause and a side-effect as a result of performing an action to achieve a
goal. This distinction between causes and side-effects may explain the permissi-
bility of an action in accordance with DDE. That is, if some morally wrong effect
E happens to be a cause for a goal G that one wants to achieve by performing
418 A. Saptawijaya and L.M. Pereira

an action A, and E is not a mere side-eﬀect of A, then performing A is imper-

missible. This is expressed by the counterfactual form below, in a setting where
action A is performed to achieve goal G: “If not E had been true, then not G
would have been true.”
The evaluation of this counterfactual form identifies permissibility of action A
from its effect E, by identifying whether the latter is a necessary cause for goal G or
a mere side-effect of action A: if the counterfactual proves valid, then E is instru-
mental as a cause of G, and not a mere side-effect of action A. Since E is morally
wrong, achieving G that way, by means of A, is impermissible; otherwise, not.
Note, the evaluation of counterfactuals in this application is considered from the
perspective of agents who perform the action, rather than from that of observers.
Moreover, the emphasis on causation in this application focuses on agents’ delib-
erate actions, rather than on causation and counterfactuals in general.
We demonstrate in [18] the application of this counterfactual form in machine
ethics. First, we use counterfactual queries to distinguish moral permissibility
between two off-the-shelf military cases from [24], viz., terror bombing vs. tacti-
cal bombing, according to DDE. In the second application, we show that coun-
terfactuals may as well be suitable to justify permissibility, via a process of
argumentation (wrt. Scanlon contractualism [23]), using a scenario built from
cases of the trolley problem that involve both DDE and DTE. Alternatively, we
show that moral justification can also be addressed via ‘compound counterfac-
tuals’ – Had I known what I know today, then if I were to have done otherwise,
something preferred would have followed – for justifying with hindsight a moral
judgment that was passed under lack of current knowledge.

Moral Updating. Moral updating (and evolution) concerns the adoption of new
(possibly overriding) moral rules on top of those an agent currently follows. Such
adoption often happens in the light of situations freshly faced by the agent, e.g.,
when an authority contextually imposes other moral rules, or due to some cul-
tural difference. In [12], moral updating is illustrated in an interactive storytelling
(using Acorda), where the robot must save the princess imprisoned in a castle,
by defeating either of two guards (a giant spider or a human ninja), while it
should also attempt to follow (possibly conflicting) moral rules that may change
dynamically as imposed by the princess (for the visual demo, see [13]).
The storytelling is reconstructed in this paper using Qualm, to particularly
demonstrate: (1) The direct use of LP updating so as to place a moral rule
into effect; and (2) The relevance of contextual abduction to rule out tabled but
incompatible abductive solutions, in case a goal is invoked by a non-empty initial
abductive context (the content of this context may be obtained already from
another agent, e.g., imposed by the princess). A simplified program modeling
the knowledge of the princess-savior robot in Qualm is shown below, where
f ight/1 is an abducible predicate:
guard(spider). guard(ninja). human(ninja).
survive f rom(G) ← utilV al(G, V ), V > 0.6. utilV al(spider, 0.4). utilV al(ninja, 0.7).
intend saveP rincess ← guard(G), f ight(G), survive f rom(G).
intend saveP rincess ← guard(G), f ight(G).
Logic Programming Applied to Machine Ethics 419

The first rule of intend saveP rincess corresponds to a utilitarian moral rule
(wrt. the robot’s survival), whereas the second one to a ‘knight’ moral, viz., to
intend the goal of saving the princess at any cost (irrespective of the robot’s
survival chance). Since each rule in Qualm is assigned a unique name in
its transform (based on rule name fluent in [21]), the name of each rule for
intend saveP rincess may serve as a unique moral rule identifier for updating
by toggling the rule’s name, say via rule name fluents #rule(utilitarian) and
#rule(knight), resp. In the subsequent plots, query ?- intend saveP rincess is
referred, representing the robot’s intent on saving the princess.
In the first plot, when both rule name fluents are retracted, the robot does not
adopt any moral rule to save the princess, i.e., the robot has no intent to save the
princess, and thus the princess is not saved. In the second (restart) plot, in order
to maximize its survival chance in saving the princess, the robot updates itself
with the utilitarian moral: the program is updated with #rule(utilitarian). The
robot thus abduces f ight(ninja) so as to successfully defeat the ninja instead
of confronting the humongous spider.
The use of tabling in contextual abduction is demonstrated in the third
(start again) plot. Assuming that the truth of survive f rom(G) implies the
robot’s success in defeating (killing) guard G, the princess argues that the
robot should not kill the human ninja, as it violates the moral rule she fol-
lows, say a ‘Gandhi’ moral, expressed by the following rule in her knowledge
(the first three facts in the robot’s knowledge are shared with the princess):
f ollow gandhi ← guard(G), human(G), not f ight(G). That is, the princess
abduces not f ight(ninja) and imposes this abductive solution as the initial
(input) abductive context of the robot’s goal (viz., intend saveP rincess). This
input context is inconsistent with the tabled abductive solution f ight(ninja),
and as a result, the query fails: the robot may argue that the imposed ‘Gandhi’
moral conflicts with its utilitarian rule (in the visual demo [13], the robot
reacts by aborting its mission). In the final plot, as the princess is not saved
yet, she further argues that she definitely has to be saved, by now addition-
ally imposing on the robot the ‘knight’ moral. This amounts to updating the
rule name fluent #rule(knight) so as to switch on the corresponding rule. As
the goal intend saveP rincess is still invoked with the input abductive context
not f ight(ninja), the robot now abduces f ight(spider) in the presence of the
newly adopted ‘knight’ moral. Unfortunately, it fails to survive, as confirmed by
the failing of the query ?- survive f rom(spider).
The plots in this story reflect a form of deliberative employment of moral
judgments within Scanlon’s contractualism. For instance, in the second plot,
the robot may justify its action to fight (and kill) the ninja due to the utilitar-
ian moral it adopts. This justification is counter-argued by the princess in the
subsequent plot, making an exception in saving her, by imposing the ‘Gandhi’
moral, disallowing the robot to kill a human guard. In this application, rather
than employing updating, this exception is expressed via contextual abduction
with tabling. The robot may justify its failing to save the princess (as the robot
leaving the scene) by arguing that the two moral rules it follows (viz., utilitarian
420 A. Saptawijaya and L.M. Pereira

and ‘Gandhi’) are conﬂicting wrt. the situation it has to face. The argumenta-
tion proceeds, whereby the princess orders the robot to save her whatever risk
it takes, i.e., the robot should follow the ‘knight’ moral.

3 Conclusion and Future Work

The paper summarizes our investigation on the application of LP-based rea-
soning to the terra incognita of machine ethics, a field that is now becoming a
pressing concern and receiving wide attention. Our research shows a number of
original inroads, exhibiting a proof of possibility to model morality viewpoints
systematically using a combination of various LP-based reasoning features (such
as LP abduction, updating, preferences, probabilistic LP and counterfactuals)
afforded by the-state-of-the-art tabling mechanisms, through moral examples
taken off-the-shelf from the literature. Given the broad dimension of the topic,
our contributions touch solely on a dearth of morality issues. Nevertheless, it
prepares and opens the way for additional research towards employing various
features in LP-based reasoning to machine ethics. Several topics can be further
explored in the future, as summarized below.
So far, our application of counterfactuals in machine ethics is based on the
evaluation of counterfactuals in order to determine their validity. It is interesting
to explore in future other aspects of counterfactual reasoning relevant for moral
reasoning. First, we can consider assertive counterfactuals: rather than evaluat-
ing the truth validity of counterfactuals, they are asserted (known) as being a
valid statement. The causality expressed by such a valid counterfactual may be
useful for refining moral rules, which can be achieved through incremental rule
updating. Second, we may extend the antecedent of a counterfactual with a rule,
instead of just literals, allowing to express exception in moral rules, such as “If
killing the giant spider had been done by a noble knight, then it would not have
been wrong”. Third, we can imagine the situation where the counterfactual’s
antecedent is not given, though its conclusion is, the issue being that the con-
clusion is some moral wrong. In this case, we want to abduce the antecedent in
the form of interventions that would prevent some wrong: “What could I have
done to prevent a wrong?”.
This paper contemplates the individual realm of machine ethics: it stresses
individual moral cognition, deliberation, and behavior. A complementary realm
stresses collective morals, and emphasizes instead the emergence, in a popula-
tion, of evolutionarily stable moral norms, of fair and just cooperation, to the
advantage of the whole evolved population. The latter realm is commonly studied
via Evolutionary Game Theory by resorting to simulation techniques, typically
with pre-determined conditions, parameters, and game strategies (see [19] for
references). The bridging of the gap between the two realms [19] would appear
to be promising for future work. Namely, how the study of individual cognition
of morally interacting multi-agent (in the context of this paper, by using LP-
based reasoning features) is applicable to the evolution of populations of such
agents, and vice versa.
Logic Programming Applied to Machine Ethics 421

Acknowledgments. Both authors acknowledge the support from FCT/MEC NOVA

LINCS PEst UID/CEC/ 04516/2013. Ari Saptawijaya acknowledges the support from
FCT/MEC grant SFRH/BD/72795/2010.

References
1. Anderson, M., Anderson, S.L.: EthEl: Toward a principled ethical eldercare robot.
In: Procs. AAAI Fall 2008 Symposium on AI in Eldercare (2008)
2. Bringsjord, S., Arkoudas, K., Bello, P.: Toward a general logicist methodology for
engineering ethically correct robots. IEEE Intelligent Systems 21(4), 38–44 (2006)
3. Cushman, F., Young, L., Greene, J.D.: Multi-system moral psychology. In: Doris,
J.M. (ed.) The Moral Psychology Handbook. Oxford University Press (2010)
4. Dell’Acqua, P., Pereira, L.M.: Preferential theory revision. Journal of Applied Logic
5(4), 586–601 (2007)
5. Foot, P.: The problem of abortion and the doctrine of double effect. Oxford Review
5, 5–15 (1967)
6. Ganascia, J.-G.: Modelling ethical rules of lying with answer set programming.
Ethics and Information Technology 9(1), 39–47 (2007)
7. Anh, H.T., Kencana Ramli, C.D.P., Damásio, C.V.: An implementation of
extended P-Log using XASP. In: Garcia de la Banda, M., Pontelli, E. (eds.) ICLP
2008. LNCS, vol. 5366, pp. 739–743. Springer, Heidelberg (2008)
8. Han, T.A., Saptawijaya, A., Pereira, L.M.: Moral reasoning under uncertainty. In:
Bjørner, N., Voronkov, A. (eds.) LPAR-18 2012. LNCS, vol. 7180, pp. 212–227.
Springer, Heidelberg (2012)
9. Hauser, M., Cushman, F., Young, L., Jin, R.K., Mikhail, J.: A dissociation between
moral judgments and justifications. Mind and Language 22(1), 1–21 (2007)
10. Kamm, F.M.: Intricate Ethics: Rights, Responsibilities, and Permissible Harm.
Oxford U. P (2006)
11. Kowalski, R.: Computational Logic and Human Thinking: How to be Artificially
Intelligent. Cambridge U. P (2011)
12. Lopes, G., Pereira, L.M.: Prospective storytelling agents. In: Carro, M., Peña, R.
(eds.) PADL 2010. LNCS, vol. 5937, pp. 294–296. Springer, Heidelberg (2010)
13. Lopes, G., Pereira, L.M.: Visual demo of “Princess-saviour Robot” (2010). http://
centria.di.fct.unl.pt/∼lmp/publications/slides/padl10/quick moral robot.avi
14. Mallon, R., Nichols, S.: Rules. In: Doris, J.M. (ed.) The Moral Psychology Hand-
book. Oxford University Press (2010)
15. McIntyre, A.: Doctrine of double effect. In: Zalta, E.N. (ed.) The Stanford Encyclo-
pedia of Philosophy. Center for the Study of Language and Information, Stanford
University, Fall 2011 edition (2004). http://plato.stanford.edu/archives/fall2011/
entries/double-effect/
16. Newman, J.O.: Quantifying the standard of proof beyond a reasonable doubt: a
comment on three comments. Law, Probability and Risk 5(3–4), 267–269 (2006)
17. Pereira, L.M., Saptawijaya, A.: Modelling morality with prospective logic. In:
Anderson, M., Anderson, S.L. (eds.) Machine Ethics, pp. 398–421. Cambridge U.
P (2011)
18. Pereira, L.M., Saptawijaya, A.: Abduction and beyond in logic programming with
application to morality. Accepted in “Frontiers of Abduction”, Special Issue in
IfCoLog Journal of Logics and their Applications (2015). http://goo.gl/yhmZzy
422 A. Saptawijaya and L.M. Pereira

19. Pereira, L.M., Saptawijaya, A.: Bridging two realms of machine ethics. In: White,
J.B., Searle, R. (eds.) Rethinking Machine Ethics in the Age of Ubiquitous Tech-
nology. IGI Global (2015)
20. Pereira, L.M., Saptawijaya, A.: Counterfactuals in Logic Programming with Appli-
cations to Agent Morality. Accepted at a special volume of Logic, Argumentation
& Reasoning (2015). http://goo.gl/6ERgGG (preprint)
21. Saptawijaya, A., Pereira, L.M.: Incremental tabling for query-driven propagation
of logic program updates. In: McMillan, K., Middeldorp, A., Voronkov, A. (eds.)
LPAR-19 2013. LNCS, vol. 8312, pp. 694–709. Springer, Heidelberg (2013)
22. Saptawijaya, A., Pereira, L.M.: TABDUAL: a Tabled Abduction System for Logic
Programs. IfCoLog Journal of Logics and their Applications 2(1), 69–123 (2015)
23. Scanlon, T.M.: What We Owe to Each Other. Harvard University Press (1998)
24. Scanlon, T.M.: Moral Dimensions: Permissibility, Meaning, Blame. Harvard Uni-
versity Press (2008)
25. Swift, T.: Tabling for non-monotonic programming. Annals of Mathematics and
Artificial Intelligence 25(3–4), 201–240 (1999)
26. The Future of Life Institute. Research Priorities for Robust and Beneficial Arti-
ficial Intelligence (2015). http://futureoflife.org/static/data/documents/research
priorities.pdf
Intelligent Information Systems
Are Collaborative Filtering Methods Suitable for Student
Performance Prediction?

Hana Bydžovská()

CSU and KD Lab Faculty of Informatics, Masaryk University, Brno, Czech Republic
[email protected]

Abstract. Researchers have been focusing on prediction of students’ behavior

for many years. Different systems take advantages of such revealed information
and try to attract, motivate, and help students to improve their knowledge. Our
goal is to predict student performance in particular courses at the beginning of
the semester based on the student’s history. Our approach is based on the idea
of representing students’ knowledge as a set of grades of their passed courses
and finding the most similar students. Collaborative filtering methods were uti-
lized for this task and the results were verified on the historical data originated
from the Information System of Masaryk University. The results show that this
approach is similarly effective as the commonly used machine learning methods
like Support Vector Machines.

Keywords: Student performance · Prediction · Collaborative filtering methods ·

Recommender system

1 Introduction

Students have to accomplish all the study requirements defined by their university.
The most important is to pass all mandatory courses and to select elective and volun-
tary courses that they are able to pass. Masaryk University offers a vast amount of
courses to its students. Therefore, it is very difficult for students to make a good deci-
sion. It is important for us to understand students’ behavior to be able to guide them
through their studies to graduate. Our goal is to design an intelligent module inte-
grated into the Information System of Masaryk University that will help students with
selecting suitable courses and warn them against too difficult ones. Necessarily, we
need to be able to predict whether a student will succeed or fail in an investigated
course in order to realize the module. We need the information at the beginning of a
term when we have no information about students’ knowledge, skills or enthusiasm
for any particular course. We also do not want to obtain the information directly from
students using questionnaires. Since the questionnaires tend to have a lower response
rate, we use only verifiable data from the university information system.
We have drawn inspiration from techniques utilized in recommender systems. No-
wadays, usage of collaborative filtering (CF) methods [5] spreads over many areas
including the educational environment. Walker et al. [9] designed a system called

© Springer International Publishing Switzerland 2015

F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 425–430, 2015.
DOI: 10.1007/978-3-319-23485-4_42
426 H. Bydžovská

Altered Vista that was specifically aimed at teachers and students who reviewed web
resources targeted at education. The system implemented CF methods in order to
recommend web resources to its users. Many researchers aimed at e-learning, e.g.
Loll et al. [6] designed a system that enables students to solve their exercises and to
criticize their schoolmates’ solutions. The system based on students’ answers could
reveal difficult tasks and recommend good solutions to enhance students’ knowledge.
In this paper, we report on the possibility to estimate student performance in par-
ticular courses based only on knowledge of students’ previously passed courses. We
utilize different CF to estimate the final prediction. The preliminary work can be seen
in [1]. Now we explore the most suitable settings of the approach in detail and com-
pare the results with our previous approach using classification algorithms [2].

2 Our Previous Approach

Many researchers successfully used machine learning approach to predict students’

performance [7]. In order to characterize students, we collected many different data
about students that are stored in the Information System of Masaryk University [2].
The study-related data contained attributes such as gender, year of birth, year of ad-
mission, number of credits gained from passed courses. The social data described
students’ behavior and co-operation with other students, e.g. weighted average grades
of their friends, importance in the sociogram computed for example from the commu-
nication statistics, students‘ publication co-authoring, and comments among students.
We also utilized algorithms implemented in Weka [10] with different data sets in
order to obtain the best possible results. We utilized algorithms for regression and
classification, and we also computed the mean absolute error from confusion matrix.
In order to lower the number of attributes, we employed feature selection (FS) algo-
rithms. Support Vectors Machines (SMO) and Random Forests were the most suitable
methods in combination with FS algorithms. The comparison of this approach with
the results obtained by CF methods on the same data set can be seen in Section 4.

3 Experiment

Our hypothesis is that students’ knowledge can be characterized by the grades of

courses that students enrolled during their studies. Based on this information we can
select students with similar interests and knowledge and subsequently predict whether
a particular student has sufficient skills needed for a particular course.
For our purposes, each student can be represented with a vector of grades of
courses passed in one of the student’s studies. In order to confirm our hypothesis, we
selected 62 courses with different success rates that were offered to students in the
years 2010 – 2013 at Masaryk University. The students for whom we were not able to
give any prediction – students without any history in the system and without any
passed course – were omitted from the experiment. The extracted data set comprised
of 3,423 students enrolled at least in one of the 62 courses and their 42,635 grades.
Are Collaborative Filtering Methods Suitable for Student Performance Prediction? 427

Our aim was to predict the grades of students enrolled in the investigated courses
in the year 2012 based on the results of similar students enrolled in the same courses
in the years 2010 and 2011. Then we could verify the predictions with the real grades
and evaluate the methods and the settings. Then we selected the most suitable method
and verified it on data about students enrolled in the same courses in the year 2013.

Similarity of Students. For each student, we constructed four vectors of grades cha-
racterizing the knowledge. The values were computed with respect to the number of
repetition of each course. We consider only the last grade (NEWEST), a grade of each
attempt at the last year (YEAR), only the last grades of each repetition (LAST), and
all grades (ALL). For example, a student failed a course in the first year using three
attempts and got the grades 444. The student had to repeat the course next year. Sup-
posing he or she got the grades 442, the student’s values for this particular course
were the following: NEWEST: 2, YEAR: 4+4+2, LAST: 4+2, ALL: 4+4+4+4+4+2.
Vectors of grades were compared by five methods. Mean absolute difference
(MAD) and Root mean squared difference (RMSD) measure the mean difference of
the investigated student’s grades and the grades of students’ in their shared courses.
The lower the value, the better the result is. The other methods return values near 1
for the best results. Cosine similarity (COS) and Pearson’s correlation coefficient
(PC) define the similarity of grades of shared courses. Jaccard’s coefficient (JC) de-
fines the ratio of shared and different courses. Supposing that students’ knowledge
can be represented with passed courses, it was very important to calculate the overlap
of students’ courses.

Neighborhood Selection. We selected several methods to compute a suitable

neighborhood:

• Top x, where x ∈ [1; 50] with step 1; (the analysis [4] indicates that the neighbor-
hood of 20 - 50 neighbors is usually optimal).
• More similar than the threshold y, where y ∈ [0; 1] with step 0.1.
• We also utilized the idea of baseline user [8]. We selected only these students to
the neighborhood that were more similar to the investigated one than the investi-
gated one to the baseline user. We decided to calculate two types of baseline user:
─ Average student – we characterized an average student by the average grades of
courses in which the investigated students were enrolled in.
─ Uniform student – we characterized a uniform student by grades with values 2.5
(the average grade through all courses) of all courses in which the investigated
student was enrolled in.

Grade Prediction. As the neighborhood was defined, we could make a prediction.

We used different approaches to estimate grades from grades of students in the neigh-
borhood: mean, median, and the majority class. We also utilized the significance
weighting [3], also its extension using average grades of compared students (sig.
weighting +), and lowering the importance of students with only few co-ratings [4].
428 H. Bydžovská

4 Results

Mean absolute error (MAE) represents the size of the prediction error. The exact
grade prediction is very difficult and even less powerful prediction can be sufficient.
Therefore, we also predicted the grades as good (1) / bad (2) / failure (4) or just suc-
cess or failure. The results of the CF methods were compared with our previous work
described in Section 2 where the predictions were obtained using classification algo-
rithms (CA). We used a confusion matrix for calculating MAE. We mined study-
related data and data about social behavior of students.
The comparison of both approaches can be seen in Table 1. Although both the ap-
proaches used different data from the information system and utilized different
processing, the results showed that their performance was almost similar. The only
one significant difference can be seen in grade prediction when CF methods were
slightly better. We consider the accuracy of 78.5% for student success or failure pre-
diction reliable enough considering that we did not know students’ skills or enthu-
siasm for courses. MAE of good / bad / failure prediction was around 0.6. We consid-
er MAE less than one degree in the modified grade scale to be very satisfactory. Even
in the grade (1, 1.5, 2, 2.5, 3, and 4) prediction, MAE was around 0.7 which means
only slightly more than one degree in the grade scale. In general, these results were
positive but the grade prediction was still not trustworthy.

Table 1. Comparison of approaches

Grade Good/bad/failure Success/failure
Approach Selection MAE Accuracy
CA 2012 0.67 0.58 81.04%
2013 0.84 0.61 78.72%
CF 2012 0.64 0.57 80.44%
2013 0.68 0.64 78.58%

The advantage of the CF approach is that all information systems store the data
about students’ grades. Therefore, this approach can be used in all systems. Our pre-
vious approach was based on mining data obtained from the information system. But
not all systems store the data about social behavior of students. We proved that this
data improve the accuracy of the results significantly [2].

5 Discussion

The settings of the CF approach that reached the best average results can be seen in
Table 2. As the results show, PC worked properly in combination with the uniform
student for selecting a proper neighborhood and significance weighting with an exten-
sion using average grades of compared students for the final prediction. On the other
hand, for MAD, a Top x function was the best option for selecting the neighborhood
and median for the final prediction. Both the approaches reached very similar results
Are Collaborative Filtering Methods Suitable for Student Performance Prediction? 429

in all tasks and we consider them to be trustworthy. We also investigated the most
suitable x for these tasks. We searched for the minimal x with the best possible results.
We derived x = 25 to be the best choice generally for all methods and settings. The
most suitable classification algorithms were SMO and Random Forests (Table 3).

Table 2. The settings of the CF approach that reached the best average results
Sim. function Neighborhood Estimation approach
Grade PC Uniform student Sig. weighting +
Good/bad/failure PC Uniform student Sig. weighting +
Success/failure MAD Top 25 Median

Table 3. The settings of the classification algorithms that reached the best average results
Classification algorithm Feature selection algorithm
Grade SMO InfoGainAttributeEval
Good/bad/failure SMO OneRAttributeEval
Success/failure Random Forests 5 attributes selected by each
FS algorithm for each course

We also investigated the influence of different details of grades described in

Section 3. The conclusion was that only the NEWEST grade was expressive enough
for a satisfactory prediction. More detailed information about the grades did not
improve the results significantly.

6 Conclusion

In this paper, we used CF methods for student modeling. Our experiment provides
evidence that CF approach is also suitable for student performance prediction. The
data set comprised of 62 courses taught in 4 years with almost 3,423 students and
their 42,635 grades. We confirmed our hypothesis, that students’ knowledge can be
sufficiently characterized only by their previously passed courses that should cover
their knowledge of the field of study. We processed data about students’ grades stored
in the Information System of Masaryk University to be able to estimate students’
interests, enthusiasm and prerequisites for passing enrolled courses at the beginning
of each term. For each investigated student, we searched for students enrolled in the
same courses in the last years who were the most similar ones to the investigated stu-
dent. Based on their study results, we predicted the students’ performance.
We compared the results with the results obtained by classification algorithms that
researches usually utilize for student performance prediction. The results were almost
the same. The main advantage of CF approach is that all university information sys-
tems store the data about students’ grades needed for the prediction. On the other
hand, this approach is not suitable if we have no information about the history of the
particular students. Now, we are able to predict the student success or failure with the
accuracy of 78.5%, whether the grade will be good, bad, or failure with the MAE of
430 H. Bydžovská

0.6 and the exact grade with the MAE of 0.7. We consider the results to be very satis-
factory and CF approach can be considered as expressive as the commonly used clas-
sification algorithms.
Based on this approach we can recommend suitable voluntary courses for each stu-
dent with respect to his or her interests and skills. We hope that this information will
also encourage students to study hard when they have to enroll in a mandatory course
that seems to be too difficult for them. Moreover, teachers can utilize this information
to identify potentially weak students and help them before they will be at risk to fail
the course. This approach can be also beneficially used in an intelligent tutoring sys-
tem as the basic estimation of students’ potentials before they start to operate with the
system in the investigated course.

Acknowledgement. We thank Michal Brandejs, Lubomír Popelínský, and all colleagues of

Knowledge Discovery Lab, and also IS MU development team for their assistance. This work
has been partially supported by Faculty of Informatics, Masaryk University.

References
1. Bydžovská, H.: Student performance prediction using collaborative filtering methods. In:
Conati, C., Heffernan, N., Mitrovic, A., Verdejo, M. (eds.) AIED 2015. LNCS, vol. 9112,
pp. 550–553. Springer, Heidelberg (2015)
2. Bydžovská, H., Popelínský, L.: The Influence of social data on student success prediction.
In: Proceedings of the 18th International Database Engineering & Applications Sympo-
sium, pp. 374–375 (2014)
3. Herlocker, J.L., Konstan, J.A., Borchers, A., Riedl, J.: An algorithmic framework for per-
forming collaborative filtering. In: Proceedings of the 22nd Annual International ACM
SIGIR Conference, pp. 230–237 (1999)
4. Herlocker, J.L., Konstan, J.A., Riedl, J.: Explaining collaborative filtering recommenda-
tions. In: Proceedings of the ACM Conference on Computer Supported Cooperative Work,
pp. 241–250 (2000)
5. Jannach, D., Zanker, M., Felfernig, A., Friedrich, G.: Recommender Systems: An Intro-
duction. Cambridge University Press (2010)
6. Loll, F., Pinkwart N.: Using collaborative filtering algorithms as elearning tools. In:
Proceedings of the 42nd Hawaii International Conference on System Sciences (2009)
7. Marquez-Vera, C., Romero, C., Ventura, S.: Predicting school failure using data mining.
In: Pechenizkiy, M., et al. (eds.) EDM, pp. 271–276 (2011)
8. Matuszyk, P., Spiliopoulou, M.: Hoeffding-CF: neighbourhood-based recommendations
on reliably similar users. In: Dimitrova, V., Kuflik, T., Chin, D., Ricci, F., Dolog, P.,
Houben, G.-J. (eds.) UMAP 2014. LNCS, vol. 8538, pp. 146–157. Springer, Heidelberg
(2014)
9. Walker, A., Recker, M.M., Lawless, K., Wiley, D.: Collaborative Information Filtering:
A Review and an Educational Application. International Journal of Artificial Intelligence
in Education 14(1), 3–28 (2004)
10. Witten, I., Frank, E, Hall, M.: Data Mining: Practical Machine Learning Tools and
Techniques, 3rd edn. Morgan Kaufmann Publishers (2011)
Intelligent Robotics
A New Approach for Dynamic Strategic
Positioning in RoboCup Middle-Size League

António J.R. Neves(B) , Filipe Amaral, Ricardo Dias, João Silva,

and Nuno Lau

Intelligent Robotics and Intelligent Systems Lab,

IEETA/DETI – University of Aveiro, Aveiro, Portugal
[email protected]

Abstract. Coordination in multi-robot or multi-agent systems has been

receiving special attention in the last years and has a prominent role in
the field of robotics. In the robotic soccer domain, the way that each
team coordinates its robots, individually and together, in order to per-
form cooperative tasks is the base of its strategy and in large part dictates
the success of the team in the game. In this paper we propose the use
of Utility Maps to improve the strategic positioning of a robotic soccer
team. Utility Maps are designed for different set pieces situations, mak-
ing them more dynamic and easily adaptable to the different strategies
used by the opponent teams. Our approach has been tested and success-
fully integrated in normal game situations to perform passes in free-play,
allowing the robots to choose, in real-time, the best position to receive
and pass the ball. The experimental results obtained, as well as the anal-
ysis of the team performance during the last RoboCup competition show
that the use of Utility Maps increases the efficiency of the team strategy.

1 Introduction
RoboCup (“Robot Soccer World Cup”) is a scientific initiative with an annual
international meeting and competition that started in 1997. The aim is to world-
widly promote developments in Artificial Intelligence, Robotics and Multi-agent
systems. Robot soccer represents one of the attractive domains promoted by
RoboCup for the development and testing of multi-agent collaboration tech-
niques, computer vision algorithms and artificial intelligence approaches, only
to name a few.
In the RoboCup Middle Size League (MSL), autonomous mobile soccer
robots must coordinate and collaborate for playing and winning a game of soc-
cer, similar to the human soccer games. They have to assume dynamic roles in
the field, to share information about visible objects of interest or obstacles and
to position themselves in the field so that they can score goals and prevent the
opponent team from scoring. Decisions such as game strategies, positioning and
team coordination play a major role in the MSL soccer games.
This paper introduces Utility Maps as a tool for the dynamic positioning of
soccer robots on the field and for opportunistic passing between robots, under

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 433–444, 2015.
DOI: 10.1007/978-3-319-23485-4 43
434 A.J.R. Neves et al.

different situations that will be presented throughout the paper. As far as the
authors know, no previous work has been presented about the use of Utility
Maps in the Middle Size League of RoboCup.
The paper is structured into 8 sections, first of them being this Introduction.
In Section 2 we present a summary of the work already done on strategic posi-
tioning. Section 3 introduces the use of Utility Maps in the software structure of
the CAMBADA MSL team. In Section 4 we describe the construction of Util-
ity Maps. Section 5 describes the use of Utility Maps for the positioning of the
robots in defensive set pieces. In Section 6 we present the use of Utility Maps
in offensive set pieces, while Section 7 presents their use in Free Play. Finally,
Section 8 introduces some measures of the impact of the Utility Maps on the
performance of the team and discusses a series of results that prove the efficiency
of the team strategy, based on Utility Maps.

2 Related Work

Strategic positioning is a topic with broad interest within the RoboCup com-
munity. As teams participating in the RoboCup Soccer competitions gradually
managed to solve the most basic tasks involved in a soccer game, such as loco-
motion, ball detection and ball handling, the need of having smarter and more
efficient robotic soccer players arose. Team coordination and strategic position-
ing are nowadays the key factors when it comes to winning a robotic soccer game.
The first efforts for achieving coordination in multi-agent soccer teams has
been presented in [2] [3]. Strategic Positioning with Attraction and Repulsion
(SPAR) takes into account the positions of other agents as well as that of the
ball. The following forces are evaluated when taking a decision regarding the
positioning of an agent: repulsion from opponents and team members, attraction
to the active team member and ball and attraction to the opponents’ goal.
In the RoboCup Soccer Simulation domain, Situation Based Strategic Posi-
tion (SBSP) [4] is a well known technique used for the positioning of the software
agents. The positioning of an agent only takes into consideration the ball posi-
tion, as focal point, and does not consider other agents. However, if all agents are
assumed to always devote their attention to the ball position, then cooperative
behavior can be achieved indirectly. An agent defines its base strategic position
based on the analysis of the tactic and team formation. Its position is adjusted
accordingly to ball pose and game situation. This approach has been adapted to
the Middle Size League constraints and has been presented in [5].
In [6] a method for Dynamic Positioning based on Voronoi Cells(DPVC) was
introduced. The robotic agents are placed based on attraction vectors. These
vectors represent the attraction of the players towards objects, depending on
the current state of the game and roles.
The Delaunay Triangulation formations (DT) [7] divide the soccer pitch into
several triangles based on given training data. Each training datum affects only
the divided region to which it belongs. A map is built from a focal point, such
as ball position, to the positioning of the agents.
A New Approach for Dynamic Strategic Positioning 435

For more than 20 years, grid-based representations have been used in robotics
in order to show different kinds of spacial information, allowing a more accurately
and simplified world perception and modelling [8] [9]. Usually, this type of rep-
resentations are oriented to a specific goal. Utility functions have been presented
before [10] [11] as a tool for role choosing within multi-agents systems.
Taking into account the successful use of this approach, a similar idea was
applied to the CAMBADA agents, but with different functionalities and purpose.
The aim of the proposed approach was to improve the collective behavior in some
specific game situations. Utility Maps have been developed as support tools for
the positioning of soccer robots in defensive and offensive set pieces, as well as
in freeplay passes situations.

3 Utility Maps of CAMBADA MSL Team

The general architecture of the CAMBADA robots has been described in [1].
The decision about the strategic positioning of the robots in taken by the high
level agent. This is a process that, at each cycle, is responsible for the high-level
control of the robots, which is divided in several stages. The first stage is the
sensor fusion, executed by an integrator module, with the objective of gathering
the noisy information from the sensors and from its team mates and updating
the state of the world. This world state will be used by the high-level decision
and coordination modules.
In the second stage of the high-level control of the robots, the agent has to
decide how to act given the state of the world that he built. At the higher level
it assumes a Role and operates on the field with a given attitude, for example, as
role Striker. A detailed description about the Roles used in CAMBADA team
can be found in [13]. The actions it can take are defined by lower level Behaviors,
which define the orders to be send to the actuators in order to fulfill a task, for
example, a Move behavior, to reach a given point on the field.
During a MSL game, there are three possible game situations: defensive set
pieces when for some reason (a fault, ball outside the field or a valid goal) the
game stops and the ball belongs to the opponent team; offensive set pieces
when for some reason the game stops and the ball belongs to our team and free
play when the ball is moving on the field after a game stop.
In defensive set pieces only one role is involved, role Barrier. The robot
will stay in this role until the ball is considered to be in game. The ball is in game
when it moves more than 20cm or if 10 seconds passed since the start signal has
been given by the referee. After this point, the game enters in free play mode
and new roles will be assigned to the robots, as described next.
In offensive set pieces two roles are involved, role Replacer and role
Receiver. The robot closest to the ball will assume the role Replacer and
all the others, except for the goal keeper, will assume the role Receivers. After
the ball has been passed, the robot that will receive the ball becomes Striker
and all the other robots are Midfielders. After a successful pass or when our
robots detect that the ball has been gathered by the opponent team, the game
state changes to free play.
436 A.J.R. Neves et al.

When the game is in free play, a robot can assume one of two roles: Striker
or Midfielder, depending on the relative robot position regarding the ball.

4 Building the Utility Maps

In this paper we propose an approach for dynamic strategy positioning in MSL

based on Utility Maps. The position of each robot that takes part in a specific
game situation is dynamically obtained based on the information about the
environment around the robot, namely its position on the field, the position of
obstacles and the ball position.
The CAMBADA robots use a catadioptric vision system, often named omni-
directional vision system. The algorithms for detecting the objects of interest are
presented in [14]. The information acquired by the vision system is merged with
other information of the robot to build the worldstate information, namely its
position on the field and a list of valid balls and a list of valid obstacles. A detailed
description of the algorithms used to build the worldstate is presented in [15].
The information about the obstacles in the current version of the CAMBADA
robot worldstate is a list of objects containing their absolute positions on the
field and their classification of being team mates or opponents. The algorithm
for obstacles detection and identification is described in [15].
The Utility Map is constructed merging the relevant information about the
environment, namely the team mates positions, obstacles and ball positions, using
the information of the robot and the information shared by the colleagues [12].
The first step to obtain the Utility Map is to build an occupancy map that
gives the robot a global idea about the state of the world around it. Then,
depending on the game situation and the role in which the Utility Map will be
used, a Field of Vision (FOV) is calculated on top of the occupancy map. FOV
represents the area that is considered visible from the point in which it was
calculated. For example, it is possible to calculate a FOV from the ball, from
the robot itself, from the goal, etc. An example of a FOV calculated from robot
number 3 is shown on Fig. 1.
Finally, the Utility Map is created taking into consideration the occupancy
map, the FOVs and some conditions, restrictions and metrics for decisions
depending on the game situation. Taking as example the offensive set pieces,
there are some restrictions included in the process of building the Utility Map.
The two main restrictions are that the robots cannot be inside the goal areas and
they have to receive the ball from at least 2 meters from the ball. Moreover, it
is possible to combine three metrics to build the Utility Maps in order to decide
the best positions to receive the ball. One is the free space between the pass
line and the closest obstacle. The second one is the weighted average between
the distance to the ball, the distance to the opponent goal, the rotation angle
for a shot on target and the distance from the point on the map to the position
of the robot. Finally, the third metric is the angle of each map position to the
opponent goal.
A New Approach for Dynamic Strategic Positioning 437

The use of these Utility Maps allow the robots to easily take decisions regard-
ing their positioning simply choosing the local maximum on the maps. The Util-
ity Maps are calculated locally on the robots and are part of their worldstate so
that they are easily accessible by any behavior or role. In terms of implemen-
tation, the TCOD1 library has been used. The library provides built-in toolkits
for management of height maps, which in the context of this work are used as
Utility Maps, and ﬁeld of view calculations. It takes, on average, 4ms to update
the necessary maps in each cycle of the agent software execution. The robots are
currently working with a cycle of 20 ms, controlled by the vision process that
works at 50 frames per second [14].
The identiﬁed opponent robots lead to hills in the map on their position, with
some persistence to improve the stability of the decisions based on the maps. It
takes 5 agent cycles (100 milliseconds) for a new obstacle to reach the maximum
cost level. In the end, the map is normalized, thus always holding values between
0.0 and 1.0.

5 Positioning in Defensive Set Pieces

In defensive set pieces the main objective is to prevent the opponent team to
perform a pass and to gain ball possession. In order to prevent the opponent
players to pass the ball, our robots must be positioned between the ball and the
possible receivers from the opponent team and to follow them while they move.
To calculate the base position for the Barriers, the role assumed by the
robots during the opponent set pieces, we use a Delaunay Triangulation (DT)
[7] to interpolate all possible robots positions on the field, depending on the ball
position on the field. On top of that, the rules restrictions are applied. These
restrictions are: minimum 3m distance to the ball, except in a drop ball situation,
when the required distance is only 1m and only one player inside our penalty
area. This player does not need to respect the previous rule as long it is inside
the penalty area.
Figure 1 shows the tool used to configure the DT positions. Here, the ball
position is used as triangle vertice and each vertex represents the given training
data. Each vertex produces output values for the position of the robots for that
triangle. When the ball is inside a triangle, the position of the agent is calculated
using the interpolation algorithm described in [7].
It is possible to configure each one of the Barriers to dynamically cover
the opponent robots or to simply stay in the base positions given by the DT
configuration. When there are no obstacles on the field, the robots only use
this position. In the presence of obstacles, the position of the cover robots are
obtained from an Utility Map, following the approach proposed in this paper.
In order to avoid any possible error in the identification of obstacles, it is
necessary to filter the information received from different agents. Obstacles close
to each other are merged using a clustering algorithm, and obstacles too close to

1
http://roguecentral.org/doryen/libtcod/
438 A.J.R. Neves et al.

Fig. 1. On the left, the FOV calculated from robot number 3. The ball is represented by
a small circle and the robots as larger circles. The circles with numbers are considered
team mates. The red areas are considered visible from the point of view of the robot.
On the right, the conﬁguration tool used for the positioning of our robots in opponent
set pieces and during free play (DT).

a team mate are ignored, unless that team mate sees it. The filtered information
is then used to build the map.
From each cluster of obstacles, a valley is carved in direction to the ball. After
that, the calculated map is added with a predefined height map that defines the
priorities of the positions (see Figure 2). This map takes into consideration that
it is more important to cover the opponent robots in the direction of our goal,
rather than in the direction of their goal. Finally, all the restrictions (minimum
distance to the ball, positions inside the field and avoidance of penalty areas)
are added.
In Fig. 2 we can see a game situation where robots number 3, 4 and 5 are
configured to cover the opponent robots. The best position given by the Utility
Map for each robot is represented in red. As intended, these positions are between
the ball and the opponent robots. Robot 2 is in its base position provided by
the DT configuration.
The distance between our team robot and the opponent robot that it is trying
to cover can be configured (in the configuration file of the robot). The human
coach can specify, in the same configuration file, what are the robots allowed to
perform covering.

6 Positioning in Oﬀensive Set Pieces

To configure our set pieces we use a graphical tool (Figure 3) that implements
an SBSP algorithm. The field is divided into 10 zones. Each zone defines a set of
positions for the Replacer (the role of the robot closer to the ball and responsible
for putting the ball in play) and Receivers (the role of the other robots, except
the Goalie). The position to kick by default in a situation where there is no
Receiver available can also be configured. The position of the Receivers can
be absolute or relative to the ball. We can also define if the Receiver needs to
have a clear line between its position and the ball and an option to force the
A New Approach for Dynamic Strategic Positioning 439

Fig. 2. On the left, cover priorities height map used to deﬁne the priority of the cover
positions in the ﬁeld. Red is the more prioritary and blue is the least prioritary. On
the right, example of a cover Utility Map. As we can see, the best position for the
robots number 3, 4 and 5 are between the ball and the robots of the opponent team
(red color).

Receiver to be aligned with the goal. The priority for each receiver is indicated
as well, in the case that for a specific region will be more than one configured
and available. The one with more priority will be tested first for receiving a
pass. The action to be performed by the Replacer to that specific Receiver is
also configurable. This action can be a pass, a cross or none. In the last case,
the Receiver will never be considered as an option. It is possible to configure
differently each one of the possible set pieces (corner, free kick, drop ball, throw
in and kick of) for each one of the regions.
When the set pieces using the referred tool are configured, the opponent
team is not taken into account. After the opponent team has positioned itself,
our configured positions can be positions where the receiver is not able to receive
a pass. To deal with these situations, there is the need to have an alternative
reception position that has to be calculated dynamically, taking into account the
opponent team. An Utility Map is used to calculate the alternative position for
the Receiver.
All the constraints imposed by the rules, namely minimum distance to the
ball (2m) and no entering in the goal areas are taken into account for the con-
struction of the Receiver Utility Map. The field is divided into two zones for the
application of different metrics to calculate the utility value for each position.
On our side of the field, only one metric is used. This metric is the distance
to the halfway line. Three metrics are used on the opponent side of the field.
One is the free space between the pass line and the closest obstacle. The sec-
ond one is the weighted average between the distance to the ball, the distance
to the opponent goal, the rotation angle for a shot on target and the distance
from the point on the map to the position of the robot. Finally, the third metric
is the angle of each map position to the opponent goal. The weights for the
second metric are easily configurable in the configuration file of the robot. These
metrics are only applied within a circle whose radius is also defined in the con-
figuration file. This circle is centered on the position of the ball in the set piece,
and only the positions that have FOV from the ball (positions where the ball
can be passed) are considered.
440 A.J.R. Neves et al.

Fig. 3. On the left, the conﬁguration tool used for our set pieces. On the right, an
example of a Utility Map for a Receiver.]Example of an alternative positioning map
for Receiver calculated for Robot number 2. CAMBADA is attacking towards the blue
goal. The black line goes from the ball to the alternative position indicated by Robot
number 2 to receive the ball.

The robots move to the best position extracted from the Utility Map only
after the referee gives the start signal to prevent the opponent robots to follow
them. In Fig. 3 all the receivers are sharing that they have line clear to receive
the ball (lineClear is information associated to each robot). Robot number 4 is
the Replacer already chosen to pass the ball to robot number 2. The pass line
it is trying to make is represented by the black line. Robot number 2 will move
to that position to receive the ball.

7 Positioning in Freeplay

Free-play passes are a true challenge in terms of coordination among robots,

being thus much more complex to achieve. This is mainly due to the com-
plete freedom that the opponent team has to approach our Striker or cover
a Midfielder. Since the robot that wants to make the pass has the ball in
its possession, its movement is very limited. Taking this into consideration, the
development of an Utility Map to estimate the best position of the Midfielders
on the field is more than necessary in order to improve the capability of per-
forming passes in free play.
A Delaunay Triangulation (DT) as in their set pieces is used to calculate the
base position for the Midfielders we use. On top of that, the restrictions from
the rules are applied, namely the avoidance of the goal areas.
The algorithm to calculate the Utility Map for the Midfielders is similar
to the one described in our set pieces. An example of an Utility Map in a free
play game situation is presented in Fig. 4. The best positions on the field to
receive the ball are in red. Only positions inside the field are considered, as well
as only positions outside both penalty areas. Positions near the opponent corners
(dead angles) are also avoided. A minimum distance of 2 meters to the Striker
is required and a preference is given to positions near the last chosen position,
near the strategic position returned by DT and also near the actual Receiver.
A FOV for the ball is also required. With these constraints, the free-play receiver
A New Approach for Dynamic Strategic Positioning 441

Fig. 4. On the left, the Free-play utility map calculated by the Midfielders. The best
positions on the ﬁeld to receive the ball are in red. On the right, the Free-play Utility
Map to be used by the Striker to choose the best position to perform a pass or a kick
to the goal, when dribbling the ball. The robot will dribble to the positions on red.

robot will be constantly adapting to the changes of the opponent formation and
the ball position.
The Striker, the robot holding the ball or closer to it, also uses an Utility
Map for selecting the best position to perform a pass or to kick towards the goal.
The calculated map (Figure 4) deals with the constraints regarding a generic
dribble behavior, complying with the current MSL rules, that do not allow a
robot to dribble for more than 3 meters. Some areas of the field have less utility,
namely both penalty areas, areas outside the field and outside a 3 meter radius
circle centered on the point where the ball was grabbed. More priority is given
to the areas close the limits of the field since it is more advantageous to kick to
the goal from there.

8 Results and Discussions

CAMBADA won third place in the last RoboCup MSL competition. After a
thorough analysis of the log files and game videos, it is safe to say that the
approach we present in this paper for a dynamic strategic positioning of the
robots, has had a major contribution to the success of the team. We present in
this section the analysis of the presented approach and we discuss its impact on
the performance of the team. While a clear distinction between the performance
of the team prior to the use of Utility Maps and its current performance cannot
be pursued due to the continuously evolving dynamism of the MSL soccer games
and the improvements of each of the participating teams each year, the following
results prove that the use of Utility Maps had a major contribution for bringing
the robotic soccer game as close to the human soccer games as possible.
Looking to the examples of the Utility Maps presented above regarding the
three game situations, we can confirm that the maps were correctly built since
the position of the robots in a specific game situation are the intended positions,
in order to maximize the success of the game.
The video that we submitted together with this paper represents, to our view,
the best experimental results to show the effectiveness of the proposed approach
for dynamic strategic positioning. Moreover, we analysed the videos and log files
442 A.J.R. Neves et al.

Table 1. Defensive set pieces cover eﬃciency during the last two games of
RoboCup2014. According to the rules, the defending team has to be at least 3 meters
from the ball. In these situations it is impossible to intercept the ball.

Intercepted
Game Opponent <3m % Sucess
Yes No
Semi-ﬁnal Tech United 11 10 3 77
3rd place MRL 14 4 3 57
Total 25 14 6 70

Table 2. Attacking set pieces eﬃciency during the last two games of RoboCup2014.
We are considering the success of the ball reception after a pass.

Pass
Game Opponent % Sucess
Yes No
Semi-ﬁnal Tech United 10 8 56
3rd place MRL 15 6 71
Total 25 14 64

from the RoboCup games. This analysis reveals that the team reached a 70%
success rate in the interception of the ball in defensive set pieces (see Table 1),
performed 64% successful passes in offensive set pieces (see Table 2) and a high
percentage of successful passes in free play, being these last situations hard to
analyze due to the high dynamism of the games.
Looking at Table 1, considering that most of the times when the attacking
team made a short pass means that was forced into it by not having other pass
option, we have a success rate of 70% in defensive set pieces situations.
Looking further into the unsuccessful situations, the problem was clearly
identified and it is not related to the cover position obtained from the Utility
Maps. The problem was rather the transition from the Barrier role into the
Midfielder or Striker role, situations where the cover position are not used.
This is still an open issue to be addressed in the near future.
In the last two games of RoboCup 2014, the final and semi-final - which
were probably the most dynamic games, there was a total of 45 defensive set
pieces situations. An average of 22 defensive set pieces per game, in a game of
30 minutes, which means a defensive set piece situation every 1 minute and 20
seconds.
In Fig. 5 we can see two game situations were the CAMBADA robots are in
strategic positions. By being in those positions, the CAMBADA robots do not
allow the attacking team to perform a pass in a proper way.
In the same last two games of RoboCup 2014, the final and semi-final, there
was a total of 39 offensive set pieces situation. An average of 20 offensive set
pieces per game, in a game of 30 minutes, which means an offensive set piece
situation every 1 minute and 30 seconds. In 64% of the situations the robots were
able to properly receive the ball. Looking further into the unsuccessful situations,
there were cases were the ball was passed to a position far from the Receiver
A New Approach for Dynamic Strategic Positioning 443

Fig. 5. Defensive game situations during RoboCup 2014. CAMBADA team has blue
markers and it is the defending team.

Fig. 6. Oﬀensive game situations during RoboCup 2014. CAMBADA team is with blue
markers and is the attacking team.

and it was lost and some other cases where the reception was not properly done
mainly due to misalignment of the Receiver. These situations where not due
to a wrong positioning given by the Utility Map. We just counted a total of 3
interceptions by the opponent team of the ball for long passes.
In Fig. 6 we can see two game situations where the CAMBADA robots are
in strategic positions, which allows them to receive the ball with success.
Based upon this study, we are convinced that the use of Utility Maps is an
advantageous approach in extremely dynamic environments, such as the one of
robotic soccer. Without great complexity being added to the structure of the
agents, as it was described in the previous sections, it was possible to introduce
the desired dynamism that led to the increase of the team competitiveness and
improved its overall performance.

References
1. Neves, A., Azevedo, J., Lau, N., Cunha, B., Silva, J., Santos, F., Corrente, G.,
Martins, D.A., Figueiredo, N., Pereira, A., Almeida, L., Lopes, L.S., Pedreiras, P.:
CAMBADA soccer team: from robot architecture to multiagent coordination,
chapter 2, pp. 19–45. I-Tech Education and Publishing, Vienna, January 2010
2. Veloso, M., Bowling, M., Achim, S., Han, K., Stone, P.: The cmunited-98 champion
small robot team (accessed February 27, 2014)
3. Stone, P.: Layered Learning in Multiagent Systems: A Winning Approach to
Robotic Soccer. MIT Press (2000)
4. Reis, L.P., Lau, N., Oliveira, E.C.: Situation based strategic positioning for coordi-
nating a team of homogeneous agents. In: Hannebauer, M., Wendler, J., Pagello, E.
(eds.) ECAI-WS 2000. LNCS (LNAI), vol. 2103, pp. 175–197. Springer, Heidelberg
(2001)
444 A.J.R. Neves et al.

5. Lau, N., Lopes, L.S., Corrente, G.: CAMBADA: information sharing and team
coordination. In: Proc. of the 8th Conference on Autonomous Robot Systems and
Competitions, Portuguese Robotics Open - ROBOTICA 2008, pp. 27–32, Aveiro,
Portugal, April 2008
6. Dashti, H.A.T., Aghaeepour, N., Asadi, S., Bastani, M., Delafkar, Z., Disfani, F.M.,
Ghaderi, S.M., Kamali, S.: Dynamic positioning based on voronoi cells (DPVC).
In: Bredenfeld, A., Jacoﬀ, A., Noda, I., Takahashi, Y. (eds.) RoboCup 2005. LNCS
(LNAI), vol. 4020, pp. 219–229. Springer, Heidelberg (2006)
7. Akiyama, H., Noda, I.: Multi-agent positioning mechanism in the dynamic envi-
ronment. In: Visser, U., Ribeiro, F., Ohashi, T., Dellaert, F. (eds.) RoboCup 2007:
Robot Soccer World Cup XI. LNCS (LNAI), vol. 5001, pp. 377–384. Springer,
Heidelberg (2008)
8. Elfes, A.: Using occupancy grids for mobile robot perception and navigation. Com-
puter 22(6), 46–57 (1989)
9. Rosenblatt, J.K.: Utility fusion: map-based planning in a behavior-based system.
In: Zelinsky, A. (ed.) Field and Service Robotics, pp. 411–418. Springer, London
(1998)
10. Chaimowicz, L., Kumar, V.: Mario Fernando Montenegro Campos. A paradigm
for dynamic coordination of multiple robots. Auton. Robots 17(1), 7–21 (2004)
11. Spaan, M.T.J., Groen, F.C.A.: Team coordination among robotic soccer players.
In: Kaminka, G.A., Lima, P.U., Rojas, R. (eds.) RoboCup 2002. LNCS (LNAI),
vol. 2752, pp. 409–416. Springer, Heidelberg (2003)
12. Santos, F., Almeida, L., Lopes, L.S., Azevedo, J.L., Cunha, M.B.: Communicating
among robots in the robocup middle-size league. In: Baltes, J., Lagoudakis, M.G.,
Naruse, T., Ghidary, S.S. (eds.) RoboCup 2009. LNCS, vol. 5949, pp. 320–331.
Springer, Heidelberg (2010)
13. Lau, N., Lopes, L.S., Corrente, G., Filipe, N., Sequeira, R.: Robot team coordi-
nation using dynamic role and positioning assignment and role based setplays.
Mechatronics 21(2), 445–454 (2011)
14. Trifan, A., Neves, A.J.R., Cunha, B., Azevedo, J.L.: UAVision: a modular time-
constrained vision library for soccer robots. In: Bianchi, R.A.C., Akin, H.L.,
Ramamoorthy, S., Sugiura, K. (eds.) RoboCup 2014. LNCS, vol. 8992, pp. 490–501.
Springer, Heidelberg (2015)
15. Silva, J., Lau, N., António, J.R., Neves, A.J., Azevedo, J.L.: World modeling on
an MSL robotic soccer team. Mechatronics 21(2), 411–422 (2011)
Intelligent Wheelchair Driving: Bridging the Gap
Between Virtual and Real Intelligent Wheelchairs

Brígida Mónica Faria1,2,3(), Luís Paulo Reis2,4, Nuno Lau3,5,

António Paulo Moreira6,7, Marcelo Petry2,7,8, and Luís Miguel Ferreira3
1
ESTSP/IPP - Escola Sup. Tecnologia de Saúde do Porto,
Inst Politécnico do Porto, Vila Nova de Gaia, Portugal
[email protected]
2
LIACC – Lab. Inteligência Artificial e Ciência de Computadores, Porto, Portugal
[email protected], [email protected]
3
IEETA - Inst. Engenharia Electrónica e Telemática de Aveiro, Aveiro, Portugal
{nunolau,luismferreira}@ua.pt
4
Dep. de Sistemas de Informação, EEUM - Escola de Engenharia da Universidade do Minho,
Guimarães, Portugal
5
DETI/UA – Dep de Electrónica, Telecomunicações e Informática da Univ. Aveiro,
Aveiro, Portugal
6
FEUP - Faculdade de Engenharia, Universidade do Porto, Porto, Portugal
[email protected]
7
INESC TEC - INESC Tecnologia e Ciência, Porto, Portugal
8
UFSC - Universidade Federal de Santa Catarina, Blumenau, Brazil

Abstract. Wheelchairs are important locomotion devices for handicapped and

senior people. With the increase in the number of senior citizens and the incre-
ment of people bearing physical deficiencies, there is a growing demand for sa-
fer and more comfortable wheelchairs. So the new Intelligent Wheelchair (IW)
concept was introduced. Like many other robotic systems, the main capabilities
of an intelligent wheelchair should be: autonomous navigation with safety, flex-
ibility and capability of avoiding obstacles; intelligent interface with the user;
communication with other devices. In order to achieve these capabilities a good
testbed is needed on which trials and users’ training may be safely conducted.
This paper presents an extensible virtual environment simulator of an intelligent
wheelchair to fulfill that purpose. The simulator combines the main features of
robotic simulators with those built for training and evaluation of prospective
wheelchair users. Experiments with the real prototype allowed having results
and information to model the virtual intelligent wheelchair. Several experiments
with real users of electric wheelchairs (suffering from cerebral palsy) and po-
tential users of an intelligent wheelchair were performed. The System Usability
Score allowed having the perception of the users in terms of the usability of the
IW in the virtual environment. The mean score was 72 indicating a satisfactory
level of the usability. It was possible to conclude with the experiments that the
virtual intelligent wheelchair and environment are usable instruments to test and
train potential users.

Keywords: Intelligent wheelchair · Intelligent robotics · Intelligent simulation ·

Virtual reality · Multimodal interface
© Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 445–456, 2015.
DOI: 10.1007/978-3-319-23485-4_44
446 B.M. Faria et al.

1 Introduction

Recently, virtual reality has attracted much interest in the field of motor rehabilitation
engineering [1]. Virtual reality has been applied to provide safe and interesting train-
ing scenarios with near-realistic environments for subjects to interact with it [2]. The
performance of elements within these virtual environments proved to be representa-
tive of the elements’ abilities in the real world and their real-world skills showed sig-
nificant improvements following the virtual reality training [3-5]. Until now, electric
powered wheelchairs simulators were mainly developed to either facilitate patient
training and skill assessment [6] [7] or assist in testing and development of semi-
autonomous intelligent wheelchairs [8]. While in the training simulators the focus is
on user interaction and immersion, the main objective of the robotics simulators is the
accurate simulation of sensors and physical behaviour. The simulator presented here
addresses the need to combine these approaches. It is a simple design that provides
the user training ability while supporting a number of sensors and ensuring physically
feasible simulation for intelligent wheelchairs’ development. The simulator is a part
of a larger project where the IntellWheels prototype will include all typical IW capa-
bilities, like facial expression recognition based command, voice command, sensor
base command, advanced sensorial capabilities, the use of computer vision as an aid
for navigation, obstacle avoidance, intelligent planning of high-level actions and
communication with other devices.
The experiments with real wheelchair users allowed to access information about
the usability of the virtual intelligent wheelchair and virtual environment. These users
besides using a wheelchair to move around are also potential users of an intelligent
wheelchair. In fact, they suffer from cerebral palsy which is a group of permanent
disorders in the development of movement and posture.
This paper is organized with five sections; the first one is composed with this in-
troduction. Section two presents an overview of the methodologies for wheelchair
simulation and the criteria to select the platform to simulate the IW and the environ-
ment. A special attention is given to the USARSim which was the chosen platform to
produce the simulation. In section three a brief description of the IntellWheels project
is presented. Section four presents the experiments and results and finally the last
section refers the conclusions and future work.

2 Methodologies for Wheelchair Simulation

Assistive technologies are defined as any product, instrument, equipment or adapted

technology specially designed to improve the functional levels of the disabled person.
Resorting to these products can help reduce the limitations in mobility [9] [10] [11]
[12]. The electronic wheelchair wheels are an enabling technology, used by people,
due to a wide range of diseases including cerebral palsy [13]. Simulators allow users
to test and train several of these assistive technologies [14]. This section contains an
overview of the methodologies available for developing a wheelchair simulator.
Bridging the Gap Between Virtual and Real Intelligent Wheelchairs 447

The methodologies that typically are concerned with rendering 2D and 3D are
known as graphics engine. Usually they are aggregate inside games engine using
specific libraries for rendering. Examples of graphics engines are OpenSceneGraph
[15], Object- Oriented Graphics Rendering Engine (OGRE) [16], jMonkey Engine
[17] and Crystal Space [18]. The physics engines are software applications with the
objective of simulate the physics reality of objects and world. Bullet [19], Havok
[20], Open Dynamics Engine (ODE) [21] and PhysX [22] are examples of physics
engines. These engines also contribute for robotics simulation for more realistic
motion generation of the robot. The game engines are software framework that de-
velopers use to create games. The game engines normally include a graphic engine
and a physics engine. The collision detection/ response, sound, scripting, animation,
artificial intelligence, networking, streaming, memory management, localization
support and scene graph are also functionalities included in this kind of engine.
Examples of game engine are Unreal Engine 3 [23], Blender Game Engine [24],
HPL Engine [25] and Irrlicht Engine [26]. The robotics simulator is a platform to
develop software for robots modulation and behaviour simulation in a virtual envi-
ronment. In several cases it is possible to transfer the application develops in the
simulation to the real robots without any extra modification. In the literature there
are several commercial examples of robotics simulators: AnyKode Marilou (for
mobile robots, humanoids and articulated arms) [27]; Webots (for educational pur-
poses it has a large choice of simulated sensors and actuators is available to prepare
each robot) [28]; Microsoft Robotics Developer Studio (HRDS) (allows an easy
access to simulated sensors and actuators) [29]; Workspace 5 (environment based
on Windows and allows the creation, manipulation and modification of images in
3DCad and several ways of communication) [30]. The non-commercial robotics
simulators are also available: SubSim [31]; SimRobot [32]; Gazebo [33]; USARSim
[34]; Simbad [35] and SimTwo [36]. A comparison of the most used 3D robotics
simulator according several criteria was presented by Petry et al. [37]. Petry et al.
[37] also presented the requirements and characteristics for simulation of intelligent
wheelchairs and in particularly to the IntellWheels prototype.
The USARSim, acronym of Unified System for Automation and Robot Simulation,
is a high-fidelity simulation of robots and environments, based on the Unreal Tour-
nament game engine [34]. Initially was created as a research tool designated as a si-
mulation of Urban Search And Rescue (USAR) robots and environments for the study
of human-robot interaction (HRI) and multi-robot coordination [37]. USARSim is the
basis for the RoboCup rescue virtual robot competition (RoboCup) as well as the
IEEE Virtual Manufacturing Automation Competition (VMAC) [34].
Nowadays, the simulator uses the Unreal Engine UDK and the NVIDIA's PhysX
physics engine. The version used to develop the IntellSim was the Unreal Engine 2.5
and the Karma physics engine (which are integrated into the Unreal Tournament 2004
game) which maintain and render the virtual environment and model the physical
behaviour of its elements respectively.
448 B.M. Faria et al.

3 IntellWheels Project
IntellWheels project aims at providing a low cost platform to electric wheelchairs in
order to transform them into intelligent wheelchairs. The simulated environment al-
lows to test and train potential users of the intelligent wheelchair. And select the ap-
propriate interface, among the available possibilities, for a specific user. After the first
set of experiments [38] it was necessary to improve the realism of the simulated envi-
ronment and behaviour of the IW. For that reason and trying to maintain the principle
of producing the IntellWheels’ project as the lowest cost possible the USARSim was
the choice for the new simulator. There were other reasons to decide by the USAR-
Sim such as, having an advance support on robots with wheels, allowing it the inde-
pendent configuration; allowing the importation of object and robots modelled in
different platforms in order to facilitate for instance the wheelchair modulation; being
possible to program robots and control them in the network which can be imple-
mented in the mixed reality [39].
One of the main objectives of the project is also the creation of a development plat-
form for intelligent wheelchairs [40], entitled IntellWheels Platform (IWP). The
project main focus is the research and design of a multi-agent platform, enabling easy
integration of different sensors, actuators, devices for extended interaction with the
user [41], navigation methods and planning techniques and methodologies for intelli-
gent cooperation to solve problems associated with intelligent wheelchairs [42].
The IntellWheels platform allows the system to work in real mode (the IW has a
real body), simulated (the body of the wheelchair is virtual) or mixed reality (real IW
with perception of real and virtual objects). In real mode it is necessary to connect the
system (software) to the IW hardware. In the simulated mode, the software is con-
nected to the IWP simulator. In the mixed reality mode, the system is connected to
both (hardware and simulator). Several types of input devices were used in this
project to allow people with different disabilities to be able to drive the IW. The in-
tention is to offer the patient the freedom to choose the device they find most
comfortable and safe to drive the wheelchair. These devices range from traditional
joysticks, accelerometers, to commands expressed by speech, facial expressions or a
combination of some of them. Moreover, these multiple inputs for interaction with the
IW can be integrated with a control system responsible for the decision of enabling or
disabling any kind of input, in case of any observed conflict or dangerous situation.
To compose the necessary set of hardware to provide the wheelchair’s ability to avoid
obstacles, follow walls, map the environment and see the holes and unevenness in the
ground, two side bars were designed, constructed and place on the wheelchair. In
these bars were incorporated 16 sonars and a laser range finder. Two encoders were
also included, and coupled to the wheels to allow the odometry.

3.1 IntellWheels Simulator

The system module, named IntellWheels Simulator [43] or more recent IntellSim,
allows the creation of a virtual world where one can simulate the environment of a
building (e.g. a floor of a hospital), as well as wheelchairs and generic objects (tables,
doors and other objects). The purpose of this simulator is essentially to support the
testing of algorithms, analyse and test the modules of the platform and safely train
users of the IW in a simulated environment.
Bridging the Gap Between Virtual and Real Intelligent Wheelchairs 449

The virtual intelligent wheelchair was modelled using the program 3DStudioMax
[44]. The visualizing part, which appears on the screen, was imported to the UnrealE-
ditor as separated static meshes (*.usx) file. The model was then added to USARSim
by writing appropriate UnrealScript classes and modifying the USARSim configura-
tion file. The physics property of the model was described in Unreal Script language,
using a file for each robot’s part. The model has fully autonomous caster wheels and
two differential steering wheels. In the simulation it is equipped with: camera; front
sonar ring; odometry sensor and encoders. Fig. 1 shows different perspectives of the
real and virtual wheelchair.

Fig. 1. The real and virtual prototype of the IW

An important factor affecting the simulation of any model in UT2004 is its mass
distribution and associated inertial properties. These were calculated using estimated
masses of the different parts of the real chair (70 kg) with batteries and literature val-
ues for average human body parameters (60 kg). The values obtained for the center of
mass and tensor of inertia were used to calculate the required torque for the two simu-
lated motors using the manufacturer’s product specification as a guideline. The sen-
sors used in the simulated wheelchair are the same as those used in the real IW. As in
the real prototype 16 sonars and a laser range finder were place in two side bars. Two
encoders were also included, and coupled to the wheels to allow odometry. These
sensors provide the wheelchair’s ability to avoid obstacles, follow walls, map the
environment and see the holes and unevenness in the ground. Using the simulator it
was also possible to model rooms with low illumination and noisy environments and
test the performance of users while driving the wheelchair. The map created was done
using the Unreal Editor 3 and it is similar to the local were the patients are used to
move around. Several components in the map were modelled using 3DStudioMax. In
order to increase the realism of the virtual environment it was implemented several
animations using sequence scripts. The simulator runs on a dual-core PC with a gam-
ing standard dual-view graphics card. And other supported input devices include key-
board, mouse, mouse replacement devices and gaming joysticks.

4 Experiments and Results

The initial experiments were conducted to have more information about the real pro-
totype and with the objective of being able to modelled and simulate the behaviour of
450 B.M. Faria et al.

the real wheelchair more precisely in virtual environment. The final experiments
involved real wheelchair users and potential users of the intelligent wheelchair.
Therefore the experiments were divided into two components: wheelchair technical
information and users’ feeling about the simulated wheelchair and the virtual
environment modelled.

4.1 Real Intelligent Wheelchair Characteristics

The technical information was obtained using the manual of the electric wheelchair
which the platform was applied, measurements taken and experiments with the real
intelligent wheelchair prototype (Table 1 resumes that information).

Table 1. Real intelligent wheelchair prototype characteristics

Real Intelligent Wheelchair Prototype Characteristics

180w 24 volt, 2.5A max,

Weight (with batteries) 70 kg Motor
3800 RPM (32:1)
electromagnetic automatic
Big wheel diameter 0.315 m Brakes
brakes
Small wheel diameter 0.185 m Maximum speed 7km/h
Distance between rear Time for a total
0.535 m 7’’
wheels rotation
16 sonares, 1 laser range
finder (URG-04LX
Distance between axes 0.48 m Sensors
Hokuyo), 2 encoders
(rear wheels)

Fig. 2. Real intelligent wheelchair experiments

Fig. 2 shows some of the experiments done with real intelligent wheelchair proto-
type. It was analyzed the velocity and time for the total rotation with the new adapta-
tions applied to the real prototype. The results obtained, using the real prototype, were
considered in order to develop the virtual intelligent wheelchair.
Bridging the Gap Between Virtual and Real Intelligent Wheelchairs 451

4.2 IntellSim Experiments

The IntellSim experiments were performed by patients suffering from cerebral palsy,
all of them wheelchairs users. Using IntellSim several experiments were performed
changing the conditions of the environment, using the joystick and the manual con-
trol. A map was created integrating paths with degrees of difficulty. The overall cir-
cuit (Fig. 3) passes through two floors and the link between floors is a ramp. The map
was divided into three parts. The first part is characterized by having simple and large
corridors without any kind of obstacles except pillars. The second part has narrow
corridors, ramps and obstacles. The last part involved a circuit entering in three rooms
with different kind of illumination and noise.

Fig. 3. Overall circuit and snapshots of the first person view during the game

After the experiments the users respond to the System Usability Scale [45] which
is a simple ten-item Likert scale giving a global view of individual assessments of
usability (with a score between 0 and 100) and some more questions about safety and
control managing the IW, if it was easy to drive the IW in tight places and the atten-
tion needed to drive the IW.

4.2.1 Sample Characterization

To better understand the next results the sample characterization will be presented. It is
important to reinforce that cerebral palsy is defined as a group of permanent disorders in
the development of movement and posture. It causes limitations at the level of daily ac-
tivities because of a non-progressive disturbance which occurs in the brain during the
fetal and infant development [46]. The motor disorders in cerebral palsy are associated
with deficits of perception, cognition, communication and behaviour. In general, there
are also episodes of epilepsy and secondary musculoskeletal problems [46]. The individ-
uals included in this study suffer from cerebral palsy and were classified in the levels IV
(26%) and V (74%) of Gross Motor Function Measure (GMF). These are the highest
levels in the cerebral palsy severity degree. The sample size was composed of the 19
individuals and all require the use of a wheelchair. The mean of age was 29 years old
with 79% males and 21% females. In terms of school level 15% did not answer, 10% are
452 B.M. Faria et al.

illiterate, 16% just have the elementary school, 16% have the middle school, 37% have
the high school and only 5% have a BSc. The dominant hand was divided as: 12 for left,
6 for right hand and 1 did not answer. Another question was the frequency of use of in-
formation and communication technologies: 27% did not answer; 42% answered rarely;
21% sometimes; 5% lots of times and 5% always. The aspects related to experience of
using manual and electric wheelchair were also questioned. Table 2 shows the distribu-
tion of answers about autonomy and independency using the wheelchair and constraints
presented by these individuals.

Table 2. Experience using wheelchair, autonomy, independence and constraints of the cerebral
palsy users

Experience, Autonomy, Independence and Constraints

Variables n Variables n
Use manual wheelchair Cognitive constraints
no 13 no 8
yes 6 yes 11
Use electric wheelchair Motor constraints
no 6 no 0
yes 13 yes 19
Autonomy using wheelchair Visual constraints
no 4 no 11
yes 15 yes 8
Independence using wheelchair Auditive constraints
no 4 no 19
yes 15 yes 0

4.2.2 IntellSim Results

The results obtained reveal a satisfactory usability in terms of the experiments using
the IntellSim. In fact all of them could easily identify that the virtual environment was
a replica of the cerebral palsy institution where they are used to be.

Table 3. Results of the experiments in IntellSim

IntellSim experiments with patients suffering from cerebral palsy

Mean Median Std Min Max
Usability and Safety
Score SUS 72.0 70.0 11.7 57.5 95
Safety managing IW -- Agree -- Ind SAgree
Control of the IW -- Agree -- Ind SAgree
Easy to drive the IW in tight places -- Agree -- Dis SAgree
The IW do not need to much attention -- Dis -- SDis Agree
Satisfaction -- VSatis -- Ind VSatis
Performance
Time 12.6 9.5 8.6 5.6 42.4
Number of objects collected 12.7 14 3.6 4 15
Legend: SDis – strongly disagree; Dis – disagree; Ind – indifferent; SAgree – strongly agree;
VDiss – very dissatisfied; Diss – dissatisfied; Satis – satisfied; VSatis – very satisfied
Bridging the Gap Between Virtual and Real Intelligent Wheelchairs 453

The SUS mean score was satisfactory and all the users considered the usability
positive (higher than 57.5). Overall, the users were very satisfied with the experience.
The users made the circuit with a median of 9.5 minutes. The best time was made by a
user in 5.6 minutes and the worst time 42.4 minutes was made by a user with severe
difficulties and without autonomy or independence in driving a wheelchair. The data
from the logs allows plotting the circuits after the experiment. It is a way of analysing
the behaviour of the users (Fig. 4).

Fig. 4. Circuits performed by patients suffering from cerebral palsy with joystick

It is interesting to notice that the path is smooth using the joystick in manual mode
and using the IntellSim. These three individuals are autonomous and independent in
driving their own electric wheelchairs with joystick (Level IV of the GMF). The next
three circuits’ examples (second line in the Fig. 4) were executed by the users that
took the longest times. In the left example the user took 17.7 minutes to collect 14
objects and to finish the circuit. In the middle the user took 22.35 to collect 12 objects
and on the right side there is the circuit performed by the user that took the longest
time of 42.4 minutes to collect only 9 objects. Although these three examples are the
worst in terms of performance it is necessary to enhance that they are classified in the
most severe degree of the GMF and do not have the autonomy and independence to
drive a conventional wheelchair. However, the IntellSim can be used to train these
users and with appropriate methodologies, such as shared or automatic controls, it is
possible to drive the IW in efficient and effective manner.

5 Conclusions and Future Work

The attention given to the autonomy and independence of the individual is nowadays an
actual subject. The scientific community is concerned in develop and present many
prototypes, such as intelligent wheelchairs, however most of them only execute experi-
mental work in the labs and without real potential users. The virtual reality wheelchair
454 B.M. Faria et al.

simulator presented here addresses an important gap in wheelchair simulation in that it

can be used for both patient training or evaluation and design and development of semi-
autonomous intelligent wheelchairs. Because of its flexibility and the long (and expand-
ing) list of sensors inherited from the USARSim project, the current system provides a
perfect test bed for development and testing of intelligent wheelchair systems. Experi-
mental work with the real IW prototype enable having information about weight, wheels
diameters, distance between axes and wheels, motor characteristics, maximum speed,
time for total rotation and localization and sensors characteristics in order to realistic
model the virtual wheelchair. The experiments with real patients suffering from cerebral
palsy allow having a confirmation about the usability of the IW in the IntellSim. The
performance results also present evidences that it is possible to use the IntellSim as a
training and test tool.
After this exploratory work for bridging the gap between the real and virtual IW
and for future work the full capabilities of the IW are going to be tested with real
patients. The possibility of driving the IW using a multimodal interface and a shared
control that allows the correction of the trajectory of the user with severe constraints
are some of the issues that are going to be tested.

Acknowledgments. This work is financed by LIACC (PEst-OE/EEI/UI0027/2014) and ERDF

– European Regional Development Fund through the COMPETE Programme (operational
programme for competitiveness) and by National Funds through the FCT – Fundação para a
Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) within project
«FCOMP-01-0124-FEDER-037281».

References
1. Holden, M.K.: Virtual environments for motor rehabilitation: review. J. Cyberpsychol
Behav. 8(3), 187–211 (2005)
2. Boian, R.F., Burdea, G.C., Deutsch, J.E., Windter, S.H.: Street crossing using a virtual
environment mobility simulator. In: Proceedings of IWVR, Lausanne, Switzerland (2004)
3. Inman, D.P., Loge, K., Leavens, J.: VR education and rehabilitation. Commun. ACM
40(8), 53–58 (1997)
4. Harrison, A., Derwent, G., Enticknap, A., Rose, F.D., Attree, E.A.: The role of virtual real-
ity technology in the assessment and training of inexperienced powered wheelchair users.
Disabil Rehabil 24(8), 599–606 (2002)
5. Adelola, I.A., Cox, S.L., Rahman, A.: VEMS - training wheelchair drivers, Assistive
Technology, vol. 16, pp. 757–761. IOS Press (2005)
6. Desbonnet, M., Cox, S.L., Rahman, A.: Development and evaluation of a virtual reality
based training system for disabled children. In: Sharkey, P., Sharkeand, R., Lindström, J.-I.
(eds.) The Second European Conference on Disability, Virtual Reality and Associated
Technologies, Mount Billingen, Skvde, Sweden, pp. 177–182 (1998)
7. Niniss, H., Inoue, T.: Electric wheelchair simulator for rehabilitation of persons with motor
disability, Symp Virtual Reality VIII, Belém (PA). Brazilian Comp. Society (BSC) (2006)
8. Röfer, T.: Strategies for using a simulation in the development of the Bremen autonomous
wheelchair. In: 12th European Simulation Multi Conference 1998 Simulation: Past,
Present and Future, ESM 1998, Manchester, UK, pp. 460–464 (1998)
Bridging the Gap Between Virtual and Real Intelligent Wheelchairs 455

9. Tefft, D., Guerette, P., Furumasu, J.: Cognitive predictors of young children’s readiness
for powered mobility. Dev. Medicine and Child Neurology 41(10), 665–670 (1999)
10. Faria, B.M., Silva, A., Faias, J., Reis, L.P., Lau, N.: Intelligent wheelchair driving: a com-
parative study of cerebral palsy adults with distinct boccia experience. In: Rocha, Á.,
Correia, A.M., Tan, F., Stroetmann, K. (eds.) New Perspectives in Information Systems
and Technologies, Volume 2. AISC, vol. 276, pp. 329–340. Springer, Heidelberg (2014)
11. Palisano, R.J., Tieman, B.L., Walter, S.D., Bartlett, D.J., Rosenbaum, P.L., Russell, D., et
al.: Effect of environmental setting on mobility methods of children with cerebral palsy.
Developmental Medicine & Child Neurology 45(2), 113–120 (2003)
12. Wiart, L., Darrah, J.: Changing philosophical perspectives on the management of children
with physical disabilities-their effect on the use of powered mobility. Disability & Rehabil-
itation 24(9), 492–498 (2002)
13. Edlich, R.F., Nelson, K.P., Foley, M.L., Buschbacher, R.M., Long, W.B., Ma, E.K.: Tech-
nological advances in powered wheelchairs. Journal Of Long-Term Effects of Medical
Implants 14(2), 107–130 (2004)
14. Faria, B.M., Teixeira, S.C., Faias, J., Reis, L.P., Lau, N.: Intelligent wheelchair simulator
for users’ training: cerebral palsy children’s case study. In: 8th Iberian Conf. on Informa-
tion Systems and Technologies, vol. I, pp. 510–515 (2013)
15. Wang, R., Qian, X.: OpenSceneGraph 3 Cookbook. Packt Pub. Ltd., Birmingham (2012)
16. Koranne, S.: Handbook of Open Source Tools, West Linn. Springer, Oregon (2010)
17. jMonkeyEngine (2012). http://jmonkeyengine.com/ (current May 2014)
18. Space, C.: Crystal Space user manual, Copyright Crystal Space Team. http://www.
crystalspace3d.org/main/Documentation#Stable_Release_1.4.0 (current May 2014)
19. Gu, J., Duh, H.B.L.: Handbook of Augmented Reality. Springer, Florida (2011)
20. Havok, Havok (2012). http://havok.com/ (current May 2014)
21. Smith, R.: Open Dynamics Engine (2007). http://www.ode.org/ (current May 2014)
22. Rhodes, G.: Real-Time Game Physics, in Introduction to Game Development, Boston,
Course Technology, pp. 387–420 (2010)
23. Busby, J., Parrish, Z., Wilson, J.: Mastering Unreal Technology. Sams Publishing, Indian-
apolis (2010)
24. Flavell, L.: Beginning Blender - Open Source 3D Modeling, Animation and Game Design.
Springer, New York (2010)
25. Games, F.: Frictional Games (2010). http://www.frictionalgames.com/site/about (current
May 2014)
26. Kyaw, A.S.: Irrlicht 1.7 Realtime 3D Engine – Beginner’s Guide. Packt Publishing Ltd.,
Birmingham (2011)
27. Marilou, April 2012. http://doc.anykode.com/frames.html?frmname=topic&frmfile=index.
html (current May 2014)
28. Cyberbotics, Webots Reference Manual, April 2012. http://www.cyberbotics.com/
reference.pdf (current May 2014)
29. Johns, K., Taylor, T.: Microsoft Robotics Developer Studio. Wrox, Indiana (2008)
30. Workspace, Workspace Robot Simulation, WAT Solutions (2012). http://www.workspacelt.
com/ (current May 2014)
31. Boeing, A., Braunl, T.: SubSim: an autonomous underwater vehicle simulation package.
In: 3rd Int. Symposium on A. Minirobots for Research and Edut, Fukui, Japan
32. Laue, T., Röfer, T.: SimRobot - development and applications. In: Proceedings of the
International Conference on Simulation, Modeling and Programming for Autonomous
Robots, Venice, Italy (2008)
33. Gazebo, Gazebo. http://gazebosim.org/ (current May 2014)
456 B.M. Faria et al.

34. Carpin, S., Lewis, M., Wang, J., Balakirsky, S., Scrapper, C.: USARSim: a robot simulator
for research and education. In: Proceedings of the IEEE International Conference on
Robotics and Automation, Roma, Italy (2007)
35. Hugues, L., Bredeche, N.: Simbad Project Home, May 2011. http://simbad.sourceforge.
net/ (current May 2014)
36. Costa, P.: SimTwo - A Realistic Simulator for Robotics, March 2012. http://paginas.
fe.up.pt/~paco/pmwiki/index.php?n=SimTwo.SimTwo (current May 2014)
37. Petry, M., Moreira, A.P., Reis, L.P., Rossetti, R.: Intelligent wheelchair simulation: re-
quirements and architectural issues. In: 11th International Conference on Mobile Robotics
and Competitions, Lisbon, pp. 102–107 (2011)
38. Faria, B.M., Vasconcelos, S., Reis, L.P., Lau, N.: Evaluation of Distinct Input Methods of
an Intelligent Wheelchair in Simulated and Real Environments: A Performance. The Offi-
cial Journal of RESNA (Rehabilitation Engineering and Assistive Technology Society of
North America) 25(2), 88–98 (2013). USA
39. Namee, B.M., Beaney, D., Dong, Q.: Motion in Augmented Reality Games: An engine for
creating plausible physical interactions in augmented reality games. International Journal
of Computer Games Technology (2010)
40. Braga, R., Petry, M., Moreira, A., Reis, L.P.: A development platform for intelligent
wheelchairs for disabled people. In: 5th Int. Conf Informatics in Control, Automation and
Robotics, vol. 1, pp. 115–121 (2008)
41. Reis, L.P., Braga, R.A., Sousa, M., Moreira, A.P.: IntellWheels MMI: a flexible interface
for an intelligent wheelchair. In: Baltes, J., Lagoudakis, M.G., Naruse, T., Ghidary, S.S.
(eds.) RoboCup 2009. LNCS, vol. 5949, pp. 296–307. Springer, Heidelberg (2010)
42. Braga, R., Petry, M., Moreira, A., Reis, L.P.: Platform for intelligent wheelchairs using
multi-level control and probabilistic motion model. In: 8th Portuguese Conf. Automatic
Control, pp. 833–838 (2008)
43. Braga, R.A., Malheiro, P., Reis, L.P.: Development of a realistic simulator for robotic
intelligent wheelchairs in a hospital environment. In: Baltes, J., Lagoudakis, Michail G.,
Naruse, Tadashi, Ghidary, Saeed Shiry (eds.) RoboCup 2009. LNCS, vol. 5949, pp. 23–34.
Springer, Heidelberg (2010)
44. Murdock, K.L.: 3ds Max 2011 Bible. John Wiley & Sons, Indianapolis (2011)
45. Brooke, J.: SUS: A quick and dirty usability scale, in Usability evaluation in industry,
pp. 189–194. Taylor and Francis, London (1996)
46. Rosenbaum, P., Paneth, N., Leviton, A., Goldstein, M., Bax, M., Damiano, D., Dan, B.,
Jacobsson, B.: A report: the definition and classification of cerebral palsy April 2006.
Developmental Medicine & Child Neurology - Supplement 49(6), 8–14 (2007)
A Skill-Based Architecture for Pick
and Place Manipulation Tasks

Eurico Pedrosa(B) , Nuno Lau, Artur Pereira, and Bernardo Cunha

Department of Electronics, Telecommunications and Informatics, IEETA, IRIS,

University of Aveiro, Aveiro, Portugal
{efp,nunolau,artur}@ua.pt, [email protected]

Abstract. Robots can play a signiﬁcant role in product customization

but they should leave a repetitive, low intelligence paradigm and be able
to operate in unstructured environments and take decisions during the
execution of the task. The EuRoC research project addresses this issue
by posing as a competition to motivate researchers to present their solu-
tion to the problem. The first stage is a simulation competition where
Pick & Place type of tasks are the goal and planning, perception and
manipulation are the problems. This paper presents a skill-based archi-
tecture that enables a simulated moving manipulator to solve these tasks.
The heuristics that were used to solve specific tasks are also presented.
Using computer vision methods and the definition of a set of manipula-
tion skills, an intelligent agent is able to solve them autonomously. The
work developed in this project was used in the simulation competition of
EuRoC project by team IRIS and enabled them to reach the 5th rank.

1 Introduction

The trend in industry is clearly for higher levels of customization of products,

which must be enabled by fast customization and adaptability of the produc-
tion lines to different requirements. Robots can play a significant role in this
customization but they should leave a repetitive, low intelligence paradigm and
be able to operate in unstructured environments and take decisions during the
execution of the task. The European Robotics Challenges (EuRoC) is a research
project based on a robotics competition that aims to present solutions to the
European manufacturing industry. Exploring our healthy competitive nature,
we try to build and developed a robot, or robots, to accomplish a task bet-
ter than the competition. Usually, the final outcome is a push in the boundary
of our knowledge. Take for example the DARPA Grand Challenge from which
self-driving cars are a reality, e.g. [12].
The EuRoC project is divided in three challenges, each with different moti-
vations and objectives. Our focus is the Shop Floor Logistics and Manipulation
challenge, or Challenge 2 (C2), a challenge that presents tasks to be solved by a
mobile robot with manipulation capabilities in a industrial shop floor. The first
stage of the challenge, was a simulation contest where the contesting teams had

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 457–468, 2015.
DOI: 10.1007/978-3-319-23485-4 45
458 E. Pedrosa et al.

to solve several tasks in a Simulation Environment in order to score (Sect. 2). In

the end, the best ﬁfteen teams became candidates for entering a second stage.
The Simulation Environment of EuRoC C2 exposes an interface with the
simulation using the Robotic Operating System (ROS) communications mid-
dleware. To address this requirement we propose a System Architecture that is
mapped into ROS without the loss of generality (Sect. 3). The analysis of the
properties of the environment allowed us to design a generic agent that can be
used to solve any task (Sect. 4). On top of this architecture and agent design, the
logistics and manipulation tasks are solved using computer vision methods and
a set of manipulation skills ruled by several heuristics (Sect. 5). Some related
work is presented in Sect. 6 and ﬁnal conclusion in Sect. 7.

2 Shop Floor Logistics and Manipulation

The motivation of this challenge is to bring a mobile robot onto the shop floor
for dexterous manipulations and logistics carries. Enabling an autonomous robot
to operate in an unstructured environment and establishing a safe and effective
human-robot interaction are two of the main research issues to be addressed.
The first stage consists of a sequence of tasks to be performed in a simu-
lated environment, with increasing difficulties in the problem to be solved. The
Simulation Environment consist of a Light-Weight-Robot (LWR), with a two-
jaw-gripper, mounted on a moving XY axis on a table top, and a fixed mast
with a Pan & Tilt (PT) actuator. Additionally, a vision system made by a pair of
RGB and Depth cameras is installed on the PT and another on the Tool Center
Point (TCP). The objects to be manipulated assume a basic shape, i.e. cylinder
or box, or a compound of basic shapes. An overview is depicted in Fig. 1.
All tasks include a Pick & Place (P&P) scenario where objects (e.g. Fig. 2)
have to be picked from unknown locations and placed on target locations.
Required actions to accomplish a task includes: perception, to locate the objects;
manipulation, to pick and place them; and planning, to move the arm to the
target locations. For demonstrating these problems four different tasks are con-
sidered.

P&P. The goal is to pick all objects in the working space and place them in the
proper location without any particular order. The task contains three objects of
diﬀerent shape and color on the table (e.g. Figures 2c, 2b 2d). The pose of the
objects in the environment are unknown but their properties (color and shape
composition) and corresponding place zone are given. The LWR base cannot
move. Scoring is achieved by picking an object and place it in the correct zone.

P&P with Significant Errors and Noise. This is the same task as P&P but
with the difference that there are significant errors in the robot precision and
also significant noise in all sensors. This implies an operation of the LWR in an
imprecise and uncalibrated environment.
A Skill-Based Architecture for Pick and Place Manipulation Tasks 459

Fig. 1. Overview of the simulation environment. 1) Mast with a PT, 2) LWR, 3)

Gripper, 4) XY Axis, 5) Object to pick, 6) Place zone.

Mobile P&P with Typical Errors and Noise. This is based on the P&P
task, but with typical calibration errors and sensor noise. The LWR can now
use the XY axis to move. This task introduces the concept of mobility, meaning
that pick and place positions may not be in range of the LWR, not considering
the additional axis.

Loose Assembly of a Puzzle. In this tasks, a puzzle made up of pieces like

the ones depicted in Fig. 2a and Fig. 2e, initially scattered across the table in
unknown locations has to be loosely done. Each puzzle part is composed of basic
blocks and has to be placed into a puzzle of size 4x4. To allow parts to be pushed,
the puzzle ﬁxture has two ﬁxed sides. Scoring is achieved by covering the puzzle
with the correct blocks.

3 System Architecture Overview

The tasks to be accomplished were diﬀerent. However, while studying the prob-
lems to be solved, common subtasks were identiﬁed. In order to take advantage
of this fact, a system architecture to solve all tasks was developed. This architec-
ture, shown in Fig. 3, is divided into three components: Simulation Environment,
Interface Node, and Agent.
The Interface Node, supplied by the EuRoC partners to all teams, provides
an abstraction of the Simulation Environment. The purpose of this abstraction
460 E. Pedrosa et al.

Fig. 2. Examples of objects that appear in the simulated environment. The puzzle
parts are only examples of the set of valid shapes. Objects may vary on scale and color.

is to allow a future replacement of the Simulation Environment with a Real

Environment, where a real robot in a real shop floor would be used instead with
minimum modifications of the Agent.
The proposed Agent is composed of a Solver, multiple Skills and a Sensory
Data Adapter. The Solver is responsible for making decisions on how to solve
the current task based on the current sensory data and available Skills. A Skill
is the capacity of doing a particular task like, for instance, picking an object or
moving the end-effector of the manipulator to a desired pose. The role of the
Sensory Data Adapter is to convert the sensor data format transmitted by the
Interface Node to the format used inside the Agent.
The provided interface with the Simulation Environment is implemented
using ROS [6]. This communications middleware is a publish-subscribe infras-
tructure where a topic is a communication channel that forwards messages
from the publisher to the subscriber. A node, which is a system process, can
subscribe or advertise n topics, with n ≥ 0. In addition, it provides a request-
response infrastructure through services.
The Interface Node, as the bridge between the Simulation Environment and
the Agent, is a node that provides a continuous stream of sensory data and a set
of functions to interact with the simulation effectors. The sensory data stream
includes the images from the cameras and effectors feedback (i.e. telemetry),
and is sent through topics. The provided functions include: inverse and forward
kinematics calculation, joint motion planning and execution, and operation mode
control. All functions are available as services.
The Sensory Data Adapter is a ROS node that converts the joints state
information to a transformation tree using ROS tf package. This allows the user
to keep track the multiple coordinate frames over time. The telemetry informa-
tion is only provided upon request, i.e. by service call. The reason for this choice
A Skill-Based Architecture for Pick and Place Manipulation Tasks 461

Simulation Environment

Interface Node

Sensor Effector
Interface Interface

Sensory Data
Adapter

Skill
Solver
Agent

Fig. 3. Diagram of the system architecture.

is based on the fact that the telemetry information is published at a rate of

1 kHz that would introduce a computation overhead to all nodes that require
this information. This way only the Sensory Data Adaptor suffers from a com-
putation overhead, other nodes needing this information may request it at a
desired rate.
Each Skill is implemented as a ROS node that exposes itself as a service.
Only a Skill knows how to interact with the simulation effectors, doing it by
using the appropriate functions of the Interface Node.
The Solver is the decision-making module implemented in a single node.
Based on the task to be solved and on an initially analysis of the environment,
it defines a plan to solve the task and executes it using the available skills.

4 Agent Design
The EuRoC project exposes a set of tasks with perception and manipulation
problems to be completed. To solve these tasks we need a proper agent capable
of perceiving the environment through sensors and act upon that environment
using actuators [8]. To design such agent we have to analyze the properties of
the environment, which in this case is a well deﬁned Simulated Environment, to
develop a generic as possible solution capable of handling all tasks. Our proposal
is summarized in algorithm form in Fig. 4.
The set O of objects to be manipulated in each task, including their properties
and place zones, is known in advance. The algorithm starts by determining the
order by which the objects have to be manipulated. The order is restricted
by a direct acyclic graph (dag), which represents a dependency graph between
objects in terms of order of manipulation. Leafs represent objects that need to
be handled ﬁrst. A dummy object λ is added to represent the graph root.
Once an object is manipulated, it is removed from the graph. Thus, at any
moment in the task execution, leafs represent objects that can be manipulated.
When the dummy object is the only one in the graph, the task is completed.
462 E. Pedrosa et al.

1: G ← BuildOrderGraph(O), S ← buildSearchSpace(G)
2: while leafs(G) = λ do λ is the empty root
3: L ← leafs(G)
4: obj ← detectObject(L)
5: while obj is not found do
6: s ← next(S)
7: move(s)
8: obj ← detectObject(L)
9: end while
10: focusObject(obj)
11: plan = MakePlan(obj)
12: success = execute(plan)
13: if success then
14: removeLeaf(G, obj)
15: end if
16: end while

Fig. 4. Generic algorithm to solve a task.

If the TCP is over an object, its vision system is in a privileged position to

help in its location and pick up operation. However it has to move to there first.
In the other side, and despite of being in a high position, it can be not possible,
even using its pan and tilt capabilities, to cover all the objects in the table with
the vision system placed in the mast. For instance, an object could be occluded
by the LWR. Thus, both vision systems are used.
To ensure that all object are eventually detected, a set of search poses S is
calculated by the procedure buildSearchSpace, insuring that the entire workspace
is covered. To improve the search, the vision system on the PT is used to detect
objects in the environment and the obtained poses are put in the head of S in
the order defined by G. It may not detect any object, but if it does, the system
can gain in overall execution time due to good initial search poses. Thus, each
pose in S represents a region containing one or more objects or a region that
should be locally explored using the TCP vision system. The set S is encoded as
a circular list so that search poses never ends. The algorithm then executes two
nested loops that, making the TCP move around the searchable poses, finishes
when all objects are placed in their target positions. Each searchable pose is
explored to see if a leaf object is there. If so, a sequence of steps is performed to
put the object in its target position.
First, the TCP is moved in such a way that the focus axis of the RGB camera
intercepts the geometry center of the object, giving preference to positions in
which the gripper is perpendicular to the object’s plane. This way the object
appears in the center of the image, which results in two benefits: the distortion
of the object in the image is reduced; and it copes better with noise in sensors
and effectors by restricting the problem to a bounded local space.
In order to properly transfer the object to its target position a plan is com-
puted. A plan is a sequence of actions that allows to properly pick the object,
A Skill-Based Architecture for Pick and Place Manipulation Tasks 463

move the TCP and place the object properly in the target position. Those actions
depend on a priori calculation of the pick and place pose. The way the plan is
calculated depends on the task being solved and on the disposition of the object
and its target position. For instance, the target position could be non-reachable
from the top, then the object has to be picked from a different direction. If the
execution of the plan succeed, the leaf corresponding to the processed object is
removed from the graph and the algorithm goes to the next iteration.
From this design only two procedures are task dependent, BuildOrderGraph
and MakePlan. This fosters the re-utilization of software code and facilitates the
task solving job, since the developer only has to focus on the creation of a plan. To
aid the definition of a plan, a set of skills was defined and implemented.

4.1 Object Detection Pipeline

The objective of the object detection module is, as the name implies, to detect an
object in the environment. It creates information about an object by processing
the fed sensory data. The input data is the RGB and depth images created by one
of the two available vision systems. The data is processed by several submodules
managed by a pipeline (Fig. 5).

Depth and RGB Integration. Before any processing takes place it is nec-
essary to match every depth value to the corresponding RGB value [3]. The
depth and the RGB images come from diﬀerent cameras separated by an oﬀset,
thus they do not overlap correctly. To solve this issue, for each depth value in
the depth image, the corresponding 3D space position is calculated and then
re-projected into the image plane of the RGB camera. The registered depth is
then transformed into an organized point cloud in the coordinates frame of the
LWR.

Height Filter. The purpose of the Height Filter is to reduce the search space for
objects of interest using the fact that they are on top of a table (e.g. Fig. 6a). A
filter is applied to the pointcloud where positions with z value below a threshold
are set to not-a-number (NaN), identifying a non searchable position. The output
of this block is a mask defined by the filtered pointcloud that identifies the
searchable areas in the RGB image (Fig. 6b).

Fig. 5. Object detection pipeline.

464 E. Pedrosa et al.

Fig. 6. View of several stages in the object detection pipeline.

Color Segmentation. The color of the object is a known attribute, it is homo-

geneous but subject to different light conditions (e.g. shadows). Using color seg-
mentation we can further identify the object of interest. The segmentation is
applied on HSV color space to reduce the influence of different light conditions
[10]. The output of this block is a mask defined by the segmentation (Fig. 6c).

Blob Extraction. At this stage, it is expected that the input mask defines one
or more undefined shape areas. More than one blob can happen if two, or more,
objects share the same color. However, we are only interested in one object, thus
only the blob with the biggest area is considered. This is a safe action because
different objects have different shapes and can be disambiguated by matching
the shape from the blob contours with the correct object. To extract a blob, the
contours of the segmented areas are calculated by contour detection function
[11]. The output mask is a set of points that delimits the area of the blob.

Pose Estimation. After the identiﬁcation of the object blob in the image we
calculate its position (x, y, z) and orientation θ. We start by calculating the
rotated bounding box that best ﬁts the select blob (e.g. Fig. 6d). The center of
the rectangle in the image provide us enough information to extract its posi-
tion because we have a registered point cloud where (u, v) coordinate from the
RGB image has a corresponding (x, y, z). Furthermore, from the rotation of the
rectangle we can now extract the orientation of the object relative to the LWR.

Morphologic Extraction. The goal is to extract a useful shape from the blob
(e.g. Fig. 6e). All objects are treated as polygons, even the cylinder that is a
circle when looked from above – a circle can be approximated by a polygon. The
shape is obtained by a function that approximates the blob to a polygon [2]. The
polygon shape can then be used to disambiguate an object detection.

4.2 Agent Skills

A Skill is the capability to perform a specialized action with a semantic meaning.

Performing an action will always imply a physical motion by one or more actua-
tors. In the Simulated Environment we have several actuators, already described,
that are controlled by joints, either rotational or linear. All joints can be actuated
on to provoke motion at the same time, meaning that a skill is not necessarily
A Skill-Based Architecture for Pick and Place Manipulation Tasks 465

bound to one actuator. For example, moving the LWR and the XY axis at the
same time can be considered a single skill.
To solve any task we need a set of skills that are enough for the tasks ahead,
but at the same time keep their number low as possible. The idea is to create
skills that can be used as atoms from more complex skills. To solve a task we need
the capability to move the LWR so that its TCP is in the desired position, e.g. a
search pose to perceive the environment. Objects needs to be picked and place,
so this actions are also necessary. It may not always be possible to position an
object with a place action, for example, the Gripper may hit a wall when placing
right next to it. This can be solved by pushing the object to its rightfull place.

Simple Move. This skill uses only the LWR actuator. Its objective is to put the
LWR’s TCP on the requested pose. This action requires at ﬁrst the calculation of
the inverse-kinematics, and then the control of the joints to the desired position.

Move XY Axis. The reach of the LWR can be increased by using the XY
axis to move it. This skill only requires x, y positions to move. The values are
directly applied to the joints.

Pick Object. The LWR and Gripper are used together to provide this skill.
It expects as input a pick pose that must correspond to the tip of the Gripper
end-eﬀector. The skill will do the necessary calculation to transform the input
pose to the corresponding TCP pose to move the LWR. Then it takes the proper
actions to safely grab the object.

Place Object. This skill uses the LWR and Gripper to place the object in the
requested pose. The input pose is adjusted to the TCP and then a safe position
is calculated to ensure that the object will hover the ﬂoor before the actual place.
Then it will lower the object with a lower velocity to prevent any undesirable
collisions due to higher accelerations.

Push Object. When it is not possible to place an object without a harmful

collision a push strategy can be applied. Assuming that the object is near its
final position, we can use the Gripper to push the object until it reaches the final
destination. This is achieved by defining a path between two points to be covered
by the end-effector of the Gripper. If an object is in the path, as a product of
the LWR motion the object will be pushed. The skill is sensible enough to ask
for a motion with force limits. If the limit is not reached it means the object
could be pushed to its final position.

5 Experiments with Solving Tasks

To evaluate the proposed architecture for logistics and manipulation tasks we
set to solve the already described tasks: all P&P tasks and Loose Assembly of
a Puzzle. As stated, to solve a task we have to concentrate our eﬀorts on the
objects order and planning. Most planning involve ﬁnding the correct pick and
place positions and then make use of the available skills to accomplish that.
466 E. Pedrosa et al.

5.1 Pick and Place Tasks

In these tasks, the order by which the objects are picked and placed is not
relevant and so graph G does not contain any order dependency. Thus, once an
object is detected it can be picked and placed in its target location.
The agent assumes all objects are pickable. This implies that there is a part
of the object the gripper is able to grab. Since the shapes of the objects are
known in advance, the grab pose, defined in the object’s frame of reference, can
be pre-determined. The gripper’s grabbing pose can be obtained by merging
this information with the rotated bounding box (RBB) that encloses the object,
estimated by the object detection module. For basic shape objects, the grab pose
coincides with the center of the RBB. For compound objects, the grab pose can
be obtained from the center of the RBB, adding half of its width and subtracting
half of its height. This approach worked well even in the presence of significant
noise.
The preferable way to pick an object is applying a top-down trajectory to the
gripper (Fig. 7a). However, this may not be possible due to constraints in the
freedom of motion of the LWR. In such cases, an angular adjustment is applied
to make the pick operation possible (Fig. 7c). This drawback can be mitigated
when the LWR’s movement along the XY axes is available, since the LWR can
be moved to a better position to pick the object of interest.
After an object is picked it must be placed in its target location. The most
efficient move is to place it keeping the gripper orientation. But, again, it may
not be possible. When this happens, a fake placement is done instead: an appro-
priated location is chosen, the object is placed there, and a failure signal is
returned so that the object is picked again. Similarly as before, the availability
of robot movement along the XY axes can avoid this drawback.

5.2 Loose Assembly of a Puzzle

The assembly of the puzzle can not be done by putting the parts from above.
The margin of error for the positioning is very thin, making it diﬃcult to use an
approach where the parts are put from the top. Then, using the puzzle ﬁxture
boundary and the already in place parts, a pushing approach can be used. To

Fig. 7. Example of an out-of-reach situation for the preferable pick pose.

A Skill-Based Architecture for Pick and Place Manipulation Tasks 467

find the right order we search all permutations for ordering the objects in their
insertion in the puzzle. The valid solutions must comply with the following rule:
the pushing of the object, horizontally or vertically, is not prevented by any
other object. After a solution is found the graph G is built.
The pick position is selected from one of the convex terminations of the part
that has a width smaller than the Gripper maximum width. The parts of the
puzzle are always assembled from cubes with a width smaller than the gripper
maximum width, hence the pick position must consider a convex termination
with the width of a cube. To calculate the pick position the vertices of the
polygon shape are used. The vertices v are properly ordered and ring accessible
by vi . The main idea is to search for an edge ei = (vi , vi+1 ) that is part of the
frame of reference. An edge ei is a candidate when:
π
ei ≈ ∧ ∠(ei−1 , ei ) ≈ ∠(ei , ei+1 ) ≈
2
where is the length of a cube edge. Approximate values are considered to handle
errors. Afterward, all candidates go through a final validation. To recognize the
object frame of reference, which is needed to correctly place the object, we
assume that the vertices vi is the origin, then, the number of blocks at the left,
right, top and bottom are compared with the original object shape definition.
Once an edge is selected the pick position is given by sum of the normalized
edges ei and ei+1 , and the orientation is given by the normal direction of ei .
The XY is available for this task, thus any object can be picked in the preferable
way.
The next step is to define the set of actions to position the puzzle part in
its rightful place. An offset is added to the final position p of the puzzle part in
the puzzle fixture po = p + offset to prevent an overlap of parts. The offset takes
into account how the parts are connected. After it is placed at po , it has to be
pushed towards p. The first push is towards the closest fixed axis and the next
it against the supporting piece – or axis. Doing a single pushing sequence may
not be enough. For some reason the piece may get stuck, therefore, the sequence
must be repeated. Detecting a stuck piece is simple, since the TCP reports the
applied force and when a force above a threshold is detected a termination is
triggered.

6 Related Work

ROS has become the robot middleware of choice for researcher and the industry.
For example, MoveIt! [1] is a mobile manipulation software suitable for research
and the industry. In addition to manipulation, the creation of behaviors is also
a topic of interest, e.g. ROSCo [5]. Task level programming of robots is an
important exercise for industrial applications. The authors of SkiROS [7] propose
a paradigm based on a hierarchy movement primitives, skills and planning. For
P&P tasks, the authors of [9] propose a manipulation planner under continuous
grasps and placements, while a decomposition of the tasks is proposed by [4].
468 E. Pedrosa et al.

7 Conclusion
The tasks to be accomplished were diﬀerent. However, while studying the prob-
lems to be solved, we identiﬁed common subtasks. In order to take advantage
of that fact, a general system architecture to solve all tasks was developed.
Additionally, the architecture works seamlessly in the ROS infrastructure. This
solution allowed our team to achieve the 5th rank.

Acknowledgments. This research is supported by the European Union’s FP7 under

EuRoC grant agreement CP-IP 608849; and by the Foundation for Science and Tech-
nology in the context of UID/CEC/00127/2013 and Incentivo/EEI/UI0127/2014.

References
1. Chitta, S., Sucan, I., Cousins, S.: Moveit! [ros topics]. IEEE Robotics Automation
Magazine 19(1), 18–19 (2012)
2. Douglas, D.H., Peucker, T.K.: Algorithms for the reduction of the number of points
required to represent a digitized line or its caricature. Cartographica: The Inter-
national Journal for Geographic Information and Geovisualization (1973)
3. Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: RGB-D mapping: Using
Kinect-style depth cameras for dense 3D modeling of indoor environments. I. J.
Robotic Res. 31(5), 647–663 (2012)
4. Lozano-Pérez, T., Jones, J.L., Mazer, E., O’Donnell, P.A.: Task-level planning of
pick-and-place robot motions. IEEE Computer 22(3), 21–29 (1989)
5. Nguyen, H., Ciocarlie, M., Hsiao, K., Kemp, C.: Ros commander (rosco): Behavior
creation for home robots. In: ICRA, pp. 467–474 (May 2013)
6. Quigley, M., Gerkey, M., Conley, K., Faust, J., Foote, T., Leibs, J., Berger, E.,
Wheeler, R., Ng, A.: ROS: An open-source robot operating system. In: ICRA
Workshop on Open Source Software, Kobe, Japan (May 2009)
7. Rovida, F., Chrysostomou, D., Schou, C., Bøgh, S., Madsen, O., Krüger, V., Ander-
sen, R.S., Pedersen, M.R., Grossmann, B., Damgaard, J.S.: Skiros: A four tiered
architecture for task-level programming of industrial mobile manipulators. In: 13th
Internacional Conference on Intelligent Autonomous System, Padova (July 2013)
8. Russell, S., Norvig, P.: Artiﬁcial Intelligence: A Modern Approach, 3rd edn., Pren-
tice Hall (2010)
9. Simeon, T., Cortes, J., Sahbani, A., Laumond, J.P.: A manipulation planner for
pick and place operations under continuous grasps and placements. In: Proceedings
of the Robotics and Automation, ICRA 2002, vol. 2, pp. 2022–2027 (2002)
10. Sural, S., Qian, G., Pramanik, S.: Segmentation and histogram generation using
the hsv color space for image retrieval. In: Proceedings of the 2002 International
Conference on Image Processing 2002, vol. 2, pp. II-589–II-592 (2002)
11. Suzuki, S., Abe, K.: Topological structural analysis of digitized binary images by
border following. Computer Vision, Graphics and Image Processing 30(1), 32–46
(1985)
12. Urmson, C., Baker, C.R., Dolan, J.M., Rybski, P.E., Salesky, B., Whittaker, W.,
Ferguson, D., Darms, M.: Autonomous Driving in Traﬃc: Boss and the Urban
Challenge. AI Magazine 30(2), 17–28 (2009)
Adaptive Behavior of a Biped Robot
Using Dynamic Movement Primitives

José Rosado1(), Filipe Silva2, and Vítor Santos3

1
Department of Computer Science and Systems Engineering, Coimbra Institute of Engineering,
IPC, 3030-199 Coimbra, Portugal
[email protected]
2
Department of Electronics, Telecommunications and Informatics, Institute of Electronics
and Telematics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal
[email protected]
3
Department of Mechanical Engineering, Institute of Electronics and
Telematics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal
[email protected]

Abstract. Over the past few years, several studies have suggested that adaptive
behavior of humanoid robots can arise based on phase resetting embedded in
pattern generators. In this paper, we propose a movement control approach that
provides adaptive behavior by combining the modulation of dynamic movement
primitives (DMP) and interlimb coordination with coupled phase oscillators.
Dynamic movement primitives (DMP) represent a powerful tool for motion
planning based on demonstration examples. This approach is currently used as a
compact policy representation well-suited for robot learning. The main goal is
to demonstrate and evaluate the role of phase resetting based on foot-contact in-
formation in order to increase the tolerance to external perturbations. In particu-
lar, we study the problem of optimal phase shift in a control system influenced
by delays in both sensory information and motor actions. The study is per-
formed using the V-REP simulator, including the adaptation of the humanoid
robot’s gait pattern to irregularities on the ground surface.

Keywords: Biped locomotion · Adaptive behavior · Movement primitives ·

Interlimb coordination · Phase resetting

1 Introduction

The coordination within or between legs is an important element for legged systems
independently of their size, morphology and number of legs. Evidences from
neurophysiology indicate that pattern generators in the spinal cord contribute to
rhythmic movement behaviors and sensory feedback modulates proper coordination
dynamics [1], [2]. In this context, several authors studied the role of phase shift and
rhythm resetting. Phase resetting is a common strategy known to have several advan-
tages in legged locomotion, namely by endowing the system with the capability to
switch among different gait patterns or to restore coordinated patterns in the face of

© Springer International Publishing Switzerland 2015

F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 469–479, 2015.
DOI: 10.1007/978-3-319-23485-4_46
470 J. Rosado et al.

disturbances. In human biped walking the maintenance of reciprocal out-of-phase

motions of the legs is critical for stable and efficient gait patterns [3], [4].
In the same line of thought, coordination dynamics is important for humanoid ro-
bots operating in real world environments. Further, this dynamics often needs to be
adapted to account for variations in the environment conditions and external perturba-
tions. Over the past few years different approaches to coordination has been applied
to biped locomotion robots in which the emergence and change of coordination pat-
terns are governed by dynamical equations [5], [6]. These authors have explored the
role of phase resetting for adaptive walking based on foot-contact information using
theoretical models and physical robots. Adaptation of the interlimb parameters largely
restores symmetry of the gait cycle with inherent advantages for stability. In other
words, the adjustment of the phase between legs helps to substantially increase the
range of parameters (e.g., average speed) and the tolerance to disturbances for which
stable walking is possible.
In this paper, we propose a movement control approach that provides adaptive be-
havior by combining the modulation of dynamic movement primitives (DMP) and
interlimb coordination with coupled phase oscillators. DMP appeared as a powerful
tool for motion planning based on demonstration examples. This approach is currently
used as a compact policy representation well-suited for robot learning. Here, rhythmic
DMP are employed as trajectory representations learned in task-space from a single
demonstration. Once learned, new movements are generated by simply modifying the
parameters of the DMP. Adaptive biped locomotion based on phase resetting, which
is the main focus of this paper, is studied and evaluated using the ASTI robot model
in the V-REP simulation software [10]. The main goal is to demonstrate and evaluate
the role of phase resetting based on foot-contact information in order to increase the
tolerance to external perturbations. In particular, we study the problem of optimal
phase shift in a control system influenced by delays in both sensory information and
motor actions.
The remainder of the paper is organized as follows: Section 2 describes the pro-
posed approach for trajectory formation based on rhythmic movement primitives
learned in the task space and their modulation capabilities. Section 3 presents the
interlimb coordination strategy based on coupled phase oscillators and phase resetting
embedded with the movement control. In Section 4 the applicability of these concepts
is demonstrated by numerical simulations. Section 5 concludes the paper and
discusses future work.

2 Rhythmic Movement Primitives

2.1 Trajectory Formation

Dynamical system movement primitives have become a robust policy representation,
for both discrete and periodic movements, that facilitates the process of learning and
improving the desired behavior [7]. The basic idea behind DMP is to use an analyti-
cally well-understood dynamical system with convenient stability properties and
modulate it with nonlinear terms such that it achieves a desired point or limit cycle
Adaptive Behavior of a Biped Robot Using Dynamic Movement Primitives 471

attractor. The approach was originally proposed by Ijspeert et al. [8] and, since then,
other mathematical variants have been proposed [9].
In the case of rhythmic movement, the dynamical system is defined in the form of
a linear second order differential equation that defines the convergence to the goal g
(baseline, offset or center of oscillation) with an added nonlinear forcing term f that
defines the actual shape of the encoded trajectory. This model can be written in first-
order notation as follows:

τz = α z [β z (g − y ) − z ] + f
τy = z , (1)

where τ is a time constant the parameters α z , β z > 0 are selected and kept fixed,
such as the system converge to the oscillations given by f around the goal g in a criti-
cally damped manner. The forcing function f (nonlinear term) can be defined as a
normalized combination of fixed basis functions:

 ψω
N

f (φ ) = i =1 i i
r
 ψ
N
i =1 i

ψ i (φ ) = exp (− hi (cos(φ − ci ) − 1) , (2)

where ωi are adjustable weights, r characterizes the amplitude of the oscillator, ψi are
von Mises basis functions, N is the number of periodic kernel functions, hi > 0 are the
widths of the kernels and ci equally spaced values from 0 to 2π in N steps (N, hi and ci
are chosen a priori and kept fixed). The phase variable φ bypasses explicit dependen-
cy on time by introducing periodicity in a rhythmic canonical system. This is a simple
dynamical system that, in our case, is defined by a phase oscillator:

τφ = Ω , (3)

where Ω is the frequency of the canonical system. In short, there are two main com-
ponents in this approach: one providing the shape of the trajectory patterns (the trans-
formation system) and the other providing the synchronized timing signals (the
canonical system). In order to encode a desired demonstration trajectory ydemo as a
DMP, the weight vector has to be learned with, for example, statistical learning tech-
niques such as locally weighted regression (LWR) given their suitability for online
robot learning.

2.2 Extension to Multiple Degrees-of-Freedom

The extension of the previous concepts to multiple degrees-of-freedom (DOF) is
commonly performed by sharing one canonical system among all DOFs, while main-
taining a set of transformation systems and forcing terms for each DOF. In brief, the
canonical system provides the temporal coupling among DOFs, the transformation
472 J. Rosado et al.

system achieves the desired attractor dynamics for each individual DOF and the re-
spective forcing terms modulate the shape of the produced trajectories.
The adaptation of learned motion primitives to new situations becomes difficult
when the demonstrated trajectories are available in the joint space. The problem oc-
curs because, in general, a change in the primitive’s parameters does not correspond
to a meaningful effect on the given task. Having this in mind, the proposed solution is
to learn the DMP in task space and relate their parameters to task variables. To con-
cretely formulate the dynamical model, a task coordinate system is fixed to the hip
section that serves as a reference frame where tasks are presented. The y-axis
is aligned with the direction of movement, the z-axis is oriented downwards and the
x-axis points towards the lateral side to form a direct system.
In line with this, a total of six DMP are learned to match the Cartesian trajectories
of the lower extremities of both feet (end-effectors), using a single demonstration. It is
worth note that a DMP contains one independent dynamical system per dimension of
the space in which it is learned. At the end, the outputs of these DMP are converted,
through an inverse kinematic algorithm, to the desired joint trajectories used as refer-
ence input to a low-level feedback controller. Fig. 1 shows the close match between
the reference signals (solid lines) and the learned ones (dashed lines) as defined in the
reference frame. The gray shaded regions show the phases of double-support.
Once the complete desired movement {y, y , y} is learned (i.e., encoded as a DMP),
new trajectories with similar characteristics can be easily generated. In this work, the
DMP parameters resulting from the previous formulation (i.e., amplitude, frequency
and offset) are directly related to task variables, such as step length, hip height, foot
clearance and forward velocity. For example, the frequency is used for speed up or
slow down the motion, the amplitudes of the DMP associated with the y- and z-
coordinates are used to modify the step length and the hip height (or foot clearance) of
the support leg (or swing leg), respectively.

Left Foot DMP

0.8

0.7

0.6 x
y
Position (m)

0.5
z
0.4
0.3

0.2

0.1

-0.1
0 0.5 1 1.5 2
Time (s)

Fig. 1. Result of learning the single-demonstration: the task is specified by the x, y and z-
coordinates of the robot’s foot in the reference frame. Reference signal (solid line) and trained
signal (dashed line) are superimposed. Gray shaded regions show double-support phases.
Adaptive Behavior of a Biped Robot Using Dynamic Movement Primitives 473

3 Adaptive Biped Locomotion

3.1 Modulation of the DMP Parameters

The formulation of DMP includes a few parameters which allow changing the learned
behavior. This subsection provides examples of how new movements can be generat-
ed by simply modifying the parameters of a rhythmic DMP. These parameters can
potentially be used to adapt the learned movement to new situations in order, for ex-
ample, to adapt the final goal position, the movement amplitude or the duration of the
movement. Therefore, the amplitude, frequency and offset parameters will be mod-
ified by scaling the corresponding parameters r, Ω and g, respectively.
Fig. 2 shows the time evolution of the rhythmic motion associated to a canonical
system with frequency Ω = 2 Hz and whose weights ωi are learned to fit two different
trajectories: the first is the sum of the first two harmonics of a rectangular wave de-
fined in the interval between t = 0 and t = 8s ; the second trajectory is the sum of the
first three harmonics of a triangular wave defined in the interval between t = 8s and
t = 16s. The reference input signal is superimposed and vertical dashed lines marks
events where the parameters are changed: the instant in which the learned signal
doubles the amplitude r (at t = 4 s), change the reference signal (at t = 8 s) and mod-
ulates the baseline g (at t = 12.5 s). The main observation is that the changes of para-
meters result in smooth variations of the trajectory y (t ) to be reproduced by the robot.
An example of the use of DMP modulation to biped locomotion is shown in Fig 3.
The original signal that was used to train the DMP was modified in order to change
the relative step length of each leg and the corresponding foot clearances. This strate-
gy allows the robot to turn with a smooth curve around an obstacle placed on its path
(a video is available at http://www.youtube.com/watch?v=Y3Y-6WNhxHE).

2 Learned Signal
Reference Signal

1
Position, y(t)

-1

-2
0 2 4 6 8 10 12 14 16
Time (s)
Fig. 2. Time courses of the rhythmic dynamical system with continuous learning (solid line)
and the input reference signal (dashed-line). The vertical dashed-lines in the plot indicate, from
left to right, the instant in which the learned signal doubles the amplitude r (at t = 4 s), change
the reference signal (at t = 8 s) and modulates the baseline g (at t = 12.5 s).
474 J. Rosado et al.

3
COG path
Transition from Turning curve
2 turning left to right

1
y (m)

-1
Transition from
Transition from turning to straight
straight to turning
-2

-3
0 1 2 3 4 5 6 7
x (m)

Fig. 3. View of the movement path of the robot’s COG projected on the ground and the corres-
ponding turning curve. The black box represents an obstacle placed on the path.

3.2 Interlimb Coordination

DMP exhibit a desirable property in the context of robot learning from demonstration:
the system does not depend on an explicit time variable, giving them the ability to
handle spatial or temporal perturbations. This property makes them attractive in order
to create smooth kinematics control policies that can robustly replicate and adapt
demonstrations. However, adaptation of inter-limb parameters is essential to restore
the symmetry of the gait cycle in order to reduce the likelihood of becoming unstable.
For example, whenever a leg is constrained, for example, by external perturbations,
compensatory reactions in the other leg are expected such as to restore the out-of-
phase relationship between legs.
In this study, one canonical system per leg and multiple transformation systems as-
sociated with the x-, y- and z-coordinates of the robot’s end-effectors are adopted.
Intra-limb coordination results from planning trajectories in the Cartesian space, con-
straining the leg to act as one unit. Phase coordination between legs is provided by
two separate canonical oscillators coupled such that the left and the right limbs move
180 degrees out-of-phase. At the same time, phase resetting of the oscillator phase is
based on foot-contact information (a kinematic event) that depends on force sensors
placed on the feet.
Adaptive Behavior of a Biped Robot Using Dynamic Movement Primitives 475

As a result, the dynamics of the phase oscillators in (3), for the left and the right
leg, are modified according to:

( )(
τφleft = Ω − Kφ sin (φleft − φright − π ) − φleft − φ contact δ t − tleft
contact
− Δt ) , (4)
( )(
τφright = Ω − Kφ sin (φright − φleft − π ) − φrightt − φ contact δ t − trightt
contact
− Δt )
where K φ is the coupling strength parameter ( K φ > 0 ) , φ contact is the phase value
to be reset when the foot touch the ground, δ (⋅) is the Dirac’s delta function,
ticontact (i = left, right) is the time when the foot touch the ground and Δt is a factor
used to study the influence of delays in both sensory information an motor control.

4 Numerical Simulations

The applicability of these concepts is demonstrated by numerical simulations per-

formed in V-REP, Virtual Robot Experimentation Platform [17], using the ASTI
robot model available in their libraries. Two specific experiments are conducted dem-
onstrate the important role of phase resetting to achieve adaptive locomotion subject
to perturbations.

4.1 Robustness Against Disturbing Forces

In the first experiment, an external force is applied to the trunk section of the humano-
id robot in two situations: a horizontal force is applied on the direction of the move-
ment or, instead, on the backward direction. Specifically, after the robot has achieved
a steady-state stable walking, a horizontal force is applied for 0.1 s at its center-of-
gravity (COG). The instant in which this external force is applied varies, from the
moment the left foot leaves the ground to the instant when the same foot touches the
ground, in intervals of 50ms. In both cases, the maximum force tolerated by robot
without falling was measured, with and without phase resetting. Fig. 4 shows the
increase on the tolerated forces with phase reset for the backward and the forward
force application, respectively. It is worth noting that the tolerance to disturbing
forces is greatly affected by the phase value to be reset requiring its optimization.
The result of applying a force to the robot, with and without phase resetting
is observed on the variation of the COG velocity on the direction of movement
(see Fig. 5). Here a force was applied around the 11.6s and we can see that without
phase resetting the robot lost the stability with a high increase on the COG velocity
leaving to the fall a few seconds after (blue curve). With phase resetting, the impact
produces a moderate increase on the COG velocity, but after a few seconds the nor-
mal cyclic pattern is recovered. Also, the coupling between the phase oscillators re-
covers the phase offset of 180º between each leg. In fact, as we can see on the black
curve, the phase resetting produces an increase on the phase offset to around 205º
degrees at the moment of the force application and the coupling returns this offset to
the 180º after a few seconds.
476 J. Rosado et al.

Phase resetting at foot-contact

150
Backward Force
Forward Force
100
Pertubation Tolerance (N)

-50

-100

-150

-200
0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5
Timing (s)

Fig. 4. Additional tolerance to perturbation forces applied at different instants of the movement
cycle when using phase resetting.

2 210
without phase resetting
with phase reseting
phase difference

Phase Difference (deg)

COG Velocity (m/s)

1 200

0 190

180
-1
5 10 15 20 25
Time (s)

Fig. 5. Velocity of the COG in the direction of the motion with a perturbation force applied at
11.5s without and with the use of phase resetting; the time course of the phase difference
between the oscillators is represented in a different vertical axis.

4.2 Adaptation to Irregular Terrains

Biped walking in irregular terrains depends on prediction about when the swing foot
touches the ground. In the first experiment, the humanoid robot walks over a level
surface when it finds a small step of 2 cm high used to approximate irregularities of
the environments. The learned DMP phase is reset online to properly incorporate the
sensory information from the force sensors mounted on the robot feet. More concrete-
ly, the dynamic event corresponds to foot-contact information at the instant of impact
of the swing foot with the ground. Fig. 6 illustrates snapshots of the robot walking
response without and with phase reset. It was found that using the locomotion pattern
as defined by the learned DMP, without any modulation, the robot tolerates step
Adaptive Behavior of a Biped Robot Using Dynamic Movement Primitives 477

irregularities up to 0.5 cm height. Here, the proposed strategy is to change the phase
of the canonical system to a value corresponding to the point of ground contact on the
normal signal generation (when there’s no early contact with the ground).
In the second experiment, it is examined how the phase reset of the canonical oscil-
lator provides changes on the DMP that allows the robot to overcome a set of irregu-
larities that assemble like a set of steps of a small staircase. These consist of two
consecutive steps up followed by two steps down, each one with 2cm high. Beside
this the robot system is also supposed to receive visual information regarding the
stairs location and height in order to modify the basic gait pattern (foot clearance and
step size). Fig. 7 shows the path the robot has to go through and the sequence of cap-
tured images of the robot stepping on the first step, followed by the second step and
after a few steps on this, the first down step followed by the final down step takes the
robot to the ground level. As in the previous example, a phase reset is applied as soon
the robot senses the foot as hit the ground sooner than expected.

Fig. 6. Snapshots of the robot walking on a level surface when it finds a small step 2 cm high
that disturbs its balance (top: without phase resetting; bottom: with phase resetting). Numerical
simulations performed in V-REP [10].
478 J. Rosado et al.

Fig. 7. Top: full view of the path the robot has to go through; center and bottom: sequence of
the robot walking through the path. Numerical simulations performed in V-REP [10] (a video
of this experiment is available at: http://www.youtube.com/watch?v=WjBq27hJAJE).
Adaptive Behavior of a Biped Robot Using Dynamic Movement Primitives 479

5 Conclusions

This paper presents a study in which online modulation of the DMP parameters and
interlimb coordination trough phase coupling providing adaptation of biped locomotion
with improvement to external perturbations. By using the DMP in the task space, new
tasks are easily accomplished by modifying simple DMP parameters that directly relate
to the task, such as step length, velocity, foot clearance. By introducing coupling be-
tween members and using phase reset we have shown that adaption to irregularities on
the terrain are successful. The phase resetting methodology also allowed increasing the
tolerance to external perturbations, such as forces that push or pull the robot on the di-
rection of the movement. Future work will address problems like the role of phase reset-
ting and DMP parameters change in sudden changes on the trunk mass, stepping up
stairs, climbing up and down on ramps. Demonstrations from human data behavior will
be collected using a VICON system and use to train the DMP in new tasks.

Acknowledgements. This work is partially funded by FEDER through the Operational Pro-
gram Competitiveness Factors - COMPETE and by National Funds through FCT - Foundation
for Science and Technology in the context of the project FCOMP-01-0124-FEDER-022682
(FCT reference Pest-C/EEI/UI0127/ 2011).

References
1. Hultborn, H., Nielsen, J.: Spinal control of locomotion – from cat to man. Acta Physiologica
189(2), 111–121 (2007)
2. Grillner, S.: Locomotion is vertebrates: central mechanisms and reflex interaction.
Physiological Reviews 55(2), 247–304
3. Aoi, S., Ogihara, N., Funato, T., Sugimoto, Y., Tsuchiya, K.: Evaluating the functional roles of
phase resetting in generation of adaptive human biped walking with a physiologically based
model of the spinal pattern generator. Biological Cybernetic 102, 373–387 (2010)
4. Yamasaki, T., Nomura, T., Sato, S.: Possible functional roles of phase resetting during
walking. Biological Cybernetic 88, 468–496 (2013)
5. Aoi, S., Tsuchiva, K.: Locomotion control of a biped robot using nonlinear oscillators.
Autonomous Robots 19(3), 219–232 (2005)
6. Nakanishi, J., Morimoto, J., Endo, G., Cheng, G., Schaal, S., Kawato, M.: A framework
for learning biped locomotion with dynamical movement primitives. International Journal
of Humanoid Robots (2004)
7. Argall, B., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from dem-
onstration. Robotics and Autonomous Systems 57(5), 469–483 (2009)
8. Ijspeert, A., Nakanishi, J., Schaal, S.: Movement imitation with nonlinear dynamical sys-
tems in humanoid robots. In: Proceedings of the 2002 IEEE International Conference on
Robotics and Automation, pp. 1398–1403 (2002)
9. Ijspeert, A., Nakanishi, J., Hoffmann, H., Pastor, P., Schaal, S.: Dynamical movement primi-
tives: learning attractor models for motor behaviors. Neural Computation 25, 328–373 (2013)
10. Rohmer, E., Singh, S., Freese, M.: V-REP: A versatile and scalable robot simulation
framework. In: IEEE/RSJ International Conference on Intelligent Robots and Systems,
pp. 1321–1326 (2013)
Probabilistic Constraints for Robot Localization

Marco Correia, Olga Meshcheryakova, Pedro Sousa, and Jorge Cruz(B)

NOVA Laboratory for Computer Science and Informatics,

DI/FCT/UNL, Caparica, Portugal
{mvc,jcrc}@fct.unl.pt

Abstract. In robot localization problems, uncertainty arises from many

factors and must be considered together with the model constraints.
Probabilistic robotics is the classical approach for dealing with hard
robotic problems that relies on probability theory. This work describes
the application of probabilistic constraint techniques in the context of
probabilistic robotics to solve robot localization problems. Instead of
providing the most probable position of the robot, the approach charac-
terizes all positions consistent with the model and their probabilities (in
accordance with the underlying uncertainty). It relies on constraint pro-
gramming to get a tight covering of the consistent regions combined with
Monte Carlo integration techniques that beneﬁt from such reduction of
the sampling space.

Keywords: Probabilistic robotic · Constraint programming · Robot

localization

1 Introduction
Uncertainty plays a major role in modeling most real-world continuous systems
and, in particular, robotic systems. A reliable framework for decision support
must provide an expressive mathematical model for a sound integration of the
system and uncertainty.
Stochastic approaches associate a probabilistic model to the problem and rea-
son on approximations of the most likely scenarios. In highly nonlinear problems
such approximations may miss relevant satisfactory scenarios leading to erro-
neous decisions. In contrast, constraint programming (CP) approaches reason
on safe enclosures of all consistent scenarios. Model-based reasoning and what-if
scenarios are supported through safe constraint techniques, which only eliminate
scenarios that do not satisfy model constraints. However, safe reasoning based
exclusively on consistency may be inappropriate to suﬃciently reduce the space
of possibilities on large uncertainty settings.
This paper shows how probabilistic constraints can be used for solving global
localization problems providing a probabilistic characterization of the robot posi-
tions (consistent with the environment) given the uncertainty on the sensor
measurements.

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 480–486, 2015.
DOI: 10.1007/978-3-319-23485-4 47
Probabilistic Constraints for Robot Localization 481

2 Probabilistic Robotics
Probabilistic robotics [1] is a generic approach for dealing with hard robotic
problems that relies on probability theory to reason with uncertainty in robot
perception and action. The idea is to model uncertainty explicitly, representing
information by probability distributions over all space of possible hypotheses
instead of relying on best estimates.
Probabilistic approaches are typically more robust in the face of sensor limita-
tions and noise, and often scale much better to unstructured environments. How-
ever, the required algorithms are usually less efficient when compared with non-
probabilistic algorithms, since entire probability densities are considered instead
of best estimates. Moreover, the computation of probability densities require
working exclusively with parametric distributions or discretizing the probability
space representation.
In global localization problems, a robot is placed somewhere in the environ-
ment and has to localize itself from local sensor data. The probabilistic paradigm
maintains over time, the robot’s location estimate which is represented by a
probability density function over the space of all locations. Such estimate is
updated whenever new information is gathered from sensors, taking into account
its underlying uncertainty.
A generic algorithm known as Bayes filter [2] is used for probability esti-
mation. The Bayes filter is a recursive algorithm that computes a probability
distribution at a given moment from the distribution at the previous moment
accordingly to the new information gathered. Two major strategies are usually
adopted for the implementation of Bayes filters in continuous domains: Gaussian
filters and nonparametric filters.
Gaussian techniques share the idea that probabilities are represented by
multivariate normal distributions. Among this techniques the most popular are
(Extended) Kalman Filters [3,4] which are computationally efficient but inad-
equate for problems where distributions are multimodal and subject to highly
nonlinear constraints.
Nonparametric techniques [5,6] approximate continuous probabilities by a
finite number of values. Representatives of these techniques for robot localiza-
tion problems are Grid and Monte Carlo Localization algorithms [7,8]. Both
techniques do not make any assumptions on the shape of the probability distri-
bution and have the property that the approximation error converges uniformly
to zero as the the number of values used to represent the probabilistic space
goes to infinity. The computational cost is determined by the granularity of
the approximation (the number of values considered) which is not easy to tune
depending both on the model constraints and on the underlying uncertainty.

3 Constraint Programming
Continuous constraint programming [9,10] has been widely used to model safe
reasoning in applications where uncertainty on the values of the variables is
modeled by intervals including all their possibilities. A Continuous Constraint
482 M. Correia et al.

Satisfaction Problem (CCSP) is a tripleX, D, C where X is a tuple of n real

variables x1 , · · · , xn , D is a Cartesian product of intervals Xi × · · · × Xn (a
box), where each Xi is the domain of xi and C is a set of numerical constraints
(equations or inequalities) on subsets of the variables in X. A solution of the
CCSP is a value assignment to all variables satisfying all the constraints in C.
The feasible space F is the set of all CCSP solutions within D.
Continuous constraint reasoning relies on branch-and-prune algorithms [11]
to obtain sets of boxes that cover the feasible space F . These algorithms begin
with an initial crude cover of the feasible space (D) which is recursively reﬁned
by interleaving pruning and branching steps until a stopping criterion is satisﬁed.
The branching step splits a box from the cover into sub-boxes (usually two). The
pruning step either eliminates a box from the covering or reduces it into a smaller
(or equal) box maintaining all the exact solutions. Prunning is achieved by per-
forming constraint propagation [12] based on interval analysis methods [13].

Probabilistic Constraint Pro-

Algorithm 1. probDist(D, C, G) gramming. In classical CCSPs,
1 S ← Branch&P rune(D, C); uncertainty is modeled by intervals
2 ∀1≤i1 ≤g1 ...1≤in ≤gn Mi1 ,...,in ← 0 ; that represent the domains of the vari-
3 P ← 0; ables. Nevertheless this paradigm can-
4 foreach B ∈ S do not distinguish between different sce-
5 i1 , . . . , in ← getIndex(B); narios and all combination of val-
6 Mi1 ,...,in ← M CIntegrate(B); ues within such enclosure are consid-
7 P ← P + Mi1 ,...,in ; ered equally plausible. In this work
8 if P = ∅ then return M ;; an extension of the classical CP
9 ∀1≤i1 ≤g1 ...1≤in ≤gn Mi1 ,...,in ← paradigm is used to support proba-
Mi1 ,...,in /P ; bilistic reasoning. Probabilistic con-
10 return M ; straint programming [14] associates
a probabilistic space to the classical
CCSP by defining an appropriate density function. A probabilistic constraint
space is a pair X, D, C ´ , f , where X, D, C is a CCSP and f a p.d.f. defined
in Ω ⊇ D such that: Ω f (x)dx = 1. The constraints C specify an event H
whose probability
´ can be computed by integrating f over its feasible space,
P (H) = H f (x)dx. The probabilistic constraint framework peforms constraint
propagation to get a box cover of the region of integration H and compute the
overall integral by summing up the contributions of each box in the cover. In this
work, classical Monte Carlo methods [15] are used to estimate the value of the
integrals at each box, by randomly selecting N points in the multidimensional
space and averaging the function values at these points. The success of this tech-
nique relies on the reduction of the sampling space where a pure Monte Carlo
method is not only hard to tune but also impractical in small error settings.
Probability distributions are computed by algorithm 1 which assumes a grid
over the feasible region and computes a conditional probability distribution of
the random vector X given the event H that satisfies all constraints in C. The
grid is specified by the input G = g1 , . . . , gn which is an array that defines
the number or partitions considered at each dimension. The output matrix M is
the conditional probability at each grid cell. The algorithm first computes a grid
Probabilistic Constraints for Robot Localization 483

box cover S for the feasible space of the model constraints C (line 1). Function
Branch&P rune (see [14] for details) is used with a grid oriented parametriza-
tion, i.e., it splits the boxes in the grid and choose to process only those boxes
that are not yet inside a grid cell, stopping when there are no more eligible
boxes. Matrix M is initialized to zero (line 2) as well as the normalization factor
P that will contain, in the end, the overall sum of all non normalized parcels
(line 3). For each box B in the cover S (lines 4-7), its corresponding index of
the matrix cell is identiﬁed (line 5) and its probability is computed by func-
tion M CIntegrate (that implements the Monte Carlo method to compute the
contribution of B) and assigned to the value in that cell (line 6). The normaliza-
tion factor is updated (line 7) and used in the end to normalize the computed
probabilities (line 9).

4 Probabilistic Constraints for Robot Localization

The location of the robot is confined to a box that characterises the environ-
ment’s coordinate system on which prior knowledge is represented as a set of
segments (walls). A robot pose is a triplet x, y, α where (x, y) defines its loca-
tion and α ∈ [0, 2π] characterises its heading direction. This work focus on the
information gathered by a ladar which provides a panoramic view of the environ-
ment gathering distance measurements within a given maximum range δmax (for
a given direction it provides the distance to the closest wall in that direction).
We consider n ladar measurements, each represented as a pair: the angle relative
to the robot heading direction and the recorded distance.
Figure 1(above) illustrates 3 environments confined to the box [0,1000]×[0,1000]
and their respective robot poses. The robot is pictured as a small circle centered
at its location coordinates (with radius 30) and its heading direction is shown
by the inner tick. The straight dotted lines illustrate the distance measurements
that would be captured by a ladar from the robot pose. It considers 7 measure-
ments with direction angles covering the ladar panoramic range and a maximum
distance range δmax = 300 (dotted circle).
In general, the information provided by the robot sensors is subject to dif-
ferent sources of noise. We assume that all ladar measurement errors are inde-
pendent and normally distributed with 0 mean and σ standard deviation. For n
ladar measurements, their joint error probability density is the Gaussian function
√ −n − 1 Σi 2
σ 2π e 2σ2 i where i is the error committed in the ith measurement.
Without loss of generality, other pdf could be used to model the uncertainty on
other sensor measurements.
Figure 1(below) illustrates the results (projected in the xy plan). It shows
the grid over the initial coordinates box and the xy cells consistent with the
measurements with grey levels that reflect the computed probability.
The Gaussian pdf for the current pose x, y, α of the robot given n ladar
measurements (used by the function M CIntegrate) is computed as follows. For
each ladar measurement i, the direction of the observation αi is given by the
robot angle α plus the relative angle of the measurement. The predicted value
for the distance is computed as the distance from the robot pose to the closest
484 M. Correia et al.

object in the direction αi (or is δmax if such distance exceeds the maximum
ladar range). The diﬀerence between the predicted and the distance recorded
by the ladar is the measurement error and its square is accumulated for all
measurements and used to compute the pdf.
The specialized function that narrows a domains box accordingly to the ladar
measurements maintains a set of numerical constraints that can be enforced
over the variables of the problem and then calls the generic interval arithmetic
narrowing procedure [16]. The numerical constraints may result from each ladar
measurement. Firstly a geometric function is used to determine which of the
walls in the map can eventually be seen by a robot positioned in the box with
an angle of vision within the range of the robot pose angle plus the relative
angle of the ladar measurement. If no wall can be seen, the predicted distance
is the maximum ladar range δmax and a constraint is added to enforce the error
between the predicted and the ladar measurement not to exceed a predeﬁned
threshold max . If it is only possible to see a single wall, an adequate numerical
constraint is enforced to restrain the error between the ladar measurement and
the predicted distance for a pose x, y, α. Notice that whenever there is the
possibility of seeing more that one wall it cannot be decided which constraint to
enforce, and the algorithm proceeds without associating any constraint to the
ladar measurement.

5 Experimental Results
The probabilistic constraint framework was applied on a set of global localiza-
tion problems covering different simulated environments and robot poses and
are illustrative of the potential and limitations of the proposed approach. The
algorithms were implemented in C++ over the RealPaver constraint solver [16]
and the experiments were carried out on an Intel Core i7 CPU at 2.4 GHz.
Our grid approach adopted as reference a grid granularity commonly used in
indoor environments [1]: 15 cm for the xy dimensions (10 units represents 1.5
cm), and 4 degrees for the rotational dimension. Based on previous experience
on the hybridization of Monte Carlo techniques with constraint propagation we
adopted a small sampling size of N = 100.
In all our experiments increasing the sampling size did not improve the qual-
ity of the results. Similarly the reference value of 4 degrees for the grid size of the
rotational dimension was fixed since coarser grids prevented constraint pruning
and finer grids increased the computation time without providing better results.
In the following problems, to illustrate the effect of the resolution of the xy grid
we consider, apart from the reference grid size, a 4 times coarser grid (60 cm)
and a 4 times finer grid (3.75 cm).
Fig. 1(left) illustrates a problem where, from the given input, the robot loca-
tion can be circumscribed to a unique compact region. The results obtained
with the coarse grid clearly identify a single cell enclosing the simulated robot
location. The obtained enclosure for the heading direction is guaranteed to be
between 40 and 48 degrees (not shown). With the reference and the fine-grained
grids the results were similar. The CPU time was 5s, 10s and 30s for increasing
grid resolutions.
Probabilistic Constraints for Robot Localization 485

1000 1000 1000

800 800 800

600 600 600

400 400 400

200 200 200

0 0 0
0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000
1000 1000

800 800 800

600 600 600

400 400 400

200 200 200

0 0 0
0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000

Fig. 1. (above) environment and robot poses with the distance measurements cap-
tured by the robot sensors; (below) computed solutions given the environment and the
measurements.

Fig. 1(center) illustrates a problem where local symmetry of the environ-

ment (with 4 similar rooms) makes it impossible to localize the robot into a
unique compact region, coexisting several possible alternatives. In this case, all
the consistent locations were identified despite the adopted grid resolution. The
predicted heading directions point towards the respective room entrance. The
CPU time was 5s, 8s and 13s for increasing grid resolutions.
Fig. 1(right), illustrates a problem where there is a continuum of indistin-
guishable locations for the robot (along the corridor). Again, despite the adopted
grid resolution, all the consistent locations were successfully identified. Notice
that the possible locations are represented as two sets of adjacent cells and
the probability of the cells in the left is larger because here the robot may be
faced towards two oposite directions (left or right) whereas within the right
cells the robot must be heading left. This example reflect the limitations of grid
approaches to represent a continuum of possibilities - the CPU time severly
degradates for increasing grid resolutions: about 4s, 16s and 117s.

6 Conclusions
In this paper we propose the application of probabilistic constraint programming
to probabilistic robotics. We show how the approach can be used to support
sound reasoning in global localization problems integrating prior knowledge on
the environment with the uncertainty information gathered by the robot sensors.
Preliminary experiments on a set of simulated problems highlighted the potential
and limitations of the approach.
In the future the authors aim to extend the approach to address kinematic
constraints and their underlying uncertainty. Probabilistic constraint reasoning
486 M. Correia et al.

has the potential to combine all sources of uncertainty information providing a

valuable probabilistic characterization of the set of robot poses consistent with
the kinematic constraints.

References
1. Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics. The MIT Press (2006)
2. Särkkä, S.: Bayesian filtering and smoothing. Cambridge University Press (2013)
3. Kalman, R.E.: A new approach to linear filtering and prediction problems. ASME
Journal of Basic Engineering (1960)
4. Julier, S.J., Jeffrey, Uhlmann, K.: Unscented filtering and nonlinear estimation.
Proceedings of the IEEE, 401–422 (2004)
5. Kaplow, R., Atrash, A., Pineau, J.: Variable resolution decomposition for robotic
navigation under a pomdp framework. In: IEEE Robotics and Automation,
pp. 369–376 (2010)
6. Arulampalam, M., Maskell, S., Gordon, N.: A tutorial on particle filters for online
nonlinear/non-gaussian bayesian tracking. IEEE Trans. Signal Proc. 50, 174–188
(2002)
7. Wang, Y., Wu, D., Seifzadeh, S., Chen, J.: A moving grid cell based mcl algorithm
for mobile robot localization. In: IEEE Robotics and Biomimetics, pp. 2445–2450
(2009)
8. Dellaert, F., Fox, D., Burgard, W., Thrun, S.: Monte carlo localization for mobile
robots. In: IEEE Robotics and Automation, pp. 1322–1328 (1999)
9. Lhomme, O.: Consistency techniques for numeric CSPs. In: Proc. of the 13th IJCAI
(1993)
10. Benhamou, F., McAllester, D., van Hentenryck, P.: CLP(intervals) revisited. In:
ISLP, pp. 124–138. MIT Press (1994)
11. Hentenryck, P.V., Mcallester, D., Kapur, D.: Solving polynomial systems using a
branch and prune approach. SIAM Journal Numerical Analysis 34, 797–827 (1997)
12. Benhamou, F., Goualard, F., Granvilliers, L., Puget, J.F.: Revising hull and box
consistency. In: Procs. of ICLP, pp. 230–244. MIT (1999)
13. Moore, R.: Interval Analysis. Prentice-Hall, Englewood Cliffs (1966)
14. Carvalho, E.: Probabilistic Constraint Reasoning. PhD thesis, FCT/UNL (2012)
15. Hammersley, J., Handscomb, D.: Monte Carlo Methods. Methuen, London (1964)
16. Granvilliers, L., Benhamou, F.: Algorithm 852: Realpaver an interval solver using
constraint satisfaction techniques. ACM Trans. Mathematical Software 32(1),
138–156 (2006)
Detecting Motion Patterns in Dense Flow
Fields: Euclidean Versus Polar Space

Andry Pinto(B) , Paulo Costa, and Antonio Paulo Moreira

Robotics and Intelligent Systems - INESCTEC and Faculty of Engineering,

University of Porto, Porto, Portugal
{andry.pinto,paco,amoreira}@fe.up.pt

Abstract. This research studies motion segmentation based on dense

optical flow fields for mobile robotic applications. The optical flow is usu-
ally represented in the Euclidean space however, finding the most suit-
able motion space is a relevant problem because techniques for motion
analysis have distinct performances. Factors like the processing-time and
the quality of the segmentation provide a quantitative evaluation of the
clustering process. Therefore, this paper defines a methodology that eval-
uates and compares the advantage of clustering dense flow fields using
different feature spaces, for instance, Euclidean and Polar space. The
methodology resorts to conventional clustering techniques, Expectation-
Maximization and K-means, as baseline methods. The experiments con-
ducted during this paper proved that the K-means clustering is suitable
for analyzing dense flow fields.

Keywords: Visual perception · Optical ﬂow · Motion segmentation

1 Introduction

The work presented by this research studies and compares techniques for motion
analysis and segmentation using dense optical flow fields. Motion segmentation is
the process of dividing an image into different regions using motion information
in a way that each region presents homogeneous characteristics. Two techniques
are considered during this research, the Expectation-Maximization (EM) and K-
means. The performance and the behavior of these techniques is well-known in
the scientific community since they have been applied in countless applications of
machine learning; however, this paper presents their performance for clustering
dense motion fields obtained by a realistic robotic application [5]. The optical
flow technique [7] is used this paper because it estimates dense flow fields in short
time and makes it suitable for robotic applications without specialized computer
devices however, the quality of the flow fields that are obtained is lower when
compared to the most recent methods.
Four different feature spaces are considered in this research: the motion vector
is represented in Cartesian space or Polar space, and the feature can have the
positional information of the image location. Mathematically, this is represented

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 487–492, 2015.
DOI: 10.1007/978-3-319-23485-4 48
488 A. Pinto et al.

by the following features: flc = (x̄, ȳ, ū, v̄), flp = (x̄, ȳ, m̄, φ̄), f c = (u, v) and
f p = (m, φ), where (x, y) is the image location, (u, v) is the flow vector in
Cartesian space and the (m, φ) is the flow vector in Polar space (magnitude
and angle). In most cases, a normalization is performed [3] due to different
physical meanings of the feature’s components: v̄ = mean(v) σv ,, where σv is the
standard deviation of the component v. As can be noticed, the influence of this
normalization on the segmentation procedure is also analyzed.
Therefore, contributions of this articles include: study about motion analysis
in dense optical flow fields and for a practical use in a mobile robot. The goal is to
segment different objects according to their motion coherence; comparison about
the most suitable feature space for clustering dense flow fields; extensive qual-
itative and quantitative evaluations considering several baseline and pixel-wise
clustering techniques, namely the K-means and the Expectation-Maximization.

2 Related work

The work [3] evaluates the performance of several clustering methods namely,
K-means, self-tuning spectral clustering and nonlinear dimension reduction.
Authors defend that one most important factor for clustering dense flow fields
is the proper choice of the distance measure. They considered the feature space
to be formed by the pixel coordinates and motion vectors, whose values are
normalized by taking into consideration the mean and standard deviation of
each feature. Results show the difficulty of segmenting dense flow fields because
no technique was outperformed and, thereby, the choice of the most suitable
clustering technique and distance metric must be investigated for a specific con-
text and environment. An accurate segmentation technique that resorts to dense
flow fields and uses long term point trajectories is presented in [1]. By clustering
trajectories over time, it is possible to use a metric that measures the distance
between these trajectories. The work in [4] proposes a sparse approach for detect-
ing salient regions in the sequences. Feature points are tracked over time in order
to pursue saliency detection as violation of co-visibility. The evaluation of the
method shows that it cannot achieve a real-time computational performance
since it took 32.6 seconds to process a single sequence. The [2] addresses the
problem of motion detection and segmentation in dynamic scenes with small
camera movements and resorting to a set of moving points. They use the Lukas-
Kanade optical flow to compute the sparse flow field (features obtained by the
Harris corner). Afterwards, these points are clustered using a variable bandwidth
mean-shift technique and, finally, the cluster segmentation is conducted using
graph cuts.

3 Practical Results

An extensive set of experiments was conducted as part of this research. They

aimed to analyze and understand the behavior of two parametric techniques for
Detecting Motion Patterns in Dense Flow Fields 489

motion analysis in a robotic and surveillance context [7]. The EM and the K-
means are used in this research as baselines for segmenting dense optical flow
fields and they were implemented as standard functions. In the first experiment,
the assessment was performed using an objective (quantitative) and subjective
(qualitative) evaluation. The objective metric F-score [6] provides quantitative
quality evaluations of the clustering results since it weights the average of pre-
cision and recall and reaches the best value at 1. The baseline methods provide
a pixel-wise segmentation and factors such as the computational effort and the
quality of the visual clustering are considered. Experiments were performed 1
considering four feature spaces: flc , flp , f c and f p .
The results start by demonstrating the segmentation performed by the EM
and K-means in several testing sequences that capture a real surveillance scenario
(indoor). Figures 1(a), 1(b) and 1(c) depict only three dense flow fields that were
obtained from these sequences. Using the EM and the K-means for clustering
the flow field represented in f p have resulted in figures 2(d) to 2(f) and 3(d)
to 3(f), respectively. Figures 2(a) to 2(c) and 3(a) to 3(c) depict results for the
Cartesian space.
As can be noticed, the segmentation conducted by the EM in f c do not
originate a suitable segmentation because the clusters of people appear larger
and they have spatially isolated regions that are meaningless (hereafter, called
clustering noise). This issue is more depicted in figure 2(c) however, the same
flow field segmented in Polar space originated a result that represents more
faithfully the person’s movement since it is less affected by meaningless and
isolated regions, see figure 2(f). On the other hand, the visual illustration of
the motion segmentation conducted by the K-means in f p is similar to f c . The
qualitative analysis of these results is not possible however; and independently
of the feature space that is considered (f p or f c ), the result of the K-means is
better than the EM since the person’s movements are more faithfully depicted.
In addition, the clustering noise of the K-means is lower than the EM for these
two feature spaces. The K-means is simpler than the EM however, it is a powerful
technique to cluster the input dataset.

(a) (b) (c)

Fig. 1. Figures of the first row depict dense flow fields that were obtained from the
technique proposed in [7]. The HSV color space is used to represent the direction (color)
and magnitude (saturation) of the flow.

1
The results in this section were obtained using an I3-M350 2.2GHz and manually
annotated images.
490 A. Pinto et al.

(a) (b) (c)

(d) (e) (f)

Fig. 2. Comparison between the EM in f c (ﬁrst row) and EM in f p (second row).

Motion segmentation for the flow fields represented in figures 1(a), 1(b) and 1(c).

(a) (b) (c)

(d) (e) (f)

Fig. 3. Comparison between the K-means in f c (first row) and K-means in f p (second
row). Motion segmentation for the flow fields represented in figures 1(a), 1(b) and 1(c).

The visual illustration of the motion segmentation based K-means produced

inconclusive results in terms of the best feature space. Therefore, quantitative
evaluations were conducted based on manually annotated images that represent
the ground truth of the segmentation. Table 1 presents the results evaluated
using the objective metric F-score, for the spaces f c and f p . The superscripts
“c” and “p” represent the segmentation result in Cartesian and Polar space. The
motion segmentation conducted by the K-meansc was close to the K-meansp ,
although with a lower amount of noise. This table conﬁrms the visual illustra-
tion of the previous results since, the EMp had a F-score that is in average
0.178 higher than EMc , while the K-meansc produced clusters with a F-score
that is 0.040 higher than K-meansp . This means, the F-score of the EMc and
Detecting Motion Patterns in Dense Flow Fields 491

EMp was 0.630 and 0.809; the K-meansc and K-meansp was 0.893 and 0.852
(in average). Therefore, the Polar feature space is a clear advantage for the EM
technique while the quality of the segmentation produced by the K-means is not
so affected by the feature space. Both clustering techniques were able to char-
acterize the two motion models present in each trial although, EM technique
produces clusters affected by a higher level of noise. This may be caused due
to a process that is more complex in nature since it is an iterative scheme that
computes the posterior probabilities and the log-likelihood. Therefore, it is less
robust to noisy data relatively to the K-means that is a more simpler technique.
In addition, table 1 proves that the quality of the segmentation obtained by
the EM was substantially better when compared to the experiments with f c .
In detail, the performance of the EM increased by 43.3% however, the average
performance of the K-mean was similar to the result obtained in f c . Generally,
the flc makes possible for the EM and K-means to achieve a better segmenta-
tion quality (despising the first sequence). These trials depict that the EM and
K-mean are not suitable for the Polar feature space with information about the
image location, flp , since results are inconclusive (only some trials reported an
improved quality).

Table 1. F-score - Performance comparison between the EM and K-means in f c , f p , flc

and flp . Parameters such the precision (“Prec.”) and the recall (“Rec.”) are presented.
Experiences 1, 2 and 3 represent the results for the ﬂow ﬁelds 1(a), 1(b) and 1(c),
respectively.

Sequence EMc K-meansc EMp K-meansp

Pre. Rec. F-score Pre. Rec. F-score Pre. Rec. F-score Pre. Rec. F-score
1 0.404 1.000 0.575 0.981 0.944 0.962 0.797 0.936 0.861 0.881 0.928 0.904
2 0.631 0.999 0.774 1.000 0.853 0.921 0.875 0.917 0.896 0.924 0.837 0.879
3 0.455 0.994 0.624 0.543 0.926 0.685 0.496 0.894 0.638 0.535 0.885 0.667
4 0.638 0.996 0.778 0.987 0.866 0.922 0.805 0.939 0.867 0.923 0.934 0.929
5 0.367 0.998 0.537 0.999 0.852 0.919 0.715 0.905 0.799 0.852 0.846 0.849
6 0.330 0.999 0.496 0.998 0.899 0.946 0.668 0.974 0.792 0.828 0.947 0.884
Sequence EMcl K-meanscl EMpl K-meanspl
1 0.912 0.927 0.919 0.578 0.916 0.708 0.717 0.936 0.812 0.303 0.976 0.463
2 0.934 0.897 0.915 0.999 0.853 0.920 0.589 0.969 0.733 0.906 0.836 0.870
3 0.882 0.968 0.923 0.979 0.948 0.963 0.418 0.975 0.585 0.874 0.937 0.904
4 0.879 0.998 0.935 0.977 0.920 0.947 0.922 0.900 0.911 0.910 0.901 0.906
5 0.697 0.972 0.812 0.998 0.858 0.923 0.923 0.726 0.813 0.843 0.871 0.857
6 0.885 0.951 0.917 0.998 0.901 0.947 0.909 0.866 0.887 0.822 0.942 0.878

Finally, the computational performance was evaluated for the EM and K-

means in both f c and f p . The computation of the techniques took in average,
7.127 seconds (EMc ), 3.787 seconds (EMp ), 0.142 seconds (K-meansc ) and 0.122
seconds (K-meansp ). As can be seen, the Polar feature space accelerates the
492 A. Pinto et al.

convergence of the clustering in both techniques since the processing time is sub-
stantially reduced, especially for the EM case whose processing time is reduced
by 46.9% while the processing time of the K-means is reduced by 14.1%.

4 Conclusion
Therefore, the paper presented an important research topic for motion perception
and analysis because the segmentation produces poor results when the features
space is not properly adjusted. This compromises the ability of the mobile robot
to understand its surrounding environment. An extensive set of experiments
were conducted as part of this work and several factors were considered and
studied such as, space (Cartesian and Polar) and dimensionality of the feature
vector. Results prove that choosing a good feature space for the detection of
motion patters is not a trivial problem since it influences the performance of
the Expectation-Maximization and K-means. This last technique in Cartesian
space revealed the best performance for motion segmentation of flow fields (with
a resolution of 640×480). It originates a good visual segmentation (evaluated
using the F-score metric) in a reduced period of time since it took 0.122 seconds
to compute.
This work was funded by the project FCOMP - 01-0124-FEDER-022701.

References
1. Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories.
In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS,
vol. 6315, pp. 282–295. Springer, Heidelberg (2010)
2. Bugeau, A., Prez, P.: Detection and segmentation of moving objects in complex
scenes. Computer Vision and Image Understanding 113(4), 459–476 (2009)
3. Eibl, G., Brandle, N.: Evaluation of clustering methods for finding dominant optical
flow fields in crowded scenes. In: International Conference on Pattern Recognition,
pp. 1–4 (December 2008)
4. Georgiadis, G., Ayvaci, A., Soatto, S.: Actionable saliency detection: Independent
motion detection without independent motion estimation. In: IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), pp. 646–653 (2012)
5. Pinto, A.M., Costa, P.G., Correia, M.V., Paulo Moreira, A.: Enhancing dynamic
videos for surveillance and robotic applications: The robust bilateral and temporal
filter. Signal Processing: Image Communication 29(1), 80–95 (2014)
6. Dan Melamed, I., Green, R., Turian, J.P.: Precision and recall of machine transla-
tion. In: Proceedings of the 2003 Conference of the North American Chapter of the
Association for Computational Linguistics on Human Language Technology: Com-
panion Volume of the Proceedings of HLT-NAACL, NAACL-Short 2003, pp. 61–63.
Association for Computational Linguistics, Stroudsburg (2003)
7. Pinto, A.M., Paulo Moreira, A., Correia, M.V., Costa, P.G.: A flow-based motion
perception technique for an autonomous robot system. Journal of Intelligent and
Robotic Systems, 1–25 (2013) (in press)
Swarm Robotics Obstacle Avoidance:
A Progressive Minimal Criteria Novelty
Search-Based Approach

Nesma M. Rezk(B) , Yousra Alkabani, Hassan Bedour, and Sherif Hammad

Computer and Systems Department Faculty of Engineering,

Ain Shams University, Cairo, Egypt
{nesma.rezk,yousra.alkabani,hassan bedour,sherif.hammad}@eng.asu.edu.eg

Abstract. Swarm robots are required to explore and search large areas.
In order to cover largest possible area while keeping communications,
robots try to maintain hexagonal formation while moving. Obstacle
avoidance is an extremely important task for swarm robotics as it saves
robots from hitting objects and being damaged.
This paper introduces novelty search evolutionary algorithm to swarm
robots multi-objective obstacle avoidance problem in order to overcome
deception and reach better solutions.
This work could teach robots how to move in diﬀerent environments
with 2.5% obstacles coverage while keeping their connectivity more than
82%. Percentage of robots reached the goal was more than 97% in 70%
of the environments and more than 90% in the rest of the environments.

Keywords: Maintaining formation · Novelty search · Obstacle avoid-

ance

1 Introduction
Our main interest in this work, is to teach swarm robots by using novelty search
evolutionary algorithm how to reach a certain goal while maintaining formation
and avoiding obstacles.

2 Related Work
2.1 Novelty Search
Lehman and Stanley proposed a new change in genetic algorithms [1]. Instead
of calculating a fitness function and selecting individuals with the best fitness
values, individuals who have more novel behaviour than other individuals are
selected to be added to the new generation.
Novelty search can be easily implemented on top of most evolutionary algo-
rithms. Basically, the fitness value would be replaced with what is called a nov-
elty metric. The novelty metric (sparseness of an individual) is calculated as the

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 493–498, 2015.
DOI: 10.1007/978-3-319-23485-4 49
494 N.M. Rezk et al.

average distance between behavior vectors of the individual and its kth nearest
neighbors, and an archive.
Pure novelty search for large search spaces is not enough to reach solutions
as the algorithm will spend a lot of time searching behaviors that are not meet-
ing the goal. So, novelty search can overcome deception but cannot work alone
without the guidance of the fitness value.
In MCNS (Minimal Criteria Novelty Search) only individuals who have a
fitness value greater than a minimal criteria would be assigned their novelty
score [2]. Otherwise their novelty score would be zero. Zero novelty value indi-
viduals would only be used for reproduction if no other individuals meet the
minimal criteria. It is clear that MCNS acts as random search until individuals
that meet the minimal criteria are reproduced. So, MCNS should be seeded with
initial population that meets the minimal criteria.
Progressive minimal criteria novelty search was proposed by Gomes et al.
to overcome the need of seeding the MCNS algorithm with initial population
[3]. The minimal criteria is a dynamic fitness threshold initially set to zero. The
fitness threshold progressively increases among generations to avoid the search
from exploring irrelevant solutions. In each generation, the new criteria is found
by determining the value of the P-th percentile of the fitness scores in the current
population. This means that P percent of the fitness values would fall under the
minimal criteria.

2.2 Evolutionary Obstacle Avoidance

Hettiarachchi and Spears proposed an evolutionary algorithm for swarm robots
offline learning [4]. The genetic algorithm is to teach robots how to reach their
goal while preserving their hexagonal formation and avoiding obstacles at the
same time. To get optimum coverage with least number of robots and efficient
multi-hop communications network hexagonal formation is the best formation
as discussed in [5].
Lennard Jones force law is used to control the robot. The robot moves accord-
ing to the net force calculated from the forces that are exerted upon it by other
robots and environment. As the robot moves in the environment it interacts
with three types of objects: robots, obstacles and a goal. It is required to set the
parameters for three copies of the force law one for each object in order to keep
the robot at distance R from neighbor robots which will lead to the hexagonal
formation, keep away from obstacles, and reach goal.
An evolutionary algorithm is used to optimize the parameters for the three
copies of the force law.The penalty function (1) is a minimizing multi-objective
fitness function. The function consists of three components: penalty for not reach-
ing the goal, penalty for collisions, and penalty for lack of cohesion.

P enalty = Pcollisions + Pconnectivity + Pnotreachingthegoal (1)

Swarm Robotics Obstacle Avoidance: A PMCNS-Based Approach 495

3 Applying PMCNS
To apply PMCNS, we have several issues to handle. The penalty function of our
problem is a minimizing function. It needs to minimize the penalty for not reach-
ing the goal, penalty for non-cohesion and penalty for collisions. The PMCNS
algorithm equations assumes that the fitness function is a maximizing function.
It allows individuals that have fitness values higher than the minimal criteria
to be selected for the next generation. We have two options to apply. The first
option is to change all of the equations of the algorithm to be a maximal criteria
algorithm not a minimal criteria.
The second option (which was adopted in this work) is to inverse all the
penalty values to change the problem from a minimizing to a maximizing prob-
lem. The minimum value would be the maximum value and the maximum value
would be the minimum value . Equation (2) shows how the penalty value would
be inverted.

inversed penalty(i) = max penalty − penalty(i) (2)

where max penalty is the maximum penalty of the current generation. penalty(i)
is subtracted from current generation maximum penalty so the inverted penalty
values will have the same range of penalty values.
We need to decide how to capture the behavior vector, and how to apply
PMCNS to a multi-objective problem. To fill the behaviour vector, we need to
make the vector express how the controller behave through simulation. For two
reasons, we chose the genomes of the individual to express the behaviour vector.
The first reason is that the genomes are the parameters of the three versions of
the force law. Those parameters decide how the robot will behave with robots,
obstacles, and the goal, so they express the behaviour of the controller. The
second reason is that Cuccu and Gomez stated that the simplest way to fill the
behavior vector is to fill it with the individual genomes [6].

4 Experiments and Results

4.1 PMCNS Experiment
This experiment was held to compare objective-based search to progressive mini-
mal criteria novelty search. The environments generated for the evaluation mod-
ule contains 40 robots and 90 obstacles like the experiment held by Hettiarachchi
and Spears [4]. The evolution was run for 80 generations. Each individual was
evaluated 20 times. The evaluation module of the genetic algorithm was dis-
tributed over 30 computers to gain speedup [7].
There are diﬀerences in the way the two algorithms act. The objective-based
search would start faster than PMCNS but PMCNS could reach less penalty
values than objective-based search. Fig. 1 shows the minimum penalty found
in all generations during evolution for both objective-based and PMCNS with
percentile = 50%.
496 N.M. Rezk et al.

Fig. 1. Minimum penalty of each generation for both objective-based and novelty
search evolutionary algorithms.

4.2 Evaluation Module Experiments

The next experiments were held to examine the changes can be done in the eval-
uation module to reach better solutions. Since the previous experiment showed
that PMCNS can perform better than objective-based search, we used PMCNS
in the rest of experiments. In these experiments PMCNS evolutionary algorithm
with percentile= 50% was used.
The target of these experiments is to train robots to move in environments
with obstacle coverage less than or equal 2.5%. So, the environments for training
contained 40 robots and 50 obstacles. The diameter of the obstacle is 0.2 units,
while the diameter of the robot is 0.02 units. The arena dimensions are 9 x 7
units, so the obstacles coverage is 2.5% of the environment. Robots are initially
placed at the bottom left of the arena and the goal is located at the top right of
the arena. Obstacles are randomly placed in the arena. There is an area around
the nest where no obstacles are placed to prevent proximity collisions.
Robot can sense goal at any distance. A penalty is added at the end of simu-
lation (1500 time step) if less than 80% of the robots did not reach the goal area.
The goal area is 4R from the center of the goal. Each individual was evaluated
20 times. Most of those settings are like Hettiarachchi and Spears experimental
settings in their work [4]. For each experiment the best penalty value individual
performance was tested over 20, 40, 60, 80, and 100 robots moving in environ-
ments that contain 10, 20, 30, 40, and 50 obstacles corresponding to obstacle
coverages equal 0.5, 1, 1.5, 2 and 2.5% of the environment. So, the total number
of performance experiments=25. Each experiment is evaluated 50 times.
1. First Penalty Function (Penalty Experiment 1)
In this experiment, robot can sense neighbor robots at distance 1.5R where
R is the desired separation between robots. Robot attracts its neighbors if
neighbors are 1.5R distance away, and repulses its neighbors if the neighbors
Swarm Robotics Obstacle Avoidance: A PMCNS-Based Approach 497

Fig. 2. Summary of the penalty evaluation experiments results.

are closer than 1.5R distance. A penalty for non cohesion is added if less
than or more than 6 robots are found at distance R from the robot.
Obstacles are sensed at distance equals double the obstacle diameter from
the center of the robot to the center of the obstacle. Robot starts to interact
with the obstacle at distance equals double the obstacle diameter from the
center of the robot to the center of the obstacle. A penalty is added for
collision if the distance between center of robot and center of obstacle is less
than robot diameter.
2. Second Penalty Function (Penalty Experiment 2) In this experiment, we
changed the distances of doing action and adding penalty for cohesion (robot
to robot interaction). Robot can sense neighbor robots at distance 1.5R,
where R is the desired separation between robots. Robot attracts its neigh-
bors if the neighbors are R distance away and repulses its neighbors if the
neighbors are closer than R distance. A penalty for non cohesion is added if
less than or more than 6 robots are found at distance R from the robot
3. Applying Harder Problem on Third Penalty Function (Penalty Experiment 3)
In the last two experiments, we noticed that results are the worst at 50
obstacle environments, so we decided to hold a new experiment with same
settings of penalty experiment 2, but the training environments contain 60
obstacles instead of 50 obstacles to see if the training environments were
harder would the robots behave better for easier environments.

Penalty Experiments Results. Experiment 1 is the best in the percentage

of robots reached the goal but the worst in the minimum percentage of robots
remained connected and the number of collisions. Experiment 2 is much better
than experiment 1 in the minimum percentage of robots remained connected but
less number of robots reached the goal. Experiment 3 is the same like experiment
2 but only the training environments were harder. Experiment 3 showed the best
results for all objectives.
Fig. 2 shows graphically the results of the experiments. Reachability tells us
how many times in the reachibility performance experiments the percentage of
robots reached the goal was over 97%. Connectivity tells us how many times
in the connectivity performance experiments the minimum percentage of robots
498 N.M. Rezk et al.

remained connected was over 90%. The number of collisions shows the total
number of collisions found in each experiment.

5 Conclusion and Future Work

This work shows that progressive minimal criteria novelty search can perform in
a different way than objective-based search. It can reach better solutions than
objective-based search for a multi-objective task for swarm robots. However we
believe that as the task is deceptive we can have better solutions and better
evolutionary behavior using novelty search algorithms than the behavior reached
in this work.
The purpose of the upcoming experiments is to find the part of the fitness
function that was decepted by the multi-objective problem to describe it in the
behavior vector and apply novelty search. Otherwise, we shall prove that the
way used in this work for calculating the fitness value using the multi-objective
function was able to overcome the deception of the problem and novelty search
will not be of a big value.
This work examined different changes in the evaluation module of the multi-
objective genetic algorithm settings. These changes can enhance one objective,
but another objective may get worse. This work proved that using harder prob-
lem for learning during the evolutionary algorithm will give us better solutions
for easier problems.

References
1. Lehman, J., Stanley, K.: Improving evolvability through novelty search and self-
adaptation. In: Proceedings of the 2011 IEEE Congress on Evolutionary Computa-
tion (CEC), Piscataway, NJ, US (2011)
2. Lehman, J., Stanley, K.: Revising the evolutionary computation abstraction: mini-
mal criteria novelty search. In: Proceedings of the Genetic and Evolutionary Com-
putation Conference (GECCO), New York, US (2010)
3. Gomes, J., Urbano, P., Christensen, A.L.: Progressive minimal criteria nov-
elty search. In: Pavón, J., Duque-Méndez, N.D., Fuentes-Fernández, R. (eds.)
IBERAMIA 2012. LNCS, vol. 7637, pp. 281–290. Springer, Heidelberg (2012)
4. Hettiarachchi, S., Spears, W., Spears, D.: Physicomimetics, chapter 14, pp. 441–473.
Springer, Heidelberg (2011)
5. Prabhu, S., Li, W., McLurkin, J.: Hexagonal lattice formation in multi-robot sys-
tems. In: 11th International Conference on Autonomous Agents and Multiagent
Systems (AAMAS) (2012)
6. Cuccu, G., Gomez, F.: When novelty is not enough. In: Di Chio, C., Cagnoni, S., Cotta,
C., Ebner, M., Ekárt, A., Esparcia-Alcázar, A.I., Merelo, J.J., Neri, F., Preuss, M.,
Richter, H., Togelius, J., Yannakakis, G.N. (eds.) EvoApplications 2011, Part I. LNCS,
vol. 6624, pp. 234–243. Springer, Heidelberg (2011)
7. Rezk, N., Alkabani, Y., Bedour, H., Hammad, S.: A distributed genetic algorithm
for swarm robots obstacle avoidance. In: IEEE 9th International Conference on
Computer Engineering and Systems (ICCES), Cairo, Egypt (2014)
Knowledge Discovery
and Business Intelligence
An Experimental Study on Predictive Models
Using Hierarchical Time Series

Ana M. Silva1,2 , Rita P. Ribeiro1,3(B) , and João Gama1,2

1
LIAAD-INESC TEC, University of Porto, Porto, Portugal
{201001541,jgama}@fep.up.pt, [email protected]
2
Faculty of Economics, University Porto, Porto, Portugal
3
Faculty of Sciences, University Porto, Porto, Portugal

Abstract. Planning strategies play an important role in companies’

management. In the decision-making process, one of the main important
goals is sales forecasting. They are important for stocks planing, shop
space maintenance, promotions, etc. Sales forecasting use historical data
to make reliable projections for the future. In the retail sector, data has a
hierarchical structure. Products are organized in hierarchical groups that
reflect the business structure. In this work we present a case study, using
real data, from a Portuguese leader retail company. We experimentally
evaluate standard approaches for sales forecasting and compare against
models that explore the hierarchical structure of the products. Moreover,
we evaluate different methods to combine predictions for the different
hierarchical levels. The results show that exploiting the hierarchical struc-
ture present in the data systematically reduces the error of the forecasts.

Keywords: Data mining · Hierarchical time-series · Forecasting in

retail

1 Introduction
Nowadays, with the increasing competitiveness, it is important for companies to
adopt management strategies that allow them to value up against the compe-
tition. In the retail sector, in particular, there is an evident relationship among
different time series. The problem presented here is related to a Portuguese leader
company in the retail sector, in the electronics area. As we see in Figure 1, the
total sales of the company can be divided into five business unities: Home Appli-
ances (U51), Entertainment (U52), Wifi (U53), Image (U54) and Mobile (U55),
each one can be divided in 137 different stores.
In this paper, we propose a predictive model that estimates the monthly
sales revenue for all stores of this company. We then compare this flat model
that ignores the hierarchy present in the time series, with three other ways,
existing in the literature, to combine the obtained forecasts, by exploring its
hierarchical structure (e.g. [1,2,3,4]): bottom-up, top-down and combination of
predictions made at different levels of the hierarchy. The experimental results

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 501–512, 2015.
DOI: 10.1007/978-3-319-23485-4 50
502 A.M. Silva et al.

Fig. 1. Hierarchical structure of our time series data

obtained with our case study comprove that taking advantage of the hierarchical
structure present in the data leads to an improvement of the models performance
as it reduces the error of the forecasts.
This paper is organized as follows: in Section 2 the related work is presented;
furthermore, in Section 3, is described how the forecast model was built and how
we do the comparison between the diﬀerent hierarchical models; and ﬁnally, in
Section 4 are exposed the conclusions of the work and future work.

2 Related Work
Increasingly, the data mining techniques have been applied in time series analy-
sis [5]. In the retail sector, where the time series display well defined components
of trend and seasonality, the usage of learning algorithms such as Artificial Neural
Networks (ANN) proves to be more efficient than the application of traditional
methods, once that this one can capture the non-linear dynamics associated to
these components and their interactions [6]. However, to apply these algorithms
we must study the best way to present the data. Some studies show that, as
standard, the ANNs are most efficient when applied to time series with trend
and seasonality correction [7]. In addition, in the retail sector, normally, there
are several variables that can somehow justify fluctuations in sales. In [8], ANNs
are applied in daily sales forecasting of a company in the shoe’s industry, using as
explanatory variables: month of the year, day of the week, holidays, promotions
or special events, sales period, weeks pre/post Christmas and Easter, the aver-
age temperature, the turnover index in retail sale of textiles, clothing, footwear
and leather articles and the daily sales of previous seven days with correction of
trend and seasonality.
In [9] the predictive power of ANNs is compared with the Support Vector
Regression Machines [10] (SVRs) - this second algorithm also compares the use of
the linear kernel function with the Gaussian kernel function. In that work, these
learning algorithms are applied to five different artificial time series: stationary,
with additive seasonality, with linear trend, with linear trend and additive sea-
sonality and linear trend and multiplicative seasonality. The results showed that
the SVRs with Gaussian kernel function is the most efficient algorithm in the
forecasting of time series without trend. However, in series with trend, the pre-
dictions shown are disastrous, while the ANNs and SVRs with linear kernel
An Experimental Study on Predictive Models Using Hierarchical Time Series 503

function produce robust predictions, even without performing pre-processing of

the data.
In hierarchical databases, it is frequently useful to explore the relationship
of dependency between different time series, thus ensuring consistency in time
series forecasting belonging to different levels. In [1] is presented a methodology
that explores the different levels of aggregation hierarchies and the predictions
for different time periods. In this case, forecasting the next elements of the time
series are obtained by aggregating the predictions of descendants series in the
hierarchy associated with this dimension.
Normally, when climbing into the hierarchy the forecast error decreases, since
at these levels some of the deviations and fluctuations are neutralized. The two
most commonly used strategies are bottom-up and top-down. While with the first
method the forecasts are calculated on the lower level of the hierarchy and then
aggregated to provide the higher dimension series predictions; in the second, the
forecasts are calculated at the top level and then disaggregated to the lower lev-
els [2]. When using this second case, there is no universal form of breakdown, but
there are several methodologies that can be adopted [3]. Although the algorithm
top-down is easy to build and produce reliable predictions of aggregated levels,
when we go down in the hierarchy, it can lead to loss of information related to
the dynamics of the series in descending hierarchies and the distribution of pre-
dictions by lower levels is not always easy to accomplish. Furthermore, with the
algorithm bottom-up, the loss of information is not so big, but there are many
series to predict and the noise presented in the data below hierarchies is often
high. However, there is no consensus on the best approach.
More recently, it was proposed a method which produces better results when
compared to the application of algorithms bottom-up and top-down. This method
consists in the calculation of independent projections in all levels of the hier-
archy, applying then a regression model to optimize the combination of these
predictions [4]. The main advantage of this approach is the fact that the pre-
dictions obtained for all time series of all levels can come from the application
of any learning algorithm, which allows you to use all available data, as well
as inherent time series dynamics. After that, the forecasts will be revised and
transformed by applying a weighted average, which will use all other forecasts.
To better explain how this method works, let us consider the time series
hierarchy illustrated in Figure 2.
The authors [4] start by proposing the translation of the time series hierarchy
to a matrix notation where:

– each line i represents a node in the hierarchy in breadth traversal order;

– each column j represents a node of the bottom level;
– each position (i, j) of the matrix is 1 if the time series contained in the
bottom level node j contributes to the time series in node i; otherwise, it
should be 0.
504 A.M. Silva et al.

Fig. 2. Example of a hierarchical relationship with two levels

Thus, the hierarchy shown in Figure 2 is represented by the following matrix

S (cf. Equation 1). ⎛ ⎞
11111
⎜1 1 1 0 0⎟
⎜ ⎟
⎜0 0 0 1 1⎟
⎜ ⎟
⎜1 0 0 0 0⎟
S=⎜ ⎜0 1 0 0 0⎟
⎟ (1)
⎜ ⎟
⎜0 0 1 0 0⎟
⎜ ⎟
⎝0 0 0 1 0⎠
00001
The authors [4] also showed that, assuming that the forecasting errors of the
hierarchy take the same distribution of the aggregated data, it is possible to
obtain reasonable forecasts by solving the expression shown in Equation 2,

Yt (h) = S(S t S)−1 S t × Yt (h) (2)

where Yt (h) represents the recalculated prediction for series h at time t, Yt (h)
represents the prediction obtained independently for the time series h at time t
and S is the matrix that represents the hierarchy of the time series.
The calculation of S(S t S)−1 S t gives us the weights we need to do the forecast
adjustment. Considering the matrix S (cf. Equation 1), we obtain the weights
matrix (cf. Equation 3) corresponding to all series of the diﬀerent hierarchy
nodes.
⎛ ⎞
0.58 0.30 0.28 0.10 0.10 0.10 0.14 0.14
⎜0.31 0.51 −0.20 0.17 0.17 0.17 −0.10 −0.10⎟
⎜ ⎟
⎜0.27 −0.21 0.48 −0.07 −0.07 0.07 0.24 0.24⎟
⎜ ⎟
⎜0.10 0.18 −0.08 0.72 −0.27 −0.27 −0.04 −0.04⎟
t −1 t
S(S S) S = ⎜ ⎜ ⎟ (3)
⎟
⎜0.10 0.18 −0.08 −0.27 0.72 −0.27 −0.04 −0.04⎟
⎜0.10 0.18 −0.08 −0.27 −0.27 0.72 −0.04 −0.04⎟
⎜ ⎟
⎝0.15 −0.09 0.24 −0.03 −0.03 −0.03 0.62 −0.38⎠
0.15 −0.09 0.24 −0.03 −0.03 −0.03 −0.38 0.62

For example, the forecast value for the times series AA would be obtained by
using the weights of the fourth line of this weights matrix, as shown in Equation 4.
An Experimental Study on Predictive Models Using Hierarchical Time Series 505

YAA = 0.10 × YT otal + 0.18 × YA − 0.08 × YB + 0.72 × YAA

− 0.27 × YAB − 0.27 × YAC − 0.04 × YBA − 0.04 × YBB (4)

We should note that the negative weights are associated to the time series
which are not directly influencing the considered time series. This coefficient is
negative, instead of null, since we want to extract the effect of this series in the
series on the top levels.
The same authors made available in R [11], the package hts [12] which has
the implementation of an algorithm that automatically returns predictions for
all hierarchical levels based on the idea described previously. However, it only
allows the use of a linear model to predict each set.

3 The Case Study

3.1 Data Description

The challenge lies in building a model to forecast the sales revenue of the com-
pany, per month. Additionally, this forecast should be made by store - there
are 137 stores across the country - and by business unit - i.e., is not intended a
forecast for the whole store, but for a particular set products. In this particular
business, there are ﬁve business units: Home Appliances (U51), Entertainment
(U52), Wiﬁ (U53), Image (U54) and Mobile (U55). The goal is at the 15th day
of each month, foreseen the sales revenue of the following month.
This problem is as a regression problem - since it is the forecast of a contin-
uous variable - and the available data will be used to train the algorithms, i.e.,
we will have a supervised learning process.
We have monthly aggregated data since January 2011 until December 2014.
We keep the last six months of 2014 to evaluate our models, using the remaining
data for training. Due to the reduced number of instances, we decided to use a
growing window with a time horizon of two months. This means that to predict
the sales of July 2014, we use as training window all the data until May 2014.
Then, the training window grows by incorporating the data of June 2014, to
predict the sales of August 2014, and so on.

3.2 Non-hierarchical Model

From a ﬁrst analysis of our data, we have noticed that the sales evolution regard-
ing each business unit in each store exhibits very diﬀerent behaviours. In these
conditions, it is unfeasible to obtain a single model that would have a good perfor-
mance for every store and business unit. Thus, in order to avoid very large errors,
we have started by applying the k-means algorithm [13] to our stores, setting
the number of clusters to three. Our aim was to cluster the stores by the three
main areas of the country: north, center and south. The stores that constituted
the centroid of each cluster, were used to tune the learning methods parameters
for the time series corresponding to the total and the business unit of each store.
506 A.M. Silva et al.

To the tuning process, we used the function experimentalComparison from the

package DMwR [14], which chooses the best parameters that minimize the mean
squared error (MSE).
In a first modelling approach, we have applied the Autoregressive integrated
moving average (arima) - from R package forecast [15] - to obtain the sales
forecast for each business unit in each store and each store total. Still, no good
results were obtained.
In this context, and since we have a huge variety of time series with different
features, we decided to apply different learning algorithms to each time series and
use a simple ensemble to combine the models predictions. The final prediction
was obtained by summing up the prediction made by the best model, i.e. the
model that achieved the lowest MSE estimation, weighted by 0.75, with the
prediction made by the second best model, weighted by 0.25. The used learning
algorithms in our ensemble were: Artificial Neural Networks - from R packages
nnet [16] and caret [17] -, Support Vector Machines - from R packages e1071 [18]
and kernlab [19] - and Random Forests - from R package randomForest [20].
These non-linear learning algorithms have shown better results in comparison
to the linear model arima and the ensemble has shown better results when
compared with each simple learning algorithm alone.
Using this modelling strategy, and based on [8], we also conducted a study
to see if the addition/change of variables to the original time series would jus-
tify the sales’ fluctuations and, thus, positively influence the results of the sales
forecast. After some tests, the following changes to the original time series lead
to overall better results. The sales values were normalized to the [0, 1] interval.
New variables were added with information on the month of year, the number
of Saturdays/Sundays in the month, indication of the Easter month, and pro-
motional campaigns and their time intervals of impact on the sales. It was also
verified that the results became worse when the sales of earlier periods were
used - either in the original format and in the data corrected of trend and/or
seasonality, thus it was not used. It was also found that there was no significant
correlation between considered variables.
There were some recent stores for which we did not had the information from
the beginning of 2011. For these stores, we found the oldest store with the most
similar behaviour and used it to predict the new one.

3.3 Hierarchical Models

Our flat modelling approach, described in previous section, of predicting sales for
each store, each business unit, and total of the company shown some drawbacks.
The sum of the forecasts of the series of lower levels, does not correspond exactly
to the value of the upper level. In this context, we found that would be useful to
explore the hierarchies present in the database, which will also help to build a
more consistent model in time series forecasts from different hierarchical levels.
Therefore, based on the forecasts obtained with our base model,we considered
the following four different modelling approaches.
An Experimental Study on Predictive Models Using Hierarchical Time Series 507

Non-hierarchical Model (NonHierarch): model that predicts the sales for

each store, each business unit, and total of the company, ignoring the hier-
archical relationship.
Bottom-up Model (BottomUp): model obtained using a bottom-up app-
roach, which means that the predictions made for the lower level of the
hierarchy are used to forecast levels above; for each level the prediction is
given by the sum of the predictions made in the level below.
Top-Down Model (TopDown): model obtained using a top-down approach,
which means that the forecasts made for the total of the company are used
predict the sales of lower hierarchical levels; the prediction uses an a priori
measure of the weight of each business unit in the total of sales total, based
on the history of the considered month.
Hierarchical Combination Model (HierarchComb): model that uses all the
predictions obtained independently for each hierarchical level, and applies
them the regression model suggested by [21] to optimize the combination of
the obtained predictions.

3.4 Experimental Results

The comparative results of the Mean Absolute Percentage Error (MAPE)
obtained for the total sales of the company and for each business unit are shown
in Table 1. This error metric was used in order to compare the results obtained
with the error rates previously set by the company.
From the analysis of Table 1, we verify that for the series corresponding to
the total sales of the company, the months of July, October and November (i.e.
50% of the test set) are better predicted by HierarchComb, while the remaining
months are better predicted by the model BottomUp - 17% - and NonHierarch and
TopDown - 33%. These two last models are, in fact, the same because TopDown
uses the forecast obtained for the total of sales. Regarding the business units
level, the models BottomUp and HierarchComb are the best in the same number
of forecasts - 33% each - followed by models NonHierarch and TopDown, in 17%
of the forecasts, each.
The results for each store, by business unit, are illustrated in Figure 3. We
have also the results for the total of each store, obtained accordingly with each
model.
In fact, looking at Figure 3, there is mainly a reduction of the highest error
rates when using the model HierarchComb in comparison with model NonHier-
arch. Moreover, except for some small rounding errors, the predictions obtained
by models NonHierarch and BottomUp are equal, since the model BottomUp uses
all independent forecasts of the bottom level. On the other hand, the error rate
obtained by the model TopDown is higher than the others, which can be justified
by the noise introduced in the disaggregation process.
In order to verify if the results obtained by the different models were statis-
tically significant in the bottom level, we applied an hypothesis test. Since we
have a high sample size, the Central Limit Theorem allow us to consider that
the sample follows a distribution approximately Normal. So, visually, we observe
508 A.M. Silva et al.

Table 1. MAPE of sales forecast per business unit and total sales of the company by
four modelling approaches to combine predictions of the diﬀerent hierarchical levels.

Business Modelling Forecast Month

Unit Approach Jul Aug Sept Oct Nov Dec
NonHierarch 2.62% 13.31% 6.08% 0.91% 2.89% 1.96%
BottomUp 2.71% 3.81% 3.81% 3.82% 6.54% 1.34%
U51
TopDown 2.46% 6.32% 4.21% 2.71% 5.42% 2.10%
HierarchComb 1.93% 9.42% 5.82% 0.46% 3.15% 1.04%
NonHierarch 12.42% 5.10% 2.31% 2.37% 16.36% 15.22%
BottomUp 6.24% 12.68% 2.67% 3.28% 7.02% 13.42%
U52
TopDown 4.12% 6.76% 2.52% 4.02% 9.51% 14.22%
HierarchComb 9.22% 4.97% 2.39% 2.36% 15.21% 13.11%
NonHierarch 4.91% 23.94% 6.3% 2.21% 9.63% 5.89%
BottomUp 2.98% 8.10% 6.56% 3.06% 6.78% 15.60%
U53
TopDown 5.02% 8.43% 5.78% 1.98% 5.59% 7.20%
HierarchComb 4.76% 22.71% 7.21% 2.15% 8.52% 6.26%
NonHierarch 9.30% 1.41% 3.51% 1.52% 3.25% 4.02%
BottomUp 1.47% 1.70% 2.70% 2.89% 2.87% 4.30%
U54
TopDown 3.56% 2.87% 3.87% 2.81% 4.56% 3.89%
HierarchComb 8.67% 1.37% 3.70% 1.40% 4.02% 4.07%
NonHierarch 1.87% 14.48% 2.07% 0.03% 6.76% 11.89%
BottomUp 13.22% 9.86% 4.05% 1.91% 4.49% 12.15%
U55
TopDown 4.20% 11.21% 3.42% 1.32% 5.02% 12.67%
HierarchComb 2.04% 10.17% 4.12% 0.12% 6.05% 11.65%
NonHierarch 1.51% 9.18% 0.56% 0.16% 1.85% 4.42%
BottomUp 1.58% 2.24% 0.93% 0.45% 2.42% 5.02%
Total
TopDown 1.51% 9.18% 0.56% 0.16% 1.85% 4.42%
HierarchComb 1.46% 5.76% 0.68% 0.14% 1.23% 6.33%

that TopDown has higher error rates and we also know that for this hierarchi-
cal level, the models NonHierarch and BottomUp are equal. Therefore, and since
parametric tests are more powerful than non-parametric tests, we used the t-test
for paired samples, considering the difference between the pairs of error rates
observations, and the null hypothesis tests whether the mean of these differ-
ences is null, instead of the alternative hypothesis, which tests if the mean value
of the differences is higher that zero, which means the errors obtained by the
model HierarchComb are lower than that obtained by the model BottomUp. We
got a p-value less than 1% so, for this level of significance, we reject the null
hypothesis. We conclude that the model HierarchComb produces better fore-
casts than the model BottomUp, and the differences are statistically significant.
In fact, graphically, the model HierarchComb particularly reduces larger errors,
which have great impact on the average.
An Experimental Study on Predictive Models Using Hierarchical Time Series 509

(a) Business Unity U51

(b) Business Unity U52

(c) Business Unity U53

Fig. 3. Distribution of MAPE of sales forecast for all the stores in business units U51,
U52 and U53 by four modelling approaches: NonHierarch, BottomUp, TopDown and
HierarchComb
510 A.M. Silva et al.

(d) Business Unity U54

(e) Business Unity U55

(f) total by store

Fig. 3. (Continued)
An Experimental Study on Predictive Models Using Hierarchical Time Series 511

4 Conclusions and Future Work

The main goal of this paper is sales forecasting. We present a case study using
real data from a Portuguese company leader of non-food retail sector. We study
predictive models to obtain predictions for monthly sales revenue, by business
unity, for all stores of the company. In our study we compare the standard flat
approach, that ignores the hierarchical structure, and 3 different models that
exploit the hierarchical structure. Our results show that when we descend in the
hierarchy, the error rates tend to increase, given that, at higher levels we have
a more uniform history. Our results confirmed that exploiting the hierarchical
structure of the time series leads to more accurate forecasts. Namely, the app-
roach proposed by [21] that combines the predictions made at each hierarchy
level produced globally better results, for all hierarchical levels. This type of
model tends to reduce the higher error rates and it can be seen as a valuable
alternative when compared with the flat approach, and the bottom-up and top-
down approaches. As future work, we plan to study how the forecasting variance
depends on the inherent interactions between the time series in the same level.
Moreover, as there are many real world problems where data presents an hier-
archical structure, we intent to explore the application of this methodology to
other real problems.

Acknowledgments. This work was supported by the European Commission through

the project MAESTRA (Grant number ICT-2013-612944).

References
1. Ferreira, N., Gama, J.: Análise exploratória de hierarquias em base de dados
multidimensionais. Revista de Ciências da Computação 7, 24–42 (2012)
2. Fliedner, G.: Hierarchical forecasting: issues and use guidelines. Industrial
Management and Data Systems 101, 5–12 (2001)
3. Gross, C.W., Sohl, J.E.: Disaggregation methods to expedite product line forecast-
ing. Journal of Forecasting 9(3) (1990)
4. Hyndman, R.J., Ahmed, R.A., Athanasopoulos, G., Shang, H.L.: Optimal com-
bination forecasts for hierarchical time series. Computational Statistics & Data
Analysis 55, 2579–2589 (2011)
5. Azevedo, J.M., Almeida, R., Almeida, P.: Using data mining with time series data
in short-term stocks prediction: A literature review. International Journal of Intel-
ligence Science 2, 176 (2012)
6. Alon, I., Qi, M., Sadowski, R.J.: Forecasting aggregate retail sales: A compari-
son of artifcial neural networks and traditional methods. Journal of Retailing and
Consumer Services, 147–156 (2001)
7. Zhang, G.P.: Neural networks for retail sales forecasting. In: Encyclopedia of
Information Science and Technology (IV), pp. 2100–2104. Idea Group (2005)
8. Sousa, J.: Aplicação de redes neuronais na previsão de vendas para retalho.
Master’s thesis, Faculdade de Engenharia da Universidade do Porto (2011)
9. Crone, S.F., Guajardo, J., Weber, R.: A study on the ability of support vector
regression and neural networks to forecast basic time series patterns. In: Bramer,
M. (ed.) Artiﬁcial Intelligence in Theory and Practice. LNCS, vol. 217, pp. 149–158.
Springer, Boston (2006)
512 A.M. Silva et al.

10. Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A.J., Vapnik, V.: Support vector
regression machines. In: Advances in Neural Information Processing Systems 9,
December 2–5, NIPS, Denver, CO, USA, pp. 155–161 (1996)
11. R Core Team: R: A Language and Environment for Statistical Computing. R Foun-
dation for Statistical Computing, Vienna, Austria (2014)
12. Hyndman, R.J., Wang, E., with contributions from Roman, A., Ahmed, A.L., to
earlier versions of the package, H.L.S.: hts: Hierarchical and grouped time series.
R package version 4.4 (2014)
13. Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. JSTOR: Applied
Statistics 28(1), 100–108 (1979)
14. Torgo, L.: Data Mining with R, learning with case studies. Chapman and Hall/CRC
(2010)
15. With contributions from George Athanasopoulos, R.J.H., Razbash, S., Schmidt,
D., Zhou, Z., Khan, Y., Bergmeir, C., Wang, E.: forecast: Forecasting functions for
time series and linear models. R package version 5.6 (2014)
16. Ripley, B.: nnet: Feed-forward Neural Networks and Multinomial Log-Linear Mod-
els R package version 7.3-8 (2014)
17. Kuhn, M.: caret: Classiﬁcation and Regression Training. R package version 6.0-35
(2014)
18. Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F.: e1071: Misc
Functions of the Department of Statistics (e1071), TU Wien. R package version
1.6-4 (2014)
19. Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: kernlab - an S4 package for
kernel methods in R. Journal of Statistical Software 11(9), 1–20 (2004)
20. Breiman, L., Cutler, A., Liaw, A., Wiener, M.: Random forests for classiﬁcation
and regression. R package version 4.6-10 (2014)
21. Hyndman, R.J., Athanasopoulos, G.: Optimally reconciling forecasts in a hierarchy.
Foresight: The International Journal of Applied Forecasting (35), 42–48 (2014)
Crime Prediction Using Regression
and Resources Optimization

Bruno Cavadas1,2 , Paula Branco3,4(B) , and Sérgio Pereira5

1
Instituto de Investigação e Inovação em Saúde,
Universidade do Porto, Porto, Portugal
[email protected]
2
Instituto de Patologia e Imunologia Molecular da,
Universidade do Porto, Porto, Portugal
3
LIAAD - INESC TEC, Porto, Portugal
[email protected]
4
DCC - Faculdade de Ciências, Universidade do Porto, Porto, Portugal
5
ALGORITMI Centre, University of Minho, Braga, Portugal
[email protected]

Abstract. Violent crime is a well known social problem aﬀecting both

the quality of life and the economical development of a society. Its predic-
tion is therefore an important asset for law enforcement agencies, since
due to budget constraints, the optimization of resources is of extreme
importance. In this work, we tackle both aspects: prediction and opti-
mization.
We propose to predict violent crime using regression and optimize
the distribution of police oﬃcers through an Integer Linear Program-
ming formulation, taking into account the previous predictions. Although
some of the optimization data are synthetic, we propose it as a possi-
ble approach for the problem. Experiments showed that Random Forest
performs better among the other evaluated learners, after applying the
SmoteR algorithm to cope with the rare extreme values. The most severe
violent crime rates were predicted for southern states, in accordance with
state reports. Accordingly, these were the states with more police oﬃcers
assigned during optimization.

Keywords: Violent crime · Prediction · SmoteR · Regression · Opti-

mization

1 Introduction
Violent crime is a severe problem in society. Its prediction can be useful for the
law enforcement agents to identify problematic regions to patrol. Additionally,
it can be a valuable information to optimize available resources ahead of time.
In the United States of America (USA), according to the Uniform Crime
Reports (UCR) published by the Federal Bureau of Investigation (FBI) [1], vio-
lent crimes imply the use of force or threat of using force, such as rape, murder,

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 513–524, 2015.
DOI: 10.1007/978-3-319-23485-4 51
514 B. Cavadas et al.

robbery, aggravated assault, and non-negligent manslaughter. In 2013, it was

reported 1,163,146 violent crimes, with an average of 367.9 per 100k inhabitants.
This was equivalent to one violent crime every 27.1 seconds. In 2012, according
to the United States Department of Labor [2], there were 780,000 police officers
and detectives in the USA, with a median salary of $56,980 per year. There-
fore, the optimization of police officers can be useful to optimize costs, while
guaranteeing the safety of the population.
In this paper, the contributions are twofold. Firstly, we propose to predict the
violent crime per 100k population using regression. To the best of our knowledge,
this is the first time that such problem is tackled in this way. Moreover, we pre-
process the data using smoteR algorithm to improve predictions on the most
critical values: the extreme high. Having the predictions, we also propose an
Integer Linear Programming formulation for the optimization of police officers
distribution across states. This distribution takes into account the crime severity,
population, density and budget of the states.
The remaining of the paper is organized as follows. In Section 2 a brief survey
on related work is presented. Materials and methods are exposed in Section 3,
including the description of the data set, the prediction-related procedures and
the optimization scheme. Then, in Section 4, results are presented and discussed,
while in Section 5 the main conclusions are pointed out.

2 Related Work
Crime prediction has been extensively studied throughout the literature due to
its relevance to society. These studies employ diverse machine learning techniques
to tackle the crime forecasting problem.
Nath [3] combined K-means clustering and a weighting algorithm, considering
a geographical approach, for the clustering of crimes according to their types.
Liu et al. [4] proposed a search engine for extracting, indexing, querying and
visualizing crime information using spatial, temporal, and textual information
and a scoring system to rank the data. Shah et al. [5] went a step further and pro-
posed CROWDSAFE for real-time and location-based crime incident searching
and reporting, taking into account Internet crowd sourcing and portable smart
devices. Automatic crime prediction events based on the extraction of Twitter
posts has also been reported [6].
Regarding the UCI data set used in this work, Iqbal et al. [7] compared Naive
Bayesian and decision trees methods by dividing the data set into three classes
based on the risk level (Low, medium and high). In this study, decision trees out-
performed Naive Bayesian algorithms, but the pre-processing procedures were
rudimentary. Somayeh Shojaee et al. [8], applied a more rigorous data process-
ing methodology for a binary class and applied the usage of two diﬀerent feature
selection methods to a wider range of learning algorithms (Naive Bayesian, deci-
sion trees, support vector machine, neural networks and K-Nearest neighbors). In
these studies no class balancing methodologies were employed. Other approaches
such as the fuzzy association rule mining [9] and case-based editing [10] have also
been performed.
Crime Prediction Using Regression and Resources Optimization 515

After prediction, optimization of resources can be achieved by several strate-

gies. Donovan et al. [11] used integer linear programming for the optimization of
fire-fighting resources, solving one of the most commonly constrains faced by fire
managers. The same strategy was used by Caulkins et al. [12] in the optimization
of software system security measures given a fixed budget.
Regarding the problem of police officer optimization, Mitchell [13] used a
P-median model to determine the patrol areas in California, while Daskin [14]
applied a Backup Coverage Model to maximize the number of areas covered.
More recently, Li et al. [15] relied on the concept of “crime hot-spots” to create
a cross entropy approach to produce randomized optimal patrol routes.

3 Materials and Methods

3.1 Data Set Description
The Communities and Crime Unnormalized Data Set 1 provides information on
several crimes in the USA, combining socio-economical and law enforcement data
from 90’ Census, 1990 Law Enforcement Management and Admin Stats survey
and the 1995 FBI UCR. It includes 2215 examples, 124 numeric and 1 nominal
attribute. It also contains 4 non-predictive attributes with information about
the community name, county, code and fold. Among the several possible target
variables we chose the number of violent crimes per 100k population.

3.2 Prediction
We started by pre-processing the data set. The violent crime is our target vari-
able, thus we removed all the other 17 possible target variables contained in
the data set. We also eliminated all the examples that had a missing value on
our target variable and removed all the attributes that had more than 80% of
missing values. The data set contained four non-predictive attributes, which we
have also eliminated. Finally, we have removed one more example that still had
a missing value, and have normalized all the remaining attributes.
Although this problem was previously tackled as a classification task, we
opted for addressing it as a regression task. This is an innovative aspect of our
proposal and this choice is also based on the fact that we will use the numeric
results obtained with the predictions for solving an optimization problem. There-
fore, it makes sense to use a continuous variable throughout the work, instead
of discretizing the target variable and latter recovering a numeric value.
Another challenge involving this data set is the high number of attributes.
To address this problem we have applied the same feature selection scheme with
two different percentages. The scheme applies a hierarchical clustering analysis,
using the Pearson Correlation Coefficient. This step removes a percentage of the
features less correlated with the target variable. Then, a Random Forest (RF)
1
Available at UCI repository in
https://archive.ics.uci.edu/ml/datasets/Communities+and+Crime+Unnormalized.
516 B. Cavadas et al.

learner is applied to compute the remaining features importance based on the

impact in the Mean Squared Error. A percentage of the most important features
provided is selected. Two different sets of features were selected by applying
different percentages in the previous scheme. In one of the pre-processed data
we aimed at obtaining 50% of the original features and in the other the goal was
to select only 30% of the original features. This way we obtained two data sets
with 52 and 32 features corresponding to 50% and 30% percentages.
In our regression problem we are interested in predicting the number of vio-
lent crimes per 100k inhabitants. However, we are more concerned with the errors
made in the higher values of the target variable, i.e., the consequences of missing
a high value of violent crimes by predicting it as low are worst than the reverse
type of error. The extreme high values of the violent crime variable are the most
important and yet the less represented in the data set. When addressed as a
classification problem, this is clearly a problem with imbalanced classes, where
the most important class has few examples. SmoteR algorithm is a proposal to
address this type of problems within regression which was presented in [16,17].
This proposal uses the notion of utility-based Regression [18] and relies on the
definition of a relevance function. The relevance function expresses the user pref-
erences regarding the importance assigned to the target variable range. Ribeiro
[18] proposes automatic methods for estimating the relevance function of the
target variable. We have used those methods because they correspond to our
specific concerns: the extreme rare values are the most important. The essential
idea of SmoteR algorithm is to balance the data set by under-sampling the most
frequent cases and over-sampling the rare extreme examples. The over-sampling
strategy generates new synthetic examples by interpolating existing rare cases.
More details can be obtained in [16,17]. The motivation for applying this pro-
cedure is to force the learning systems to focus on the rare extreme cases which
would be difficult to achieve in the original imbalanced data. Our experiences
included several variants of smoteR which were applied to the two pre-processed
data sets. The smoteR variants used in the experiences included all combina-
tions of the following parameters: under-sampling percentage 50% and 100%;
over-sampling percentage 200% and 400%; number of neighbours 5.
For the prediction task we have used three learning algorithms: Support Vec-
tor Machines (SVM), RF and Multivariate Adaptive Regression Splines (MARS).
More details on the experimented parameters and the evaluation are described
in Section 4.1.

3.3 Optimization Through Integer Linear Programming

Given the predicted violent crime per 100k population, we propose to optimize
the distribution of available police oﬃcers by state. We present our proposal as a
proof of concept, since more detailed data and insight into the problem would be
needed to implement a more realistic solution. Given that the number of oﬃcers
by state is an integer quantity, it is used Integer Linear Programming. To solve
the optimization problem it was applied the Branch-and-bound algorithm.
Crime Prediction Using Regression and Resources Optimization 517

Problem Formulation. We considered as resources a certain amount of police

officers to freely distribute by the states of the USA. The optimization takes
into account the predictions on violent crime per 100k population to assign
more officers by the states with more violent criminality. This assignment is
constrained by an ideal number of officers that each state would like to receive
and the available budget. However, every state should receive a minimum amount
of officers to guarantee the security of its citizens.
In the data set, the instances are defined by communities, with several of
them belonging to the same state. Since we wanted to distribute officers by
state, it was calculated the mean violent crime predictions by state.
The optimization problem was defined as,

m
maximize si xi
i=1
m
subject to xi = N ; xi ≤ Hi ;
i=1

xi ≥ fi Hi ; ci xi ≤ Bi ;
xi ∈ N

where i ∈ {1, ..., m} indexes each of the m states, with m = 46, xi is the number
of officers to distribute by state, si is the violent crime predictions by state, Hi
is the ideal number of officers by state, fi is a fraction on the ideal number of
officers that each state accepts as the minimum, ci is the cost that each state
should pay for each officer, and Bi is the available budget for each state.
The ideal number of officers was defined in function of the violent crime
prediction of the state and the population (number of citizens), since bigger
populations, with more violent crime, have higher demands regarding police
officers. To this end, the violent crime predictions were scaled (ssi ) to the interval
[vl , vh ]. This way, it acts as a proportion on the population. However, since some
populations have millions of citizens, this value was divided by 100 to get more
realistic estimates for the ideal number of officers. So,
ssi pi
Hi = (1)
100
where pi is the real population of the state i.
It was defined that the minimum number of officers should be a fraction on
the ideal number, taking into account the crime predictions. Defining a lower (lb )
and an upper (ub ) bound for the fraction, the previously scaled violent crime
predictions are linearly mapped to the interval [lb , ub ]. Knowing that it is in the
interval [vl , vh ], the fraction on the ideal number of officers is calculated as,
si − vl
fi = (ub − lb ) + lb (2)
vh − v l
Budget was defined in function of the population and its density. Such def-
inition is based on the intuition that a small and less dense population needs
less budget and officers than a highly dense and big population. However, the
518 B. Cavadas et al.

population numbers are several orders of magnitude higher than density, which
would make the eﬀect of density negligible. So, we have rescaled both population
and density to the range [0, 100] (psi and dsi ). Moreover, the budget for each
state is a part of the total national budget (BT ). So, Bi was calculated as

(ds + a · psi ) BT
Bi = mi (3)
i=1 dsi + a · psi
where a > 0 is a parameter to tune the weight of the density and population
over the budget calculation.

4 Experimental Analysis
We have divided our problem, and analysis, into two sub-problems: prediction
and optimization. In this section, we describe the tools, metrics, and evaluation
methodology for each sub-problem. Then we focus in each sub-problem results.

4.1 Experimental Setup

Prediction. The main goal of our experiments is to select one of the two pre-
processed data sets, a smoteR variant (in case it has a positive impact) and a
model (among SVM, RF and MARS) to apply in the optimization task.
The experiments were conducted with R software. Table 1 summarizes the
learning algorithms that were used and the respective parameter variants. All
combinations of parameters were tried for the learning algorithms, which led to
4 SVM variants, 6 RF variants and 8 MARS variants.
We started by splitting each data set in train and test sets, approximately
corresponding to 80% and 20% of the data. The test set was held apart to be used
in the optimization, after predicting its crime severities. This set was randomly
built with stratification and with the condition of including at least one example
for each possible state of the USA.
In imbalanced domains, it is necessary to use adequate metrics since tra-
ditional measures are not suitable for assessing the performance. Most of these
specific metrics, such as precision and recall, exist for classification problems. The
notions of precision and recall were adapted to regression problems with non-
uniform relevance of the target values by Torgo and Ribeiro [19] and Ribeiro [18].
We will use the framework proposed by these authors to evaluate and compare
our results. More details on this formulation can be obtained in[18].
All the described alternatives were evaluated according to the F-measure
with β = 1, which means that the same importance was given to both precision
and recall scores. The values of F1 were estimated by means of 3 repetitions of a
10-fold Cross Validation process and the statistical significance of the observed
paired differences was measured using the non-parametric pairwise Wilcoxon
signed-rank test.
Crime Prediction Using Regression and Resources Optimization 519

Table 1. Regression algorithms, parameter variants, and respective R packages.

Learner Parameter Variants R package

MARS nk = {10, 17}, degree = {1, 2}, thresh = {0.01, 0.001} earth [20]
SVM cost = {10, 150}, gamma = {0.01, 0.001} e1071 [21]
Random Forest mtry = {5, 7}, ntree = {500, 750, 1500} randomForest [22]

Optimization. In the optimization sub-problem the objective was to assign

to each state a certain amount of police officers, given the total budget, the
total number of available officers, and the violent criminality predictions. The
optimization was carried out in R software, with the package “lpSolve”.
The values for the population and the density are real values, obtained from
the estimates for 2014 [23]. However, the total budget, the number of available
police officers, and the individual cost of the officers by state were defined by
us. Although they are not real values, they serve as proof of concept. The cost
of each officer by state was chosen randomly, and uniformly, from the interval
[5, 15] once, then the same values were used in all experiments. Additionally, the
values for ub and lb were set to 0.12 and 0.08, while vl and vh were set to 0.125
and 0.7, respectively.

4.2 Results and Discussion

Prediction. We started by examining the results obtained with all the param-
eters selected for the two pre-processed data sets, the three types of learners and
the smoteR variants. All combinations of parameters were tested by means of 3
repetitions of a 10-fold cross validation process. Figure 1 shows these results.
We have also analysed the statistical significance of the differences observed
in the results. Table 2 contains the several p-values obtained when comparing the
SmoteR variants and the different learners, using the non-parametric pairwise
Wilcoxon signed rank test with Bonferroni correction for multiple testing.
The p-value for the differences between the two data sets (with 30% and 50%
of the features) was 0.17. Therefore we chose the data set with less features to
continue to the optimization problem. This was mainly because of: i) the non
statistical significant differences and ii) the smaller size of the data (less features
can explain well the target variable, so we chose the most efficient alternative).

Table 2. Pairwise Wilcoxon signed rank test with Bonferroni correction for the SmoteR
strategies (left) and the learning systems (right).
Strategies none S.o2.u0.5 S.o2.u1 S.o4.u0.5
S.o2.u0.5 1.3e-14 - - - Learners svm rf
S.o2.u1 < 2e-16 1 - - rf <2e-16 -
S.o4.u0.5 2.3e-16 1 1 - mars 0.077 <2e-16
S.o4.u1 < 2e-16 0.18 1 1
520 B. Cavadas et al.

Fig. 1. Results from 3 × 10-fold CV by learning system and SmoteR variant. (none-
original data; S-smoteR; ox-x × 100% over-sampling; uy-y × 100% under-sampling)

Regarding the SmoteR strategy, Figure 1 and Table 2 provide clear evi-
dence of the advantages of this procedure. Moreover, we also observed that the
differences between the several variants of this procedure are not statistically
significant. Therefore, we have opted for the variant which leads to a smaller
data set and consequently a lower run time. For the optimization sub-problem
we chose to use the smoteR variant with 200% of over-sampling percentage and
100% of under-sampling percentage. The learning system that provides a better
performance is clearly the RF. With this learner, there is almost no differences
among the several experimented variants.
Considering these results, we chose the following setting to generate a model
for the optimization sub-problem:

– Pre-processing to remove missing values and select 30% of the most relevant
features;
– Apply the smoteR strategy with parameters k=5; over-sampling percent-
age=200; under-sampling percentage=100;
– RF model with parameters: mtry=7; ntree=750.

After generating the model we obtained the predictions for the test set which
was held apart to use in the optimization sub-problem. These predictions were
used as input of the optimization task.

Optimization. Several parameters were experimented. It was veriﬁed that

with high budget and number of available officers, states with more criminal-
ity are assigned more officers. When the weight of the population increases, the
most populated states, such as California, receive more police officers. When
Crime Prediction Using Regression and Resources Optimization 521

this weight is decreased, those states lost officers, while, for instance, Vermont
obtained the ideal number, although the population is one of the lowest
Table 3 shows the results of distributing 500,000 police officers, with a budget
of 8,000,000, and a = 1. Figure 2 shows the same results in a map of the USA,
where brighter red is associated with higher criminality, and the radius of the
circles is proportional to the amount of officers assigned to the state. The color
of the circle indicates which restriction limited the number of officers. Therefore,
green means that the state received the ideal number, the minimum is repre-
sented in blue, yellow means that the budget of the state did not allow more

Table 3. Distribution of 500,000 police oﬃcers by state, subjected to a total budget

of 8,000,000.

State Crime Prediction Budget Min. Off. Ideal Off. Dist. Off. Cost
NJ 676.8 320390.4 1597 18614 18614 182659.9
PA 276.1 323802.5 1279 15991 1279 11413.7
OR 638.3 87076.7 678 7951 7951 59607.0
NY 893.1 504835.7 4445 49991 41007 504830.6
MO 381.3 12559.3 123 1503 123 1280.8
MA 408.1 233862.6 842 10283 842 5457.7
IN 726.6 164404.1 1247 14420 14420 162921.4
TX 821.1 648307.0 5643 64214 64214 477378.9
CA 850.2 948629.7 8369 94781 68083 948620.3
KY 655.6 104572.9 769 8996 8996 66277.2
AR 928.2 64349.2 691 7726 5885 64346.6
CT 355.4 146317.7 413 5090 413 6109.4
OH 542.8 90044.9 587 6997 587 8778.9
NH 312.6 33606.1 142 1760 142 2027.5
FL 1313.2 503583.7 6432 67716 52441 503581.4
WA 557.3 168051.3 1089 12954 12954 120927.3
LA 1721.0 109854.6 1994 19765 11893 109847.5
WY 528.1 1846.3 87 1036 87 778.2
NC 1315.6 247220.2 3221 33900 27118 247214.5
MS 1089.4 65686.5 808 8801 7149 65677.9
VA 851.7 208745.0 1798 20363 14106 208734.1
SC 1171.5 119505.3 1397 15028 10543 119498.9
WI 325.7 136544.5 629 7793 629 5271.4
TN 712.7 160819.6 1219 14128 10940 160817.3
UT 794.8 61723.4 599 6850 4253 61723.1
OK 488.6 86322.2 545 6560 545 5960.0
ND 342.0 6056.3 83 1026 83 618.2
AZ 500.5 155514.1 962 11554 962 11214.4
CO 791.7 121546.9 1087 12432 12432 84710.8
WV 551.2 39327.5 283 3371 283 3485.0
RI 440.9 110983.5 138 1680 138 1327.7
AL 1452.9 113590.5 1738 17915 7786 113576.2
GA 1254.6 248026.3 3120 33144 26054 248017.1
ID 444.8 28553.6 216 2616 216 2020.0
ME 275.9 23475.0 133 1663 133 1942.2
KS 1286.4 60751.9 920 9724 9724 52508.6
SD 568.5 8853.4 133 1585 1403 8851.3
NV 920.6 58244.7 656 7350 7350 56186.7
IA 556.7 67617.0 479 5696 2243 15039.1
MD 1271.9 191051.4 1872 19832 13217 191045.6
MN 862.1 125640.6 1191 13465 13465 96440.4
NM 835.6 39199.9 443 5031 3950 39199.7
DE 1161.9 48882.5 268 2891 2891 16497.7
VT 517.2 8881.3 92 1097 92 548.1
AK 932.3 64349.2 694 7752 7752 45227.2
DC 3044.8 926793.2 553 4612 4612 67407.4
522 B. Cavadas et al.

Fig. 2. Map of the USA representing the level of violent criminality by state, the
amount of police oﬃcers assigned, and the restriction that imposed that number. White
states are not represented in the data set.

officers, and white means that the state received a middle value of officers, which
is less than the ideal or the maximum allowed by the budget, but higher than the
minimum. It is possible to observe that ten states received the ideal number of
officers. Some of them were associated with low or moderate levels of criminality,
but the density or the population was high, such as New Jersey or Texas. Others
are less populated, such as Oregon, but the ideal number of officers was also
lower than other states constraint by the budget. The violent crime rate was
particularly important in Kansas, since with a lower density and population,
its budget allowed the state to receive the ideal number of officers. It is, also,
possible to observe that the states with more violent criminality reached the
number of officers allowed by their budget, such as Alabama or South Carolina.
Accordingly, many states with less criminality received the minimum number of
officers that they would allow (North Dakota), or values between the minimum
and the ideal, without being constrained by the budget (Iowa). This behaviour
may be desirable, since having too many officers in states with less criminality
may be a waste of resources. The influence of the crime severity may be per-
ceived when comparing Arizona with Nevada. The former has more population,
higher density and budget than the latter, but received less officers because of
the lower criminality rating.
According to the FBI [1], the region with more violent crime incidents is the
South, followed by the West, Midwest and Northeast. It is interesting to notice
that, in Figure 2, it was predicted more severe criminality for the southern states.
These were the states that receive more police officers.
Crime Prediction Using Regression and Resources Optimization 523

5 Conclusions
In this paper, we proposed a pipeline for predicting violent crime and a resources
optimization scheme. Prediction encompasses feature selection through correla-
tion and feature importance analysis, over-sampling of the rare extreme values
of the target variable and regression. Among the evaluated learning systems, RF
presented the best performance. This pipeline itself is one of the contributions
of this work, given that, to the best of our knowledge, this problem in this data
set was never approached as regression. Having the predictions, we propose a
decision support scheme through the optimization of police oﬃcers across states,
while taking into account the violent crime predictions, population, density and
budget of the states. This contribution is presented as a proof of concept, since
some of the parameters were synthesized and may not correspond to the real
scenario. Nevertheless, our results show an higher crime burden in states located
in the southern part of the USA compared with the states in the north. For
this reason, southern states tend to have an higher assignment of police oﬃcers.
These predictions are in accordance with some national reports, and although
some parameters of the optimization are not completely realistic, it seems to
work as expected.
This work, although limited to the United States, can be easily applied to
various other countries. So, as future work we consider that it would be inter-
esting to apply the proposed framework in other countries or regions.

Acknowledgments. This work is ﬁnanced by the FCT – Fundação para a Ciência

e a Tecnologia (Portuguese Foundation for Science and Technology) within project
UID/EEA/50014/2013. Sérgio Pereira and Paula Branco were supported by scholar-
ships from the Fundação para a Ciência e Tecnologia (FCT), Portugal (scholarships
number PD/BD/105803/2014 and PD/BD/105788/2014). We would like to thank the
useful comments of Manuel Filipe Santos, Paulo Cortez, Rui Camacho and Luis Torgo.

References
1. FBI, Crime in the United States 2013 (2014). http://www.fbi.gov/about-us/cjis/
ucr/crime-in-the-u.s/2013/crime-in-the-u.s.-2013 (accessed: January 21, 2015)
2. Labor-Statistics, B.: United States Department of Labor - Bureau of Labor Statis-
tics: Police and detectives (2012). http://www.bls.gov/ooh/protective-service/
police-and-detectives.htmtab-1 (accessed: January 21, 2015)
3. Nath, S.V.: Crime pattern detection using data mining. In: 2006 IEEE/WIC/ACM
International Conference on Web Intelligence and Intelligent Agent Technology
Workshops, WI-IAT 2006 Workshops, pp. 41–44. IEEE (2006)
4. Liu, X., Jian, C., Lu, C.-T.: A spatio-temporal-textual crime search engine. In:
Proceedings of the 18th SIGSPATIAL International Conference on Advances in
Geographic Information Systems, pp. 528–529. ACM (2010)
5. Shah, S., Bao, F., Lu, C.-T., Chen, I.-R.: Crowdsafe: crowd sourcing of crime
incidents and safe routing on mobile devices. In: Proceedings of the 19th ACM
SIGSPATIAL International Conference on Advances in Geographic Information
Systems, pp. 521–524. ACM (2011)
524 B. Cavadas et al.

6. Wang, X., Gerber, M.S., Brown, D.E.: Automatic crime prediction using events
extracted from twitter posts. In: Yang, S.J., Greenberg, A.M., Endsley, M. (eds.)
SBP 2012. LNCS, vol. 7227, pp. 231–238. Springer, Heidelberg (2012)
7. Iqbal, R., Murad, M.A.A., Mustapha, A., Panahy, P.H.S., Khanahmadliravi, N.:
An experimental study of classification algorithms for crime prediction. Indian
Journal of Science and Technology 6(3), 4219–4225 (2013)
8. Shojaee, S., Mustapha, A., Sidi, F., Jabar, M.A.: A study on classification learn-
ing algorithms to predict crime status. International Journal of Digital Content
Technology and its Applications 7(9), 361–369 (2013)
9. Buczak, A.L., Gifford, C.M.: Fuzzy association rule mining for community crime
pattern discovery. In: ACM SIGKDD Workshop on Intelligence and Security Infor-
matics, p. 2. ACM (2010)
10. Redmond, M.A., Highley, T.: Empirical analysis of case-editing approaches for
numeric prediction. In: Innovations in Computing Sciences and Software Engineer-
ing, pp. 79–84. Springer (2010)
11. Donovan, G., Rideout, D.: An integer programming model to optimize resource
allocation for wildfire containment. Forest Science 49(2), 331–335 (2003)
12. Caulkins, J., Hough, E., Mead, N., Osman, H.: Optimizing investments in security
countermeasures: a practical tool for fixed budgets. IEEE Security & Privacy 5(5),
57–60 (2007)
13. Mitchell, P.S.: Optimal selection of police patrol beats. The Journal of Criminal
Law, Criminology, and Police Science, 577–584 (1972)
14. Daskin, M.: A maximum expected covering location model: formulation, properties
and heuristic solution. Transportation Science 17(1), 48–70 (1983)
15. Li, L., Jiang, Z., Duan, N., Dong, W., Hu, K., Sun, W.: Police patrol service
optimization based on the spatial pattern of hotspots. 2011 IEEE International
Conference on in Service Operations, Logistics, and Informatics, pp. 45–50. IEEE
(2011)
16. Torgo, L., Ribeiro, R.P., Pfahringer, B., Branco, P.: SMOTE for regression.
In: Reis, L.P., Correia, L., Cascalho, J. (eds.) EPIA 2013. LNCS, vol. 8154,
pp. 378–389. Springer, Heidelberg (2013)
17. Torgo, L., Branco, P., Ribeiro, R.P., Pfahringer, B.: Resampling strategies for
regression. Expert Systems (2014)
18. Ribeiro, R.P.: Utility-based Regression. PhD thesis, Dep. Computer Science, Fac-
ulty of Sciences - University of Porto (2011)
19. Torgo, L., Ribeiro, R.: Precision and recall for regression. In: Gama, J., Costa,
V.S., Jorge, A.M., Brazdil, P.B. (eds.) DS 2009. LNCS, vol. 5808, pp. 332–346.
Springer, Heidelberg (2009)
20. Milborrow, S.: earth: Multivariate Adaptive Regression Spline Models. Derived
from mda:mars by Trevor Hastie and Rob Tibshirani (2012)
21. Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., Weingessel, A.: e1071: Misc
Functions of the Department of Statistics (e1071), TU Wien (2011)
22. Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3),
18–22 (2002)
23. U.S.C. Bureau, Population Estimates (2012). http://www.census.gov/popest/
data/index.html (accessed: January 23, 2015)
Distance-Based Decision Tree Algorithms
for Label Ranking

Cláudio Rebelo de Sá1,3(B) , Carla Rebelo3 , Carlos Soares2,3 ,

and Arno Knobbe1
1
LIACS Universiteit Leiden, Leiden, The Netherlands
{c.f.de.sa,a.j.knobbe}@liacs.leidenuniv.nl
2
Faculdade de Engenharia, Universidade do Porto, Porto, Portugal
[email protected]
3
INESCTEC Porto, Porto, Portugal

Abstract. The problem of Label Ranking is receiving increasing

attention from several research communities. The algorithms that have
developed/adapted to treat rankings as the target object follow two
diﬀerent approaches: distribution-based (e.g., using Mallows model) or
correlation-based (e.g., using Spearman’s rank correlation coeﬃcient).
Decision trees have been adapted for label ranking following both
approaches. In this paper we evaluate an existing correlation-based app-
roach and propose a new one, Entropy-based Ranking trees. We then
compare and discuss the results with a distribution-based approach. The
results clearly indicate that both approaches are competitive.

1 Introduction
Label Ranking (LR) is an increasingly popular topic in the machine learning
literature [7, 8, 18, 19, 24]. LR studies a problem of learning a mapping from
instances to rankings over a finite number of predefined labels. It can be consid-
ered as a natural generalization of the conventional classification problem, where
only a single label is requested instead of a ranking of all labels [6]. In contrast
to a classification setting, where the objective is to assign examples to a specific
class, in LR we are interested in assigning a complete preference order of the
labels to every example.
There are two main approaches to the problem of LR: methods that trans-
form the ranking problem into multiple binary problems and methods that were
developed or adapted to treat the rankings as target objects, without any trans-
formation. An example of the former is the ranking by pairwise comparison of
[11]. Examples of algorithms that were adapted to deal with rankings as the
target objects include decision trees [6,23], naive Bayes [1] and k -Nearest Neigh-
bor [3,6].
Some of the latter adaptations are based on statistical distribution of rankings
(e.g., [5]) while others are based on rank correlation measures (e.g., [19,23]). In
this paper we carry out an empirical evaluation of decision tree approaches for LR
based on correlation measures and compare it to distribution-based approaches.

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 525–534, 2015.
DOI: 10.1007/978-3-319-23485-4 52
526 C. Rebelo de Sá et al.

We implemented and analyzed the algorithm previously presented in [17]. We

also propose a new decision tree approach for LR, based on the previous one,
which uses information gain as splitting criterion. The results clearly indicate
that both are viable LR methods and are competitive with state of the art
methods.

2 Label Ranking

The Label Ranking (LR) task is similar to classiﬁcation. In classiﬁcation, given

an instance x from the instance space X, the goal is to predict the label (or
class) λ to which x belongs, from a pre-defined set L = {λ1 , . . . , λk }. In LR, the
goal is to predict the ranking of the labels in L that are associated with x [11].
A ranking can be represented as a total order over L defined on the permutation
space Ω. In other words, a total order can be seen as a permutation π of the set
{1, . . . , k}, such that π(a) is the position of λa in π.
As in classification, we do not assume the existence of a deterministic X → Ω
mapping. Instead, every instance is associated with a probability distribution over
Ω [6]. This means that, for each x ∈ X, there exists a probability distribution
P(·|x) such that, for every π ∈ Ω, P(π|x) is the probability that π is the ranking
associated with x. The goal in LR is to learn the mapping X → Ω. The training
data is a set of instances D = {xi , πi }, i = 1, . . . , n, where xi is a vector
containing the values xji , j = 1, . . . , m of m independent variables describing
instance i and πi is the corresponding target ranking.
Given an instance xi with label ranking πi , and the ranking π̂i predicted by
an LR model, we evaluate the accuracy of the prediction with a loss function on
Ω. One such function is the number of discordant label pairs,

D(π, π̂) = #{(a, b)|π(a) > π(b) ∧ π̂(a) < π̂(b)}

If normalized to the interval [−1, 1], this function is equivalent to Kendall’s

τ coeﬃcient [12], which is a correlation measure where D(π, π) = 1 and
D(π, π −1 ) = −1 (π −1 denotes the inverse order of π).
The accuracy of a model can be estimated by averaging this function over a
set of examples. This measure has been used for evaluation in recent LR studies
[6, 21] and, thus, we will use it here as well. However, other correlation measures,
like Spearman’s rank correlation coeﬃcient [22], can also be used.

2.1 Ranking Trees

One of the advantages of tree-based models is how they can clearly express infor-
mation about the problem because their structure is relatively easy to interpret
even for people without a background on learning algorithms. It is also possible
to obtain information about the importance of the various attributes for the
prediction depending on how close to the root they are used. The Top-Down
Induction of Decision Trees (TDIDT) algorithm is commonly used for induction
Distance-Based Decision Tree Algorithms for Label Ranking 527

of decision trees [13]. It is a recursive partitioning algorithm that iteratively splits

data into smaller subsets which are increasingly more homogeneous in terms of
the target variable (Algorithm 1).
It starts by determining the split that optimizes a given splitting criterion.
A split is a test on one of the attributes that divides the dataset into two disjoint
subsets. For instance, given a numerical attribute x2 , a split could be x2 ≥ 5.
Without a stopping criterion, the TDIDT algorithm only stops when the nodes
are pure, i.e., when the value of the target attribute is the same for all examples
in the node. This usually leads the algorithm to overfit, i.e., to generate models
that fit not only to the patterns in the data but also to the noise. One approach
to address this problem is to introduce a stopping criterion in the algorithm that
tests whether the best split is significantly improving the quality of the model.
If not, the algorithm stops and returns a leaf node. This node is represented
by the prediction that will be made for new examples that fall into that node.
This prediction is generated by a rule that solves potential conflicts in the set
of training examples that are in the node. In classification, the prediction rule
is usually the most frequent class among the training examples. If the stopping
criterion is not verified, then the algorithm is executed recursively for the subsets
of the data obtained based on the best split.

Algorithm 1. TDIDT algorithm

BestSplit = Test of the attributes that optimizes the SPLITTING CRITERION
if STOPPING CRITERION == TRUE then
Determine the leaf prediction based on the target values of the examples in D
Return a leaf node with the corresponding LEAF PREDICTION
else
LeftSubtree = TDIDT(D¬BestSplit )
RightSubtree = TDIDT(DBestSplit )
end if

An adaptation of the TDIDT algorithm for the problem of learning rank-

ings has been proposed [23], called Ranking Trees (RT) which is based on the
clustering trees algorithm [2]. Adaptation of this algorithm for label ranking
involves an appropriate choice of the splitting criterion, stopping criterion and
the prediction rule.

Splitting Criterion. The splitting criterion is a measure that quantiﬁes the qual-
ity of a given partition of the data. It is usually applied to all the possible splits
of the data that can be made based on individual tests of the attributes.
In RT the goal is to obtain leaf nodes that contain examples with target rank-
ings as similar between themselves as possible. To assess the similarity between
the rankings of a set of training examples, we compute the mean correlation
between them, using Spearman’s correlation coeﬃcient. The quality of the split
is given by the weighted mean correlation of the values obtained for the subsets,
where the weight is given by the number of examples in each subset.
528 C. Rebelo de Sá et al.

Table 1. Illustration of the splitting criterion

Attribute Condition Negated condition

values rank corr. values rank corr.
x1 a 0.3 b, c -0.2
b 0.2 a, c 0.1
c 0.5 a, b 0.2
x2 <5 -0.1 ≥5 0.1

The splitting criterion of ranking trees is illustrated both for nominal and
numerical attributes in Table 1. The nominal attribute x1 has three values
(a, b and c). Therefore, three binary splits are possible. For the numerical
attribute x2 , a split can be made in between every pair of consecutive values. In
this case, the best split is x1 = c, with a mean correlation of 0.5 for the training
examples that verify the test and a mean correlation of 0.2 for the remaining,
i.e., the training examples for which x1 = a or x1 = b.

Stopping Criterion. The stopping criterion is used to determine if it is worthwhile

to make a split to avoid overfitting [13]. A split should only be made if the
similarity between examples in the subsets increases substantially. Let Sparent
be the similarity between the examples in the parent node and Ssplit the weighted
mean similarity in the subsets obtained with the best split. The stopping criterion
is defined in [17] as follows:
(1 + Sparent ) ≥ γ(1 + Ssplit ) (1)
Note that the significance of the increase in similarity is controlled by the γ
parameter.

Prediction Rule. The prediction rule is a method to generate a prediction from

the (possibly conﬂicting) target values of the training examples in a leaf node.
In RT, the method that is used to aggregate the q rankings that are in the leaves
is based on the mean ranks of the items in the training examples that
fall into
the corresponding leaf. The average rank for each setting is π (j) = i πi (j) /n.
The predicted ranking π̂ will be the average ranking π after assigning ranks to
π (j). Table 2 illustrates the prediction rule used in this work.

Table 2. Illustration of the prediction rule.

λ1 λ2 λ3 λ4
π1 1 3 2 4
π2 2 1 4 3
π 1.5 2 3 3.5
π̂ 1 2 3 4
Distance-Based Decision Tree Algorithms for Label Ranking 529

2.2 Entropy Ranking Trees

Decision trees, like ID3 [15], use Information Gain (IG) as a splitting criterion
to look for the best split points.

Information Gain. IG is a statistical property that measures the diﬀerence in

entropy, between the prior and actual state relatively to a target variable [13].
In other words, considering a set S of size nS , as entropy - H - is a measure of
disorder, IG is basically how much uncertainty in S is reduced after splitting on
attribute A:
|S1 | |S2 |
IG (A, T ; S) = H (S) − H (S1 ) − H (S2 )
nS nS
where |S1 | and |S2 | are the number of instances on the left side (S1 ) and the
number of instances on the right side (S2 ), respectively, of the cut point T in
attribute A.
Using the same tree generation algorithm, the TDIDT (Section 2.1), we pro-
pose an alternative approach of decision trees for ranking data, the Entropy-
based Ranking Trees (ERT). The diﬀerence is on the splitting and stopping
criteria. ERT use IG to assess the splitting points and MDLPC [10] as stop-
ping criterion. Using the measure of entropy for rankings [20], the splitting and
stopping criteria come in a natural way.
The entropy for rankings [20] is deﬁned as:
K

Hranking (S) = P (πi , S) log (P (πi , S)) log kt (S) (2)
i=1

where K is the number of distinct rankings in S and kt (S) is the average nor-
malized Kendall τ distance in the subset S:
K n τ (πi ,πj )+1
i=1 j=1 2
kt (S) =
K × nS
where K is the number of distinct target values in S.
As in Section 2.1 the leafs of the tree should not be forced to have pure leafs.
Instead, they should have a stop criterion to avoid overﬁtting and be robust to
noise in rankings. As shown in [20], the MDLPC Criterion can be used as a
splitting criterion with the adapted version of entropy Hranking . This entropy
measure also works with partial orders, however, in this work, we only use total
orders.
One other ranking tree approach based in Gini Impurity, which will not be
presented in detail in this work, was proposed in [25].

3 Experimental Setup
The data sets in this work were taken from KEBI Data Repository in the Philipps
University of Marburg [6] (Table 3). Two diﬀerent transformation methods were
530 C. Rebelo de Sá et al.

used to generate these datasets: (A) the target ranking is a permutation of the
classes of the original target attribute, derived from the probabilities generated
by a naive Bayes classiﬁer; (B) the target ranking is derived for each example
from the order of the values of a set of numerical variables, which are no longer
used as independent variables. Although these are somewhat artiﬁcial datasets,
they are quite useful as benchmarks for LR algorithms.
The statistics of the datasets used in our experiments is presented in Table 3.
Uπ is the proportion of distinct target rankings for a given dataset.

Table 3. Summary of the datasets

Datasets type #examples #labels #attributes Uπ

autorship A 841 4 70 2%
bodyfat B 252 7 7 94%
calhousing B 20,640 4 4 0.1%
cpu-small B 8,192 5 6 1%
elevators B 16,599 9 9 1%
fried B 40,769 5 9 0.3%
glass A 214 6 9 14%
housing B 506 6 6 22%
iris A 150 3 4 3%
pendigits A 10,992 10 16 19%
segment A 2310 7 18 6%
stock B 950 5 5 5%
vehicle A 846 4 18 2%
vowel A 528 11 10 56%
wine A 178 3 13 3%
wisconsin B 194 16 16 100%

The code for all the examples in this paper has been written in R ([16]).
The performance of the LR methods was estimated using a methodology
that has been used previously for this purpose [11]. It is based on the ten-
fold cross validation performance estimation method. The evaluation measure is
Kendall’s τ and the performance of the methods was estimated using ten-fold
cross-validation.

4 Results

RT uses a parameter, γ, that can aﬀect the accuracy of the model. A γ ≥ 1 does
not increase the purity of nodes. On the other hand, small γ values will rarely
generate any nodes. We vary γ from 0.50 to 0.99 and measure the accuracy on
several KEBI datasets.
To show in what extent γ aﬀects the accuracy of RT we show in Figure 1 the
results obtained for some of the datasets in Table 3. From Figure 1 it is clear
Distance-Based Decision Tree Algorithms for Label Ranking 531

Fig. 1. Comparison of the accuracy obtained on some datasets by RT as γ varies from

0.5 to 0.99.

that γ plays an important role in the accuracy of RT. It seems that the best
values lie between 0.95 and 0.98. We will use γ = 0.98 for the Ranking Tees
(RT).
Table 4 presents the results obtained by the two methods presented in com-
parison to the results for Label Ranking Trees (LRT) obtained in [6]. Even
though LRT perform better in the cases presented, given the closer values to it,
both RT and ERT give interesting results.
To compare diﬀerent ranking methods we use a method proposed in [4] which
is a combination of Friedmans test and Dunns Multiple Comparison Procedure
[14]. First we run the Friedman’s test to check whether the results are diﬀerent
or not, with the following hypotheses:
532 C. Rebelo de Sá et al.

Table 4. Results obtained for Ranking Trees on KEBI datasets. (The mean accuracy
is represented in terms of Kendall’s tau, τ )

RT ERT LRT
authorship .879 .890 .882
bodyfat .104 .183 .117
calhousing .181 .292 .324
cpu-small .461 .437 .447
elevators .710 .758 .760
fried .796 .773 .890
glass .881 .854 .883
housing .773 .704 .797
iris .964 .853 .947
pendigits .055 .042 .935
segment .895 .902 .949
stock .854 .859 .895
vehicle .813 .786 .827
vowel .085 .054 .794
wine .899 .907 .882
wisconsin -.039 -.035 .343

Table 5. P-values obtained for the comparison of the 3 methods

RT ERT LRT
RT 1.0000 0.2619
ERTt 1.0000 0.1529
LRT 0.2619 0.1529

H0 . There is no diﬀerence in the mean average correlation coeﬃcients for the 3

methods
H1 . There are some diﬀerences in the mean average correlation coeﬃcients for
the three methods

Using the friedman.test function from the stats package [16] we got a p-value
< 1%, which shows strong evidence against H0 .
Now that we know that there are some differences between the 3 methods
we will test which are different from one another with the Dunns Multiple Com-
parison Procedure [14]. Using the R package dunn.test [9] with a Bonferroni
adjustment, as in [4], we tested the following hypotheses for each pair of of
methods a and b:
H0 . There is no difference in the mean average correlation coefficients between
a and b
H1 . There is some difference in the mean average correlation coefficients between
a and b
The p-values obtained are presented in Table 5. Table 5 indicates that there is no
strong statistically evidence that the methods are different. One other conclusion
Distance-Based Decision Tree Algorithms for Label Ranking 533

is that both RT and ERT are very equivalent approaches. While RT and ERT
does not seem to outperform LRT in most of the cases studied, from the statical
tests we can say that both approaches are competitive.

5 Conclusions
In this work we implemented a decision tree method for Label Ranking, Ranking
Trees (RT) and proposed an alternative approach Entropy-based Ranking Trees
(ERT). We also present an empirical evaluation on several datasets of correlation-
based methods, RT and ERT, and compare with the state of the art distribution-
based Label Ranking Trees (LRT). The results indicate that both RT and ERT
are reliable LR methods.
Our implementation of Ranking Trees (RT) shows that the method is a com-
petitive approach in the LR field. We showed that the input parameter, γ, can
have a great impact on the accuracy of the method. The tests performed on
KEBI datasets indicate that the best results are obtained when 0.95 < γ < 1.
The method proposed in this paper, ERT, which uses IG as a splitting cri-
terion achieved very similar results to the RT presented in [17]. Statistical tests
indicated that there is no strong evidence that the methods (RT, ERT and
LRT) are significantly different. This means that both RT and ERT are valid
approaches, and, since they are correlation-based methods, we can also say that
this kind of approaches is also worth pursuing.

References
1. Aiguzhinov, A., Soares, C., Serra, A.P.: A similarity-based adaptation of naive
bayes for label ranking: application to the metalearning problem of algorithm
recommendation. In: Pfahringer, B., Holmes, G., Hoﬀmann, A. (eds.) DS 2010.
LNCS, vol. 6332, pp. 16–26. Springer, Heidelberg (2010)
2. Blockeel, H., Raedt, L.D., Ramon, J.: Top-down induction of clustering trees.
CoRR cs.LG/0011032 (2000). http://arxiv.org/abs/cs.LG/0011032
3. Brazdil, P., Soares, C., Costa, J.: Ranking Learning Algorithms: Using IBL and
Meta-Learning on Accuracy and Time Results. Machine Learning 50(3), 251–277
(2003)
4. Brazdil, P., Soares, C., da Costa, J.P.: Ranking learning algorithms: Using IBL
and meta-learning on accuracy and time results. Machine Learning 50(3), 251–277
(2003). http://dx.doi.org/10.1023/A:1021713901879
5. Cheng, W., Dembczynski, K., Hüllermeier, E.: Label ranking methods based on
the plackett-luce model. In: ICML, pp. 215–222 (2010)
6. Cheng, W., Huhn, J.C., Hüllermeier, E.: Decision tree and instance-based learn-
ing for label ranking. In: Proceedings of the 26th Annual International Confer-
ence on Machine Learning, ICML 2009, June 14–18, Montreal, Quebec, Canada,
pp. 161–168 (2009)
7. Cheng, W., Hüllermeier, E.: Label ranking with abstention: Predicting partial
orders by thresholding probability distributions (extended abstract). Computing
Research Repository, CoRR abs/1112.0508 (2011). http://arxiv.org/abs/1112.0508
534 C. Rebelo de Sá et al.

8. Cheng, W., Hüllermeier, E., Waegeman, W., Welker, V.: Label ranking with par-
tial abstention based on thresholded probabilistic models. In: Advances in Neural
Information Processing Systems 25: 26th Annual Conference on Neural Informa-
tion Processing Systems 2012. Proceedings of a meeting held December 3–6, Lake
Tahoe, Nevada, United States, pp. 2510–2518 (2012). http://books.nips.cc/papers/
files/nips25/NIPS2012 1200.pdf
9. Dinno, A.: dunn.test: Dunn’s Test of Multiple Comparisons Using Rank Sums,
r package version 1.2.3 (2015). http://CRAN.R-project.org/package=dunn.test
10. Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued
attributes for classification learning. In: Proceedings of the 13th International Joint
Conference on Artificial Intelligence, August 28-September 3, Chambéry, France,
pp. 1022–1029 (1993)
11. Hüllermeier, E., Fürnkranz, J., Cheng, W., Brinker, K.: Label ranking by learning
pairwise preferences. Artificial Intelligence 172(16–17), 1897–1916 (2008)
12. Kendall, M., Gibbons, J.: Rank correlation methods. Griffin London (1970)
13. Mitchell, T.: Machine Learning. McGraw-Hill (1997)
14. Neave, H., Worthington, P.: Distribution-free Tests. Routledge (1992). http://
books.google.nl/books?id=1Y1QcgAACAAJ
15. Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986).
http://dx.doi.org/10.1023/A:1022643204877
16. R Development Core Team: R: A Language and Environment for Statistical Com-
puting. R Foundation for Statistical Computing, Vienna, Austria (2010). http://
www.R-project.org ISBN 3-900051-07-0
17. Rebelo, C., Soares, C., Costa, J.: Empirical evaluation of ranking trees on some
metalearning problems. In: Chomicki, J., Conitzer, V., Junker, U., Perny, P. (eds.)
Proceedings 4th AAAI Multidisciplinary Workshop on Advances in Preference
Handling (2008)
18. Ribeiro, G., Duivesteijn, W., Soares, C., Knobbe, A.: Multilayer perceptron for
label ranking. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds.)
ICANN 2012, Part II. LNCS, vol. 7553, pp. 25–32. Springer, Heidelberg (2012)
19. de Sá, C.R., Soares, C., Jorge, A.M., Azevedo, P., Costa, J.: Mining association
rules for label ranking. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011,
Part II. LNCS, vol. 6635, pp. 432–443. Springer, Heidelberg (2011)
20. de Sá, C.R., Soares, C., Knobbe, A.: Entropy-based discretization methods for
ranking data. Information Sciences in Press (2015) (in press)
21. de Sá, C.R., Soares, C., Knobbe, A., Azevedo, P., Jorge, A.M.: Multi-interval
discretization of continuous attributes for label ranking. In: Fürnkranz, J.,
Hüllermeier, E., Higuchi, T. (eds.) DS 2013. LNCS, vol. 8140, pp. 155–169.
Springer, Heidelberg (2013)
22. Spearman, C.: The proof and measurement of association between two things.
American Journal of Psychology 15, 72–101 (1904)
23. Todorovski, L., Blockeel, H., Džeroski, S.: Ranking with predictive clustering trees.
In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI),
vol. 2430, pp. 444–455. Springer, Heidelberg (2002)
24. Vembu, S., Gärtner, T.: Label ranking algorithms: A survey. In: Fürnkranz, J.,
Hüllermeier, E. (eds.) Preference Learning, pp. 45–64. Springer, Heidelberg (2010)
25. Xia, F., Zhang, W., Li, F., Yang, Y.: Ranking with decision tree. Knowl. Inf. Syst.
17(3), 381–395 (2008). http://dx.doi.org/10.1007/s10115-007-0118-y
A Proactive Intelligent Decision Support System
for Predicting the Popularity of Online News

Kelwin Fernandes1(B) , Pedro Vinagre2 , and Paulo Cortez2

1
INESC TEC Porto/Universidade Do Porto, Porto, Portugal
2
ALGORITMI Research Centre, Universidade Do Minho, Braga, Portugal
[email protected]

Abstract. Due to the Web expansion, the prediction of online news

popularity is becoming a trendy research topic. In this paper, we propose
a novel and proactive Intelligent Decision Support System (IDSS) that
analyzes articles prior to their publication. Using a broad set of extracted
features (e.g., keywords, digital media content, earlier popularity of news
referenced in the article) the IDSS ﬁrst predicts if an article will become
popular. Then, it optimizes a subset of the articles features that can
more easily be changed by authors, searching for an enhancement of the
predicted popularity probability. Using a large and recently collected
dataset, with 39,000 articles from the Mashable website, we performed a
robust rolling windows evaluation of ﬁve state of the art models. The best
result was provided by a Random Forest with a discrimination power
of 73%. Moreover, several stochastic hill climbing local searches were
explored. When optimizing 1000 articles, the best optimization method
obtained a mean gain improvement of 15 percentage points in terms of
the estimated popularity probability. These results attest the proposed
IDSS as a valuable tool for online news authors.

Keywords: Popularity prediction · Online news · Text mining · Clas-

siﬁcation · Stochastic local search

1 Introduction
Decision Support Systems (DSS) were proposed in the mid-1960s and involve the
use of Information Technology to support decision-making. Due to advances in
this field (e.g., Data Mining, Metaheuristics), there has been a growing interest
in the development of Intelligent DSS (IDSS), which adopt Artificial Intelligence
techniques to decision support [1]. The concept of Adaptive Business Intelligence
(ABI) is a particular IDSS that was proposed in 2006 [2]. ABI systems combine
prediction and optimization, which are often treated separately by IDSS, in order
to support decisions more efficiently. The goal is to first use data-driven models
for predicting what is more likely to happen in the future, and then use modern
optimization methods to search for the best possible solution given what can be
currently known and predicted.

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 535–546, 2015.
DOI: 10.1007/978-3-319-23485-4 53
536 K. Fernandes et al.

Within the expansion of the Internet and Web 2.0, there has also been a grow-
ing interest in online news, which allow an easy and fast spread of information
around the globe. Thus, predicting the popularity of online news is becoming a
recent research trend (e.g., [3,4,5,6,7]). Popularity is often measured by consid-
ering the number of interactions in the Web and social networks (e.g., number of
shares, likes and comments). Predicting such popularity is valuable for authors,
content providers, advertisers and even activists/politicians (e.g., to understand
or influence public opinion) [4]. According to Tatar et al. [8], there are two main
popularity prediction approaches: those that use features only known after pub-
lication and those that do not use such features. The first approach is more
common (e.g., [3,5,9,6,7]). Since the prediction task is easier, higher prediction
accuracies are often achieved. The latter approach is more scarce and, while a
lower prediction performance might be expected, the predictions are more useful,
allowing (as performed in this work) to improve content prior to publication.
Using the second approach, Petrovic et al. [10] predicted the number of
retweets using features related with the tweet content (e.g., number of hash-
tags, mentions, URLs, length, words) and social features related to the author
(e.g., number of followers, friends, is the user verified). A total of 21 million
tweets were retrieved during October 2010. Using a binary task to discriminate
retweeted from not retweeted posts, a top F-1 score of 47% was achieved when
both tweet content and social features were used. Similarly, Bandari et al. [4]
focused on four types of features (news source, category of the article, subjec-
tivity language used and names mentioned in the article) to predict the number
of tweets that mention an article. The dataset was retrieved from Feedzilla and
related with one week of data. Four classification methods were tested to predict
three popularity classes (1 to 20 tweets, 20 to 100 tweets, more than 100; articles
with no tweets were discarded) and results ranged from 77% to 84% accuracy,
for Naı̈ve Bayes and Bagging, respectively. Finally, Hensinger et al. [11] tested
two prediction binary classification tasks: popular/unpopular and appealing/non
appealing, when compared with other articles published in the same day. The
data was related with ten English news outlets related with one year. Using
text features (e.g., bag of words of the title and description, keywords) and
other characteristics (e.g., date of publishing), combined with a Support Vector
Machine (SVM), the authors obtained better results for the appealing task when
compared with popular/unpopular task, achieving results ranging from 62% to
86% of accuracy for the former, and 51% to 62% for the latter.
In this paper, we propose a novel proactive IDSS that analyzes online news
prior to their publication. Assuming an ABI approach, the popularity of a can-
didate article is first estimated using a prediction module and then an optimiza-
tion module suggests changes in the article content and structure, in order to
maximize its expected popularity. Within our knowledge, there are no previous
works that have addressed such proactive ABI approach, combining prediction
and optimization for improving the news content. The prediction module uses a
large list of inputs that includes purely new features (when compared with the
literature [4,11,10]): digital media content (e.g., images, video); earlier popular-
A Proactive Intelligent Decision Support System 537

ity of news referenced in the article; average number of shares of keywords prior
to publication; and natural language features (e.g., title polarity, Latent Dirich-
let Allocation topics). We adopt the common binary (popular/unpopular) task
and test ﬁve state of the art methods (e.g., Random Forest, Adaptive Boosting,
SVM), under a realistic rolling windows. Moreover, we use the trendy Mashable
(mashable.com/) news content, which was not previously studied when predict-
ing popularity, and collect a recent and large dataset related with the last two
years (a much larger time period when compared with the literature). Further-
more, we also optimize news content using a local search method (stochastic hill
climbing) that searches for enhancements in a partial set of features that can be
more easily changed by the user.

2 Materials and Methods

2.1 Data Acquisition and Preparation
We retrieved the content of all the articles published in the last two years from
Mashable, which is one of the largest news websites. All data collection and
processing procedures described in this work (including the prediction and opti-
mization modules) were implemented in Python by the authors. The data was
collected during a two year period, from January 7 2013 to January 7 2015.
We discarded a small portion of special occasion articles that did not follow the
general HTML structure, since processing each occasion type would require a
speciﬁc parser. We also discarded very recent articles (less than 3 weeks), since
the number of Mashable shares did not reach convergence for some of these arti-
cles (e.g., with less than 4 days) and we also wanted to keep a constant number of
articles per test set in our rolling windows assessment strategy (see Section 2.3).
After such preprocessing, we ended with a total of 39,000 articles, as shown in
Table 1. The collected data was donated to the UCI Machine Learning repository
(http://archive.ics.uci.edu/ml/).

Table 1. Statistical measures of the Mashable dataset.

Articles per day

Number of articles Total days Average Standard Deviation Min Max
39,000 709 55.00 22.65 12 105

We extracted an extensive set (total of 47) features from the HTML code
in order to turn this data suitable for learning models, as shown in Table 2.
In the table, the attribute types were classiﬁed into: number – integer value;
ratio – within [0, 1]; bool – ∈ {0, 1}; and nominal. Column Type shows within
brackets (#) the number of variables related with the attribute. Similarly to
what is executed in [6,7], we performed a logarithmic transformation to scale
the unbounded numeric features (e.g., number of words in article), while the
nominal attributes were transformed with the common 1-of-C encoding.
538 K. Fernandes et al.

We selected a large list of characteristics that describe diﬀerent aspects of

the article and that were considered possibly relevant to influence the number
of shares. Some of the features are dependent of particularities of the Mashable
service: articles often reference other articles published in the same service; and
articles have meta-data, such as keywords, data channel type and total number
of shares (when considering Facebook, Twitter, Google+, LinkedIn, Stumble-
Upon and Pinterest). Thus, we extracted the minimum, average and maximum
number of shares (known before publication) of all Mashable links cited in the
article. Similarly, we rank all article keyword average shares (known before pub-
lication), in order to get the worst, average and best keywords. For each of these
keywords, we extract the minimum, average and maximum number of shares.
The data channel categories are: “lifestyle”,“bus”,“entertainment”,“socmed”,
“tech”,“viral” and “world”.
We also extracted several natural language processing features. The Latent
Dirichlet Allocation (LDA) [12] algorithm was applied to all Mashable texts
(known before publication) in order to first identify the five top relevant topics
and then measure the closeness of current article to such topics. To compute the
subjectivity and polarity sentiment analysis, we adopted the Pattern web mining
module (http://www.clips.ua.ac.be/pattern) [13], allowing the computation of
sentiment polarity and subjectivity scores.

Table 2. List of attributes by category.

Feature Type (#) Feature Type (#)
Words Keywords
Number of words in the title number (1) Number of keywords number (1)
Number of words in the article number (1) Worst keyword (min./avg./max. shares) number (3)
Average word length number (1) Average keyword (min./avg./max. shares) number (3)
Rate of non-stop words ratio (1) Best keyword (min./avg./max. shares) number (3)
Rate of unique words ratio (1) Article category (Mashable data channel) nominal (1)
Rate of unique non-stop words ratio (1) Natural Language Processing
Links Closeness to top 5 LDA topics ratio (5)
Title subjectivity ratio (1)
Number of links number (1)
Article text subjectivity score and
Number of Mashable article links number (1)
its absolute diﬀerence to 0.5 ratio (2)
Minimum, average and maximum number
Title sentiment polarity ratio (1)
of shares of Mashable links number (3)
Rate of positive and negative words ratio (2)
Digital Media Pos. words rate among non-neutral words ratio (1)
Number of images number (1) Neg. words rate among non-neutral words ratio (1)
Number of videos number (1) Polarity of positive words (min./avg./max.) ratio (3)
Time Polarity of negative words (min./avg./max.) ratio (3)
Day of the week nominal (1) Article text polarity score and
Published on a weekend? bool (1) its absolute diﬀerence to 0.5 ratio (2)

Target Type (#)

Number of article Mashable shares number (1)

2.2 Intelligent Decision Support System

Following the ABI concept, the proposed IDSS contains three main modules
(Figure 1): data extraction and processing, prediction and optimization. The
ﬁrst module executes the steps described in Section 2.1 and it is responsible
A Proactive Intelligent Decision Support System 539

for collecting the online articles and computing their respective features. The
prediction module first receives the processed data and splits it into training,
validation and test sets (data separation). Then, it tunes and fits the classifica-
tion models (model training and selection). Next, the best classification model
is stored and used to provide article success predictions (popularity estimation).
Finally, the optimization module searches for better combinations of a subset of
the current article content characteristics. During this search, there is an heavy
use of the classification model (the oracle). Also, some of the new searched fea-
ture combinations may require a recomputing of the respective features (e.g.,
average keyword minimum number of shares). In the figure, such dependency is
represented by the arrow between the feature extraction and optimization. Once
the optimization is finished, a list of article change suggestions is provided to
the user, allowing her/him to make a decision.

Data
Separation

Model
Training
and Prediction
Data Extraction and Processing Selection

URLs Article Data Feature Popularity

Retrieval Retrieval Selection Extraction Estimation

Optimization Decision

Fig. 1. Flow diagram describing the IDSS behavior.

2.3 Prediction Module

We adopted the Scikit learn [14] library for fitting the prediction models. Sim-
ilarly to what is executed in [10,4,11], we assume a binary classification task,
where an article is considered “popular” if the number of shares is higher than
a fixed decision threshold (D1 ), else it is considered “unpopular”.
In this paper, we tested five classification models: Random Forest (RF);
Adaptive Boosting (AdaBoost); SVM with a Radial Basis Function (RBF) ker-
nel; K-Nearest Neighbors (KNN) and Naı̈ve Bayes (NB). A grid search was used
to search for the best hyperparameters of: RF and AdaBoost (number of trees);
SVM (C trade-off parameter); and KNN (number of neighbors). During this grid
search, the training data was internally split into training (70%) and validation
sets (30%) by using a random holdout split. Once the best hyperparameter is
selected, then the model is fit to all training data.
540 K. Fernandes et al.

The receiver operating characteristic (ROC) curve shows the performance of

a two class classifier across the range of possible threshold (D2 ∈ [0, 1]) values,
plotting one minus the specificity (x-axis) versus the sensitivity (y-axis) [15]. In
this work, the classification methods assume a probabilistic modeling, where a
class is considered positive if its predicted probability is p > D2 . We computed
several classification metrics: Accuracy, Precision, Recall, F1 score (all using
a fixed D2 = 0.5); and the Area Under the ROC (AUC, which considers all
D2 values). The AUC metric is the most relevant metric, since it measures the
classifier’s discrimination power and it is independent of the selected D2 value
[15]. The ideal method should present an AUC of 1.0, while an AUC of 0.5
denotes a random classifier. For achieving a robust evaluation, we adopt a rolling
windows analysis [16]. Under this evaluation, a training window of W consecutive
samples is used to fit the model and then L predictions are performed. Next,
the training window is updated by replacing the L oldest samples with L more
recent ones, in order to fit a new model and perform a new set of L predictions,
and so on.

2.4 Optimization
Local search optimizes a goal by searching within the neighborhood of an initial
solution. This type of search suits our IDSS optimization module, since it receives
an article (the initial solution) and then tries to increase its predicted popularity
probability by searching for possible article changes (within the neighborhood of
the initial solution). An example of a simple local search method is the hill climb-
ing, which iteratively searches within the neighborhood of the current solution
and updates such solution when a better one is found, until a local optimum
is reached or the method is stopped. In this paper, we used a stochastic hill
climbing [2], which works as the pure hill climbing except that worst solutions
can be selected with a probability of P . We tested several values of P , ranging
from P = 0 (hill climbing) to P = 1 (Monte-Carlo random search).
For evaluating the quality of the solutions, the local search maximizes the
probability for the “popular” class, as provided by the best classification model.
Moreover, the search is only performed over a subset of features that are more
suitable to be changed by the author (adaptation of content or change in day of
publication), as detailed in Table 3. In each iteration, the neighborhood search
space assumes small perturbations (increase or decrease) in the feature original
values. For instance, if the current number of words in the title is n = 5, then a
search is executed for a shorter (n = 4) or longer (n = 6) title. Since the day of
the week was represented as a nominal variable, a random selection for a different
day is assumed in the perturbation. Similarly, given that the set of keywords (K)
is not numeric, a different perturbation strategy is proposed. For a particular
article, we compute a list of suggested keywords K that includes words that
appear more than once in the text and that were used as keywords in previous
articles. To keep the problem computationally tractable, we only considered the
best five keywords in terms of their previous average shares. Then, we generate
perturbations by adding one of the suggested keywords or by removing one
A Proactive Intelligent Decision Support System 541

of the original keywords. The average performance when optimizing N articles

(i.e., N local searches), is evaluated using the Mean Gain (MG) and Conversion
Rate (CR):
1 N
MG = (Q − Qi )
N i=1 i
(1)
CR = U /U
where Qi denotes the quality (estimated popularity probability) for the original
article (i), Qi is the quality obtained using the local search, U is the number of
unpopular articles (estimated probabilitity ≤ D2 , for all N original articles) and
U is the number of converted articles (original estimated probability was ≤ D2
but after optimization changed to > D2 ).

Table 3. Optimizable Features.

Feature Perturbations
Number of words in the title (n) n ∈ {n − 1, n + 1}, n ≥ 0 ∧ n = n
Number of words in the content (n) n ∈ {n − 1, n + 1}, n ≥ 0 ∧ n = n
Number of images (n) n ∈ {n − 1, n + 1}, n ≥ 0 ∧ n = n
Number of videos (n) n ∈ {n − 1, n + 1}, n ≥ 0 ∧ n = n
Day of week (w) w ∈ [0..7), w = w
Keywords (K) k ∈ {K ∪ i} ∪ {K − j}, i ∈ K ∧ j ∈ K

3 Experiments and Results

3.1 Prediction

For the prediction experiments, we adopted the rolling windows scheme with
a training window size of W = 10, 000 and performing L = 1, 000 predic-
tions at each iteration. Under this setup, each classification model is trained
29 times (iterations), producing 29 prediction sets (each of size L). For defining
a popular class, we used a fixed value of D1 = 1, 400 shares, which resulted
in a balanced “popular”/“unpopular” class distribution in the first training set
(first 10, 000 articles). The selected grid search ranges for the hyperparameters
were: RF and AdaBoost – number of trees ∈ {10, 20, 50, 100, 200, 400}; SVM –
C ∈ {20 , 21 , ..., 26 }; and KNN – number of neighbors ∈ {1, 3, 5, 10, 20}.
Table 4 shows the obtained classification metrics, as computed over the union
of all 29 test sets. In the table, the models were ranked according to their per-
formance in terms of the AUC metric. The left of Figure 2 plots the ROC curves
of the best (RF), worst (NB) and baseline (diagonal line, corresponds to ran-
dom predictions) models. The plot confirms the RF superiority over the NB
model for all D2 thresholds, including more sensitive (x-axis values near zero,
D2 >> 0.5) or specific (x-axis near one, D2 << 0.5) trade-offs. For the best
model (RF), the right panel of Figure 2 shows the evolution of the AUC metric
542 K. Fernandes et al.

over the rolling windows iterations, revealing an interesting steady predictive

performance over time. The best obtained result (AUC=0.73) is 23 percentage
points higher than the random classiﬁer. While not perfect, an interesting dis-
crimination level, higher than 70%, was achieved.

Table 4. Comparison of models for the rolling window evaluation (best values in bold).

Model Accuracy Precision Recall F1 AUC

Random Forest (RF) 0.67 0.67 0.71 0.69 0.73
Adaptive Boosting (AdaBoost) 0.66 0.68 0.67 0.67 0.72
Support Vector Machine (SVM) 0.66 0.67 0.68 0.68 0.71
K-Nearest Neighbors (KNN) 0.62 0.66 0.55 0.60 0.67
Naı̈ve Bayes (NB) 0.62 0.68 0.49 0.57 0.65

Fig. 2. ROC curves (left) and AUC metric distribution over time for RF (right).

Table 5 shows the relative importance (column Rank shows ratio values,
# denotes the ranking of the feature), as measured by the RF algorithm when
trained with all data (39,000 articles). Due to space limitations, the table shows
the best 15 features and also the features that are used by the optimization
module. The keyword related features have a stronger importance, followed by
LDA based features and shares of Mashable links. In particular, the features
that are optimized in the next section (with keywords subset) have a strong
importance (33%) in the RF model.

3.2 Optimization
For the optimization experiments, we used the best classiﬁcation model (RF),
as trained during the last iteration of the rolling windows scheme. Then, we
selected all articles from the last test set (N = 1, 000) to evaluate the local
search methods. We tested six stochastic hill climbing probabilities (P ∈
{0.0, 0.2, 0.4, 0.6, 0.8, 1.0}). We also tested two feature optimization subsets
A Proactive Intelligent Decision Support System 543

Table 5. Ranking of features according to their importance in the RF model.

Feature Rank (#) Feature Rank (#)

Avg. keyword (avg. shares) 0.0456 (1) Closeness to top 1 LDA topic 0.0287 (11)
Avg. keyword (max. shares) 0.0389 (2) Rate of unique non-stop words 0.0274 (12)
Closeness to top 3 LDA topic 0.0323 (3) Article text subjectivity 0.0271 (13)
Article category (Mashable data channel) 0.0304 (4) Rate of unique tokens words 0.0271 (14)
Min. shares of Mashable links 0.0297 (5) Average token length 0.0271 (15)
Best keyword (avg. shares) 0.0294 (6) Number of words 0.0263 (16)
Avg. shares of Mashable links 0.0294 (7) Day of the week 0.0260 (18)
Closeness to top 2 LDA topic 0.0293 (8) Number of words in the title 0.0161 (31)
Worst keyword (avg. shares) 0.0292 (9) Number of images 0.0142 (34)
Closeness to top 5 LDA topic 0.0288 (10) Number of videos 0.0082 (44)

related with Table 3: using all features except the keywords (without keywords)
and using all features (with keywords). Each local search is stopped after 100 iter-
ations. During the search, we store the best results associated with the iterations
I ∈ {0, 1, 2, 4, 8, 10, 20, 40, 60, 80, 100}.
Figure 3 shows the final optimization performance (after 100 iterations) for
variations of the stochastic probability parameter P and when considering the
two feature perturbation subsets. The convergence of the local search (for differ-
ent values of P ) is also shown in Figure 3. The extreme values of P (0 – pure hill
climbing; 1 – random search) produce lower performances when compared with
their neighbor values. In particular, Figure 4 shows that the pure hill climbing
is too greedy, performing a fast initial convergence that quickly gets flat. When
using the without keywords subset, the best value of P is 0.2 for MG and 0.4
for CR metric. For the with keywords subset, the best value of P is 0.8 for both
optimization metrics. Furthermore, the inclusion of keywords-related suggestions
produces a substantial impact in the optimization, increasing the performance
in both metrics. For instance, the MG metric increases from 0.05 to 0.16 in the
best case (P = 0.8). Moreover, Figure 3 shows that the without keywords subset
optimization is an easier task when compared with the with keywords search.
As argued by Zhang and Dimitroff [17], metadata can play an important role
on webpage visibility and this might explain the importance of the keywords in
terms of its influence when predicting (Table 5) and when optimizing popularity
(Figure 3).
For demonstration purposes, Figure 5 shows an example of the interface
of the implemented IDSS prototype. A more recent article (from January 16
2015) was selected for this demonstration. The IDSS, in this case using the
without keywords subset, estimated an increase in the popularity probabil-
ity of 13 percentage points if several changes are executed, such as decreas-
ing the number of title words from 11 to 10. In another example (not shown in
the figure), using the with keywords subset, the IDSS advised a change from
the keywords K ∈ {“television”, “showtime”, “uncategorized”, “entertainment”,
“film”, “homeland”, “recaps”} to the set K ∈ {“film”,“relationship”,“family”,
and “night”} for an article about the end of the “Homeland” TV show.
544 K. Fernandes et al.

Fig. 3. Stochastic probability (P ) impact on the Mean Gain (left) and in the Conver-
sion Rate (right).

Fig. 4. Convergence of the local search under the without keywords (left) and with
keywords (right) feature subsets (y-axis denotes the Mean Gain and x-axis the number
of iterations).

Fig. 5. Example of the interface of the IDSS prototype.

A Proactive Intelligent Decision Support System 545

4 Conclusions
With the expansion of the Web, there is a growing interest in predicting online
news popularity. In this work, we propose an Intelligent Decision Support System
(IDSS) that first extracts a broad set of features that are known prior to an article
publication, in order to predict its future popularity, under a binary classification
task. Then, it optimizes a subset of the article features (that are more suitable
to be changed by the author), in order to enhance its expected popularity.
Using large and recent dataset, with 39,000 articles collected during a 2 year
period from the popular Mashable news service, we performed a rolling win-
dows evaluation, testing five state of the art classification models under distinct
metrics. Overall, the best result was achieved by a Random Forest (RF), with
an overall area under the Receiver Operating Characteristic (ROC) curve of
73%, which corresponds to an acceptable discrimination. We also analyzed the
importance of the RF inputs, revealing the keyword based features as one of the
most important, followed by Natural Language Processing features and previ-
ous shares of Mashable links. Using the best prediction model as an oracle, we
explored several stochastic hill climbing search variants aiming at the increase
in the estimated article probability when changing two subsets of the article
features (e.g., number of words in title). When optimizing 1,000 articles (from
the last rolling windows test set), we achieved 15 percentage points in terms of
the mean gain for the best local search setup. Considering the obtained results,
we believe that the proposed IDSS is quite valuable for Mashable authors.
In future work, we intend to explore more advanced features related to con-
tent, such as trends analysis. Also, we plan to perform tracking of articles over
time, allowing the usage of more sophisticated forecasting approaches.

Acknowledgements. This work has been supported by FCT - Fundação para a

Ciência e Tecnologia within the Project Scope UID/CEC/00319/2013. The authors
would like to thank Pedro Sernadela for his contributions in previous work.

References
1. Arnott, D., Pervan, G.: Eight key issues for the decision support systems discipline.
Decision Support Systems 44(3), 657–672 (2008)
2. Michalewicz, Z., Schmidt, M., Michalewicz, M., Chiriac, C.: Adaptive business
intelligence. Springer (2006)
3. Ahmed, M., Spagna, S., Huici, F., Niccolini, S.: A peek into the future: predicting
the evolution of popularity in user generated content. In: Proceedings of the sixth
ACM international conference on Web search and data mining, pp. 607–616. ACM
(2013)
4. Bandari, R., Asur, S., Huberman, B.A.: The pulse of news in social media: fore-
casting popularity. In: ICWSM (2012)
5. Kaltenbrunner, A., Gomez, V., Lopez, V.: Description and prediction of slashdot
activity. In: Web Conference, LA-WEB 2007, pp. 57–66. IEEE, Latin American
(2007)
546 K. Fernandes et al.

6. Szabo, G., Huberman, B.A.: Predicting the popularity of online content. Commu-
nications of the ACM 53(8), 80–88 (2010)
7. Tatar, A., Antoniadis, P., De Amorim, M.D., Fdida, S.: From popularity prediction
to ranking online news. Social Network Analysis and Mining 4(1), 1–12 (2014)
8. Tatar, A., de Amorim, M.D., Fdida, S., Antoniadis, P.: A survey on predicting the
popularity of web content. Journal of Internet Services and Applications 5(1), 1–20
(2014)
9. Lee, J.G., Moon, S., Salamatian, K.: Modeling and predicting the popularity of
online contents with cox proportional hazard regression model. Neurocomputing
76(1), 134–145 (2012)
10. Petrovic, S., Osborne, M., Lavrenko, V.: RT to win! predicting message propagation
in twitter. In: Fifth International AAAI Conference on Weblogs and Social Media
(ICWSM), pp. 586–589 (2011)
11. Hensinger, E., Flaounas, I., Cristianini, N.: Modelling and predicting news
popularity. Pattern Analysis and Applications 16(4), 623–635 (2013)
12. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine
Learning Research 3, 993–1022 (2003)
13. De Smedt, T., Nijs, L., Daelemans, W.: Creative web services with pattern. In:
Proceedings of the Fifth International Conference on Computational Creativity
(2014)
14. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine
learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)
15. Fawcett, T.: An introduction to roc analysis. Pattern Recognition Letters 27(8),
861–874 (2006)
16. Tashman, L.J.: Out-of-sample tests of forecasting accuracy: an analysis and review.
International Journal of Forecasting 16(4), 437–450 (2000)
17. Zhang, J., Dimitroﬀ, A.: The impact of metadata implementation on webpage
visibility in search engine results (part ii). Information Processing & Management
41(3), 691–715 (2005)
Periodic Episode Discovery Over Event Streams

Julie Soulas1,2(B) and Philippe Lenca1,2

1
UMR 6285 Lab-STICC, Institut Mines-Telecom, Telecom Bretagne,
Technopôle Brest Iroise CS 83818, 29238 Brest Cedex 3, France
{julie.soulas,philippe.lenca}@telecom-bretagne.eu
2
Université Européenne de Bretagne, Rennes, France

Abstract. Periodic behaviors are an important component of the life of

most living species. Daily, weekly, or even yearly patterns are observed
in both human and animal behaviors. These behaviors are searched as
frequent periodic episodes in event streams. We propose an eﬃcient algo-
rithm for the discovery of frequent and periodic episodes. Update proce-
dures allow us to take into account that behaviors also change with time,
or because of external factors. The interest of our approach is illustrated
on two real datasets.

1 Introduction

The discovery of patterns based on the temporality in their occurrences is of great

interest in a wide range of applications, such as social interactions analysis [9],
biological sustainability studies [10], elderly people monitoring [15], mobility
data analysis [2], etc. The rhythm of the patterns appearances is studied in order
to determine whether the patterns occur regularly (the time gaps between the
occurrences are bounded [1,16]), periodically (some occurrences form repeating
cycles of time intervals), or mostly in a speciﬁc time interval.
Periodicity highlights habits. For example, Li et al. [10] studied the travel
behaviors of animals, building rules such as: “From 6 pm to 6 am, it has 90%
probability staying at location A”. Soulas et al. [15] studied the living habits of
elderly people, and discovered periodic episodes as such “The user has 70% prob-
ability having breakfast around 9:22 ± 40 min”. The discovery of such behaviors
enhances understanding of the needs of the living beings, and the factors govern-
ing their behavior. It is also useful to detect anomalies in routines due to major
events such as environmental change [10] or the onset of disorders [15].
The discovery of periodic behaviors and their evolution involves three sub-
problems: (i) the detection of the periods (daily, weekly, etc), (ii) the discovery
of periodic behaviors and (iii) the update of the periods and patterns when new
data arrives. We here focus on sub-problems (ii) and (iii). Period determina-
tion has already been extensively studied [2,10]. Moreover, the experts in the
target applications domains usually have background knowledge on the interest-
ing periods, or require particular periods to be investigated: e.g., the physicians
monitoring elderly people express the need to study daily and weekly behaviors.

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 547–559, 2015.
DOI: 10.1007/978-3-319-23485-4 54
548 J. Soulas and P. Lenca

The main contributions of the paper are: a new frequent parallel episode min-
ing algorithm on data streams; and a heuristic for the online estimation of the
periodicity of the episodes. The rest of the paper is organized as follows: section 2
presents some prominent related work. Section 3 details the proposition for fre-
quent periodic pattern mining and updating. Experiments (section 4) on two
real datasets illustrate the interest of this approach. Finally, some conclusions
are drawn, and ideas for future work are presented.

2 Related Work
Frequent episode mining has attracted a lot of attention since its introduction
by Mannila et al. [12]. The algorithms (e.g. [11–13,18]) differ from one another
by their target episodes (sequential or parallel), their search strategies (breadth
or depth first search), the considered occurrences (contiguous, minimal, overlap-
ping, etc.), and the way they count support. However, most algorithms consider
only static data. The formalism used in this paper (see section 3.1) is loosely
inspired from the formalism used in [11] and [18].
With the rapidly increasing amount of data recording devices (network traffic
monitoring, smart houses, sensor networks,...), stream data mining has gained
major attention. This evolution led to paradigm shifts. For an extensive problem
statement and review of the current trends, see [6]. In particular, item set [3,17]
and episode [11,13,14] mining in streams have been investigated. The application
context of [14] is close to the behavior we are searching for: the focus is set on the
extraction of human activities from home automation sensor streams. However,
periodicity is not taken into account.
Due to their powerful descriptive and predictive capabilities, periodic pat-
terns are studied in several domains. For instance, Kiran and Reddy [8] discover
frequent and periodic patterns in transactional databases. Periodicity is also
defined and used with event sequences, for example with the study of parallel
episodes in home automation sensor data for the monitoring of elderly peo-
ple [7,15]. These three periodic pattern mining algorithms process only static
data.
To the best of our knowledge, few studies have focused on mining both fre-
quent and periodic episodes over data streams. One can however point out some
rather close studies: Li et al. [10] and Baratchi et al. [2] both use geo-spatial data
in order to detect areas of interest for an individual (respectively eagles and peo-
ple) and periodic movement patterns. They both also determine the period of the
discovered patterns. However, their periodicity descriptions are based on single
events, not episodes.

3 Frequent Periodic Pattern Discovery and Update Over

Data Streams
3.1 Problem Statement
Behavioral patterns are searched in the form of episodes (deﬁnition 2) in an event
(deﬁnition 1) sequence, which is processed using the classical sliding window
Periodic Episode Discovery Over Event Streams 549

framework (length of the window: TW ). Indeed, recent behaviors are observable

in recent events.
Definition 1 (Event). An event is a (e, t) pair, where e is the event label,
taking values in a finite alphabet A; and t is the timestamp.
Definition 2 (Episode, episode length). An episode E is a set of n event
labels {e1 , ..., en } taken from the alphabet A. The length of episode E is n.
Definition 3 (Episode occurrence, occurrence duration). An episode
E = {e1 , ..., en } occurs if there are n events whose labels match the n items
in E. Formally, there is an occurrence o of E at time t1 if there exists a
permutation σ on (1, ...n) and n timestamps t1 ≤ ... ≤ tn , such that o =
(eσ(1) , t1 ), ..., (eσ(n) , tn ) is a subsequence of the event stream. The duration
of o is δto = tn − t1 .
The label order in the occurrence is not taken into account: it corresponds
to the episodes referred to as parallel in the problem definition of episode min-
ing [12]. The events making the occurrence may be interleaved with other events.
The events occurring in the vicinity of each other are more likely linked to
a same behavior than distant events. A stricter constraint Tep on the maximal
episode duration can thus be set (TW is used otherwise). Tep exploits expert or
statistical knowledge regarding the expected behavior durations. It also serves
as a heuristic for the reduction of the search space (see section 3.2).
Definition 4 (Minimal occurrence - MO). Let E = {e1 , ...en } be an
episode, occurring on o = (eσ(1) , t1 ), ...(eσ(n) , tn ). o is a minimal occurrence
if there is no other, shorter occurrence that occurs within the time span of o.
That is to say, ¬∃o = (eσ (1) , t1 ), ...(eσ (n) , tn ) such that t1 ≤ t1 , tn ≤ tn and
tn − t1 < tn − t1 .
Definition 5 (Time queue - TQ). The time queue of an episode E (noted
T QE ) is the list containing the distinct pairs of beginning and end timestamps
of its minimal occurrences.
We consider here only the minimal occurrences. The support of an episode
is the length of its time queue. An episode is frequent if its support is greater
than a minimal support threshold Smin . Minimal occurrences have convenient
properties for the mining of frequent episodes, namely:
– An episode E has at most one time queue entry that starts (respectively
finishes) at a given timestamp t (observation 1);
– Let E be an episode, and E a subepisode (subset) of E. For every entry in
the TQ of E, there is at least one entry in the TQ of E (observation 2);
– As a consequence, the support of E is greater or equal to this of E: the
support verifies the downward closure property (observation 3);
– A new event (e, t) can be part of an occurrence of episode E = {e}∪E (where
e ∈ E) if the latest entry in the TQ of E started less than Tep (or TW ) ago
(observation 4a). This occurrence is minimal if the beginning of the latest
entry in T QE starts strictly after the latest entry in T QE (observation
4b). This gives a particular importance to the recently observed episodes.
550 J. Soulas and P. Lenca

50 55 60
t
... a b c a c c b a c d b

Fig. 1. Example: a segment of an event stream

Fig. 2. Example of an histogram representing the observed occurrence times for some
daily habits of an elderly person living in a smart home

Example. Figure 1 presents an example of a event stream segment. The cur-

rent window contains 11 events (a, 50), (b, 51), etc. The last seen event is (b, 60),
and the labels take values in the alphabet A = {a, b, c, d}. With Tep = 3, episode
{a, c} occurs on (a, 50), (c, 52), (c, 52), (a, 53), (a, 53), (c, 54), (a, 53), (c, 55)
etc: (a, 53), (c, 54) is minimal, but (a, 53), (c, 55) is not. The time queue for
episode {a, b, c} is [(50, 52), (51, 53), (53, 56), (55, 57), (56, 58), (57, 60)], and
its support is 6. (54, 57) is not in the TQ because it does not correspond to a
MO. A one-to-one mapping between a time queue entry and a MO is not guar-
antied: the time queue entry (53, 56) of episode {a, b, c}, corresponds to two MO:
(a, 53), (c, 54), (b, 56) and (a, 53), (c, 55), (b, 56).

Periodicity. Humans and animals tend to follow routines [10,15]. A typical

example of periodic behavior is the daily occurrences of some human activities
of daily living. Figure 2 presents the occurrence times of three such activities,
recorded in a smart home over a six-month period (CASAS dataset, presented in
section 4.1). It highlights some of the characteristics these activities may have:
– The occurrence times vary from one day to the next, and this variability is
user- and activity-dependent: here go to bed is less variable than wake up,
– Activities may have several components. Here, there seems to be two meals
at home a day: a breakfast around 8:00 and a dinner around 18:00,
– Each component has its own preferential occurrence time (mean μ) and
variability (standard deviation σ),
– Some occurrences do not follow the periodic patterns.
This leads us to describe the periodicity of an episode as a distribution of
its relative occurrence times within the period of interest (e.g 1 day, 1 week).
This is done thanks to Gaussian Mixture Models (GMM), since they take into
account the aforementioned characteristics.
Periodic Episode Discovery Over Event Streams 551

Root Legend:
a d ls episode support
b c
(50, 50), (51, 51),
(52, 52), {·} frequent episode
(54, 54), that was recently
(53, 53),
(57, 57)
{a} (56, 56),
(60, 60)
{b} (55, 55), {c} {d} observed
(58, 58)
s = 3 s = 3 (59, 59)
s = 4
b c a c a b s = 1
Update with (b, 60):
(60, 60) the TQ elements
(50, 51), (50, 52),
(51, 52), in bold face cor-
(51, 53), (52, 53),
(55, 56),
(53, 56), (53, 54),
{a, c}
respond to newly
(56, 57),
{a, b} (55, 57),
{b, c} (56, 58),
(58, 60) added elements
(57, 60) (57, 58)
s = 4
s = 5 s = 5
c b a {·} node created with

(50, 52), (51, 53), the arrival of (b, 60)

(53, 56), (55, 57),
{a, b, c} (56, 58), (57, 60)
s = 6

Fig. 3. Lattice corresponding to the example stream ﬁgure 1, when (b, 60) is the last
seen event. The update when event (b, 60) arrives is highlighted in boldface.

For a periodicity model {period T , n components (μ1 , σ1 ), ...(μn , σn )}, an

occurrence is expected to occur in every time interval k · T + μi (±σi ), such that
1 ≤ i ≤ n and for every integer k ≥ 0 such that k·T +μi is in the current window.
The quality of a periodicity description is evaluated on its accuracy, that is to
say the proportion of the expected occurrences that were actually observed.

3.2 Frequent Episode Discovery and Updating

In order to be habits, episodes need to be frequent. However, the periodic
episodes are not necessarily the most frequent ones, which is why the support
threshold should remain rather low. We propose to handle the task of frequent
episode mining over an event stream thanks to a frequent episode lattice (FEL).

Episode Lattice. The frequent episodes and their time queues are stored in
a frequent episode lattice (FEL). The nodes in the FEL correspond either to
length-1 episodes, or to frequent episodes. Length-1 episodes are kept even if they
are not frequent (yet) in order to build longer episodes when they do. The parents
of a node (located at depth d) correspond to its sub-episodes of length d − 1, and
its children to its super-episodes of length d + 1. The edges linking two episodes
are indexed on the only event label that is present in the child episode but
not in the parent. Each node retains the TQ of the corresponding episode and
the GMM description that best fits the episode (see section 3.3). The episode
lattice corresponding to the example in figure 1 is given in figure 3. In spite of
its possibly big edge count, the lattice structure was chosen over the standard
prefix tree because it allows faster episode retrieval and update.
552 J. Soulas and P. Lenca

Algorithm 1. Computation of E = E1 ∪ E2 ’s TQ from the TQ of E1 and E2

Input: T Q1 (resp. T Q2 the time queue of E1 (resp. E2 ), indexed on i (resp. j)
1: i ← 0; j ← 0; T Q ← [ ]; support s ← 0
2: while i < |T Q1 | and j < |T Q2 | do
3: if T Q1 [i] ﬁnishes after T Q2 [j] then
4: Increment j as long as T Q2 [j] ends before T Q1 [i]
5: else
6: Increment i as long as T Q1 [i] ends before T Q2 [j]
7: start ← min(T Q1 [i][0], T Q2 [j][0]
8: end ← min(T Q1 [i][1], T Q2 [j][1]
9: if end − start < Tep then
10: Add (start, end) to T Q; s ← s + 1 /* New minimal occurrence */
11: Increment the index of the TQ whose current element started earlier (both if
T Q1 [i][0] == T Q2 [j][0])
12: return T Q, s

Update with a New Event. We keep track of the recently modiﬁed nodes
(RMN, the nodes describing an episode that occurred recently, i.e. less than Tep ,
or TW , ago). Indeed (see observations 4a and 4b), the recent occurrences of these
episodes can be extended with new, incoming events to form longer episodes. The
RMN are stored in a collection of lists (nodes at depth 1, depth 2, etc). The TQ
of a newly frequent length-n episode is computed thanks to the time queues of a
length-(n-1) sub-episode and the length-1 episode containing the missing item,
using algorithm 1.
When a new event (e, t) arrives, it can be a new occurrence (and also a MO) of
the length-1 episode {e}. It can also form a new MO of an episode E = E ∪ {e},
where E is a recently observed episode. The lattice update follows these steps:
1. If label e is new: create a node for episode {e} and link it to the FEL root;
2. Update the time queue of episode {e};
3. If {e} is frequent:
(a) Add it to the RMN list;
(b) For each node NE in the RMN list, try to build a new occurrence of
E ∪ {e}, following algorithm 2, which takes advantage of observations
1–4. If an episode E = E ∪ {e} becomes frequent, a new node NE is
created, and is linked to its parents in the lattice. The parents are the
nodes describing the episodes E \{e } for each e ∈ E, and are accessible
via NE .parent(e ).child(e), where NE is the node for the known subset
E. Since the RMN list is layered, and explored by increasing node depth,
NE .parent(e ).child(e) is always created before NE tries to access it.
The update process of the FEL is illustrated with the arrival of a new event
(b, 60) in ﬁgure 3. (b, 60) makes {b} a frequent episode. The nodes the RMN
list ({a}, {c} and {a, c}) are candidate for the extension with the (frequent)
new event. This allows the investigation of episodes {a, b} (extension of {a}),
{b, c} (extension of {c}), and {a, b, c} (extension of {a, c}), which indeed become
frequent.
Periodic Episode Discovery Over Event Streams 553

Algorithm 2. RMN-based update when a new event (e, t) arrives

Input: new event (e, t); recently modiﬁed node NE , characterizing episode E
1: if e ∈ E then
2: pass /* E cannot be extended with label e: E already contains it */
3: else
4: if E.lastM O starts before t − Tep then
5: Remove NE from RMN list: the last MO of E is too old to be extended
6: else
7: if NE has a child NE on label e then /* E is already frequent */

8: if E.lastM O starts strictly after E .lastM O then /* New MO */
9: Add new entry to NE .T Q; Add NE .T Q to the RMN list
10: else /* There is already a MO for E starting in E.lastM O.start:
there cannot be another one */
11: pass
12: else /* E may become frequent */
13: T Q, S ← algorithm1(T QE , T QE )
14: if S ≥ Smin then
15: Create node NE for E . Link it to its parents.
16: Add NE to the RMN list
17: return /* The FEL is updated with the information from event (e, t) */

Removal of Outdated Information. Events older than TW are outdated, and

their inﬂuence in the FEL needs to be removed. The TQ construction makes it
so that its entries are ordered by start timestamp: the entries that need to be
removed are thus at the beginning of the nodes TQ. Moreover, according to
observation 2, for every TQ entry (and thus every outdated entry) there is at
least one (outdated) entry in the TQ of one of the parent nodes. The FEL can
thus be traversed from the root using a breadth-ﬁrst search algorithm, where
nodes are investigated and updated only if at least one of their parents presents
outdated occurrences. Episodes becoming rare are removed from the FEL.

3.3 Periodicity Discovery

The periodicity of an episode is described thanks to a GMM. Each node in the
FEL is associated with a GMM describing the periodicity of the episode, which
is updated when new MO are observed or occurrences removed. Usually, a GMM
is trained with the Expectation-Maximization algorithm [5] (EM): for each com-
ponent of the GMM (the number of components being a user-given parameter),
and each data point x, the probability that x was generated by the component is
computed. The components characteristics (mean, standard deviation) are then
tweaked to maximize the likelihood of the data point/component attribution.
But streaming data may be non-stationary, the number of components may
evolve, as well as their characteristics. It is not acceptable to ask the user for
the number of components, especially since the suitable number depends on the
considered episode. We here extend EM with heuristics for the addition, removal
and merging of components.
554 J. Soulas and P. Lenca

Algorithm 3. Overview of the periodicity update (comp=GMM component)

Iteration of EM
New Comp. yes Distribution Empty no Close no New no
occ. match? update comp.? comps.? occ.?

no yes yes

Create a Remove Merge yes

component component components

Algorithm 3 presents the general workﬂow for the periodicity update. When
a new MO is detected for the episode, the position of the timestamp in the
period tr = timestamp modulo period is computed. If tr does not match any
of the existing components, i.e. for each component (μ, σ), |tr − μ| > σ, a new
component is added. When outdated data is removed, some components lose
their importance. When too rare, they are removed from the GMM. Finally, when
two components (μ1 , σ1 ), (μ2 , σ2 ) become close to one another, i.e if |μ1 − μ2 | <
a ∗ (σ1 + σ2 ) (with a = 1.5 in the experiments), the two components are merged.
In the general case, GMM updates do not change much the model. Thus, when
the number of components does not change, a single EM iteration is necessary
to update the characteristics of the components.
The interest of this approach was evaluated on synthetic data, following know
mixture of gaussian models evolving with time. The heuristics allow the detection
of the main trends in the data: emergence of new components, disparition of old
and rare components, shiftings in characteristics of the components.

4 Experimentation
A prototype was implemented in Python. It was also instrumented to record the
episodes and lattice updates. The instrumentation slows down the experimenta-
tions: the execution times given in the next subsections are over-estimated.

4.1 Ambient Assisted Living Dataset

The CASAS project [4] uses home automation devices to improve ageing at
home. Over the years, they collected and published several datasets. We present
here our experimentations on the Aruba dataset1 . The house of an elderly woman
was equipped with motion detectors and temperature sensors. The obtained
information was annotated with activities (11 labels, such as Sleeping, House-
keeping, etc.; 22 when dissociating the begin and end timestamp of each activity).
These annotations are our events (12 953 events, from Nov. 2010 to Jun. 2011).
1
http://wsucasas.wordpress.com/datasets/, number 17, consulted on Dec 4th , 2014.
Periodic Episode Discovery Over Event Streams 555

1600 150 80 s
frequent
1200 periodic 60 s
100
800 40 s
50
400 20 s
0 0 0s
Dec Feb Apr Jun Dec Feb Apr Jun Dec Feb Apr Jun
Time Time Time

(a) Event count in the win- (b) Interesting episode (c) Cumulative execution
dow count time

Fig. 4. Execution log for the CASAS Aruba annotation dataset

The dataset was processed using a period of one day, a window TW of 3 weeks,
a minimal support Smin of 15, a maximal episode duration Tep of 30 minutes,
and an accuracy threshold of 70%. The parameter setting was reinforced by a
descriptive analysis of the data (e.g., it showed that most activities last less
than 30 minutes). The results obtained throughout the course of the execution
are given in figure 4. During the first 3 weeks, the sliding window fills with the
incoming events, and the first frequent and periodic episodes appear. Then, the
number of events in the window remains quite stable, but the behaviors keep
evolving. The execution time (figure 4c) shows the scalability of the approach
for this kind of application. The contents of the FEL in the last window is
investigated, some of periodic episodes with the highest accuracy A are:
– {Sleeping end}: 50 MO, 1 component, μ = 6:00, σ = 2 hours, A = 100%
– {Sleeping end, Meal Preparation begin, Meal Preparation end, Relax begin}:
26 MO, 1 component, μ =6:00, σ =1.45 hours, A = 82%
– {Enter Home begin, Enter Home end}: 61 MO, 1 component, μ =14:00, σ =3
hours, A = 88%
These patterns can be interpreted as habits: the person woke up every morn-
ing around 6:00, and also had breakfast in 82% of the mornings. The third
episode describes a movement pattern: the inhabitant usually goes out of home
at some time (it is another episode), and comes back in the early afternoon.
Figure 5 presents the influence of the minimal support Smin , maximal episode
duration Tep , and window length TW on the maximal size of the FEL and the
execution time. In particular, it shows that the execution time is reasonable and
scalable. The duration of the episodes also has a large impact on the size of the
FEL.

4.2 Travian Game Dataset

Travian is a web-browser game, where players, organized into alliances, ﬁght
for the fulﬁlling of objectives and the control of territories. The game company
releases each day a snapshot of the server status: it contains information on the
players (villages, alliance membership). These daily updates were collected for
the 2014 fr5 game round, from July 8th to November 23rd . We focus here on the
556 J. Soulas and P. Lenca
Max. episode count

6000

Time (in minutes)

1.5
5000 frequent episodes 1.45 Total execution time
periodic episodes 1.4
4000
3000 1.35
1.3
2000 1.25
1000 1.2
0 1.15
05 10 15 20 25 30 0 5 10 15 20 25 30
Minimal support Minimal support
(a) Minimal support (b) Minimal support

Time (in minutes)

30
Max. episode count

120000 Total execution time

100000 frequent episodes 25
periodic episodes 20
80000
60000 15
40000 10
20000 5
0 0
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
Maximal episode duration (hours) Maximal episode duration (hours)
(c) Maximal episode duration (d) Maximal episode duration
Max. episode count

Time (in minutes)

500 10
450 frequent episodes 9 Total execution time
400 periodic episodes 8
350 7
300 6
250 5
200 4
150 3
100 2
50 1
0 0
0 5 10 15 20 25 0 5 10 15 20 25
Window duration (weeks) Window duration (weeks)
(e) Window duration (f) Window duration

Fig. 5. Inﬂuence of the algorithm conﬁguration on the frequent and periodic episode
counts, and on the execution time for the CASAS Aruba dataset

15000 400 150 s

frequent
300 periodic
10000 100 s
200
5000 50 s
100
0 0 0s
Aug Sep Oct Nov Aug Sep Oct Nov Aug Sep Oct Nov
Time Time Time

(a) Event count in the win- (b) Interesting episode (c) Cumulative execution
dow count time

Fig. 6. Execution log for the Travian fr5 alliance membership dataset
Periodic Episode Discovery Over Event Streams 557

players alliance shifts: the event labels look like “Player P [joined|left] alliance
A”. 27674 such events are recorded, but most labels are rare (25985 labels).
The dataset was processed with a period of one week, a window TW = six
weeks, a minimal support Smin = 5 and a maximal episode duration Tep = 1 day.
Figure 6 presents the evolution of the window size, episode counts, and execution
time during the mining. The results are fairly different from those of the home
automation dataset, but were explained by a player (picturing a domain expert).
During the first 6 weeks, the window fills rapidly with events: new players register
onto the game, and the diplomacy begins. The players join or switch alliances.
After the 6 weeks, the event count in the window decreases with time. Several
explanations: (i) the opening of a new game round (on August 22nd ) slowed down
the number of new player registrations (players tend to join the most recent game
round); (ii) most players have found an alliance they like: they stop changing
alliances. Until October, little frequent and periodic patterns are detected, but
their number increases rapidly after that. The periodic episodes discovered in
the Sep, 18th – Oct, 30th (maximal count of periodic episodes) contain notably:

– {1SixCentDix8 left Vtrans, 1SixCentDix8 joined iChiefs}: 8 MO, 2 compo-

nents, μ1 = Fri. 0:00, μ2 = Mon. 0:00, σ1 = σ2 = 0, A = 80%
– {1SixCentDix8 left iChiefs, 1SixCentDix8 joined Vtrans}: 8 MO, 2 compo-
nents, μ1 = Sat. 0:00, μ2 =Tue. 0:00, σ1 = σ2 = 0, A = 80%
– {Jill left Bakka, Jill joined LI}: 10 MO, 2 components, μ1 = Mon. 18:00,
σ1 = 1 day, μ2 = Fri. 0:00, σ2 = 0, A = 75%
Some players periodically change of alliance: 1SixCentDix8 leaves Vtrans for
iChiefs on Mondays and Fridays, and goes back to Vtrans one day later. Jill goes
from Bakka to LI either on Mondays or Tuesdays, as well as on Fridays. This
actually highlights a strategy allied alliances (iChiefs and Vtrans on one side,
and Bakka and LI on the other side) have developed to share with one another
the eﬀects of artifacts owned by players 1SixCentDix8 and Jill, respectively.

5 Conclusion
Behavior pattern (episode) mining over event sequences is an important data
mining problem, with many applications, in particular for ambient assisted liv-
ing, or wildlife behavior monitoring. Several frequent episode mining algorithms
have been proposed for both static data and data streams. But while periodic-
ity can also be an interesting characteristic for the study of behaviors, very few
algorithm have addressed frequent and periodic patterns. We propose an eﬃ-
cient algorithm to mine frequent periodic episodes in data streams. We brieﬂy
illustrate the interest of this algorithm with two case studies. As a perspective
of this work, the experiments can be extensively increased, and applied to other
application domains. It also be interesting to include a period-determination
algorithm in order to automatically adapt the period to each pattern. Closed
episodes and non-overlaping occurrences could also be investigated.
558 J. Soulas and P. Lenca

References
1. Amphawan, K., Lenca, P., Surarerks, A.: Efficient mining top-k regular-frequent
itemset using compressed tidsets. In: Cao, L., Huang, J.Z., Bailey, J., Koh, Y.S.,
Luo, J. (eds.) PAKDD Workshops 2011. LNCS, vol. 7104, pp. 124–135. Springer,
Heidelberg (2012)
2. Baratchi, M., Meratnia, N., Havinga, P.J.M.: Recognition of periodic behavioral
patterns from streaming mobility data. In: Stojmenovic, I., Cheng, Z., Guo, S. (eds.)
MOBIQUITOUS 2013. LNICST, vol. 131, pp. 102–115. Springer, Heidelberg (2014)
3. Calders, T., Dexters, N., Goethals, B.: Mining frequent itemsets in a stream. In:
ICDM, pp. 83–92 (2007)
4. Cook, D.J., Crandall, A.S., Thomas, B.L., Krishnan, N.C.: Casas: A smart home
in a box. IEEE Computer 46(7), 62–69 (2013)
5. Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data
via the EM algorithm. Journal of the Royal Statistical Society. Series B (Method-
ological), 1–38 (1977)
6. Gama, J.: A survey on learning from data streams: current and future trends.
Progress in Artificial Intelligence 1(1), 45–55 (2012)
7. Heierman, E.O., Youngblood, G.M., Cook, D.J.: Mining temporal sequences to
discover interesting patterns. In: KDD Workshop on mining temporal and sequen-
tial data (2004)
8. Kiran, R.U., Reddy, P.K.: Mining periodic-frequent patterns with maximum
items’ support constraints. In: ACM COMPUTE Bangalore Conference, pp. 1–8
(2010)
9. Lahiri, M., Berger-Wolf, T.Y.: Mining periodic behavior in dynamic social net-
works. In: ICDM, pp. 373–382. IEEE Computer Society (2008)
10. Li, Z., Han, J., Ding, B., Kays, R.: Mining periodic behaviors of object movements
for animal and biological sustainability studies. Data Mining and Knowledge Dis-
covery 24(2), 355–386 (2012)
11. Lin, S., Qiao, J., Wang, Y.: Frequent episode mining within the latest time win-
dows over event streams. Appl. Intell. 40(1), 13–28 (2014)
12. Mannila, H., Toivonen, H., Verkamo, A.I.: Discovering frequent episodes in
sequences. In: Fayyad, U.M., Uthurusamy, R. (eds.) KDD, pp. 210–215. AAAI
Press (1995)
13. Patnaik, D., Laxman, S., Chandramouli, B., Ramakrishnan, N.: Efficient episode
mining of dynamic event streams. In: ICDM, pp. 605–614 (2012)
14. Rashidi, P., Cook, D.J.: Mining sensor streams for discovering human activity
patterns over time. In: Webb, G.I., Liu, B., Zhang, C., Gunopulos, D., Wu, X.
(eds.) ICDM 2010, The 10th IEEE International Conference on Data Mining,
Sydney, Australia, December 14–17, 2010, pp. 431–440. IEEE Computer Society
(2010)
15. Soulas, J., Lenca, P., Thépaut, A.: Monitoring the habits of elderly people through
data mining from home automation devices data. In: Reis, L.P., Correia, L.,
Cascalho, J. (eds.) EPIA 2013. LNCS, vol. 8154, pp. 343–354. Springer, Heidelberg
(2013)
Periodic Episode Discovery Over Event Streams 559

16. Surana, A., Kiran, R.U., Reddy, P.K.: An eﬃcient approach to mine periodic-
frequent patterns in transactional databases. In: Cao, L., Huang, J.Z., Bailey, J.,
Koh, Y.S., Luo, J. (eds.) PAKDD Workshops 2011. LNCS, vol. 7104, pp. 254–266.
Springer, Heidelberg (2012)
17. Wong, R.W., Fu, A.C.: Mining top-k frequent itemsets from data streams. Data
Mining and Knowledge Discovery 13(2), 193–217 (2006)
18. Zhu, H., Wang, P., He, X., Li, Y., Wang, W., Shi, B.: Eﬃcient episode mining
with minimal and non-overlapping occurrences. In: ICDM, pp. 1211–1216 (2010)
Forecasting the Correct Trading Actions

Luı́s Baı́a1,2(B) and Luı́s Torgo1,2

1
LIAAD - INESC TEC, Porto, Portugal
luisbaia [email protected], [email protected]
2
DCC - Faculdade de Ciências - Universidade do Porto, Porto, Portugal

Abstract. This paper addresses the problem of decision making in the

context of financial markets. More specifically, the problem of forecast-
ing the correct trading action for a certain future horizon. We study and
compare two different alternative ways of addressing these forecasting
tasks: i) using standard numeric prediction models to forecast the vari-
ation on the prices of the target asset and on a second stage transform
these numeric predictions into a decision according to some pre-defined
decision rules; and ii) use models that directly forecast the right decision
thus ignoring the intermediate numeric forecasting task. The objective
of our study is to determine if both strategies provide identical results
or if there is any particular advantage worth being considered that may
distinguish each alternative in the context of financial markets.

1 Introduction

Many real world applications require decisions to be made based on forecasting

some numeric quantity. Sales forecasting may lead to some important decisions
concerning the production process. Asset price forecasting may lead investors
to buy or sell some financial product. Forecasting the future evolution of some
indicator of a patient may lead a medical doctor to some important treatment
prescriptions. These are just a few examples of concrete applications that fit
this general setting: decisions based on numeric forecasts of some variable. In
most cases the decision process is based on a pre-defined protocol that associates
intervals of the range of the numeric variable with concrete actions/decisions.
This means that once we have a prediction for the numeric variable we will use
some deterministic process to reach the action/decision to be taken. This work is
focused on this particular type of situations in the context of financial markets.
In this domain the goal of investors is to make the correct decision (Sell, Buy
or Hold) at any given point in time. These decisions are taken based on the
investor’s expectations on the future evolution of the asset prices. In this work
we approach this decision problem using predictive models. More specifically, we
will compare two possible ways of trying to forecast what is the correct trading
decision at any point in time.
In our target applications we assume that there are deterministic decision
rules that given the estimated evolution of the prices of the asset will indicate
the trading action to be taken. For instance, a rule could state that if the forecast

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 560–571, 2015.
DOI: 10.1007/978-3-319-23485-4 55
Forecasting the Correct Trading Actions 561

of the variation of prices is a 2.5% increase then the correct decision is to buy
the asset as this will allow covering transaction costs and still have some profit.
Given the deterministic mapping from forecasted values into decisions we can
define the prediction task in two different ways. The first consists on obtaining
a numeric prediction model that we can then use to obtain predictions of the
future variation of the prices which are then transformed (deterministically)
into trading decisions (e.g. [1], [2]). The second alternative consists of directly
forecasting the correct trading decisions (e.g. [3], [4], [5]). Which is the best
option in terms of the resulting financial results? To the best of our knowledge
no comparative study was carried out to answer this question. This is the goal of
the current paper: to compare these two approaches and provide experimental
evidence of the advantages and disadvantages of each alternative.

2 Problem Formalization
The problem of decision making based on forecasts of a numerical (continuous)
value can be formalized as follows. We assume there is an unknown function
that maps the values of p predictor variables into the values of a certain numeric
variable Y . Let f be this unknown function that receives as input a vector x
with the values of the p predictors and returns the value of the target numeric
variable Y whose values are supposed to depend on these predictors,

f : Rp → R
x → f (x).

We also assume that based on the values of this variable Y some decisions
need to be made. Let g be another function that given the values of this target
numeric variable transforms them into actions/decisions,

g : R → A = {a1 , a2 , a3 , . . . }
Y → g(Y ).

where A represents a set of possible actions.

In our target applications, functions f and g are very diﬀerent. Function g is
known and deterministic, in the sense that it is part of the domain background
knowledge. Function f is unknown and uncertain. The only information we have
about function f is an historical record of mappings from x into Y , i.e. a data
set that can be used to learn an approximation of the function f . Given that the
variable Y is numeric this approximation could be obtained using some existing
multiple regression tool. This means that given a data set Dr = {xi , Yi ni=1 } we
can use some regression tool to obtain a model r̂(x) that is an approximation
of f . From an operational perspective this would mean that given a test case
q for which a decision needs to be made we would proceed by ﬁrst using r̂ to
obtain a prediction for Y and then apply g to this predicted value to get the
562 L. Baı́a and L. Torgo

predicted action/decision, i.e. q → r̂(q) → g(r̂(q)). In the context of ﬁnancial

markets the predictors describe the currently observed dynamics of the prices
of some financial asset and the target numeric variable Y represents the future
variation of this price. This means that f is the unknown function that maps the
currently observed price dynamics into a future evolution of the price. On the
other hand g is a deterministic function (typically based on domain knowledge
and risk preferences of traders) that maps the prediction of the future evolution
of prices into one of three possible decisions: Sell, Hold or Buy.
Given the deterministic nature of g we can use an alternative process for
obtaining decisions. More specifically, we can build an alternative data set Dc =
{xi , g(Yi )ni=1 }, where the target variable is the decision associated with each
known Y value in the historical record of data. This means that we have a
nominal target variable, i.e. we are facing a classification task. Once again we
can use some standard classification tool to obtain an approximation ĉ of the
unknown function that maps the predictors into the correct actions/decisions.
Once such model is obtained we can use it given a query case q to directly
estimate the correct decision by applying the learned model to the case, i.e.
q → ĉ(q). This means that given the description of the current dynamics of the
price we will use function ĉ to forecast directly the correct trading action for this
context.
Independently of the approach followed, the final goal of the applications
we are targeting is always to make correct decisions. This means that whatever
process we use to reach a decision, it will be evaluated in terms of the “qual-
ity” of the decisions it generates. In this context, it seems that the classification
approach, by having as target variable the decisions, would be easier to bias
towards optimal actions. However, this approach completely ignores the inter-
mediate numeric variable that is supposed to influence decisions, though one
may argue that information on the relationship between Y and the decisions is
“encoded” when building the training set Dc by using as target the values of
g(Yi ). On the other hand, while the regression approach is focused on obtaining
accurate predictions of Y , it completely ignores questions like eventual different
cost/benefits of the different possible decisions that could be easily encoded into
the classification tasks. All these potential trade-offs motivate the current study.
The main goal of this paper is to compare these two approaches in the context
of financial markets.

3 Material and Methods

This section describes the main issues involved in the experimental compari-
son we will carry out with the goal of comparing the two possible approaches
described in the previous section.

3.1 The Tasks

The problem addressed in this paper is very common in automatic trading sys-
tems where decisions are based on the forecasts of some prediction models.
Forecasting the Correct Trading Actions 563

The decisions to open or close short/long positions are typically the result of
a deterministic mapping from the predicted prices variation.
In our experiments, we have used the assets prices of 12 companies. Each
data set has a minimum of 7 years of daily data and a maximum of 30 years.
In order to simplify the study, we will be working with a one-day horizon, i.e.
take a decision based on the forecasts of the assets variation for one day ahead.
Moreover, we will be working exclusively with the closing prices of each trading
session, i.e. we assume trading decisions are to be made after the markets close.
The decision function for this application receives as input the forecast of the
daily variation of the assets closing prices and returns a trading action. We will
be using the following function in our experiments:

g : R → A = {hold, buy, sell}

⎧
⎪
⎨buy, Y > 0.02
Y → sell, Y < −0.02 .
⎪
⎩
hold, other cases

This means we are assuming that any variation above 2% will be sufficient
to cover the transaction costs and still obtain some profit. Concerning the data
that will be used as predictors for the forecasting models (either forecasting the
prices variation (Y ) or directly the trading action (A)) we have used the price
variations on recent days as well as some trading indicators, such as the annual
volatility, the Welles Wilder’s style moving average [6], the stop and reverse point
indicator developed by J. Welles Wilder [6], the usual moving average and others.
The goal of this selection of predictors is to provide the forecasting models with
useful information on the recent dynamics of the assets prices.
Regarding the performance metrics we will use to compare each approach, we
will use two metrics that capture important properties of the economic results of
the trading decisions made by the alternative models. More specifically, we will
use the Sharpe Ratio as a measure of the risk (volatility) associated with the
decisions, and the percentage Total Return as a measure of the overall financial
results of these actions. To make our experiments more realistic we will consider
a transaction cost of 2% for each Buy or Sell decision a model may take.
At this stage it is important to remark that the prediction tasks we are facing
have some characteristics that turn them in to particularly challenging tasks. One
of the main hurdles results from the fact that interesting events, from a trading
perspective, are rare in financial markets. In effect, large movements of prices are
not very frequent. This means that the data sets we will provide to the models
have clearly imbalanced distributions of the target variables (both the numeric
percentage variations and the trading actions). To make this imbalance problem
harder the situations that are more interesting from a trading perspective are
rare in the data sets which creates difficulties to most modelling techniques. In
the next section, we will describe some of the measures we have taken to alleviate
this problem.
564 L. Baı́a and L. Torgo

3.2 The Models

In this section we will list all the model variants that will be used in the experi-
mental comparison. The point is to ensure that both approaches have the same
conditions for a fair comparison. Several variants for each family of models (SVM,
Random Forests, etc) were tested in order to make sure our conclusions were not
biased by the choice of models. The Table 1 and Table 2 shows all the model
variants used, where nearly 182 model variants were tested.

Table 1. Regression models used for the experimental comparisons. SVM stands for
Support Vectorial Machines, KNN for K-nearest neighbours, NNET for Neural Net-
works and MARS for Multivariate Adaptive Regression Spline models

Model Variants R Package

cost={1,5,10}, = {0.1,0.05,0.01},
SVM e1071
tolerance={0.001,0.005},kernel=linear
cost={1,10}, = {0.1,0.05,0.01},
SVM e1071
degree={2,3,5},kernel=polynomial
Random Forest ntree={500,750,1000,2000,3000},mtry={4,5,6} randomForest
Trees (pruned) se={0,0.5,1,1.5,2},cp=0, minsplit=6 DMwR
KNN k={1,3,5,7,11,15} DMwR
NNET size={2,4,6},decay={0.05,0.1,0.15} nnet
thresh={0.001,0.0005,0.002},
MARS earth
degree={1,2,3},minspan={0,1}
dist={gaussian},n.trees={10000,20000},
AdaBoost gbm
shrinkage={0.001,0.01},interaction.depth={1,2)}

Table 2. Classiﬁcation models used for the experimental comparisons. SVM stands
for Support Vectorial Machines, KNN for K-nearest neighbours, NNET for Neural
Networks and MARS for Multivariate Adaptive Regression Spline models

Model Variants R Package

cost={1,3,7,10},kernel=linear
SVM e1071
tolerance={0.001,0.005,0.0005,0.002}
cost={1,10}, = {0.1,0.05},
SVM e1071
degree={2,3,4,5},kernel=polynomial
Random Forest ntree={500,750,1000,2000,3000},mtry={3,4,5} randomForest
Trees (pruned) se={0,0.5,1,1.5,2},cp=0, minsplit=6 DMwR
KNN k={1,3,5,7,11,15} DMwR
NNET size={2,4,6},decay={0.05,0.1,0.15} nnet
coeﬂearn = c(’Breiman’,’Freund’,’Zhu’),
AdaBoost boosting
mﬁnal=c(500,1000,2000)

The predictive tasks we are facing have two main diﬃculties: (i) the fact
that the distribution of the target variables is highly imbalanced, with the more
relevant values being less frequent; and (ii) the fact that there is an implicit order-
ing among the decisions. The ﬁrst problem causes most modelling techniques to
Forecasting the Correct Trading Actions 565

focus on cases (the most frequent) that are not relevant for the application goals.
The second problem is specific to classification tasks as these algorithms do not
distinguish among the different types of errors, whilst in our target applciation
confusing a Buy decision with a Hold decision is less serious than confusing it
with a Sell.
These two problems lead us to consider several alternatives to our base mod-
elling approaches described in Tables 1 and 2. For the first problem of imbalance
we have considered the hypothesis of using resampling to balance the distri-
bution of the target variable before obtaining the models. In order to do that,
we have used the Smote algorithm [7]. This method is well known for classi-
fication models, consisting basically of oversampling the minority classes and
under-sampling the majority ones. The goal is to modify the data set in order
to ensure that each class is similarly represented. Regarding the regression tasks
we have used the work by Torgo et. al [8], where a regression version of Smote
was presented. Essentially, the concept is the same as in classification, using a
method to try to balance the continuous distribution of the target variable by
oversampling and under-sampling different ranges of its domain.
We have thoroughly tested the hypothesis that using resampling before
obtain ghe models would boost the performance of the different models we have
considered for our tasks. Our experiments confirmed that resampling lead the
models to issue more Buy and Sell signals (the one that are less frequent but
more interesting). However, this increased number of signals was accompanied
by an increased financial risk that frequently lead to very poor economic results,
with very few exceptions.
Regarding the second problem of the order among the classes we have also
considered a frequently used approach to this issue. Namely, we have used a
cost-benefit matrix that allows us to distinguish between the different types of
classification errors. Using this matrix, and given a probabilistic classifier, we
can predict for each test case the class that maximises the utility instead of the
class that has the highest probability.
We have used the following procedure to obtain the cost-benefit matrices for
our tasks. Correctly predicted buy/sell signals have a positive benefit estimated
as the average return of the buy/sell signals in the training set. On the other
hand, in the case of incorrectly predicting a true hold signal as buy (or sell ), we
we assign it minus the average return of the buy (or sell ) signals. Basically, the
benefit associated to correctly predicting one rare signal is entirely lost when the
model suggests an investment when the correct action would be doing nothing.
In the extreme case of confusing the buy and sell signals, the penalty will be
minus the sum of the average return of each signal. Choosing such a high penalty
for these cases will eventually change the model to be less likely to make this
type of very dangerous mistakes. Considering the case of incorrectly predicting
a true sell (or buy) signal as hold, we also charge for it, but in a less severe way.
Therefore, the average of the sell (or buy) signal is considered, but divided by
two. This division was our way of “teaching” the model that it is preferable to
miss an opportunity to earn money rather than making the investor lose money.
566 L. Baı́a and L. Torgo

Finally, correctly predicting a hold signal gives no penalty nor reward, since
no money is either won or lost. Table 3 shows an example of such cost-beneﬁt
matrix that was obtained with the data from 1981-01-05 to 2000-10-13 of Apple.

Table 3. Example cost-beneﬁt matrix for Apple shares.

Trues
s h b
s 0.49 -0.49 -0.82
Pred h -0.24 0.00 -0.17
b -0.82 -0.33 0.33

We have also thoroughly tested the hypothesis that using cost-beneﬁt matri-
ces to implement utility maximisation would improve the performance of the
models. Our tests have shown that nearly half the model variants see their per-
formance boosted with this approach.

3.3 The Experimental Methodology

In this section we present the experimental methodology used in our compara-
tive experiments. Due to the temporal nature of the used data sets, the usual
cross-validation methodology should not be used to estimate the performance
of a certain model. Namely, this procedure assumes that the data has no order
and by using it we would obtain unreliable estimates. In this context, we have
decided to use a Monte Carlo simulation method for obtaining our estimates.
This methodology consists of randomly selecting a series of N points in time
within the available data set. For each of these random dates, we use a certain
consecutive past window as training set for obtaining the alternative models
that are then tested/compared in a sub-sequent and consecutive test window.
The Monte Carlo estimates are formed by the average scores obtained on the N
repetitions. In our experiments we have used N = 10, 50% of the data as the
size of the training window, and 25% of the data as size of the test sets.
With respect to testing the statistical significance of the observed differences
between the estimated scores we have used the recommendations of the work by
Demsar [9]. More specifically, in situations where we are comparing k alternative
models on one specific task we have used the Wilcoxon signed rank test to test the
significance of the differences. On the experiments where k models are compared
on t tasks we use the Friedman test followed by a post-hoc Nemenyi test to check
the significance of the difference between the average ranks of the k models across
the t tasks.

4 Experimental Results
This section presents the results of the experimental comparisons between the
two general approaches to making trading decisions based on forecasting models.
Forecasting the Correct Trading Actions 567

In our experiments we have considered 76 classiﬁcation models. For each of these

models we have also tried the version with resampling and the version with cost-
benefit matrices, totaling 76 × 3 = 228 different classification variants. In terms
of regression we have a slightly large set of 97 base models that where then
tried with and without resampling, for a total of 97 × 2 = 194 variants. All
these variants were compared on the data sets of the 12 companies described in
Section 3.1 using the methodology described in Section 3.3.
We have divided our experimental analysis in two main parts. In the first one,
for each company and for each metric, we have compared the best regression
and classification variant using a Wilcoxon singed rank statistical test with a
significance level of 0.05 to check if we can reject the null hypothesis that there is
not significant difference between the best classification and regression variants.
This leads to 12 statistical tests for each metric (one test for each company),
where the models compared for each company are not necessarily the same. The
motivation of this first part is to compare the best classification variant against
the best regression task for each company and metric. Figure 1 shows the result
of this comparison for the Total Return and Sharpe Ratio financial evaluation
metrics. The results on these figures are somewhat correlated. In effect, whenever
we have found a significant difference in terms of Total Return, the same also
happened in terms of Sharpe Ration. Regarding the left graph (Total Return),
we have one significant win for each approach and 6 against for 4 non significant
wins for classification and regression, respectively. With respect to the right
graph (Sharpe Ration) we can observe a slight advantage of the classification
approach, with one more significant win and 8 vs 1 non-significant wins. Paying
respect to the second figure, one more significant win for classification is obtained
and this approach achieved 8 non significant wins against 1 for the regression
one. Overall, we have observed a very slight advantage of the best classification
approach against the best regression variant.
From an economical perspective we have observed contradictory results. For
instance, there is a very high level of Total Return for the Meg company (above
60% return), but the best Sharpe Ratio was very low. This means that the best
model for the first metric was taking enormous amounts of risk and that the high
level of return achieved was probably due to pure luck. On the other hand, there
are some high values for the Total Return accompanied by high levels of Sharpe
Ratio, such as for the Exas company. This strongly suggests that the models
could actually provide some profit with low risk, thus indicating that the model
actually predicted meaningful signals. Given the high variability of the results
across companies, taking conclusions solely based on the analysis of the best
variant per model and per metric may lead to wrong results. This establishes
the motivation for the second part of our experiments.
In this second part of our experiments, instead of grouping by metric and
company, we will just group by metric and study the average rank of each model
across all the companies (top 5 of each approach are considered). With the use
of the Friedman test followed by the post-hoc Nemenyi test, we check whether
there are statistically significant differences among these rankings. This way, if
568 L. Baı́a and L. Torgo

Fig. 1. Best classiﬁcation variant against the best regression one for the Total Return
and Sharpe Ratio metrics (asterisks denote that the respective variant is signiﬁcantly
better, according to a Wilcoxon test with α = 0.05).

a model obtains a very good result for one company but poor for all the others
(meaning that it was lucky in that specific company), its average ranking will be
low allowing the top average rankings to be populated by the true top models
that perform well across most companies.
Table 4 summarises the results in terms of Total Return. Since we could not
reject the Friedman’s null hypothesis, the post-hoc Nemenyi’s test was not per-
formed. This means that we can not say with 95% confidence that there is some
difference in terms of Total return between these modelling approaches. Never-
theless, there are some observations to remark. The model with the best average
ranking is a classification model using cost-benefit matrices. All the remain-
ing classification variants are in their original form (without using cost-benefit
matrices) and occupying mostly the last positions. Moreover, not a single variant
obtained with Smote appears in this top 5 for each approach, which means that
we confirm that resampling does not seem to pay off for this class of applications
due to the economic costs of making more risky decisions. Furthermore, another
very interesting remark is that all the top models are are using SVMs as the
base learning algorithm. Overall we can not say that any of the two approaches
(forecasting directly the trading actions using classification models or forecasting
the price returns before using regression) is better than the other.
Table 5 shows the results of the same experiment in terms of Sharpe Ratio,
i.e. the risk exposure of the alternatives. The conclusions are quite similar to
Forecasting the Correct Trading Actions 569

Table 4. The average rank of the top 5 Classification and Regression models in terms
of Total Return. The Friedman test returned a p-value of 0.3113477, meaning that there
is no statistical difference between all the 10 variants compared. Note: BC means the
model was obtained using a benefit-cost matrix, while (p)/(l) means the SVM model
was obtained using a polynomial/linear kernel. The vx labels represent the different
parameter settings that were considered within each variant.

Rank Variant Avg. Rank Rank Variant Avg. Rank

1 CLASS SVM(p) BC v3 4.54 6 CLASS SVM(l) v1 5.83
2 REG SVM(l) v1 5.12 7 CLASS SVM(l) v5 5.83
3 REG SVM(l) v11 5.12 8 CLASS SVM(l) v9 5.83
4 REG SVM(l) v2 5.25 9 REG SVM(l) v3 5.83
5 CLASS SVM(l) v6 5.67 10 REG SVM(l) v18 5.96

the Total Return metric. Once again, no significant differences were observed.
Still, one should note that the first 5 places are dominated by the classification
approaches. The best variant for the Total Return is also the best variant for the
Sharpe Ratio, which makes this variant unarguably the best one of our study
when considering the 12 different companies. Hence, ultimately we can state that
the most solid model belongs to the classification approach using an SVM with
cost-benefit matrices, since it obtained the highest returns with lowest associated
risk. Finally, unlike the results for Total Return, in this case we observe other
learning algorithms appearing in the top 5 best results.

Table 5. Top 5 average rankings of the Classiﬁcation and Regression models for the
Sharpe Ratio. The Friedman test returned a p-value of 0.1037471, implying there is no
statistical diﬀerence between all the 10 variants tested.

Rank Variant Avg. Rank Rank Variant Avg. Rank

1 CLASS SVM(p) BC v3 4.88 6 CLASS SVM(l) v1 5.58
2 CLASS SVM(l) BC v15 5.38 7 REG TREE v4 5.58
3 REG NNET v1 5.38 8 REG TREE v5 5.58
4 CLASS SVM(l) v14 5.42 9 REG NNET v2 5.67
5 CLASS SVM(l) v6 5.50 10 REG SVM(l) v1 6.04

In conclusion, we can not state that one approach performs definitely bet-
ter than the other in the context of financial trading decisions. The scientific
community typically puts more effort into the regression models, but this study
strongly suggests that both have at least the same potential. Actually, the most
consistent model we could obtain is a classification approach. Another interest-
ing conclusion is that, of a considerably large set of different types of models,
SVMs achieved better results both when considering classification or regression
tasks.
570 L. Baı́a and L. Torgo

5 Conclusions
This paper presents a study of two different approaches to financial trading
decisions based on forecasting models. The first, and more conventional, app-
roach uses regression tools to forecast the future evolution of prices and then
uses some decision rules to choose the “correct” trading decision based on these
predictions. The second approach tries to directly forecast the “correct” trading
decision. Our study is a specific instance of the more general problem of mak-
ing decisions based on numerical forecasts. In this paper we have focused on
financial trading decisions because this is a specific domain that requires specific
trade-offs in terms of economic results. This means that our conclusions from
this study in this area should not be generalised to other application domains.
Overall, the main conclusion of this study is that, for this specific applica-
tion domain, there seems to not be any statistically significant difference between
these two approaches to decision making. Given the large set of classification and
regression models that were considered, as well as different approaches to the
learning task, we claim that this conclusion is supported by significant experi-
mental evidence.
The experiments carried out in this paper have also allowed us to draw some
other conclusions in terms of the applicability of resampling and cost-benefit
matrices in the context of financial forecasting. Namely, we have observed that
the application of resampling, although increasing the number of trading deci-
sions made by the models, would typically bring additional financial risks that
would make the models unattractive to traders. On the other hand the use of
cost-benefit matrices in an effort to maximise the utility of the predictions of the
models, did bring some advantages to a high percentage of modelling variants.
As future work we plan to extend our comparisons of these two forms of
addressing decision making based on numeric forecasting, to other application
domains, in an effort to provide general guidelines to the community on how to
address these relevant real world tasks.

Acknowledgments. This work is ﬁnanced by the FCT Fundação para a Ciência

e a Tecnologia (Portuguese Foundation for Science and Technology) within project
UID/EEA/50014/2013.

References
1. Lu, C.J., Lee, T.S., Chiu, C.C.: Financial time series forecasting using independent
component analysis and support vector regression. Decision Support Systems 47(2),
115–125 (2009). cited By 112
2. HELLSTR iOM, T.: Data snooping in the stock market. Theory of Stochastic Pro-
cesses 21, 33–50 (1999) (1999b)
3. Luo, L., Chen, X.: Integrating piecewise linear representation and weighted support
vector machine for stock trading signal prediction. Applied Soft Computing 13(2),
806–816 (2013)
Forecasting the Correct Trading Actions 571

4. Ma, G.Z., Song, E., Hung, C.C., Su, L., Huang, D.S.: Multiple costs based decision
making with back-propagation neural networks. Decision Support Systems 52(3),
657–663 (2012)
5. Teixeira, L.A., de Oliveira, A.L.I.: A method for automatic stock trading combining
technical analysis and nearest neighbor classification. Expert Systems with Appli-
cations 37(10), 6885–6890 (2010)
6. Wilder, J.: New Concepts in Technical Trading Systems. Trend Research (1978)
7. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic
minority over-sampling technique. Journal of Artificial Intelligence Research 16(1),
321–357 (2002)
8. Torgo, L., Branco, P., Ribeiro, R.P., Pfahringer, B.: Resampling strategies for regres-
sion. Expert Systems (2014)
9. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. The Journal
of Machine Learning Research 7, 1–30 (2006)
CTCHAID: Extending the Application
of the Consolidation Methodology

Igor Ibarguren(B) , Jesús Marı́a Pérez, and Javier Muguerza

Department of Computer Architecture and Technology, University of the Basque

Country UPV/EHU, Manuel Lardizabal 1, 20018 Donostia, Spain
{igor.ibarguren,txus.perez,j.muguerza}@ehu.es
http://www.sc.ehu.es/aldapa/

Abstract. The consolidation process, originally applied to the C4.5 tree

induction algorithm, improved its discriminating capacity and stabil-
ity. Consolidation creates multiple samples and builds a simple (non-
multiple) classifier by applying the ensemble process during the model
construction times. A benefit of consolidation is that the understand-
ability of the base classifier is kept. The work presented aims to show
the consolidation process can improve algorithms other than C4.5 by
applying the consolidation process to another algorithm, CHAID*. The
consolidation of CHAID*, CTCHAID, required solving the handicap of
consolidating the value groupings proposed by each CHAID* tree for
discrete attributes. The experimentation is divided in three classifica-
tion contexts for a total of 96 datasets. Results show that consolidated
algorithms perform robustly, ranking competitively in all contexts, never
falling into lower positions unlike most of the other 23 rule inducting algo-
rithms considered in the study. When performing a global comparison
consolidated algorithms rank first.

1 Introduction

In some problems that make use of classification techniques, the reason of why
a decision is made is almost as important as the accuracy of the decision, thus
the classifier must be comprehensible. Decision trees are considered comprehen-
sible classifiers. The most common way of improving the discriminating capacity
of decision trees is to build ensemble classifiers. However with ensembles, the
explaining capacity individual trees possess is lost. The consolidation of algo-
rithms is an alternative that resamples the training sample multiple times and
applies the ensemble voting process while the classifier is being built, so that
the final classifier is a single classifier (with explaining capacity) built using the
knowledge of multiple samples. The well-known C4.5 tree induction algorithm
[10] has successfully been consolidated in the past [9].
With the aim of studying the benefit of the consolidation process on other algo-
rithms, maintaining the explaining capacity of the classifier, in this work we apply
this methodology on a variation of the CHAID [7,8] algorithm (CHAID* [5]), one
of the first tree induction algorithms along with C4.5 and CART. We propose the

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 572–577, 2015.
DOI: 10.1007/978-3-319-23485-4 56
CTCHAID: Extending the Application of the Consolidation Methodology 573

consolidation of CHAID* and using tests for statistical significance [4] we com-
pare its results in three different classification contexts (amounting a total of 96
datasets) against 16 genetics-based and 7 classical algorithms and also the origi-
nal CTC (Consolidated Tree Construction) algorithm.
The rest of the paper is organized as follows. Section 2 details the related
work. Section 3 explains the consolidation version of the CHAID*, CTCHAID.
Section 4 defines the experimental methodology. Section 5 lays out the obtained
results. Finally, section 6 gives this work’s conclusions.

2 Related Work on CHAID* and Consolidation

The CHAID (Chi-squared Automatic Interaction Detector) [7,8] is a tree induc-
tion algorithm that uses the chi-squared (χ2 ) as the split function and it only
works with discrete variables. CHAID* [5] is a variation of CHAID that differs
in three main aspects:
– Handling of attributes: The original CHAID algorithm lacks the ability to
handle continuous variables. Inspired by how C4.5 handles continuous vari-
ables, CHAID* uses the χ2 to determine the best cutting point to divide the
variable into two sets.
– Missing values on continuous variables: Three options are considered to treat
the examples with missing values: grouping them with those examples with
a value lower or equal to the cutting point, grouping them with examples
whose value is greater than the cutting point or creating a branch just for
the examples with a missing value.
– Pruning: CHAID* uses the same strategy as C4.5 by applying the Reduced-
error pruning mechanism.
The consolidation approach aims at improving the discriminating capacity
and the stability while reducing the complexity of the classifier [9]. It works by
applying the ensemble voting process while building the simple classifier instead
of building multiple simple classifiers and performing the ensemble vote only
when classifying new examples. Recently the term “Inner Ensembles” has been
coined to group methodologies following this approach [1].
The first consolidated algorithm was the well-known C4.5 decision tree induc-
tion algorithm, creating the CTC (Consolidated Tree Construction) algorithm.
CTC works by first creating multiple samples from the training samples, usually
by subsampling. Then, from each sample a C4.5 tree begins to grow. However
on each node, execution “stops”. Each tree proposes a new split based on their
unique sample. A vote takes place and a common split is agreed. All trees com-
ply with the majority vote and make the split accordingly, even if it is not what
they have voted. This continues until the majority decides not to split any more.
Because of this process, the structure of the trees grown from all subsamples is
the same and the outcome is a single tree model. Then, for each leaf node, the a
posteriori probabilities for each class are computed by averaging the probabili-
ties on that particular leaf using the same samples used to build the consolidated
tree.
574 I. Ibarguren et al.

3 CTCHAID
As explained in section 2 the changes made to CHAID* make it very similar to
the C4.5 algorithm, which makes the implementation of CTCHAID very similar
to the implementation of CTC45 (Consolidated C4.5) described in section 2.
Aside from the split function, the other main difference between the algorithms
is how discrete variables (nominal and ordinal) are handled. By default, when
splitting using a discrete variable C4.5 creates a branch for every possible value
for the attribute. On the other hand, CHAID* considers grouping more than
one value on each branch. In each node a contingency table is created for each
variable. Each of these tables describes the relationship between the values a
variable can take and how the examples with this value are distributed among
all possible classes. CHAID* uses Kass’ algorithm [7] on all contingency tables
to find the most significant variable and value-group to make the split.
When consolidating CHAID* the behavior is different depending on the type
of variable. First the contingency tables are built from each sample and processed
with Kass’ algorithm to find the most important grouping. From each subsample
a variable is proposed and voting takes place as with CTC45. If the voted variable
is continuous the median value of the proposed cut-point values will be used. For
categorical values, the contingency tables from each tree for the chosen variable
are averaged into a single table. This averaged table is processed with Kass’
algorithm to find the most significant combination of categories.

4 Experimental Methodology

This experiment follows a very similar structure as the works in [3] and [6] as we
compare to the results published in those works. The same three classiﬁcation
contexts are analyzed: 30 standard (mostly multi-class) datasets, 33 two-class
imbalanced datasets and the same 33 imbalanced datasets preprocessed with
SMOTE (Synthetic Minority Over-sampling Technique [2]) until the two classes
were balanced by oversampling the minority class. Fernández et al. [3] proposed
a taxonomy to classify genetics-based machine learning (GBML) algorithms for
rule induction. They listed 16 algorithms and classiﬁed them in 3 categories and
5 subcategories. They compared the performance of these algorithms with a set
of classical algorithms (CART, AQ, CN2, C4.5, C4.5-Rules and Ripper). In our
work, for each of the contexts, the winner for each of the 5 GBML categories, 7
classical algorithms (including CHAID*), CTC45 and CTCHAID are compared.
Finally a global ranking is also computed. All algorithms used the same 5-run
× 5-fold cross-validation strategy and the same training/test partitions (found
in the KEEL repository1 ). The tables containing all the information have been
omitted from this article for space issues and have been moved to the website
with the additional material for this paper2 .

1
http://sci2s.ugr.es/keel/datasets.php
2
http://www.aldapa.eus/res/2015/ctchaid/
CTCHAID: Extending the Application of the Consolidation Methodology 575

For CTC45 and CTCHAID, following the conclusions of the latest work on
consolidation [6], the subsamples used in this work are balanced and the num-
ber of examples per class is the number of examples the least populous class
has in the original training sample. The number of samples for each dataset
has been determined using a coverage value of 99% based on the results of [6].
The tables detailing the number of samples for each dataset have been moved
to the additional material. The pruning used for C4.5, CHAID*, CTC45 and
CTCHAID was C4.5’s reduced-error pruning. However, when pruning resulted
in a tree with just the root node, the tree was kept unpruned. This is due to the
fact that a root node tree results in zero for most performance measures used in
this paper. Thus, the results shown for C4.5 are not those previously published
by Fernández et al. using the KEEL platform but Quinlan’s implementation of
the algorithm.

5 Results

As described in the Experimental Methodology section, we divide the study

into three contexts. For each context we analyze and compare the behavior of
14 algorithms: 5 GBML algorithms (the best for each subcategory proposed by
Fernández et al.), 7 classical algorithms, CTC45 and CTCHAID. The GBML
algorithms change from context to context while the classical stay the same
(CART, AQ, CN2, C4.5, C45-Rules, Ripper and CHAID*).
The significance of the average performance values achieved by the algorithms
has been tested using the Friedman Aligned Ranks test, as proposed by [4]. When
this test finds statistically significant differences between algorithms Holm’s post-
hoc test has been used to find which algorithms perform significantly worse than
the best ranking algorithm. The average performance values have been moved
to the website with the additional material for this article. Figure 1 offers a
visual representation of the average ranks achieved by the algorithms on different
contexts. In that figure a thick black line covers algorithms without statistically
significant differences with the best ranking algorithm. The lower the rank the
better the performance is.
Although CTCHAID does not rank first for any of the three contexts. The
differences with the best ranking algorithm for each context are never found to
be significant by the Holm test.
In a similar fashion to what was done in [6] we perform a global analysis
combining the results of the three contexts. The rankings of this global analysis
are found in Figure 2. For the standard dataset classification, only the kappa
measure is used. In this case CTC45 ranks first followed by CTCHAID. The
Friedman Aligned Ranks test computes a p-value of 4.8 × 10−12 (test statistic
89.51) indicating the clear presence of statistically significant differences between
the performance of the algorithms. According to the Holm test DT-GA, SIA,
UCS, CART, OCEC, CORE, AQ and CN2 perform significantly worse than
CTC45.
576 I. Ibarguren et al.

Fig. 1. Visual representation of Friedman Aligned Ranks for the three contexts.

Fig. 2. Visual representation of Friedman Aligned Ranks for the global ranking.

6 Conclusions and Future Work

Results show that CTCHAID performs competitively. In summary, CTCHAID
ranks in the upper half for all three contexts and in the ﬁrst quartile for two
out of three. As most algorithms fall into much lower positions for at least one
context, CTCHAID ranks competitively in global terms. This behavior shows
the robustness brought by the consolidation process in contrast to the behavior
of the base algorithms, C4.5 and CHAID*, that fall into lower positions in some
contexts, ranking worse globally. This shows that the consolidation process can
bring improvement to multiple algorithms.
As future work we would like to study the performance of the CTC45 and
CTCHAID algorithms under diﬀerent pruning strategies: standard pruning, the
strategy used in this work, disabling pruning, alternatives to pruning, etc. in
order to tackle the class imbalance problem. Also in the same spirit, we would
like to consolidate other tree and rule-induction algorithms.
CTCHAID: Extending the Application of the Consolidation Methodology 577

Acknowledgments. This work was funded by the University of the Basque Country
UPV/EHU (BAILab, grant UFI11/45); by the Department of Education, Universities
and Research and by the Department of Economic Development and Competitiveness
of the Basque Government (grant PRE-2013-1-887; BOPV/2013/128/3067, grant IT-
395-10, grant IE14-386); and by the Ministry of Economy and Competitiveness of the
Spanish Government (eGovernAbility, grant TIN2014-52665-C2-1-R).

References
1. Abbasian, H., Drummond, C., Japkowicz, N., Matwin, S.: Inner ensembles: using
ensemble methods inside the learning algorithm. In: Blockeel, H., Kersting, K.,
Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part III. LNCS, vol. 8190,
pp. 33–48. Springer, Heidelberg (2013)
2. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic
minority over-sampling technique. Journal of Artificial Intelligence Research 16(1),
321–357 (2002)
3. Fernández, A., Garcia, S., Luengo, J., Bernadó-Mansilla, E., Herrera, F.: Genetics-
based machine learning for rule induction: State of the art, taxonomy, and com-
parative study. IEEE Transactions on Evolutionary Computation 14(6), 913–941
(2010)
4. Garcı́a, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests
for multiple comparisons in the design of experiments in computational intelligence
and data mining: Experimental analysis of power. Information Sciences 180(10),
2044–2064 (2010)
5. Ibarguren, I., Lasarguren, A., Pérez, J.M., Muguerza, J., Arbelaitz, O., Gurrutxaga,
I.: BFPART: Best-first PART. Submitted to Information Sciences
6. Ibarguren, I., Pérez, J.M., Muguerza, J., Gurrutxaga, I., Arbelaitz, O.: Coverage-
based resampling: Building robust consolidated decision trees. Knowledge-Based
Systems 79, 51–67 (2015)
7. Kass, G.V.: Significance testing in automatic interaction detection (a.i.d.). Journal
of the Royal Statistical Society. Series C (Applied Statistics) 24(2), 178–189 (1975)
8. Morgan, J.A., Sonquist, J.N.: Problems in the analysis of survey data, and a pro-
posal. J. Amer. Statistics Ass. 58, 415–434 (1963)
9. Pérez, J.M., Muguerza, J., Arbelaitz, O., Gurrutxaga, I., Martı́n, J.I.: Combining
multiple class distribution modified subsamples in a single tree. Pattern Recogni-
tion Letters 28(4), 414–422 (2007)
10. Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers
Inc., San Francisco (1993)
Towards Interactive Visualization of Time Series Data
to Support Knowledge Discovery

Jan Géryk()

KD Lab, Faculty of Informatics, Masaryk University, Brno, Czech Republic

[email protected]

Abstract. Higher education institutions have a significant interest in increasing

the educational quality and effectiveness. A major challenge in modern educa-
tion is the large amount of time-dependent data, which requires efficient tools
and methods to improve decision making. Methods like motion charts (MC)
show changes over time by presenting animations in two-dimensional space and
by changing element appearances. In this paper, we present a visual analytics
tool which makes use of enhanced animated data visualization methods. The
tool is primarily designed for exploratory analysis of academic analytics (AA)
and offers several interactive visualization methods that enhance the MC de-
sign. An experiment is conducted to evaluate the efficacy of both static and
animated data visualization methods. To interpret the experiment results, we
utilized one-way repeated measures ANOVA.

Keywords: Animation · Motion charts · Visual analytics · Academic analytics ·

Experiment

1 Introduction

A key requirement of Business Intelligence (BI) is to improve the decision making

process and to facilitate users to get all the needed information at the right time. There
is an increasing distinction made between academic analytics (AA) and traditional BI
because of the unique type of information that university executives and administra-
tors require for decision making. In [1], hundreds of higher education executives were
surveyed on their analytic needs. Authors resulted that advanced analytics should
support better decision-making, studying enrollment trends, and measuring student
retention. They also pointed out that management commitment and staff skills are
more important in deploying AA than the technology.
Visualizations are common methods used to gain a qualitative understanding of da-
ta prior to any computational analysis. By displaying animated presentations of the
data and providing analysts with interactive tools for manipulating the data, visualiza-
tions allow human pattern recognition skills to contribute to the analytic process. The
most commonly used statistical visualization methods generally focus on univariate or
bivariate data. The methods are usually used for tasks ranging from the exploration to
the confirmation of models, including the presentation of the results. However, fewer
© Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 578–583, 2015.
DOI: 10.1007/978-3-319-23485-4_57
Towards Interactive Visualization of Time Series Data to Support Knowledge Discovery 579

methods are available for visualizing data with more than two dimensions (e.g. mo-
tion charts or parallel coordinates), as the logical mapping of the data dimension to
the screen dimension cannot be directly applied.
Although a snapshot of the data can be beneficial, presenting changes over time
can provide a more sophisticated perspective. Animations allow knowledge discovery
in complex data and make it easier to see meaningful characteristics of changes over
time. The dynamic nature of Motion Charts (MC) allows a better identification of
trends in the longitudinal multivariate data and enables visualization of more element
characteristics simultaneously, as presented in [2]. The authors also conducted an
experiment whose results concluded that MC excels at data presentation. MC is a
dynamic and interactive visualization method that enables analysts to display complex
and quantitative data in an intelligible way.
In this paper, we show the assets of animated data visualizations for successful un-
derstanding of complex and large data. In the next section, we describe our VA tool
that implements visualization methods which make use of enhanced MC design. Fur-
ther, we conduct an empirical study with 16 participants on their data comprehension
to compare the efficacy of various static data visualizations with our enhanced me-
thods. We then discuss the implications of our experiment results. Finally, we draw
the conclusion and outline future work.

2 The Visual Analytics Tool

Visualization tools represent an effective way to make statistical data understandable

to analysts, as showed in [3]. MC methods proved to be useful for data presentation
and the approach was verified to be successfully employed to show the story in data
[4] or support decision making [5]. Several web-based tools allowing analysts to inte-
ractively explore associations, patterns, and trends in data with temporal characteris-
tics are available. In [6], authors presented a visualization of energy statistics using an
existing web-based data analysis tools, including IBM's Many Eyes, and Google
Motion Charts.
The motivation to develop advanced MC methods was to improve expression capa-
bilities, as well as to facilitate analysts to depict each student or study as a central ob-
ject of their interest. Moreover, the implementation enhances the number of animations
that express the students’ behavior during their studies more precisely. We partly vali-
dated usefulness of the developed methods with a case study where we successfully
utilized the capabilities of the tool for the purpose of confirming hypothesis concerning
student retention. Although, we concluded that the methods proved to be useful for
analytic purposes, more adjustments were needed.
Two main challenges are addressed by the presented VA tool. It enables visualization
of multivariate data and the qualitative exploration of data with temporal characteristics.
The technical advantages over other implementations of MC are its flexibility and the
ability to manage many animations simultaneously. Technical aspects of the enhanced
MC methods are elaborately described in [7].
580 J. Géryk

To create an effective and efficient knowledge discovery process, it is important to

support common data manipulation tasks by creating quick, responsive and intuitive
interaction methods. The tool offers several beneficial configurable interactive fea-
tures for a more convenient analytic process. User interface features are highly cus-
tomizable and allow analysts to arrange a display and variable mapping according to
his or her needs. Available features include a mouse-over data display, color and plot
size representation, traces, animated time plot, variable animation speed, changing of
axis series, changing of axis scaling, distortion, and the support of statistical methods.

3 Experiment

Any quantitative research of AA also requires a preliminary exploratory data analysis.

Though useful, advanced MC methods involve several drawbacks in comparison with
common data visualization methods. Thus, empirical data is needed to evaluate its
actual usability and efficacy. In this section, we describe the experiment for the pur-
pose of evaluating the efficacy of the MC methods implemented in our tool. We
present the results including a detailed discussion. Sixteen subjects (7 female, 9 male)
with an average age of 23.44 (SD = 2.12) participated in our experiment. The partici-
pants ranged from 21 to 26 years of age. All participants came from professions
requiring the use of data visualizations, including college students, analysts, and
administrators.
We performed a study to test the benefits of the animated methods over static me-
thods when employed to analyze study related data. The experiment used a 3 (visuali-
zation) x 2 (size) within-subjects design. The visualizations varied between the static
and the animated methods. The methods were represented by motion charts (MC),
line charts (LC), and scatter plots (SP) which were generated for each semester. The
size of datasets varied between small and large ones with the threshold of 500 ele-
ments. For the experiment, we utilized study related data about students admitted to
bachelor studies of the Faculty of Informatics Masaryk University between the years
of 2006 and 2008.

3.1 Hypotheses
We designed the experiment to address the following three hypotheses:

• H1. The MC methods will be more effective than both the static methods for all
datasets. That is, the subjects will be (a) faster and (b) make fewer errors when us-
ing MC.
• H2. The subjects will be more effective with the small datasets than with the large
datasets for all methods. That is, participants will be (a) faster and (b) make fewer
errors when working with small datasets.

In each trial, the participants completed 12 tasks, each with 1 to 3 required answers.
Each task had identification numbers of students or fields of study as the answer.
Several questions have more correct answers than requested. The participants selected
Towards Interactive Visualization of Time Series Data to Support Knowledge Discovery 581

answers by selecting IDs in the legend box located in the upper right from the chart
area. In order to complete the task, two buttons could be used–either “OK” button to
confirm the participant’s choice or “Skip Question” button to proceed to the next task
without saving the answer. There was no time limit during the experiment. For each
task, the order of the datasets was fixed with the smaller ones first.
The participants were asked to proceed as quickly and accurately as possible. In
order to reduce learning effects, the participants were told to make use of as many
practice trials as they needed. It was followed by 12 tasks (6 small dataset tasks and 6
large dataset tasks in this particular order). After that, the subjects completed survey
with questions specific for the visualization. Each block lasted about 1.5 hours. The
subjects were screened to ensure that they were not color-blind and understood com-
mon data visualization methods. To test for significant effects, we conducted repeated
measures analysis of variance (RM-ANOVA). Post-hoc analyses were performed by
using the Bonferroni technique. Only significant results are reported.

3.2 Results
Accuracy. Since some of the tasks required multiple answers, accuracy was calcu-
lated as a percentage of the correct answers. Thus, when a subject selected only one
correct answer from two, we calculated the answer as 50 % accurate rather than an
incorrect answer. The analysis revealed several significant accuracy results at the .05
level. The type of visualization had a statistically significant effect on the accuracy for
large datasets (F(1.413, 21.194) = 20.700, p < 0.001). Pair-wise comparison of the
visualizations found significant differences showing that MC was significantly more
accurate than the LC (p < 0.001). MC was also more accurate than the SP (p < 0.001).
There was no statistically significant difference between the LC and the SP. For the
small datasets, visualizations were not statistically distinguishable. Second, the sub-
jects were more accurate with the small datasets (F(1, 15) = 50.668, p < 0.001). This
fact supports our hypothesis H2.b.

Task Completion Time. An answer was considered to be incorrect if none of the

correct answers was provided. In terms of time to task completion, we observed a
statistically significant effect (F(2, 30) = 107.474, p < 0.001). Post-hoc tests revealed
that MC was fastest with the large datasets. The LC was faster than the SP (p <
0.017). The mean time for MC was 48.56 seconds compared to 59.39 seconds for the
LC–about 22% slower, and 62.88 seconds for the LC–about 29% slower. For the
small datasets, static methods were faster than MC. Pair-wise comparison of the visu-
alizations found significant differences between all of them. MC was slower than the
LC (p < 0.003) and the SP (p < 0.001). The LC was slower than the SP (p < 0.016).
The mean time for MC was 42.19 seconds compared to 35.94 seconds for the LC–
about 17% faster, and 31.94 seconds for the SP–about 32% faster. This only partially
supports the hypothesis H1.a. MC is faster than both the static methods when used for
the large datasets.
582 J. Géryk

Subjective Preferences. For each experiment block, the subjects completed a survey
where the subjects assessed their preferences regarding analysis. The subjects rated
LC, SP, and MC on a five-point Likert scale (1 = strongly disagree, 5 = strongly
agree). Using RM-ANOVA, we revealed statistically significant effects (F(1.364,
20.453) = 4.672, p = 0.033). Post-hoc analysis found that MC was significantly more
helpful than LC (p = 0.046).

Table 1. The resulted mean values of the preferences.

SP LC MC
The visualization was helpful when solving the tasks. 3.50 3.44 3.94
I found this visualization entertaining and interesting. 2.56 2.31 4.13
I prefer visualization for the small datasets. 3.88 4.00 2.63
I prefer visualization for the large datasets. 2.38 2.69 3.69

The significant differences indicate that MC was judged to be more helpful than
the static methods. The subjects preferred the static methods to MC for the small data-
sets. However, MC was judged to be more beneficial than static methods for the large
datasets (p < 0.001). The results also showed that MC was more entertaining and
interesting than the static methods (p < 0.001).

4 Discussion

Our first hypothesis (H1) was that MC would outperform both the static methods for all
dataset sizes, but the hypothesis was only partially confirmed. Contrary to the hypothe-
sis, the static methods proved to achieve better speed than the animated methods for the
small datasets. Moreover, the methods were not statistically distinguishable in terms of
accuracy. We also hypothesized that the accuracy would increase for the smaller data-
sets (H2). Hypothesis H2.a was supported, because the subjects were faster with the
small datasets. The mean time for the large datasets was 56.94 seconds and for the small
datasets was 36.69 seconds. Hypothesis H2.b was also supported, because the subjects
made fewer errors with the small datasets when compared with the large datasets. Accu-
racy is an issue for static visualizations when the large datasets are employed.
The study supports the intuition that using animations in analysis requires conve-
nient interactive tools to support effective use. The study suggests that MC leads to
fewer errors. Also, the subjects found MC method to be more entertaining and excit-
ing. The evidence from the study indicates that the animations were more effective at
building the subjects' comprehension of large datasets. However, the simplicity of
static methods was more effective for small datasets. These observations are consis-
tent with the verbal reports in which the subjects refused to abandon the static visual
methods generally. Results supported the thoughts that MC does not represent a re-
placement of common statistic data visualizations but a powerful addition. The over-
all accuracy was quite low in the study with average about 75%. However, only one
question was skipped.
Towards Interactive Visualization of Time Series Data to Support Knowledge Discovery 583

5 Conclusion and Future Work

In the tool, we enhanced the MC design and expanded it to be more suitable for AA
analysis. We also developed an intuitive, yet powerful, interactive user interface that
provides analysts with instantaneous control of MC properties and data configuration,
along with several customization options to increase the efficacy of the exploration
process. We validate the usefulness and general applicability of the tool with the
experiment to assess an efficacy of the described methods.
The study suggests that animated methods lead to fewer errors for the large data-
sets. Also, the subjects find MC to be more entertaining and interesting. The enter-
tainment value probably contributes to the efficacy of the animation, because it serves
to hold the subjects' attention. This fact can be useful for the purpose of designing
methods in academic settings.
Despite the findings of the study, further investigation is required to evaluate the
general applicability of the animated methods. We also plan to combine our animated
interactive methods with common DM methods to follow the VA principle more pre-
cisely. We already implemented a standalone method utilizing decision tree algorithm
providing interactive visual representation. We prefer decision trees because of their
clarity and simplicity to comprehend. We will also finish the integration of the tool
with our university information system to allow university executives and administra-
tors easy access when analyzing AA and to better support decision making.

References
1. Goldstein, P.J.: Academic analytics: The uses of management information and technology
in higher education. Educause (2005)
2. Al-Aziz, J., Christou, N., Dinov, I.D.: SOCR Motion Charts: an efficient, open-source,
interactive and dynamic applet for visualizing longitudinal multivariate data. Journal of
Statistics Education 18(3), 1–29 (2010)
3. Grossenbacher, A.: The globalisation of statistical content Statistical Journal of the IAOS.
Journal of the International Association for Official Statistics, 133–144 (2008)
4. Baldwin, J., Damian, D.: Tool usage within a globally distributed software development
course and implications for teaching. Collaborative Teaching of Globally Distributed
Software Development, 15–19 (2013)
5. Sultan, T., Khedr, A., Nasr, M., Abdou, R.: A Proposed Integrated Approach for BI and GIS
in Health Sector to Support Decision Makers. Editorial Preface (2013)
6. Vermylen, J.: Visualizing Energy Data Using Web-Based Applications. American Geo-physical
Union (2008)
7. Géryk, J., Popelínský, L.: Visual analytics for increasing efficiency of higher education
institutions. In: Abramowicz, W., Kokkinaki, A. (eds.) BIS 2014 Workshops. LNBIP, vol.
183, pp. 117–127. Springer, Heidelberg (2014)
Ramex-Forum: Sequential Patterns of Prices
in the Petroleum Production Chain

Pedro Tiple1 , Luı́s Cavique2 , and Nuno Cavalheiro Marques3(B)

1
GoBusiness Finance, Lisbon, Portugal
2
Universidade Aberta, Lisbon, Portugal
3
NOVA Laboratory for Computer Science and Informatics, DI-FCT,
Universidade Nova de Lisboa, Lisbon, Portugal
[email protected]

Abstract. We present a sensibility analysis and new visualizations using

an improved version of the Ramex-Forum algorithm applied to the study
of the petroleum production chain. Diﬀerent combinations of parameters
and new ways to visualize data will be used. Results will highlight the
importance of Ramex-Forum and its proper parameterizations for ana-
lyzing relevant relations among price variations in petroleum and other
similar markets.

Keywords: Ramex-forum · Financial data analysis · Petroleum price ·

Petroleum production chain · Business intelligence

1 Introduction

Petroleum is one of the most important resources to the developed world and
still is a major variable influencing the Economy and markets. The price of
petroleum and its derivatives isn’t influenced simply by supply and demand;
taxes, speculation, wars, costs in refinement and transportation all contribute in
setting prices. Due to its lengthy refinement process, a significant increase in the
price of the source material can only reflect in the price of its derivatives after
the time it takes to refine (usually within 3-4 weeks [1]). Moreover, due to its
high economic importance and cost, the price of crude oil should always reflect
on the final price [2].
This work presents a study on a method to quantify how the price of the
crude oil (raw material) can influence the price of manufactured products, by
using Ramex-Forum. This paper departs from the work of [3], with the original
Ramex-forum proposal [4]. It analysis how this proposal can be improved and
then tunned for finding sequential patterns using the prices of petroleum and
derivatives. Sect. 2 presents the basic method and introduces the main concepts
and Sect. 3 presents an evaluation on how the price of derivatives are influenced
by the price of the crude oil (the source material). Finally, some conclusions are
presented.

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 584–589, 2015.
DOI: 10.1007/978-3-319-23485-4 58
Ramex-Forum in the Petroleum Production Chain 585

2 Counting Co-occurrences in Financial Markets

We assume a crossover strategy to buy and sell ﬁnancial products. Given the
product price index in time ‘t’, denoted I(t), and the moving average of that
price index, with length of NM A days, calculated by: M A(t, NM A ) = I(t−w)/
NM A , ∀w ∈ {0 : NM A }, the decision is as follows:

– Buy, if I(t) · (1 + ) ≥ MA(t, NM A )

– Sell, if I(t) · (1 − ) ≤ MA(t, NM A )

For each moment t, if there is a decision of either Buy or Sell, respective

counters (CounterB , CounterS ) will be incremented by one unit. If neither of
those decisions is made, both counters are reset. This way, each counter has the
number of consecutive moments where the same decision is made. See example
in the Fig. 1, the ‘B’ (Buy) an ‘S’ (Sell) char illustrate the crossover strategy
for a large enough and NM A . In this paper (except when explicitly mentioned
otherwise), we will use a standard 1% error, i.e. = 0.01. Other parameter
is also used when defining an influence: parameter δ is the maximum trading
period length where a check for relations between two assets is made. Finally we
can define #Inf luence(A, B, δ): a cumulative influence counter of a given Buy
or Sell decision for a market signal A to a market signal B (denoted A → B),
counts how many times 0 < (|CounterA | − |CounterB |) ≤ δ ∧ CounterB = 0.

Fig. 1. Financial product (normalized DJI index in black) and respective moving aver-
age (blue) and crossover starting Buy (green) and Sell (red) decision with conﬁdence.

3 Results on the Petroleum Production Chain

Petroleum is reﬁned into a relatively extensive list [5], with each category hav-
ing hundreds of sub-products. Moreover this division and classiﬁcation mostly
depends on its social usage. This study is based on the repository of publicly
586 P. Tiple et al.

available historical values for a wide range of petroleum related products pro-
vided by the U.S. Energy Information Administration1 . The variations in the
prices of these products are also compared with the stock market value of eleven
corporations dedicated to extracting, processing, and selling of crude and crude
related products. The prices of the 55 products are separated into retail/bulk
price and spot price (for some items the price is taken from retail sellers and
for other items it is the security price at that day). The data is separated into
four categories of known benchmarks [6] for: crude oil (West Texas Intermedi-
ate as OklahomaWTI, European Brent, and the OPEC Basket); Reﬁnery price
for Gasoline, RBOB Gasoline, Diesel, Kerosene Jet, Propane, and Heating Oil;
National, state, and city averages for regular gasoline and diesel; Corporation
stock values.
This paper studies the inﬂuence and best values for parameters δ, thresh-
olds, and moving average size. Focus will be put on the Buy comparison because
in the selected data the increase/decrease of prices is very asymmetrical with a
strong lean towards increases. Also, for better parameter comparison, an addi-
tional measure is used in our results, the average edge weight: a relation between
the weight sum of output edges divided by the number of edges in the graph:
AverageEdgeW eight(V, E) = weight(e)/|E|, ∀e ∈ E.

Parameter δ was analyzed regarding its effect on the average edge weight
changes. The result can be seen in Fig. 2A. The chart shows the average edge
weight change for each increment in the value of δ. Each line represents the
results obtained using different moving average sizes. Several big spikes can be
seen every 5 days, this is because gas and diesel prices at the pump are only
registered on a weekly basis, so for each 5 day increase in δ the algorithm will
pick up another change in value. This makes the analysis somewhat harder but
it’s still useful as now changes in retail prices are clearly identified. The first
thing noted is that at the first week there is already a noticeable increase in
the average edge weight, however some of it is due to influences between retail
prices and not only from refinery to retail prices. Second, after the fourth week
the individual increases in δ barely produce a meaningful increase in value, still
the cumulative increases are significant.The parameter δ was fixed at a value of
30 working days (around six weeks: two weeks more than the expected).

Parameter was studied by trying to ﬁnd the best combination of parameters.

The algorithm was ran several times and the average edge weight value was
recorded for each run. The best values for threshold interval and the moving
average size are represented in Fig. 2B. The graph shows the progression of the
average edge weight, in relation to increase in threshold size. The parameters
that lead to the highest increase in average weight can be clearly identiﬁed as
the moving average size of 240 days with a threshold of around 26% of the
moving average. However things change when the inﬂuence event count is also

1
The used data was downloaded from http://www.eia.gov/ on June 2014 and ranges
form January 2006 to June 2014.
Ramex-Forum in the Petroleum Production Chain 587

considered: increasing the threshold rapidly decreases the number of detected

events (a threshold of 26% will reduce the number of events by about 80%). In
this case, the starting average is around 130 events and falls to 30, in the 3 years
period analyzed: a very low average number of events. For this case study the
choice was made to maximize the event count so that a broader spectrum of
influences can be detected instead of restricting the analysis to situations where
the prices rise or fall sharply (which is what higher threshold values restrict
the analysis to). Small increases in the threshold will raise the average weight
while only lowering the event count by small amounts. Nevertheless, random
fluctuations do not advise going for a threshold of 0%, so this trade-off seems to
favor the usage of smaller values for .

Fig. 2. Graphs showing the change in: (A) average edge weight with each increment of
δ using the Buy comparison; (B) average edge weight and number of nodes with each
t+1 = t +1% increment in the threshold interval for δ = 30 using the Buy comparison.

Moving Average Size, NM A The choices for available moving average sizes were
based on [4] and the graphs show that maximizing this parameter yields the
best results and even raises the question of how further increases in the size
would fare. The user still needs to take into account of what it means to increase
the moving average size: the bigger it is the moving average, the smoother the
curve will be and thus it will behave like a noise filter (ie., by becoming less
and less sensitive to small changes in the behavior of the product). The values
overlap for small values and it is hard to read the effects of the first increments
in a linear scale, so Fig. 2B uses a logarithmic scale for representing values,
showing that the 240 and 120 moving average sizes have a very similar behavior.
The average weight for a moving average of 120 days has a higher starting value
than the 240 days one, this means that for a buy signal best thresholds are:
= 1% ∧ δ = 30 ∧ NM A = 120. Fig. 3 shows that it is possible to find more than
just sequential patterns with this parameters.
588 P. Tiple et al.

Fig. 3. Part of the graph showing the resulting Buy tree after applying Ramex Forum
on the data with the selected parameters.

In the complete result graph (available in [3]) colors were added to each node
according to their product type. These colors show a clear grouping of prod-
uct types, with same color nodes mostly close to each other. This was expected
for gas to gas and diesel to diesel influences. However even the stock, refinery,
and reference benchmark prices tend to group together at least in pairs. Fur-
thermore, refineries are almost exclusively related to the same type of product,
gas producing refineries are connected to retail gas prices and diesel producing
refineries are connected to diesel retail prices. The Gulf Coast GAS refinery node
(Fig. 2) does not exactly meet the previous observation as it is shown influencing
some diesel products, even so, this might be a positive thing as it will alert an
attentive analyst to the weight behind the Gulf Coast refinery gas prices. After
further analysis Gulf Coast GAS is identified as the most influential node as it
has at least one detected event for all other products and its average edge weight
is the highest by a margin of 5%, probably due to huge oil production in this
area it is mostly the start of oil production chain. In [3], it was also observed
that specificities of gas usage in the Rocky Mountain retail gas price could trig-
ger third level dependences. Next the most glaring aspect of the graph is how
influential specific products are, the tree is not just an assorted web of relations
but groups of products aggregating around very influential/influenced products.
There are some expected trend setters like the OPECBasket that is used as a
benchmark for oil price, the Gulf Coast refineries and then some unexpected
like the Minnesota retail gas price. For other tested data and parameters, equal
graphs were observed. Indeed, color coding also showed very similar results with
strong groupings of colors and some few select products influencing groups of
others.

4 Conclusions

The presented case study, using real world data and deep analysis, aims to
provide an illustrative and useful example of Ramex-Forum: the signal-to-noise
ratio on the Petroleum production chain analysis already shows that sequential
Ramex-Forum in the Petroleum Production Chain 589

patterns of prices can provide a much deeper description of product depen-

dencies based on events. Moreover the δ and parameters seem both consis-
tent, intuitive and adaptable alternative for measuring long term dependencies
that are not directly possible with more instantaneous methods. So far only the
connections themselves have been considered, if the inﬂuence weights are also
taken into account the analysis becomes more complex. Future work in this area
should extend the number of products used and their detail. Some related studies
(namely [7]) show that less classical hybrid approaches can be used to comple-
ment the crossover event detection approach and are good candidates for future
experiments. Of particular interest will be to include a better characterization
of algorithm behavior during a global market crisis, namely by quantifying the
drives and consequences of the recent crisis in oil prices.

Acknowledgments. This research is supported by the GoBusiness Research project

(http://www.gobusinessfinance.ch/en/research). The authors would like to thank GoB-
usiness Finance for partial financial support and for data sets and financial knowledge
used in present work.

References
1. Borenstein, S., Shepard, A.: Sticky Prices, Inventories, and Market Power in Whole-
sale Gasoline Markets. NBER working paper series, vol. 5468. National Bureau of
Economic Research (1996)
2. Suviolahti, H.: The influence of volatile raw material prices on inventory valuation
and product costing. Master Thesis, Department of Business Technology, Helsinki
School of Economics (2009)
3. Tiple, P.: Tool for discovering sequential patterns in financial markets. Master Thesis
in Engenharia Informática, Faculdade de Ciências e Tecnologia da Universidade
Nova de Lisboa (2014)
4. Marques, N.C., Cavique, L.: Sequential pattern mining of price interactions. In:
Advances in Artificial Intelligence – Proceedings of the Workshop Knowledge Dis-
covery and Business Intelligence, EPIA-KDBI, Portuguese Conference on Artificial
Intelligence, pp. 314–325 (2013)
5. Gary, J., Handwerk, G.: Petroleum Refining. Institut français du pétrole publica-
tions. Taylor & Francis (2001)
6. Hammoudeh, S., Ewing, B.T., Thompson, M.A.: Threshold cointegration analysis
of crude oil benchmarks. The Energy Journal 29(4), 79–96 (2008)
7. Matos, D., Marques, N., Cardoso, M.: Stock market series analysis using self-
organizing maps. Revista de Ciências da Computação 9(9), 79–90 (2014)
Geocoding Textual Documents
Through a Hierarchy of Linear Classifiers

Fernando Melo and Bruno Martins(B)

Instituto Superior Técnico and INESC-ID, Universidade de Lisboa, Lisbon, Portugal

{fernando.melo,bruno.g.martins}@ist.utl.pt

Abstract. In this paper, we empirically evaluate an automated tech-

nique, based on a hierarchical representation for the Earth’s surface and
leveraging linear classiﬁers, for assigning geospatial coordinates to pre-
viously unseen documents, using only the raw text as input evidence.
We measured the results obtained with models based on Support Vector
Machines, over collections of geo-referenced Wikipedia articles in four dif-
ferent languages, namely English, German, Spanish and Portuguese. The
best performing models obtained state-of-the-art results, corresponding
to an average prediction error of 83 Kilometers, and a median error of
just 9 Kilometers, in the case of the English Wikipedia collection.

Keywords: Text mining · Document geocoding · Hierarchical text

classiﬁcation

1 Introduction
Geographical Information Retrieval (GIR) has recently captured the attention
of many different researchers that work in fields related to language processing
and to the retrieval and mining of relevant information from large document
collections. For instance, the task of resolving individual place references in tex-
tual documents has been addressed in several previous works, with the aim of
supporting subsequent GIR processing tasks, such as document retrieval or the
production of cartographic visualizations from textual documents [5,6]. How-
ever, place reference resolution presents several non-trivial challenges [8,9], due
to the inherent ambiguity of natural language discourse. Moreover, there are
many vocabulary terms, besides place names, that can frequently appear in the
context of documents related to specific geographic areas [1]. Instead of resolving
individual references to places, it may be interesting to instead study methods
for assigning entire documents to geospatial locations [1,11].
In this paper, we describe a technique for assigning geospatial coordinates of
latitude and longitude to previously unseen textual documents, using only the
raw text of the documents as evidence, and relying on a hierarchy of linear models
built with basis on a discrete hierarchical representation for the Earths surface,
known in the literature as the HEALPix approach [4]. The regions at each level of
this hierarchical representation, corresponding to equally-distributed curvilinear

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 590–596, 2015.
DOI: 10.1007/978-3-319-23485-4 59
Geocoding Textual Documents Through Hierarchical Classification 591

and quadrilateral areas of the Earths surface, are initially associated to textual
contents (i.e., we use all the documents from a training set that are known to refer
to particular geospatial coordinates, associating each text to the corresponding
region). For each level in the hierarchy, we build classification models using
the textual data, relying on a vector space model representation, and using the
quadrilateral areas as the target classes. New documents are assigned to the
most likely quadrilateral area, through the usage of the classifiers inferred from
training data. We finally assign documents to their respective coordinates of
latitude and longitude, taking the centroid coordinates from the quadrilateral
areas.
The proposed document geocoding technique was evaluated with samples of
geo-referenced Wikipedia documents in four different languages. We achieved an
average prediction error of 83 Kilometers, and a median error of just 9 Kilome-
ters, in the case of documents from the English Wikipedia. These results are
slightly better than those reported in previous state-of-the-art studies [11,12].

2 Previous and Related Work

While most work on geographic information retrieval relies on specific keywords
such as place names, Adams and Janowicz proposed an approach for geocod-
ing documents that uses only non-geographic expressions, concluding that even
ordinary textual terms may be good predictors of geographic locations [1]. The
proposed technique used Latent Dirichlet Allocation (LDA) to discover latent
topics from general vocabulary terms occurring in a training collection of geo-
referenced documents, together with Kernel Density Estimation (KDE) to inter-
polate a density surface, over each LDA topic. New documents are assigned to
the geospatial areas having the highest aggregate density, computed from the
per-document topic distributions and from the KDE surfaces.
Wing and Baldridge evaluated approaches for automatically geocoding docu-
ments based on their textual contents, specifically leveraging generative language
models learned from Wikipedia [11]. The authors applied a regular geodesic grid
to divide the Earth’s surface into discrete rectangular cells. Each cell can be
seen as a virtual document that concatenates all the training documents located
within the cell’s region. Three different methods were compared in the task of
finding the most similar cell, for a new document, namely (i) the Kullback-
Leibler divergence, (ii) naı̈ve Bayes, and (iii) a baseline method corresponding
to the average cell probability. Method (i) obtained the best results, i.e. a median
prediction error of just 11.8 Kilometers, and a mean error of 271 Kilometers, on
tests with documents taken from the English Wikipedia. More recently, Dias
et al. [2] reported on experiments with an adapted version of the method
described by Wing and Baldridge, which used language models based on char-
acter n-grams together with a discrete representation for the surface of the
Earth based on an equal-area hierarchical triangular mesh approach [3]. Another
improvement over the language modeling method was latter reported by Roller
et al. [7], where the authors collapsed nearby training documents through the
592 F. Melo and B. Martins

Fig. 1. Orthographic views associated to the ﬁrst four levels of the HEALPix sphere
tessellation.

usage of a k-d tree data structure. Moreover, Roller et al. proposed to assign the
centroid coordinates of the training documents contained in the most probable
cell, instead of just using the center point for the cell. These authors report
on a mean error of 181 Kilometers and a median error of 11 Kilometers, when
geocoding documents from the English Wikipedia.
More recently, Wing and Baldridge also reported on tests with discrimina-
tive classifiers [12]. To overcome the computational limitations of discriminative
classifiers, in terms of the maximum number of classes they can handle, the
authors proposed to leverage a hierarchical classification procedure that used
feature hashing and an efficient implementation of logistic regression. In brief,
the authors used an hierarchical approach in which the Earth’s surface is divided
according to a rectangular grid (i.e., using either a regular grid or a k-d tree),
and where an independent classifier is learned for every non-leaf node of the
hierarchy. The probability of any node in the hierarchy is the product of the
probabilities of that node and all of its ancestors, up to the root. The most
probable leaf node is used to infer the final geospatial coordinates. Rather than
greedily using the most probable node from each level, or rather than comput-
ing the probability of every leaf node, the authors used a stratified beam search.
This procedure starts at the root, keeping the b highest-probability nodes at
each level, until reaching the leafs. Wing and Baldridge report on results over
English Wikipedia data corresponding to a mean error of 168.7 Kilometers and
a median error of 15.3 Kilometers.

3 The Proposed Document Geocoding Method

The proposed document geocoding approach is based on discretizing the sur-

face of the Earth into hierarchically organized sets of regions, as given by the
HEALPix procedure and where each set corresponds to a different partition-
ing resolution. Having documents associated to these discrete regions allows us
to predict locations with standard discriminative classification approaches (e.g.,
with linear Support Vector Machines classifiers).
HEALPix is an acronym for Hierarchical Equal Area isoLatitude Pixelization
of a sphere, and the procedure results on a multi level recursive subdivision
Geocoding Textual Documents Through Hierarchical Classification 593

Table 1. Number of regions and approximate area for HEALPix grids of diﬀerent
resolutions.

Resolution 4 64 256 1024

Total number of regions 192 49,152 786,432 12,582,912
Approximate area of each region (Km2 ) 2,656,625 10,377 649 41

for a spherical approximation to the Earth’s surface, according to curvilinear

quadrilateral regions, in which each resulting subdivision covers an equal surface
area. Figure 3, adapted from an original illustration provided in the HEALPix
website1 , shows from left to right the resolution increase by three steps from the
base level with 12 different regions [4].
The HEALPix representation scheme contains a parameter Nside that con-
trols the resolution, i.e. the number of divisions along the side of a base-resolution
region that is needed to reach a desired high-resolution partition, and which
naturally will also define the area of the curvilinear quadrilaterals. In our exper-
iments, we used a hierarchy of 4 different representations with different resolu-
tions, equaling the Nside parameter to the values of 4, 64, 256 and 1024. Table 1
presents the total number of regions in each of the considered resolution levels
2
(i.e., n = 12×Nside ), together with the approximate area, in squared Kilometers,
corresponding to each region.
Another important question relates to the choice of how to represent the
textual documents. We used a vector space model representation, where each
document is seen as a vector of features. The feature weights in the vectors
that represent each document are given according to the term frequency times
inverse document frequency (TF-IDF) scheme, where the weight for a term i on
a document j can be computed as:

N
TF–IDFi,j = log2 (1 + TFi,j ) × log2 (1)
ni
In the formula, TFi,j is the term frequency for term i on document j, N is the
total number of documents in the collection, and ni is the number of documents
containing the term i. The TF-IDF weight is 0 if TFi,j = 0.
With the hierarchy of discrete representations given by the HEALPix
method, together with the document representations based on TF-IDF, we then
used linear classification algorithms to address the document geocoding task.
We trained a separate classification model for each node in the hierarchy of
discrete representations, taking all documents whose coordinates lay within the
region corresponding to each node, as the training data for each classifier. When
geocoding a test document, we first apply the root-level classifier to decide the
most likely region, and then proceed greedily by applying the classifier for each
of the most likely nodes, up to the leafs. After reaching a leaf region from the

1
http://healpix.jpl.nasa.gov
594 F. Melo and B. Martins

hierarchical representation, the geospatial coordinates of latitude and longitude

are assigned by taking the centroid coordinates of the leaf region.
Support Vector Machines (SVMs) are one of the most popular approaches
for learning classiﬁers from training data. In our experiments, we used the multi-
class linear SVM implementation from scikit-learn2 with default parameters
(e.g., with the default regularization constant), which in turn is a wrapper over
the LIBLINEAR3 package.

4 Experimental Validation
In our experiments, we used samples with geocoded articles from the English
(i.e., 847,783 articles), German (i.e., 307,859 articles), Spanish (i.e., 180,720 arti-
cles) and Portuguese (i.e., 131,085 articles) Wikipedias, taken from database
dumps produced in 2014. Separate experiments evaluated the quality of the doc-
ument geocoders built for each of the four languages, in terms of the distances
from the predictions towards the correct geospatial coordinates. We processed the
Wikipedia dumps to extract the raw text from the articles, and for extracting the
geospatial coordinates of latitude and longitude from the corresponding infoboxes.
We used 90% of the geocoded articles of each Wikipedia for model training, and
the other 10% for model validation.
In what regards the geospatial distributions of documents, we have that some
regions (e.g, North America or Europe) are considerable more dense in terms of
document associations than others (e.g, Africa), and that oceans and other large
masses of water are scarce in associations to Wikipedia documents. This implies
that the number of classes that has to be considered by our model is much smaller
than the theoretical number of classes given by the HEALPix procedure. In our
English dataset, there are a total of 286,966 regions containing associations to
documents at a resolution level of Nside = 1024, and a total of 82,574, 15,065,
and 190 regions, respectively at resolutions 256, 64, and 4. These numbers are
even smaller in the collections for the other languages.
Table 2 presents the obtained results for the different Wikipedia collections.
The prediction errors shown in Table 2 correspond to the distance in Kilome-
ters, computed through Vincenty’s geodetic formulae [10], from the predicted
locations to the true locations given in Wikipedia. The accuracy values corre-
spond to the relative number of times that we could assign documents to the
correct region (i.e., the HEALPix region where the document’s true geospatial
coordinates of latitude and longitude are contained), for each level of hierarchical
classification. Table 2 also presents upper and lower bounds for the average and
median errors, according to a 95% confidence interval and as measured through
a sampling procedure.
The results attest for the effectiveness of the proposed method, as we mea-
sured slightly inferior errors than those reported in previous studies [2,7,11,12],
which besides different classifiers also used simpler procedures for representing
2
http://scikit-learn.org/
3
http://www.csie.ntu.edu.tw/∼cjlin/liblinear/
Geocoding Textual Documents Through Hierarchical Classification 595

Table 2. The results obtained for each diﬀerent language.

Classiﬁer accuracy Errors in terms of distance

1st 2nd 3rd 4th Average Median
English 0.966 0.785 0.540 0.262 82.501 (±4.340) 8.874 [5.303 - 15.142]
German 0.972 0.832 0.648 0.396 62.995 (±5.753) 4.974 [3.615 - 8.199]
Spanish 0.950 0.720 0.436 0.157 165.887 (±16.675) 13.410 [8.392 - 22.691]
Portuguese 0.951 0.667 0.336 0.104 105.238 (±10.059) 21.872 [13.611 - 33.264]

textual contents and for representing the geographical space. It should nonethe-
less be noted that the datasets used in our tests may be slightly diﬀerent from
those used in previous studies (e.g., they were taken from diﬀerent Wikipedia
dumps), despite their similar origin.

5 Conclusions and Future Work

Through this work, we empirically evaluated a simple method for geo-referencing
textual documents, relying on a hierarchy of linear classifiers for assigning doc-
uments to their corresponding geospatial coordinates. We have shown that the
automatic identification of the geospatial location of a document, based only on
its text, can be performed with high accuracy by using out-of-the-box imple-
mentations of well-known supervised classification methods, and leveraging a
hierarchical procedure based on HEALPix [4].
Despite the interesting results, there are also many ideas for future work. The
geospatial coordinates estimated from our document geocoding procedure can
for instance be used as prior evidence (i.e., as document-level priors) to support
the resolution of individual place references in text [8]. In terms of future work,
we would also like to experiment with other types of classification approaches
and with different text representation and feature weighting schemes.

Acknowledgments. This work was supported by Fundação para a Ciência e a Tec-

nologia (FCT), through project grants with references EXCL/EEI-ESS/0257/2012
(DataStorm research line of excellency), EXPL/EEI-ESS/0427/2013 (KD-LBSN), and
also UID/CEC/50021/2013 (INESC-ID’s associate laboratory multi-annual funding).

References
1. Adams, B., Janowicz, K.: On the geo-indicativeness of non-georeferenced text. In:
Proceedings of the International AAAI Conference on Weblogs and Social Media
(2012)
2. Dias, D., Anastácio, I., Martins, B.: A language modeling approach for georefer-
encing textual documents. Actas del Congreso Español de Recuperación de Infor-
mación (2012)
3. Dutton, G.: Encoding and handling geospatial data with hierarchical triangular
meshes. In: Kraak, M.J., Molenaar, M., (eds.) Advances in GIS Research II. CRC
Press (1996)
596 F. Melo and B. Martins

4. Górski, K.M., Hivon, E., Banday, A.J., Wandelt, B.D., Hansen, F.K., Reinecke, M.,
Bartelmann, M.: HEALPIX - a framework for high resolution discretization, and
fast analysis of data distributed on the sphere. The Astrophysical Journal 622(2)
(2005)
5. Lieberman, M.D., Samet, H.: Multifaceted toponym recognition for streaming
news. In: Proceedings of the International ACM SIGIR Conference on Research
and Development in Information Retrieval (2011)
6. Mehler, A., Bao, Y., Li, X., Wang, Y., Skiena, S.: Spatial analysis of news sources.
IEEE Transactions on Visualization and Computer Graphics 12(5) (2006)
7. Roller, S., Speriosu, M., Rallapalli, S., Wing, B., Baldridge, J.: Supervised text-
based geolocation using language models on an adaptive grid. In: Proceedings of
the Conference on Empirical Methods on Natural Language Processing (2012)
8. Santos, J., Anastácio, I., Martins, B.: Using machine learning methods for disam-
biguating place references in textual documents. GeoJournal 80(3) (2015)
9. Speriosu, M., Baldridge, J.: Text-driven toponym resolution using indirect supervi-
sion. In: Proceedings of the Annual Meeting of the Association for Computational
Linguistics (2013)
10. Vincenty, T.: Direct and inverse solutions of geodesics on the ellipsoid with appli-
cation of nested equations. Survey Review XXIII(176) (1975)
11. Wing, B., Baldridge, J.: Simple supervised document geolocation with geodesic
grids. In: Proceedings of the Annual Meeting of the Association for Computational
Linguistics (2011)
12. Wing, B., Baldridge, J.: Hierarchical discriminative classiﬁcation for text-based
geolocation. In: Proceedings of the Conference on Empirical Methods on Natural
Language Processing (2014)
A Domain-Specific Language for ETL Patterns
Specification in Data Warehousing Systems

Bruno Oliveira() and Orlando Belo

Algoritmi R&D Centre, Department of Informatics, School of Engineering,

University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal
[email protected]

Abstract. During the last few years many research efforts have been done to
improve the design of ETL (Extract-Transform-Load) systems. ETL systems
are considered very time-consuming, error-prone and complex involving sever-
al participants from different knowledge domains. ETL processes are one of the
most important components of a data warehousing system that are strongly in-
fluenced by the complexity of business requirements, their changing and evolu-
tion. These aspects influence not only the structure of a data warehouse but also
the structures of the data sources involved with. To minimize the negative im-
pact of such variables, we propose the use of ETL patterns to build specific
ETL packages. In this paper, we formalize this approach using BPMN (Busi-
ness Process Modelling Language) for modelling more conceptual ETL
workflows, mapping them to real execution primitives through the use of a do-
main-specific language that allows for the generation of specific instances that
can be executed in an ETL commercial tool.

Keywords: Data warehousing systems · ETL conceptual modelling · ETL pat-

terns · BPMN specification models · Domain-Specific languages

1 Introduction

Commercial tools that support ETL (Extract, Transform, and Load) processes develop-
ment and implementation have a crucial impact in the implementation of any data ware-
housing system populating process. They provide the generation of very detailed models
under a specific methodology and notation. Usually, such kind of documentation follows
a proprietary format, which is intrinsically related to architectural issues of the develop-
ment tool. For that reason, ETL teams must have skills and experience on such tools that
allow for them to use and explore appropriately the tools. In the case of a migration proc-
ess of ETL migration to another ETL tool environment, the ETL development team will
need to understand all specificities provided by the new tool and start a new project often
almost from scratch. We believe that ETL systems development requires a simply and
reliable approach. A more abstract view of the processes and data structures is very con-
venient as well as a more effective mapping to some kind of execution primitives allow-
ing for its execution inside the environments of commercial tools. Using a pallet of spe-
cific ETL patterns representing some of the most used ETL tasks in real world applica-
© Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 597–602, 2015.
DOI: 10.1007/978-3-319-23485-4_60
598 B. Oliveira and O. Belo

tion scenarios – e.g. Surrogate Key Pipelining (SKP), Slowly Changing Dimensions
(SCD) or Change Data Capture (CDC) –, we designed a new ETL development layer on
top of a traditional method, making possible to use ETL tools from the very beginning of
the project, in order to plan and implement more appropriated ETL processes. To do that
we used the Business Process Modelling Notation (BPMN) [1] for ETL processes repre-
sentation, extending its original meta-model for including the ETL pattern specification
we designed. The inclusion of these patterns distinguishes clearly two very relevant as-
pects in ETL design and implementation: process flow coordination and data processing
tasks. BPMN is very suitable for this kind of processes, simply because it provides some
very convenient features like expressiveness and flexibility on the specification of proc-
esses. Thus, after a brief exposure of some related work (section 2), we present and dis-
cuss briefly a demonstration scenario using one of the most useful (and crucial) ETL
process: a Data Quality Enhancement (DQE) (section 3). Next, in section 4, we present a
DQE specification skeleton, its internal behaviour and how we can configure using a
Domain-Specific Language (DSL) to enable its execution. Finally, we discuss the ex-
periments done so far, analysing results and presenting some conclusions and future work
(section 5).

2 Related Work

With the exception of some low-level methods for ETL development [2], most ap-
proaches presented so far use conceptual or logical models as the basis for ETL mod-
elling. Such models reduce complexity, produce detailed documentation and provide
the ability to easily communicate with business users. Some of the proposals pre-
sented by Vassiliadis and Simitsis cover several aspects of ETL conceptual specifica-
tion [3], its representation using logical views [4, 5], and its implementation using a
specific ETL tool [6]. Later, Trujillo [7] and Muñoz [8] provided an UML extension
for ETL conceptual modelling, reducing some of the communication issues that the
proposal of Vassiliadis et al. revealed previously. However, the translation to execu-
tion primitives they made was not very natural, since UML is essentially used to de-
scribe system requirements and not to support its execution. The integration of exist-
ing organizational tasks with ETL processes was addressed by Wilkinson et al. [9],
which exposed a practical approach for the specification of ETL conceptual models
using BPMN. BPMN was firstly introduced by Akkaoui and Zimanyi [10] on ETL
systems specification. Subsequently, Akkaoui et al. [11] provided a BPMN-based
meta-model for an independent ETL modelling approach. They explored and dis-
cussed the bridges to a model-to-text translation, providing its execution in some ETL
commercial tools. Still using BPMN notation, Akkaoui et al. [12] provided a BPMN
meta-model covering two important architectural layers related to the specification of
ETL processes. More recently and following the same guidelines from previous
works, Akkaoui et al. [13] disposed a framework that allows for the translation of
abstract BPMN models to its concrete execution in a target ETL tool using model-to-
text transformations.
A Do
omain-Specific Language for ETL Patterns Specification 599

3 Pattern-Based ETL
E Modelling

We designed a high-level approach

a for ETL conceptual modelling using patterns. An
ETL pattern is a task class that
t is characterized using a set of pre-established activiities
(internal composition) and their correspondent input and output interfaces, i.e. hhow
the pattern interacts within a workflow context. Patterns avoid rewriting some of the
most repetitive tasks that are
a used regularly in an ETL system. The use of ETL ppat-
terns produces simpler con nceptual models, because finer grain tasks will be omittted
from the global ETL schem ma. The descriptive models provided by BPMN supporrted
all this. Using the ETL termminology, we defined three categories of ETL patterns: (1)
Gathers that represent typiccal data extraction processes; (2) Transformers, which are
used for data transformatioons such as cleaning or conforming tasks, and (3) Loadders,
which are used to load dataa into the data warehouse repository. With these categories
we can group all the mostt frequent ETL patterns and we can identify in a convven-
tional ETL system all its op
perational stages.

Fig. 1. A BPMN
B pattern specification for an ETL process

To demonstrate the appllication of ETL patterns in an ETL system solution we se-

lected a simple application case.
c We represent the whole process with the ETL patteerns
we proposed using the BPM MN notation, both in terms of the data orchestration and the
representation of compositee tasks. The process begins with a ‘Start Event’ havinng a
‘Timer’ representing an auto omatic execution of the process according to a specific ti
time
frame. This process starts with
w a parallel invocation of two data flows using a paraallel
d source, a CDC pattern was used for data extraction. For
gateway (Fig. 1). For each data
the first and second flow, tw
wo other CDC patterns acting over log files were used. Inn the
third flow we selected ano other CDC pattern for working with XML data structuures.
These tasks have the abilityy to identify new, updated, or deleted records, extracting and
putting them into a Data Sta aging Area (DSA). We used BPMN ‘Data input’ and ‘D Data
t input/output interfaces for each pattern. On the first two
Store’ objects to represent the
flows, we used a DQE patttern configured with some specific data cleaning and ddata
conforming procedures. Nex xt, the remaining two flows were synchronized using a ppar-
allel gateway, being data inntegrated using a data conciliation pattern (DCI) to idenntify
common records based on a specific set of conciliation rules. Posteriorly, a SKP patttern
600 B. Oliveira and O. Belo

assigns the surrogate keys to the data extracted, maintaining the correspondence meta-
data stored in specific mapping tables located in the DSA. After the SKP process
appears an IDL pattern that loads data into the data warehouse, establishing all the nec-
essary correspondences to the data warehouse schema.

4 Specification of a DQE Pattern

Usually, operational data are stored in specific data schemas built to serve particular
business needs. Independently from the sources involved (single or multiple), many prob-
lems can occur at schema and instance level when executing a loading data process [14].
Several procedures can be applied on transformation and cleaning tasks to eliminate
problems like these to avoid schema ambiguities, data inconsistences, data entry errors,
missing information or invalid data. To instantiate patterns a generator should know how
they must be created following a specific template. In particular, for ETL processes the
description of the structure of a pattern was studied already [15]. We do not intend to
describe patterns in a natural language but essentially using some descriptive primitives
to support their instantiation. For that, we divide ETL patterns specification in three parts:
1) input meta-data for pattern execution, which is composed by source schema(s), attrib-
ute(s) and the procedures that should be applied to the source data; 2) output meta-data,
which describes the output schemas in which output data will be stored; and 3) exception
handling, which represents the identification of unexpected scenarios and correspondent
policies to perform. To receive these configuration aspects, we developed a DSL for
configuring the components of ETL patterns. The use of a DSL facilitates pattern con-
figuration and provides the necessary meanings to make it suitable for computer interpre-
tation. One of the most common DQE procedures is the decomposition of a string in its
meaningful parts. The decomposition of an attribute typically involves several database
attributes such as a name or an address. Usually, these attributes need to be decomposed
in n parts according to a certain condition.

Use Decomposition[ Use Decomposition[

Header: Header: [...],
[Name: 'DecTask', Content: [...]
Description: 'Decomposition …'] Exception:
Content: [Event= EmptyAttribute,
[Id: AUTO, Action: Compensation:
Input[Schema='Sale', Quarantine[Table='CustomerQ'],
Attribute='FullName'] Log:[Name= 'CustomerLog',
Output[Schema= 'SaleT', Type= RelationalTable,
Attributes:[FirstName String; Details[DATE AS 'mydate',
LastName String;]] SOURCETABLE AS 'mytable',
Rule[Regex= '\\s', SOURCEATTRIBUTE AS myAttr']]]
Limit[FIRST, LAST]]] [Event= InvalidDecompositionRule,
Action: Error= Terminate […]]

Fig. 2. A decomposition procedure (left) and its exception handling description (right)

Fig. 2 shows an example of the DSL proposed for a typical decomposition proce-
dure used as sub-part of a DQE pattern that splits the customer full name in two new
attributes: ‘FirstName’ and ‘LastName’. The decomposition rule is performed based
on a regular expression: ‘\\s’, which means that an original string must be split using a
A Domain-Specific Language for ETL Patterns Specification 601

space as delimiter. The pattern configuration starts with the description of the pattern,
using the ‘Name’ and the ‘Description’ keywords inside of a ‘Header’ block. Blocks are
identified using square brackets delimiters. Next, a decomposition block is used in order
to describe the internal components of a decomposition pattern. This block must have an
internal ‘ID’, which can be manually or automatically assigned (‘AUTO’ keyword).
Next, three blocks are used: ‘Input’ block for input metadata, ‘Output’ block for output
metadata, and ‘Rule’ for a decomposition rule specification. Both input and output
blocks use a target data schema name storing the original/resulted data, and a collection
of attributes (and data types) used for each block. We distinguished single assignments
(a singular value) and composite assignments (composite data structures) using an equal
operator for atomic attributions and square brackets for composite attributions. The
‘Rule’ block describes the regular expression that should be applied to the original
string using the ‘Regex’ keyword. The ‘Limit’ keyword is used to control the number of
times that a pattern is applied affecting the length of the returned result set. Two special
identifiers (‘FIRST’ and ‘LAST’) are used to extract the first and last occurrence match-
ing the regular expression. The results of the output rule (‘FirstName’ and ‘LastName’)
are mapped to the ‘Output’ attributes. The DSL also includes some compensation and
error exception statements associated to each pattern. The compensation events provide
an alternative approach to handle a specific exception event, e.g. storing the non-
conform records in quarantine tables for later evaluation or applying automatic error.
The error events block or end the process execution. Fig. 2 (right) presents an exception
block with compensation and error policies associated. The ‘Exception’ block is formed
by three mandatory constructs: 1) Event that specifies an exception that may occur, e.g.
‘EmptyAttribute’ or ‘InvalidDecompositionRule’; 2) Action that identifies the action
that should be performed, e.g. record that started the exception can be stored to specific
quarantine table or can abort the workflow; and 3) Log activity that stores the exception
occurrences to a specific log file structure. With these domain-level instructions, it is
possible generating dynamically the instances following the language rules. For that, we
can implement code generators to translate the DSL to specifics formats supported by
ETL commercial tools.

5 Conclusions and Future Work

In this paper we showed how a typical ETL process can be represented exclusively
using ETL patterns on BPMN models, and how these patterns can be integrated in a
single ETL system package. To demonstrate their practical application, we selected and
discussed a DQE pattern, describing its internal composition and providing a specific
DSL for its configuration. From a conceptual modelling point of view, we consider that
ETL models should not include any kind of implementation infrastructure specification
or any criteria associated with its execution. All infrastructures that support the imple-
mentation of conceptual models are related to specific classes of users involving there-
fore the application of specific constructors. The BPMN provides this kind of abstrac-
tion, focusing essentially on the coordination of ETL patterns, promoting the reusability
of patterns across several systems, and making the system more robust to process
changes. Additionally, the DSL proposed dispose an effective way to formalize each
pattern configuration, allowing for its posterior mapping to a programing language such
602 B. Oliveira and O. Belo

as Java. Using a domain-level DSL it is possible to describe more naturally each part of
an ETL process without having the need to program each component. In the short term,
we intend to have an extended family of ETL patterns that will allows for building a
complete ETL system from scratch, covering all coordination and communication as-
pects as well as the description of all the tasks required to materialize it. Additionally, a
generic transformation plug-in for generating ETL physical schemas for data integration
tools is also planned.

References
1. OMG, Documents Associated With Business Process Model And Notation (BPMN) Ver-
sion 2.0 (2011)
2. Thomsen, C., Pedersen, T.B.: Pygrametl: a powerful programming framework for extract-
transform-load programmers. In: Proceeding of the ACM Twelfth International Workshop
on Data Warehousing and OLAP, DOLAP 2009, pp. 49–56 (2009)
3. Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual modeling for ETL processes. In:
Proceedings of the 5th ACM International Workshop on Data Warehousing and OLAP,
DOLAP 2002, pp. 14–21 (2002)
4. Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: On the logical modeling of ETL processes.
In: Pidduck, A., Mylopoulos, J., Woo, C.C., Ozsu, M. (eds.) CAiSE 2002. LNCS,
vol. 2348, pp. 782–786. Springer, Heidelberg (2002)
5. Simitsis, A., Vassiliadis, P.: A method for the mapping of conceptual designs to logical
blueprints for ETL processes. Decis. Support Syst. 45, 22–40 (2008)
6. Vassiliadis, P., Vagena, Z., Skiadopoulos, S., Karayannidis, N., Sellis, T.: Arktos: A Tool
for Data Cleaning and Transformation in Data Warehouse Environments. IEEE Data Eng.
Bull. 23(4), 42–47 (2000)
7. Luján-Mora, S., Trujillo, J., Song, I.-Y.: A UML profile for multidimensional modeling in
data warehouses. Data Knowl. Eng. 59, 725–769 (2006)
8. Trujillo, J., Luján-Mora, S.: A UML based approach for modeling ETL processes in data
warehouses. Concept. Model. 2813, 307–320 (2003)
9. Wilkinson, K., Simitsis, A., Castellanos, M., Dayal, U.: Leveraging business process mod-
els for ETL design. In: Parsons, J., Saeki, M., Shoval, P., Woo, C., Wand, Y. (eds.) ER
2010. LNCS, vol. 6412, pp. 15–30. Springer, Heidelberg (2010)
10. El Akkaoui, Z., Zimanyi, E.: Defining ETL worfklows using BPMN and BPEL. In: Pro-
ceedings of the ACM Twelfth International Workshop on Data Warehousing and OLAP,
DOLAP 2009, pp. 41–48 (2009)
11. El Akkaoui, Z., Zimànyi, E., Mazón, J.-N., Trujillo, J.: A model-driven framework for
ETL process development. In: Proceedings of the ACM 14th International Workshop on
Data Warehousing and OLAP, DOLAP 2011, pp. 45–52 (2011)
12. El Akkaoui, Z., Mazón, J.-N., Vaisman, A., Zimányi, E.: BPMN-based conceptual model-
ing of ETL processes. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2012. LNCS, vol. 7448,
pp. 1–14. Springer, Heidelberg (2012)
13. El Akkaoui, Z., Zimanyi, E., Mazon, J.-N., Trujillo, J.: A BPMN-based design and main-
tenance framework for ETL processes. Int. J. Data Warehous. Min. 9, 46 (2013)
14. Rahm, E., Do, H.: Data cleaning: Problems and current approaches. IEEE Data Eng. Bull.
23, 3–13 (2000)
15. Köppen, V., Brüggemann, B., Berendt, B.: Designing Data Integration: The ETL Pattern
Approach. Eur. J. Informatics Prof. XII (2011)
Optimized Multi-resolution Indexing and Retrieval
Scheme of Time Series

Muhammad Marwan Muhammad Fuad()

Forskningsparken 3, Institutt for Kjemi, NorStruct,

The University of Tromsø – The Arctic University of Norway, 9037 Tromsø, Norway
[email protected]

Abstract. Multi-resolution representation has been successfully used for index-

ing and retrieval of time series. In a previous work we presented Tight-MIR, a
multi-resolution representation method which speeds up the similarity search by
using distances pre-computed at indexing time. At query time Tight-MIR ap-
plies two pruning conditions to filter out non-qualifying time series. Tight-MIR
has the disadvantage of storing all the distances corresponding to all resolution
levels, even those whose pruning power is low. At query time Tight-MIR also
processes all stored resolution levels. In this paper we optimize the Tight-MIR
algorithm by enabling it to store and process only the resolution levels with the
maximum pruning power. The experiments we conducted on the new optimized
version show that it does not only require less storage space, but it is also faster
than the original algorithm.

Keywords: Multi-resolution indexing and retrieval · Optimization · Tight-MIR ·

Time series

1 Introduction

A time series is a chronological collection of observations. The particular nature of

these data makes them more appropriate to be handled as whole entities rather than
separate numeric observations. In the last decade a great deal of research was devoted
to the development of time series data mining because of its various applications in
finance, medicine, engineering, and other domains.
Time series are usually represented by Dimensionality Reduction Techniques which
map the time series onto low-dimension spaces where the query is processed.
Several dimensionality reduction techniques exist in the literature, of those we men-
tion: Piecewise Linear Approximation (PLA) [1], and Adaptive Piecewise Constant
Approximation (APCA) [2].
Multi-resolution dimensionality reduction techniques map the time series to several
spaces instead of one. In a previous work [3] we presented a multi-resolution indexing
and retrieval method of time series called Weak-MIR. Weak-MIR uses pre-computed
distances and two filters to speed up the similarity search. In [4] we presented another
multi-resolution indexing and retrieval method, MIR-X, which associates our multi-
resolution approach with another dimensionality reduction technique. In a third work
© Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 603–608, 2015.
DOI: 10.1007/978-3-319-23485-4_61
604 M.M. Muhammad Fuad

[5] we introduced Tight-MIR which has the advantages of the two previously men-
tioned methods. Tight-MIR, however, stores distances corresponding to all resolution
levels, even though some of them might have a low pruning power. In this paper we
present an optimized version of Tight-MIR which stores and processes only the reso-
lution levels with the maximum pruning power.
The rest if the paper is organized as follows: Section 2 is a background section. The
optimized version is presented in Section 3 and tested in Section 4. We conclude this
paper with Section 5.

2 Background

In [3] we presented Multi-resolution Indexing and Retrieval Algorithm (Weak-MIR).

The motivation behind this method is that traditional dimensionality reduction tech-
niques use a “one-resolution” approach to indexing and retrieval, where the dimen-
sion of the low-dimension space is selected at indexing time, so the performance of
the algorithm at query time depends completely on the choice made at indexing time.
But in practice, we do not necessarily know a priori the optimal dimension of the low-
dimension space.
Weak-MIR uses a multi-resolution representation of time series. During indexing
time the algorithm computes and stores distances corresponding to a number of reso-
lution levels, with lower resolution levels having lower dimensions. The algorithm
uses these pre-computed distances to speed up the retrieval process. The basis of
Weak-MIR is as follows: let be the original n-dimension space and be a 2 -
dimension space, where 2 . At indexing time each time series is divided
into segments each of which is approximated by a function (we used a first degree
polynomial in [3]) so that the approximation error between each segment and the
corresponding polynomial is minimal. The -dimension vector whose components are
the images of all the points of all the segments of a time series on that approximating
function is called the image vector and denoted by . The images of the two end
points of the segment are called the main image of that segment. The 2 main images
of each time series are the projection vector .
Weak-MIR uses two distances, the first is which is defined on a n-dimension
space, so it is the distance between two time series in the original space, i.e. , ,
or the distance between the original time series and its image vector, i.e. , .
The second distance is which is defined on a 2 -dimension space, so it is the
distance between two projection vectors, i.e. , . We proved in [3] that is
a lower bound of d when the Minkowski distance is used.
The resolution level is an integer related to the dimensionality of the reduced
space R. So the above definitions of the projection vector and the image vector can be
extended to further segmentation of the time series using different values . The
image vector and the projection vector at level are denoted by and ,
respectively.
Optimized Multi-resolution Indexing and Retrieval Scheme of Time Series 605

Given a query , , let , be the projection vectors of , , respectively, on

their approximating functions, where . By applying the triangle inequality we
get:

| , , | (1)

This relation represents a pruning condition which is the first filter of Weak-MIR.
By applying the triangle inequality again we get:

, , , (2)

This relation is the second filter of Weak-MIR.

In [4] we introduced MIR-X which combines a representation method with a multi-
resolution time series. MIR-X uses one of the two filters that Weak-MIR uses togeth-
er with the low-dimension distance of a time series dimensionality reduction
technique. We showed how MIR-X can boost the performance of Weak-MIR.
In [5] we presented Tight-MIR which has the advantages of both Weak-MIR and
MIR-X in that it is a standalone method, like Weak-MIR, yet it has the same competi-
tive performance of MIR-X. In Tight-MIR instead of using the projection vector to
construct the second filter, we access the raw data in the original space directly using
a number of points that corresponds to the dimensionality of the reduced space at that
resolution level. In other words, we use 2 raw points, instead of 2 main images, to
compute . There are several advantages to this modification; the first is that the
new is obviously tighter than as computed in [4]. The second is that when using
a Minkowski distance is a lower bound of the original distance in the original
space. The direct consequence of this is that the two distances , , ,
become redundant, so the second filter is overwritten by the usual lower bounding
condition , .
At indexing time the distances , are computed and stored. At
query time the algorithm starts at the lowest level and applies (1) to the first time
series in . If the time series is filtered out the algorithm moves to the next time se-
ries, if not, the algorithm applies equation (2). If all the time series in the database
have been pruned the algorithm terminates, if not, the algorithm moves to a higher
level.
Finally, after all levels have been exploited, we get a candidate answer set which
we then scan sequentially to filter out all the non-qualifying time series and return the
final answer set.

3 An Optimized Multi-resolution Indexing and Retrieval Scheme

The disadvantage of the indexing scheme presented in the previous section is that it is
“deterministic”, meaning that at indexing time the time series are indexed using a top-
down approach, and the algorithm behaves in a like manner at query time. If some
resolution levels have low utility in terms of pruning power, the algorithm will still
606 M.M. Muhammad Fuad

use the pre-computed distances related to these levels, and at query time these levels
will also be examined. Whereas the use of the first filter does not require any query
time distance evaluation, applying the second does include calculating distances and
thus we might be storing and calculating distances for little pruning benefit.
We propose in this paper an optimized multi-resolution indexing and retrieval
scheme. Taking into account that the time series to which we apply equations (1) and
(2) are those which have not been filtered out at lower resolution levels, this opti-
mized scheme should determine the optimal combination of resolution levels the algo-
rithm should keep at indexing time and consequently use at query time.
The optimization algorithm we use to solve this problem is the Genetic Algorithm.
The Genetic Algorithm (GA) is a famous evolutionary algorithm that has been ap-
plied to solve a variety of optimization problems. GA is a population-based global
optimization algorithm which mimics the rules of Darwinian selection in that weaker
individuals have less chance of surviving the evolution process than stronger ones.
GA captures this concept by adopting a mechanism that preserves the “good” features
during the optimization process.
In GA a population of candidate solutions (chromosomes) explores the search space
and exploits this by sharing information. These chromosomes evolve using genetic
operations (selection, crossover, mutation, and replacement).
GA starts by randomly initializing a population of chromosomes inside the search
space. The fitness function of these chromosomes is evaluated. According to the val-
ues of the fitness function new offspring chromosomes are generated through the
aforementioned genetic operations. The above steps repeat for a number of genera-
tions or until a predefined stopping condition terminates the GA.
The new algorithm, which we call Optimized Multi-Resolution Indexing and Re-
trieval – O-MIR, works as follow; we proceed in the same manner described for
Tight-MIR to produce candidate resolution levels. The next step is handled by the
optimizer to select resolution levels of the resolution levels, where these
levels provide the maximum pruning power. For the current version of our algorithm
the number of resolution levels to be kept, , is chosen by the user according to the
storage and processing capacity of the system. In other words, our algorithm will
decide which are the optimal resolution levels to be kept out of the resolution
levels produced by the indexing step.
Notice that when 1 we have one resolution level, which is the case with tradi-
tional dimensionality reduction techniques.
The optimization stage of O-MIR starts by randomly initializing a population of
chromosomes 〈 , ,…, 〉 where 1, … , and where
. Each chromosome represents a possible configuration of the resolution
levels to be kept. The fitness function of our optimization problem is the pruning
power of this configuration. As in [5], the performance criterion is based on the laten-
cy time concept presented in [6]. The latency time is calculated by the number of
cycles the processor takes to perform the different arithmetic operations (>,+ - ,*,abs,
sqrt) which are required to execute the similarity search query. This number for each
operation is multiplied by the latency time of that operation to get the total latency
time of the similarity search query. The latency time is 5 cycles for (>, + -), 1 cycle
Optimized Multi-reesolution Indexing and Retrieval Scheme of Time Series 607

for (abs), 24 cycles for (*)), and 209 cycles for (sqrt) [6]. The latency time for eeach
chromosome is the average of the latency time of random queries.
a
As with other GAs, our algorithm selects a percentage of chromosomes for
mating and mutates a perceentage of genes. The above steps repeat for
generations.

4 Experiments
We compared O-MIR with h Tight-MIR on similarity search experiments on differrent
time series datasets from diifferent time series repositories [7], and [18] using differrent
threshold values, and for different values of . Since the value of is related to the
value of , which in turn deepends on the length of the time series tested, we denote the
percentage of the resolution levels kept to the total resolution levels by ⁄ .
As for the parameters of the algorithm that we used in the experiments; the popuula-
tion size, , was 166, the number of generations was 100, the mutattion
rate, , was 0.2, the selection
s rate, , was 0.5, and the number of querries,
, was set to 10.
We show in Fig. 1 the reesults of our experiments. For (CBF), (Wafer), and (G Gun-
Point) we have 5 resoluttion levels ( 5). For these datasets we chose
40%, 60%, 80%. (motoCu urrent) has 8 resolution levels ( 8), we chose
25%, 50%, 75% As we can n see, the results are promising in terms of latency time and
storage space. For the threee first datasets O-MIR is faster than Tight-MIR and in adddi-
tion, it required less storagee space. This is also the case with (motoCurrent) exceptt for
20% where the laten ncy time for O-MIR was longer than that of Tight-M MIR.
However, for this value off the gain of storage space is substantial without m much
increase in latency time.

Fig. 1. Comparison of the lattency time between Tight-MIR and O-MIR on datasets (CB
BF),
(Wafer), (GunPoint) ), and (mo
otoCurrent) for different values of pk
608 M.M. Muhammad Fuad

5 Conclusion

In this paper we presented an optimized version of our previous Tight-MIR multi-

resolution indexing and retrieval method of time series. Whereas the original method
stores and processes all the resolution levels the indexing step produces, the new algo-
rithm, O-MIR, optimizes this process by applying the genetic algorithms to choose
the resolution levels with the maximum pruning power. The experiments we con-
ducted show that O-MIR is faster than Tight-MIR, and it also has the advantage of
requiring less storage space.
We believe that the main advantage of the new method is that it reduces the storage
space requirement of the original method which can be a burden when applying such
multi-resolution methods to large datasets.

References
1. Morinaka, Y., Yoshikawa, M., Amagasa, T., Uemura, S.: The L-index: an indexing struc-
ture for efficient subsequence matching in time sequence databases. In: Proc. 5th Pacific
Asia Conf. on Knowledge Discovery and Data Mining (2001)
2. Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Locally Adaptive Dimensionality
Reduction for Similarity Search in Large Time Series Databases. SIGMOD (2001)
3. Muhammad Fuad, M.M., Marteau P.F.: Multi-resolution approach to time series retrieval.
In: Fourteenth International Database Engineering & Applications Symposium– IDEAS
2010, Montreal, QC, Canada (2010)
4. Muhammad Fuad, M.M., Marteau P.F.: Speeding-up the similarity search in time series da-
tabases by coupling dimensionality reduction techniques with a fast-and-dirty filter. In:
Fourth IEEE International Conference on Semantic Computing– ICSC 2010, Carnegie Mel-
lon University, Pittsburgh, PA, USA (2010)
5. Muhammad Fuad, M.M., Marteau, P.F.: Fast retrieval of time series by combining a multi-
resolution filter with a representation technique. In: The International Conference on Ad-
vanced Data Mining and Applications–ADMA 2010, ChongQing, China, November 21,
2010
6. Schulte, M.J., Lindberg, M. Laxminarain, A.: Performance evaluation of decimal floating-
point arithmetic. In: IBM Austin Center for Advanced Studies Conference, February 2005
7. http://povinelli.eece.mu.edu/
8. Keogh, E., Zhu, Q., Hu, B., Hao, Y., Xi, X., Wei, L., Ratanamahatana, C.A.: The UCR
Time Series Classification/Clustering (2011). www.cs.ucr.edu/~eamonn/time_series_data/
Multi-agent Systems:
Theory and Applications
Minimal Change in Evolving Multi-Context
Systems

Ricardo Gonçalves(B) , Matthias Knorr, and João Leite

NOVA LINCS, Departamento de Informática, Faculdade Ciências e Tecnologia,

Universidade Nova de Lisboa, Caparica, Portugal
[email protected]

Abstract. In open environments, agents need to reason with knowledge

from various sources, possibly represented in different languages. The
framework of Multi-Context Systems (MCSs) offers an expressive, yet
flexible, solution since it allows for the integration of knowledge from dif-
ferent heterogeneous sources in an effective and modular way. However,
MCSs are essentially static as they were not designed for dynamic sce-
narios. The recently introduced evolving Multi-Context Systems (eMCSs)
extend MCSs by also allowing the system to both react to, and reason in
the presence of dynamic observations, and evolve by incorporating new
knowledge, thus making it even more adequate in Multi-Agent Systems
characterised by their dynamic and open nature.
In dynamic scenarios which admit several possible alternative evolutions,
the notion of minimal change has always played a crucial role in deter-
mining the most plausible choice. However, different KR formalisms –
as combined within eMCSs – may require different notions of minimal
change, making their study and their interplay a relevant highly non-
trivial problem. In this paper, we study the notion of minimal change in
eMCSs, by presenting and discussing alternative minimal change criteria.

1 Introduction
Open and dynamic environments create new challenges for knowledge repre-
sentation languages for agent systems. Instead of having to deal with a single
static knowledge base, each agent has to deal with multiple sources of distributed
knowledge possibly written in diﬀerent languages. These sources of knowledge
include the large number of available ontologies and rule sets, as well as the
norms and policies published by institutions, the information communicated by
other agents, to name only a few.
The need to incorporate in agent-oriented programming languages the ability
to represent and reason with heterogeneous distributed knowledge sources, and
the ﬂow of information between them, has been pointed out in [1–4], although
a general adequate practical solution is still not available.
Recent literature in knowledge representation and reasoning contains several
proposals to combine heterogeneous knowledge bases, one of which – Multi-
Context Systems (MCSs) [5–7] – has attracted particular attention because it

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 611–623, 2015.
DOI: 10.1007/978-3-319-23485-4 62
612 R. Gonçalves et al.

provides an elegant solution by considering each source of knowledge as a mod-

ule and then providing means to model the interaction between these modules.
More specifically, an MCS consists of a set of contexts, each of which is a knowl-
edge base in some KR formalism, such that each context can access information
from the other contexts using so-called bridge rules. Such non-monotonic bridge
rules add their heads to the context’s knowledge base provided the queries (to
other contexts) in their bodies are successful. Managed Multi-Context Systems
(mMCSs) [8] extend MCSs by allowing operations, other than simple addition,
to be expressed in the heads of bridge rules, thus allowing mMCSs to properly
deal with the problem of consistency management within contexts. MCSs have
gained some attention by agent developers [9–11].
Whereas mMCSs are quite general and flexible to address the problem of
integration of different KR formalisms, they are essentially static in the sense
that the contexts do not evolve to incorporate the changes in dynamic scenarios.
In such scenarios, new knowledge and information is dynamically produced, often
from several different sources – e.g., a stream of raw data produced by some
sensors, new ontological axioms written by some user, newly found exceptions
to some general rule, observations, etc.
Evolving Multi-Context Systems (eMCSs) [12] inherit from mMCSs the abil-
ity to integrate and manage knowledge represented in heterogeneous KR for-
malisms, and at the same time are able to react to dynamic observations, and
evolve by incorporating knowledge. The semantics of eMCSs is based on the sta-
ble model semantics, and allows alternative models for a given evolution, in the
same way as answer sets represent alternative solutions to a given ASP program.
One of the main principles of belief revision is minimal change, which in
case of eMCSs means that information should be maintained by inertia unless it
is required to change. In dynamic scenarios where systems can have alternative
evolutions, it is thus desirable to have some minimal change criteria to be able to
compare possible alternatives. This problem is particularly interesting and non-
trivial in dynamic frameworks based on MCSs, because of the heterogeneity of
KR frameworks that may exist in an MCS – each of which may require different
notions of minimal change –, and also because the evolution of such systems is
based not only on the semantics, but also on the evolution of the knowledge base
of each context.
In this paper, we study minimal change in eMCSs, by presenting different
minimal change criteria to be applied to the possible evolving equilibria of an
eMCS, and by discussing the relation between them.
The remainder of this paper is as follows. We introduce the framework of
eMCSs in Sect. 2. Then, we present and study some minimal change criteria in
eMCSs in Sect. 3, and conclude with a discussion of related work and possible
future directions in Sect. 4.
Minimal Change in Evolving Multi-Context Systems 613

2 Evolving Multi-Context Systems

In this section, we revisit evolving Multi-Context Systems as introduced in [12],
which generalize mMCSs [8] to dynamic scenarios in which contexts are enabled
to react to external observations and evolve.
An evolving multi-context system (eMCS) consists of a collection of compo-
nents, each of which contains knowledge represented in some logic, defined as
a triple L = KB, BS, ACC where KB is the set of well-formed knowledge
bases of L, BS the set of possible belief sets, and ACC : KB → 2BS a func-
tion describing the semantics of L by assigning to each knowledge base a set of
acceptable belief sets. We assume that each element of KB and BS is a set, and
define F = {s : s ∈ kb ∧ kb ∈ KB}.
In addition to the knowledge base in each component, bridge rules are used
to interconnect the components, specifying what operations to perform on its
knowledge base given certain beliefs held in the components of the eMCS. For
that purpose, each component of an eMCS is associated with a management
base, which is a set of operations that can be applied to the possible knowledge
bases of that component. Given a management base OP and a logic L, let
OF = {op(s) : op ∈ OP ∧ s ∈ F} be the set of operational formulas over OP and
L. Each component of an eMCS gives semantics to operations in its management
base using a management function over a logic L and a management base OP ,
mng : 2OF × KB → (2KB \ {∅}), i.e., mng(op, kb) is the (non-empty) set of
knowledge bases that result from applying the operations in op to the knowledge
base kb. We assume that mng(∅, kb) = {kb}.
In an eMCS some contexts are assumed to be observation contexts whose
knowledge bases will be constantly changing over time according to the observa-
tions made, similar, e.g., to streams of data from sensors.1 The changing obser-
vations will then affect the other contexts by means of the bridge rules. As
we will see, such effect can either be instantaneous and temporary, i.e., lim-
ited to the current time instant, similar to (static) mMCSs, where the body
of a bridge rule is evaluated in a state that already includes the effects of the
operation in its head, or persistent, but only affecting the next time instant.
To achieve the latter, the operational language is extended with a unary meta-
operation next that can only be applied on top of operations. Given a manage-
ment base OP and a logic L, we define eOF , the evolving operational language,
as eOF = OF ∪ {next(op(s)) : op(s) ∈ OF }.
The idea of observation contexts is that each such context has a language
describing the set of possible observations of that context, along with its cur-
rent observation. The elements of the language of the observation contexts can
then be used in the body of bridge rules to allow contexts to access the observa-
tions. Formally, an observation context is a tuple O = ΠO , π where ΠO is the
observation language of O and π ⊆ ΠO is its current observation.
We can now define evolving Multi-Context Systems (eMCS).

1
For simplicity of presentation, we consider discrete steps in time here.
614 R. Gonçalves et al.

Definition 1. An eMCS is a sequence Me = C1 , . . . , Cn , O1 , . . . , O , such

that, for each i ∈ {1, . . . , }, Oi = ΠOi , πi is an observation context,
and, for each i ∈ {1, . . . , n}, Ci is an evolving context deﬁned as Ci =
Li , kb i , br i , OPi , mngi where
– Li = KBi , BSi , ACCi is a logic
– kbi ∈ KBi
– br i is a set of bridge rules of the form
H(σ) ← a1 , . . . , ak , not ak+1 , . . . , not an (1)
such that H(σ) ∈ eOFi , and each ai , i ∈ {1, . . . , n}, is either of the form
(r : b) with r ∈ {1, . . . , n} and b a belief formula of Lr , or of the form (r@b)
with r ∈ {1, . . . , } and b ∈ ΠOr
– OPi is a management base
– mngi is a management function over Li and OPi .

Given an eMCS Me = C1 , . . . , Cn , O1 , . . . , O we denote by KBMe the set of

knowledge base configurations for Me , i.e., KBMe = {k1 , . . . , kn : ki ∈ KBi for
each 1 ≤ i ≤ n}. A belief state for Me = C1 , . . . , Cn , O1 , . . . , O is a sequence
S = S1 , . . . , Sn such that, for each 1 ≤ i ≤ n, we have Si ∈ BSi . We denote
by BSMe the set of belief states for Me .
An instant observation for Me is a sequence O = o1 , . . . , o such that, for
each 1 ≤ i ≤ , we have that oi ⊆ ΠOi .
Given a belief state S = S1 , . . . , Sn for Me and an instant observation
O = o1 , . . . , o for Me , we define the satisfaction of bridge literals of the form
(r : b) as S, O |= (r : b) if b ∈ Sr and S, O |= not (r : b) if b ∈ / Sr . The satisfaction
of bridge literal of the form (r@b) depends on the current observations, i.e., we
have that S, O |= (r@b) if b ∈ or and S |= not (r@b) if b ∈ / or . For a set B of
bridge literals, we have that S, O |= B if S, O |= L for every L ∈ B.
We say that a bridge rule σ of a context Ci is applicable given a belief state
S and an instant observation O if its body is satisfied by S and O, i.e., S, O |=
B(σ). We denote by appi (S, O) the set of heads of bridges rules of the context
Ci which are applicable given the belief state S and the instant observation O.
Recall that the heads of bridge rules in an eMCS may be of two types: those
that contain next and those that do not. The former are to be applied to the
current knowledge base and not persist, whereas the latter are to be applied in
the next time instant and persist. Therefore, we distinguish these two subsets.

Definition 2. Let Me = C1 , . . . , Cn , O1 , . . . , O be an eMCS, S a belief state

for Me , and O an instant observation for Me . Then, for each 1 ≤ i ≤ n, consider
the following sets:
– appnext
i (S, O) = {op(s) : next(op(s)) ∈ appi (S, O)}
– appnow
i (S, O) = {op(s) : op(s) ∈ appi (S, O)}

If we want an eﬀect to be instantaneous and persistent, this can be achieved

using two bridge rules with identical body, one with and one without next.
Similar to equilibria in mMCS, the (static) equilibrium is deﬁned to incor-
porate instantaneous eﬀects based on appnow
i (S, O) alone.
Minimal Change in Evolving Multi-Context Systems 615

Definition 3. Let Me = C1 , . . . , Cn , O1 , . . . , O be an eMCS, and O an

instant observation for Me . A belief state S = S1 , . . . , Sn for Me is an equi-
librium of Me given O iff for each 1 ≤ i ≤ n, Si ∈ ACCi (kb) for some
kb ∈ mngi (appnow
i (S, O), kbi ).
To be able to assign meaning to an eMCS evolving over time, we introduce
evolving belief states, which are sequences of belief states, each referring to a
subsequent time instant.
Definition 4. Let Me = C1 , . . . , Cn , O1 , . . . , O be an eMCS. An evolving
belief state of size s for Me is a sequence Se = S 1 , . . . , S s where each S j ,
1 ≤ j ≤ s, is a belief state for Me .
To enable eMCSs to react to incoming observations and evolve, a sequence of
observations (defined below) has to be processed. The idea is that the knowledge
bases of the observation contexts Oi change according to that sequence.
Definition 5. Let Me = C1 , . . . , Cn , O1 , . . . , O be an eMCS. A sequence of
observations for Me is a sequence Obs = O1 , . . . , Om , such that, for each
1 ≤ j ≤ m, Oj = oj1 , . . . , oj is an instant observation for Me , i.e., oji ⊆ ΠOi
for each 1 ≤ i ≤ .
To be able to update the knowledge bases and the sets of bridge rules of the
evolving contexts, we need the following notation. Given an evolving context
Ci , and a knowledge base k ∈ KBi , we denote by Ci [k] the evolving context in
which kbi is replaced by k, i.e., Ci [k] = Li , k, bri , OPi , mngi . For an observation
context Oi , given a set π ⊆ ΠOi of observations for Oi , we denote by Oi [π] the
observation context in which its current observation is replaced by π, i.e., Oi [π] =
ΠOi , π. Given K = k1 , . . . , kn ∈ KBMe a knowledge base configuration for
Me , we denote by Me [K] the eMCS C1 [k1 ], . . . , Cn [kn ], O1 , . . . , O .
We now define when certain evolving belief states are evolving equilibria of
an eMCS Me given a sequence of observations Obs = O1 , . . . , Om for Me . The
intuitive idea is that, given an evolving belief state Se = S 1 , . . . , S s for Me , in
order to check if Se is an evolving equilibrium, we need to consider a sequence of
eMCSs, M 1 , . . . , M s (each with observation contexts), representing a possible
evolution of Me according to the observations in Obs, such that each S j is a
(static) equilibrium of M j . The current observation of each observation context
Oi in M j is exactly its corresponding element oji in Oj . For each evolving context
Ci , its knowledge base in M j is obtained from the one in M j−1 by applying the
operations in appnexti (S j−1 , Oj−1 ).
Definition 6. Let Me = C1 , . . . , Cn , O1 , . . . , O be an eMCS, Se =
S 1 , . . . , S s an evolving belief state of size s for Me , and Obs = O1 , . . . , Om
an observation sequence for Me such that m ≥ s. Then, Se is an evolving equi-
librium of size s of Me given Obs iff, for each 1 ≤ j ≤ s, the belief state S j is
an equilibrium of M j = C1 [k1j ], . . . , Cn [knj ], O1 [oj1 ], . . . , O [oj ] where, for each
1 ≤ i ≤ n, kij is defined inductively as follows:
– ki1 = kbi
– kij+1 ∈ mngi (appnext (S j , Oij ), kij ).
616 R. Gonçalves et al.

3 Minimal Change
In this section, we discuss some alternatives for the notion of minimal change in
eMCSs. What makes this problem interesting is that there are different param-
eters that we may want to minimize in a transition from one time instant to the
next one. In the following discussion we focus on two we deem most relevant: the
operations that can be applied to the knowledge bases, and the distance between
consecutive belief states.
We start by studying minimal change at the level of the operations. In the
following discussion we consider fixed an eMCS Me = C1 , . . . , Cn , O1 , . . . , O .
Recall from the definition of evolving equilibrium that, in the transition
between consecutive time instants, the knowledge base of each context Ci of
Me changes according to the operations in appnext i (S, O), and these depend on
the belief state S and the instant observation O. The first idea to compare ele-
ments of this set of operations is to, for a fixed instant observation O, distinguish
those equilibria of Me which generate a minimal set of operations to be applied
to the current knowledge bases to obtain the knowledge bases of the next time
instant. Formally, given a knowledge base configuration K ∈ KBMe and an
instant observation O for Me , we can define the set:
M inEq(K, O) = {S : S is an equilibrium of Me [K] given O and there is no
equilibrium S of Me [K] given O such that, for all 1 ≤ i ≤ n,
appnext
i (S , O) ⊂ appnext
i (S, O)}
This first idea of comparing equilibria based on inclusion of the sets of oper-
ations can, however, be too strict in most cases. Moreover, different operations
usually have different costs,2 and it may well be that, instead of minimizing
based on set inclusion, we want to minimize the total cost of the operations to
be applied. For that, we need to assume that each context has a cost function
over the set of operations, i.e., costi : OPi → N, where costi (op) represents the
cost of performing operation op.
Let S be a belief state for Me and O an instant observation for Me . Then,
for each 1 ≤ i ≤ n, we define the cost of the operations to be applied to obtain
the knowledge base of the next time instant as:

Costi (S, O) = costi (op)
op(s)∈appnext
i (S,O)

Summing for all evolving contexts, we obtain the global cost of S given O:
n
Cost(S, O) = Costi (S, O)
i=1
Now that we have defined a cost function over belief states, we can define a
minimization function over possible equilibria of eMCS Me [K] for a fixed knowl-
edge base configuration K ∈ KBMe . Formally, given O an instant observation for
Me , we define the set of equilibria of Me [K] given O which minimize the global
2
We use the notion of cost in an abstract sense, i.e., depending on the context, it may
refer to, e.g., the computational cost of the operation, or its economic cost.
Minimal Change in Evolving Multi-Context Systems 617

cost of the operations to be applied to obtain the knowledge base conﬁguration

of the next time instant as:
M inCost(K, O) = {S : S is an equilibrium of Me [K] given O and
there is no equilibrium S of Me [K] given O
such that Cost(S , O) < Cost(S, O)}
Note that, instead of using a global cost, we could have also considered a
more fine-grained criterion by comparing costs for each context individually,
and define some order based on these comparisons. Also note that the particular
case of taking costi (op) = 1 for every i ∈ {1, . . . , n} and every op ∈ OPi , captures
the scenario of minimizing the total number of operations to be applied.
The function M inCost allows for the choice of those equilibria that are min-
imal with respect to the operations to be performed to the current knowledge
base configuration in order to obtain the knowledge base configuration of the
next time instant. Still, for each choice of an equilibrium S, we have to deal with
the existence of several alternatives in the set mngi (appnext i (S, O), kbi ). Our aim
now is to discuss how we can apply some notion of minimal change that allows
us to compare the elements in mngi (appnext i (S, O), kbi ). The intuitive idea is to
compare the distance between the current equilibria and the possible equilibria
resulting from the elements in mngi (appnext i (S, O), kbi ). Of course, given the pos-
sible heterogeneity of contexts in an eMCS, we cannot assume a global notion of
distance between belief sets. Therefore, we assume that each evolving context has
its own distance function between its beliefs sets. Formally, for each 1 ≤ i ≤ n,
we assume the existence of a distance function di , i.e., di : BSi × BSi → R
satisfying for all S1 , S2 , S3 ∈ BSi :
1. di (S1 , S2 ) ≥ 0
2. di (S1 , S2 ) = 0 iff S1 = S2
3. di (S1 , S2 ) = di (S2 , S1 )
4. di (S1 , S3 ) ≤ di (S1 , S2 ) + di (S2 , S3 )
There are some alternatives to extend the distance function of each context
to a distance function between belief states. In the following we present two
natural choices. One is to consider the maximal distance between belief sets of
each context. The other is to consider the average of distances between belief
sets of each context. Formally, given S 1 and S 2 belief states of Me we define two
functions dmax : BSMe × BSMe → R and davg : BSMe × BSMe → R as follows:

dmax (S 1 , S 2 ) = M ax{di (Si1 , Si2 ) | 1 ≤ i ≤ n}

n
di (Si1 , Si2 )
davg (S 1 , S 2 ) = i=1
n
We can prove that dmax and davg are distance functions between belief states.

Proposition 1. The functions dmax and davg deﬁned above are both distance
functions, i.e., satisfy the axioms 1) - 4).
618 R. Gonçalves et al.

We now study how we can use one of these distance functions between belief
states to compare the possible alternatives in the sets mngi (appnext i (S, O), kbi ),
for each 1 ≤ i ≤ n. Recall that the intuitive idea is to minimize the distance
between the current belief state S and the possible equilibria that each element
of mngi (appnext
i (S, O), kbi ) can give rise to. We explore here two options, which
differ on whether the minimization is global or local. The idea of global minimiza-
tion is to choose only those knowledge base configurations k1 , . . . , kn ∈ KBMe
with ki ∈ mngi (appnext i (S, O), kbi ), which guarantee minimal distance between
the original belief state S and the possible equilibria of the obtained eMCS.
The idea of local minimization is to consider all possible tuples k1 , . . . , kn
with ki ∈ mngi (appnext i (S, O), kbi ), and only apply minimization for each such
choice, i.e., for each such knowledge base configuration we only allow equilibria
with minimal distance from the original belief state.
We first consider the case of pruning those tuples k1 , . . . , kn such that
ki ∈ mngi (appnexti (S, O), kbi ), which do not guarantee minimal change with
respect to the original belief state. We start by defining an auxiliary function.
Let S be a belief state for Me , K = k1 , . . . , kn ∈ KBMe a knowledge base
configuration for Me , and O = o1 , . . . , o an instant observation for Me . Then
we define the set of knowledge base configurations that are obtained from K
given the belief state S and the instant observation O as:
N extKB(S, O, k1 , . . . , kn ) = {k1 , . . . , kn ∈ KBMe : for each 1 ≤ i ≤ n
we have that ki ∈ mngi (appnext
i (S, O), ki )}
For each choice d of a distance function between belief states, we define the set of
knowledge base configurations that minimize the distance to the original belief
state. Let S be a belief state for Me , K = k1 , . . . , kn ∈ KBMe a knowledge
base configuration for Me , and Oj and Oj+1 instant observations for Me .
M inN ext(S, Oj , Oj+1 , K) = {(S ,K ) : K ∈ N extKB(S, Oj , K) and
S ∈ M inCost(Me [K ], Oj+1 ) s.t. there is no
K ∈ N extKB(S, Oj , K) and no
S ∈ M inCost(Me [K ], Oj+1 ) with
d(S, S ) < d(S, S )}.
Note that M inN ext applies minimization over all possible equilibria resulting
from every element of N extKB(S, Oj , K). Using M inN ext, we can now define
a minimal change criterion to be applied to evolving equilibria of Me .

Definition 7. Let Me = C1 , . . . , Cn , O1 , . . . , O be an eMCS, Obs =

O1 , . . . , Om an observation sequence for Me , and let Se = S 1 , . . . , S s be
an evolving equilibrium of Me given Obs. We assume that K 1 , . . . , K s , with
K j = k1j , . . . , knj , is the sequence of knowledge base configurations associated
with Se as in Definition 6. Then, Se satisfies the strong minimal change criterion
for Me given Obs if, for each 1 ≤ j ≤ s, the following conditions are satisfied:
– S j ∈ M inCost(Me [K j ], Oj )
– (S j+1 , K j+1 ) ∈ M inN ext(S j , Oj , Oj+1 , K j )
Minimal Change in Evolving Multi-Context Systems 619

We call this minimal change criterion the strong minimal change criterion
because it applies minimization over all possible equilibria resulting from every
possible knowledge base configuration in N extKB(S, Oj , K).
The following proposition states the desirable property that the existence of
an equilibrium guarantees the existence of an equilibrium satisfying the strong
minimal change criterion. We should note that this is not a trivial statement
since we are combining minimization of two different elements: the cost of the
operations and the distance between belief states. This proposition in fact follows
from their careful combination in the definition of M inN ext.
Proposition 2. Let Obs = O1 , . . . , Om be an observation sequence for Me . If
Me has an evolving equilibrium of size s given Obs, then at least one evolving
equilibrium of size s given Obs satisfies the strong minimal change criterion.
Note that in the definition of the strong minimal change criterion, the knowl-
edge base configurations K ∈ N extKB(S j , Oj , K j ), for which the corresponding
possible equilibria are not at a minimal distance from S j , are not considered. How-
ever, there could be situations in which this minimization criterion is too strong.
For example, it may well be that all possible knowledge base configurations in
N extKB(S j , Oj , K j ) are important, and we do not want to disregard any of them.
In that case, we can relax the minimization condition by applying minimization
individually for each knowledge base configuration in N extKB(S j , Oj , K j ). The
idea is that, for each fixed K ∈ N extKB(S j , Oj , K j ) we choose only those equi-
libria of Me [K] which minimize the distance to S j .
Formally, let S be a belief state for Me , K ∈ KBMe a knowledge base
configuration for Me , and O an instant observation for Me . For each distance
function d between belief states, we can define the following set:
M inDist(S, O, K) ={S : S ∈ M inCost(Me [K], O) and
there is no S ∈ M inCost(Me [K], O)
such that d(S, S ) < d(S, S )}
Using this more relaxed notion of minimization we can define an alternative
weaker minimal change criterion to be applied to evolving equilibria of an eMCS.
Definition 8. Let Me = C1 , . . . , Cn , O1 , . . . , O be an eMCS, Obs =
O1 , . . . , Om an observation sequence for Me , and Se = S 1 , . . . , S s an evolv-
ing equilibrium of Me given Obs. We assume that K 1 , . . . , K s , with K j =
k1j , . . . , knj , is the sequence of knowledge base configurations associated with Se
as in Definition 6. Then, Se satisfies the weak minimal change criterion of Me
given Obs, if for each 1 ≤ j ≤ s the following conditions are satisfied:
– S j ∈ M inCost(Me [K j ], Oj )
– S j+1 ∈ M inDist(S j , K j+1 , Oj+1 )
We can now prove that the existence of an evolving equilibrium implies the
existence of an equilibrium satisfying the weak minimal change criterion. Again
note that the careful combination of the two minimizations – cost and distance
– in the definition of M inDist is fundamental to obtain the following result.
620 R. Gonçalves et al.

Proposition 3. Let Obs = O1 , . . . , Om be an observation sequence for Me . If

Me has an evolving equilibrium of size s given Obs, then at least one evolving
equilibrium of size s of Me given Obs satisﬁes the weak minimal change criterion.

We can now prove that the strong minimal change criterion is, in fact,
stronger than the weak minimal change criterion.

Proposition 4. Let Me be an eMCS, Obs = O1 , . . . , Om an observation

sequence for Me , and Se = S 1 , . . . , S s an evolving equilibrium of Me given
Obs. If Se satisﬁes the strong minimal change criterion of Me given Obs, then
Se satisﬁes the weak minimal change criterion of Me given Obs.

4 Related and Future Work

In this paper we have studied the notion of minimal change in the context of
the dynamic framework of eMCSs [12]. We have presented and discussed some
alternative definitions of minimal change criteria for evolving equilibria of an
eMCS.
Closely related to eMCSs is the framework of reactive Multi-Context Sys-
tems (rMCSs) [13–15] inasmuch as both aim at extending mMCSs to cope with
dynamic observations. The key difference between them is that the operator next
of eMCSs allows for a clear separation between persistent and non-persistent
effects, and the specification of transitions based on the current state.
Another framework closely related to eMCSs is that of evolving logic pro-
grams EVOLP [16] which deals with updates of generalized logic programs, and
the two frameworks of reactive ASP, one implemented as a solver oclingo [17]
and one described in [13]. Whereas EVOLP employs an update predicate that
is similar in spirit to the next predicate of eMCSs, it does not deal with het-
erogeneous knowledge, neither do both versions of Reactive ASP. Moreover, no
notion of minimal change is studied for these frameworks.
This work raises several interesting paths for future research. Immediate
future work includes the study of more global approaches to the minimization
of costs of operations, namely by considering the global cost of an evolving equi-
librium, instead of minimizing costs at each time instant. A topic worth investi-
gating is how to perform AGM-style belief revision at the (semantic) level of the
equilibria, as in Wang et al [18], though necessarily different since knowledge is
not incorporated in the contexts. Also interesting is to study a paraconsistent
version of eMCSs, grounded on the work in [19] on paraconsistent semantics
for hybrid knowledge bases. Another important issue open for future work is
a more fine-grained characterization of updating bridge rules (and knowledge
bases) as studied in [20] in light of the encountered difficulties when updating
rules [21–23] and the combination of updates over various formalisms [22,24].
Also, as already outlined in [25,26], we can consider generalized notions of min-
imal and grounded equilibria [5] for eMCSs to avoid, e.g., self-supporting cycles
introduced by bridge rules, or the use of preferences to deal with several evolving
equilibria an eMCS can have for the same observation sequence. Also interesting
Minimal Change in Evolving Multi-Context Systems 621

is to apply the ideas in this paper to study the dynamics of frameworks closely
related to MCSs, such as those in [27–30].
Finally, and in line with the very motivation set out in the introduction, we
believe that the research in MCSs – including eMCSs with the diﬀerent notions
of minimal change – provides a blue-print on how to represent and reason with
heterogeneous dynamic knowledge bases which could (should) be used by devel-
opers of practical agent-oriented programming languages, such as JASON [31],
2APL [32], or GOAL [33], in their quest for providing users and programmers
with greater expressiveness and ﬂexibility in terms of the knowledge representa-
tion and reasoning facilities provided by such languages. To this end, an appli-
cation scenario that could provide interesting and rich examples would be that
of norm-aware multi-agent systems [34–39].

Acknowledgments. We would like to thank the referees for their comments, which
helped improve this paper. R. Gonçalves, M. Knorr and J. Leite were partially sup-
ported by FCT under project ERRO (PTDC/EIA-CCO/121823/2010) and under
strategic project NOVA LINCS (PEst/UID/CEC/04516/2013). R. Gonçalves was par-
tially supported by FCT grant SFRH/BPD/100906/2014 and M. Knorr was partially
supported by FCT grant SFRH/BPD/86970/2012.

References
1. Dastani, M., Hindriks, K.V., Novák, P., Tinnemeier, N.A.M.: Combining multi-
ple knowledge representation technologies into agent programming languages. In:
Baldoni, M., Son, T.C., van Riemsdijk, M.B., Winikoff, M. (eds.) DALT 2008.
LNCS (LNAI), vol. 5397, pp. 60–74. Springer, Heidelberg (2009)
2. Klapiscak, T., Bordini, R.H.: JASDL: A practical programming approach com-
bining agent and semantic web technologies. In: Baldoni, M., Son, T.C.,
van Riemsdijk, M.B., Winikoff, M. (eds.) DALT 2008. LNCS (LNAI), vol. 5397,
pp. 91–110. Springer, Heidelberg (2009)
3. Moreira, Á.F., Vieira, R., Bordini, R.H., Hübner, J.F.: Agent-oriented programming
with underlying ontological reasoning. In: Baldoni, M., Endriss, U., Omicini, A.,
Torroni, P. (eds.) DALT 2005. LNCS (LNAI), vol. 3904, pp. 155–170. Springer,
Heidelberg (2006)
4. Alberti, M., Knorr, M., Gomes, A.S., Leite, J., Gonçalves, R., Slota, M.: Norma-
tive systems require hybrid knowledge bases. In: van der Hoek, W., Padgham, L.,
Conitzer, V., Winikoff, M. (eds.) Procs. of AAMAS, pp. 1425–1426. IFAAMAS (2012)
5. Brewka, G., Eiter, T.: Equilibria in heterogeneous nonmonotonic multi-context
systems. In: Procs. of AAAI, pp. 385–390. AAAI Press (2007)
6. Giunchiglia, F., Serafini, L.: Multilanguage hierarchical logics or: How we can do
without modal logics. Artif. Intell. 65(1), 29–70 (1994)
7. Roelofsen, F., Serafini, L.: Minimal and absent information in contexts. In:
Kaelbling, L., Saffiotti, A. (eds.) Procs. of IJCAI, pp. 558–563. Professional Book
Center (2005)
8. Brewka, G., Eiter, T., Fink, M., Weinzierl, A.: Managed multi-context systems. In:
Walsh, T. (ed.) Procs. of IJCAI, pp. 786–791. IJCAI/AAAI (2011)
9. Benerecetti, M., Giunchiglia, F., Serafini, L.: Model checking multiagent systems.
J. Log. Comput. 8(3), 401–423 (1998)
622 R. Gonçalves et al.

10. Dragoni, A., Giorgini, P., Serafini, L.: Mental states recognition from communica-
tion. J. Log. Comput. 12(1), 119–136 (2002)
11. Sabater, J., Sierra, C., Parsons, S., Jennings, N.R.: Engineering executable agents
using multi-context systems. J. Log. Comput. 12(3), 413–442 (2002)
12. Gonçalves, R., Knorr, M., Leite, J.: Evolving multi-context systems. In: Schaub, T.,
Friedrich, G., O’Sullivan, B. (eds.) Procs. of ECAI. Frontiers in Artificial Intelligence
and Applications, vol. 263, pp. 375–380. IOS Press (2014)
13. Brewka, G.: Towards reactive multi-context systems. In: Cabalar, P., Son, T.C.
(eds.) LPNMR 2013. LNCS, vol. 8148, pp. 1–10. Springer, Heidelberg (2013)
14. Ellmauthaler, S.: Generalizing multi-context systems for reactive stream reasoning
applications. In: Procs. of ICCSW. OASICS, vol. 35, pp. 19–26. Schloss Dagstuhl
- Leibniz-Zentrum fuer Informatik, Germany (2013)
15. Brewka, G., Ellmauthaler, S., Pührer, J.: Multi-context systems for reactive rea-
soning in dynamic environments. In: Schaub, T., Friedrich, G., O’Sullivan, B.,
(eds.) Procs. of ECAI. Frontiers in Artificial Intelligence and Applications, vol. 263,
pp. 159–164. IOS Press (2014)
16. Alferes, J.J., Brogi, A., Leite, J., Moniz Pereira, L.: Evolving logic programs. In:
Flesca, S., Greco, S., Leone, N., Ianni, G. (eds.) JELIA 2002. LNCS (LNAI), vol.
2424, p. 50. Springer, Heidelberg (2002)
17. Gebser, M., Grote, T., Kaminski, R., Schaub, T.: Reactive answer set program-
ming. In: Delgrande, J.P., Faber, W. (eds.) LPNMR 2011. LNCS, vol. 6645,
pp. 54–66. Springer, Heidelberg (2011)
18. Wang, Y., Zhuang, Z., Wang, K.: Belief change in nonmonotonic multi-context
systems. In: Cabalar, P., Son, T.C. (eds.) LPNMR 2013. LNCS, vol. 8148,
pp. 543–555. Springer, Heidelberg (2013)
19. Kaminski, T., Knorr, M., Leite, J.: Efficient paraconsistent reasoning with ontolo-
gies and rules. In: Procs. of IJCAI. IJCAI/AAAI (2015)
20. Gonçalves, R., Knorr, M., Leite, J.: Evolving bridge rules in evolving multi-context
systems. In: Bulling, N., van der Torre, L., Villata, S., Jamroga, W., Vasconcelos, W.
(eds.) CLIMA 2014. LNCS, vol. 8624, pp. 52–69. Springer, Heidelberg (2014)
21. Slota, M., Leite, J.: On semantic update operators for answer-set programs. In
Coelho, H., Studer, R., Wooldridge, M., (eds.) Procs. of ECAI. Frontiers in Arti-
ficial Intelligence and Applications, vol. 215, pp. 957–962. IOS Press (2010)
22. Slota, M., Leite, J.: Robust equivalence models for semantic updates of answer-set
programs. In: Brewka, G., Eiter, T., McIlraith, S.A. (eds.) Procs. of KR. AAAI
Press (2012)
23. Slota, M., Leite, J.: The rise and fall of semantic rule updates based on se-models.
TPLP 14(6), 869–907 (2014)
24. Slota, M., Leite, J.: A unifying perspective on knowledge updates. In: del Cerro, L.F.,
Herzig, A., Mengin, J. (eds.) JELIA 2012. LNCS, vol. 7519, pp. 372–384. Springer,
Heidelberg (2012)
25. Gonçalves, R., Knorr, M., Leite, J.: Towards efficient evolving multi-context sys-
tems (preliminary report). In: Ellmauthaler, S., Pührer, J. (eds.) Procs. of React-
Know (2014)
26. Knorr, M., Gonçalves, R., Leite, J.: On efficient evolving multi-context systems.
In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS, vol. 8862, pp. 284–296.
Springer, Heidelberg (2014)
27. Knorr, M., Slota, M., Leite, J., Homola, M.: What if no hybrid reasoner is available?
hybrid MKNF in multi-context systems. J. Log. Comput. 24(6), 1279–1311 (2014)
Minimal Change in Evolving Multi-Context Systems 623

28. Gonçalves, R., Alferes, J.J.: Parametrized logic programming. In: Janhunen, T.,
Niemelä, I. (eds.) JELIA 2010. LNCS, vol. 6341, pp. 182–194. Springer, Heidelberg
(2010)
29. Knorr, M., Alferes, J., Hitzler, P.: Local closed world reasoning with description
logics under the well-founded semantics. Artif. Intell. 175(9–10), 1528–1554 (2011)
30. Ivanov, V., Knorr, M., Leite, J.: A query tool for EL with non-monotonic rules.
In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X.,
Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part I. LNCS,
vol. 8218, pp. 216–231. Springer, Heidelberg (2013)
31. Bordini, R.H., Hübner, J.F., Wooldridge, M.: Programming Multi-Agent Systems
in AgentSpeak Using Jason (Wiley Series in Agent Technology). John Wiley &
Sons (2007)
32. Dastani, M.: 2APL: a practical agent programming language. Journal of
Autonomous Agents and Multi-Agent Systems 16(3), 214–248 (2008)
33. Hindriks, K.V.: Programming rational agents in GOAL. In: El Fallah
Seghrouchni, A., Dix, J., Dastani, M., Bordini, R.H. (eds.) Multi-Agent Pro-
gramming, pp. 119–157. Springer, US (2009)
34. Criado, N., Argente, E., Botti, V.J.: THOMAS: an agent platform for supporting
normative multi-agent systems. J. Log. Comput. 23(2), 309–333 (2013)
35. Meneguzzi, F., Rodrigues, O., Oren, N., Vasconcelos, W.W., Luck, M.: BDI rea-
soning with normative considerations. Eng. Appl. of AI 43, 127–146 (2015)
36. Cardoso, H.L., Oliveira, E.: A context-based institutional normative environment.
In: Hübner, J.F., Matson, E., Boissier, O., Dignum, V. (eds.) COIN 2008. LNCS,
vol. 5428, pp. 140–155. Springer, Heidelberg (2009)
37. Gerard, S.N., Singh, M.P.: Evolving protocols and agents in multiagent systems.
In: Gini, M.L., Shehory, O., Ito, T., Jonker, C.M. (eds.) Procs. of AAMAS,
pp. 997–1004. IFAAMAS (2013)
38. Vasconcelos, W.W., Kollingbaum, M.J., Norman, T.J.: Normative conﬂict resolu-
tion in multi-agent systems. Autonomous Agents and Multi-Agent Systems 19(2),
124–152 (2009)
39. Panagiotidi, S., Alvarez-Napagao, S., Vázquez-Salceda, J.: Towards the norm-
aware agent: bridging the gap between deontic speciﬁcations and practical mech-
anisms for norm monitoring and norm-aware planning. In: Balke, T., Dignum, F.,
van Riemsdijk, M.B., Chopra, A.K. (eds.) COIN 2013. LNCS, vol. 8386, pp. 346–363.
Springer, Heidelberg (2014)
Bringing Constitutive Dynamics
to Situated Artificial Institutions

Maiquel de Brito1(B) , Jomi F. Hübner1 , and Olivier Boissier2

1
Federal University of Santa Catarina, Florianópolis, SC, Brazil
[email protected], [email protected]
2
Laboratoire Hubert Curien UMR CNRS 5516, Institut Henri Fayol,
MINES Saint-Etienne, Saint-Etienne, France
[email protected]

Abstract. The Situated Artiﬁcial Institution (SAI) model, as proposed

in the literature, conceives the regulation of Multi-Agent Systems as
based on a constitutive state that is consequence of the institutional
interpretation of facts issued by the environment. The different nature
of these facts (e.g. past sequence of events, states holding) implies var-
ious dynamic behaviours that need to be considered to properly define
the life cycle of the constitutive state. This paper aims to bring such a
dynamic to SAI. It defines a formal apparatus (i) for the institutional
interpretation of environmental facts based on constitutive rules and (ii)
for the management of the resulting constitutive state.

1 Introduction
Among the different works related to artificial institutions, [1,2] are concerned
with the grounding of norms in the environment where the agents act, keeping
a clear separation among regulative, constitutive, and environmental elements
involved in the regulation of Multi-Agent Systems (MAS). In this paper we con-
sider and extend the Situated Artificial Institution (SAI) model [2]. The choice
of SAI is motivated by its available specification language, that is interesting to
specify norms decoupled but still grounded in the environment as shown in [3].
For example, the norm stating that “the winner of the auction is obliged to
pay its offer” is specified on top of a constitutive level that defines who, in the
environment, is the winner that must pay its offer and what must be done, in
the environment, to comply with that expectation. Norms abstracting from the
environment are more stable and flexible but must be connected to the environ-
ment [1], as the regulation of the system (realised in what we call institutions)
is, in fact, the regulation of what happens in the environment.
The notion of constitution proposed by John Searle [4] has inspired different
works addressing the relation between the environment and the regulative ele-
ments in MAS. Among them, SAI goes in a particular direction, considering that
constitutive rules specify how agents acting, events occurring, and states holding
in the environment compose (or constitute) the constitutive level of the institu-
tion. In the previous example, a constitutive rule could state that the agent that

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 624–637, 2015.
DOI: 10.1007/978-3-319-23485-4 63
Bringing Constitutive Dynamics to Situated Artificial Institutions 625

acts in the environment placing the best bid counts, in the constitutive level, as
the winner of the auction (Figure 1).
While the notion of constitution in SAI is well defined, a precise and formal
definition of the dynamics of the constitutive level, resulting of the interpretation
of constitutive rules, is still lacking. Interpreting the constitutive rules and man-
aging the SAI constitutive state require to consider (i ) how to tackle with the
different natures of the environmental elements that may constitute the relevant
elements to the institutional regulation (i.e. agents, events, states) and (ii ) how
to base the dynamic of the constitution both on the occurrences of these ele-
ments in the environment and on the production of new constitutive elements in
the institution itself. Taking as granted that the institutional regulation depends
on the constitutive state, this paper departs from the SAI conceptual model to
propose clear defined semantics answering to these two challenges.
The paper begins with a global overview of the SAI model (Section 2), on
which we base our contributions, that are presented in the sections 3 and 4.
While the Section 3 introduces the necessary representations to support the
interpretation of the constitutive rules, the Section 4 is focused on the dynamic
aspects of this interpretation. Before concluding and pointing some perspectives
for future work, Section 5 discusses the contributions of this paper with respect
to related work.

(a) Abstract overview (b) Scenario overview

Fig. 1. SAI overview

2 Background
Before presenting our contributions in the next sections, this section briefly
describes the SAI model proposed in [2]. In SAI, norms define the expected
behaviour from the agents in an abstract level that is not directly related to
the environment. For example, the norm “the winner of an auction is obliged
to pay its offer ” does not specify neither who is the winner that is obliged to
fulfil the norm nor what the winner must concretely do to fulfil it. The effective-
ness of a norm depends on its connection to the environment as its dynamics
(activation, fulfilment, etc) results of facts occurring there. Such a connection
626 M. de Brito et al.

is established when the components of the norms – the status functions – are
constituted, according to constitutive rules, from the environmental elements
(Figure 1). These elements are described below:
– The environmental elements, represented by X = AX ∪ EX ∪ SX , are organized
in the set AX of agents possibly acting in the system, the set EX of events that
may happen in the environment, and the set SX of properties used to describe
the possible states of the environment.
– The status functions of a SAI, represented by F = AF ∪ EF ∪ SF , are the
set AF of agent-status functions (i.e. status functions assignable to agents), the
set EF of event-status functions (i.e. status functions assignable to events), and
the set SF of state-status functions (i.e. status functions assignable to states).
Status functions are functions that the environmental elements (agents, events,
and states) perform from the institutional perspective [4]. For example, in an
auction, an agent may have the function of winner, the utterance “I offer $100”
may have the function of bid, and the state of “more than 20 people placed in a
room at Friday 10am” may mean the minimum quorum for its realization.
– The constitutive rules defined in C specify the constitution of the status func-
tions of F from the environment element of X . A constitutive rule c ∈ C is a
tuple x, y, t, m where x ∈ F ∪ X ∪ {ε}, y ∈ F, t ∈ EF ∪ EX ∪ , m ∈ W , and
W = WF ∪ WX . WF is the set of status-functions-formulae (sf-formulae) and
WX is the set of environment-formulae (e-formulae), defined later. A constitutive
rule x, y, t, m specifies that x counts as y when t has happened while m holds.
If x = ε, then there is a freestanding assignment of the status function y, i.e. an
assignment where there is not a concrete environmental element carrying y [2,4].
When x actually counts as y (i.e. when the conditions t and m declared in the
constitutive rule are true), we say that there is a status function assignment
(SFA) of the status function y to the element x. The establishment of a SFA of
y to some x is the constitution of y. The set of all SFAs of a SAI composes its
constitutive state (see Def. 4).
The sf-formulae wF ∈ WF are logical formulae, based on status functions (see
the Expression 1 below). The e-formulae wX ∈ WX are logical formulae, based
on environmental elements (see the Expression 2 below). Section 3 defines the
proper semantics of these formulae, based on SFA and on the actual environment.

wF ::=eF |sF |¬wF |wF ∨ wF |wF ∧ wF |x is y|⊥| (1)

wX ::=eX |sX |¬wX |wX ∨ wX |wX ∧ wX |⊥| (2)

Considering these deﬁnitions of SAI, the challenges stated in the introduction

are addressed in the next sections by (i) defining a uniform constitutive dynamics
considering the agent, state or event proper life cycles, and (ii) enriching this
uniform dynamics with the life cycle of the SFA themselves since constitutions
may be stated by already constituted status functions. The first sub-objective
requires to consider both instantaneous and fluent1 dynamics coming from events
1
Fluent refers to the possibility of holding along many states and instantaneous refers
to the holding during a single state.
Bringing Constitutive Dynamics to Situated Artificial Institutions 627

or states in the environment. Addressing the second sub-objective requires to

consider the constitutive state as condition to the constitution (i.e. constitutions
may take place under speciﬁc constitutive states) but also as the container of
elements to whom status function can be assigned.

3 Constitutive Dynamics - Preliminaries

The semantics of the constitutive rules requires, as presented in this section, a

formal representation of the elements related to the SAI constitutive dynamics.2

Definition 1 (SAI state). The SAI state is composed by an environmental

state X, a constitutive state F , and a normative state N . It is represented by
SAIDyn = X, F, N .
The formal representation of X and F is introduced below. As the normative
state dynamics is beyond the scope of this paper, N is introduced as part of the
SAI state but it is not detailed here.

Definition 2 (SAI history). The history of a SAI is the sequence of its i ∈ N

states (where N is the set of the natural numbers).
The SAI state at the ith step of its history is represented by SAIDyn i
=
i i i th
X , F , N . The set of all states between the ﬁrst step and the i step is
[i]
represented by SAIDyn = X [i] , F [i] , N [i] .

Definition 3 (Environmental state). The environmental state is represented

by X = AX , EX , SX where (i) AX is the set of agents participating in the
system, (ii) EX is the set of events occurring in the environment and (iii) SX
is the set of environmental properties describing the environmental state.
Agents in AX are represented by their names. States in SX are represented
by ﬁrst order logic atomic formulae. Events in EX are represented by pairs
(e, a) where e is the event, represented by a ﬁrst order logic atomic formula,
triggered by the agent a. Events can be triggered by actions of the agents (e.g.
the utterance of a bid in an auction, the handling of an environmental artifact,
etc) but can be also produced by the environment itself (e.g. a clock tick). In this
case, events are represented by pairs (e, ε). We use X = AX , EX , SX to denote
the current state of the environment. When it is necessary to explicitly refer
to the state of X at the step i of the SAI history, we use X i = AiX , EX i i
, SX .
The environmental state X is used to evaluate e-formulae (see Expression 2 for

2
Similarly to the SAI speciﬁcation, the SAI dynamics can be divided in two parts:
(i) constitutive dynamics, consisting of the status functions assignments and revoca-
tions, and (ii) normative dynamics, consisting of the norm activations, fulﬁlments,
violations, etc. The normative dynamic is beyond the scope of this paper.
628 M. de Brito et al.

syntax)3 :
SX |=wX iff ∃θ : wX ∈ SX ∧ wX θ ∈ SX (3)
EX |=wX iff ∃θ : wX ∈ EX ∧ wX θ ∈ EX (4)
Definition 4 (Constitutive state). The constitutive state of a SAI is repre-
sented by F = AF , EF , SF where (i) AF ⊆ AX × AF is the set of agent-status
function assignments, (ii) EF ⊆ EX × EF × AX is the set of event-status func-
tion assignments and (iii) SF ⊆ SX × SF is the set of state-status function
assignments.
As introduced in the previous section, SFA are relations between environmental
elements and status functions. Elements of AF are pairs aX , aF meaning that
the agent aX ∈ AX has the status function aF ∈ AF . Elements of EF are triples
eX , eF , aX meaning that the event-status function eF ∈ EF is assigned to the
event eX ∈ EX produced by the agent aX ∈ AX . As events are supposed to be
considered at the individual agent level in normative systems [6], it is important
to record the agent that causes an event-status function assignment. Elements
of SF are pairs sX , sF meaning that the state sX ∈ SX carries the status
function sF ∈ SF . In the following, we will note F = AF , EF , SF to denote
the current constitutive state and F i = AiF , EFi , SFi will be used to refer to the
constitutive state F at the step i of the SAI history.
The constitutive state F is used to evaluate the sf-formulae (see Expression 1
for syntax). If an agent x participates in the system (i.e. x ∈ AX ) and carries the
status function y (i.e. if x, y ∈ AF ), then the formula x is y is true in current
state F :
AF |=x is y iff x ∈ AX ∧ y ∈ AF ∧ x, y ∈ AF (5)
In the same way, event-status function semantics is defined in the Expression 6.
In addition, if an event-status function is assigned to some environmental event,
then this event-status function follows from the current constitutive state F
(Expression 7):
EF |=x is y iff x ∈ EX ∧ x = e, a ∧ y ∈ EF ∧ e, y, a ∈ EF (6)
EF |=wF iff wF ∈ EF ∧ ∃eX : eX is wF (7)
State-status function semantics is similarly defined in the Expression 8. In addi-
tion, if there is some assignment involving a state-status function, then this state-
status function follows from the current constitutive state F (Expression 9):
SF |=x is y iff x ∈ SX ∧ y ∈ SF ∧ x, y ∈ SF (8)
SF |=wF iff wF ∈ SF ∧ ∃sX : sX is wF (9)
The constitutive state defines how the institution is situated. The next section
defines how this constitutive state is deduced from the environmental state and
from the constitutive state itself.
3
In this paper, a substitution is always represented by θ. A substitution is a finite
set of pairs {α1 /β1 , · · · αn /βn } where αi is a variable and βi is a term. If θ is a
substitution and ρ is a literal, then ρθ is the literal resulting from the replacement
of each αi in ρ by the corresponding βi [5].
Bringing Constitutive Dynamics to Situated Artificial Institutions 629

status functions:
agents: auctioneer, bidder, current winner, winner.
events: to bid(Value), to pay(Value), to fine winner, commercial transaction.
states: auction running, auction finished, current value(Value).
norms:
1:auction finished: winner obliged to pay(current value).
constitutive rules:
/* The agent that proposes an auction is the auctioneer */
1: Agent count-as auctioneer when (propose(auction),Agent) while not auction finished.
/* While the auction is running, any agent other than the auctioneer is a bidder */
2: Agent count-as bidder while not(Agent is auctioneer)& auction running.
/* Auctioneer and bidders are auction participants */
3: auctioneer count-as auction participant
4: bidder count-as auction participant
/* The agent that performs the best bid is the current winner */
5: Agent count-as current winner when (to bid(Value),Agent)
while (not(current value(Current)) & Current>Value)& (auction running|auction finished).
/* The current winner is the (final) winner if the auction is finished */
6: current winner count-as winner while auction finished.
/* An auction is running while there is an agent being the auctioneer */
7: count-as auction running while is auctioneer.
/* Auctioneer hitting the hammer means that the auction is finished */
8: count-as auction finished when (hit hammer, Agent) while Agent is auctioneer.
/* An offer done by a bidder while the auction is running is a bid */
9: (offer(Value),Agent) count-as to bid(Value) while auction running & Agent is bidder.
/* An offered value is the current value if it is greater than the last one */
10: count-as current value(Value) when (to bid(Value),Agent)
while Agent is bidder & (not(Current is current value) & Current>Value)& (auction running|auction finished).
/* A bid is a commercial transaction */
11: to bid count-as commercial transaction.
/* A bank deposit from the winner to the auctioneer is a payment */
12: (bank deposit(Creditor,Value),Agent) count-as to pay(Value)
while Creditor is auctioneer & Agent is winner & auction finished & current value(Value).

Fig. 2. SAI Speciﬁcation

4 Constitutive Dynamics

The interpretation of the constitutive rules produces the SFA composing the
SAI constitutive state. Constitutive rules can specify two kinds of constitution
of status functions: first-order constitution (Section 4.1) and second-order con-
stitution (Section 4.2). From these two definitions, Section 4.3 defines the con-
stitutive dynamics of SAI. This is all illustrated considering an auction scenario
whose regulation is specified in the Figure 2, according to the SAI specification
language proposed in [2].

4.1 First-Order Constitution

The ﬁrst-order constitution, explained in defs. 5 to 7, explicitly assigns a status

function to agents, events, and states from the environment stating, for example
that the agent bob counts as a bidder.

Definition 5 (First-order constitution of agent-status-functions). The

set of agent-status function assignments due to ﬁrst-order constitution in the ith
step of the SAI history is given by the function f −consta deﬁned as follows:

f −consta (F , C, X [i−1] , F [i−1] ) = {xθ, y|∃θ ∃x, y, t, m ∈ C ∃s ∈ N ∀k ∈ [s, i − 1] :

s
(y ∈ AF ) ∧ (EX ∪ EFs |= tθ) ∧ (X k ∪ F k |= mθ) ∧ xθ ∈ Ai−1
X }
630 M. de Brito et al.

Informally, (i) if exists a constitutive rule x, y, t, m whose element t, under a

substitution θ, represents an event occurred at the step s and (ii) if along all the
steps k from s to i − 1 the formula m, under θ, is entailed by the environmen-
tal and constitutive states, then the agent identified by the element x under θ
carries the agent-status function y in the step i. Note that the function defines
that an SFA to an agent only holds while the agent participates in the system.
If it leaves the system, all its SFA are dropped. The function also explicits our
proposed approach to deal with combined instantaneous events and fluent states
as conditions to constitution when it defines that an SFA belongs to the consti-
tutive state if m holds in all steps k from the occurrence of t (at the step s) until
the step i − 1. Some points to observe in this definition are: (i) the repetition of
the event t does not affect the SFA and (ii) a SFA is dropped if m ceases to hold
and is not undropped if the m turns to hold (unless the event t happens again
while m is again holding).
The rule 1 in the Figure 2 defines a first-order constitution of an agent-
status function. If (propose(auction), bob) ∈ EX 1
, meaning that the agent bob
has proposed an auction at the step 1, then bob carries the status function
auctioneer (i.e. bob, auctioneer ∈ f −consta (F, C, X [i−1] , F [i−1] )) for all steps i,
starting from the 2nd one, while the property auction f inished does not hold
(considering θ = {Agent/bob}).

Definition 6 (First-order constitution of state-status-functions). The

set of state-status-function assignments due to ﬁrst-order constitution in the ith
step of the SAI history is given by the function f −consts deﬁned as follows:
f −consts (F , C, X [i−1] , F [i−1] ) = {xθ, y|∃θ ∃x, y, t, m ∈ C∃s ∈ N ∀k ∈ [s, i − 1] :
s
(y ∈ SF ) ∧ (EX ∪ EFs |= tθ) ∧ (X k ∪ F k |= mθ) ∧ ((x = ε) ∨ (xθ ∈ SX
i−1
)}

Similar to the constitution of agent-status functions, (i) a SFA to a state x ∈ SX

only holds while x holds in the environment and (ii) the constitution of state-
status functions is conditioned to the holding of m in all steps from the occur-
rence of the event t. Besides, the function f −consts explicits our conception that
the constitution of state-status functions may result in freestanding assignments.
The rule 8 in the Figure 2 deﬁnes a ﬁrst-order constitution of a state-status
function. If (hit hammer, bob) ∈ EX 3
, meaning that bob has hitted a hammer at
the step 3, then the assignment ε, auction f inished is active from the step 4
while bob has the status function of auctioneer (considering θ = {Agent/bob}).

Definition 7 (First-order constitution of event-status-functions). The

set of event-status-function assignments due to ﬁrst-order constitution in the ith
step of the SAI history is given by the function f −conste deﬁned as follows:

f −conste (F, C, X [i−1] , F [i−1] ) = {eθ, y, aθ|∃θ ∃x, y, t, m ∈ C : (y ∈ EF )∧

i−1
(EX ∪ EFi−1 |= tθ) ∧ (X i−1 ∪ F i−1 |= mθ) ∧ x = (e, a) ∧ (eθ, aθ) ∈ EX
i−1
}
Bringing Constitutive Dynamics to Situated Artiﬁcial Institutions 631

Compared to agent- and state-status functions, the constitution of event-status

functions is differently related to the SAI history. Event-status function assign-
ments are assumed to hold only in the step after which the conditions t and m
hold, mimicking, thus, in the constitutive level, the atomic nature of the environ-
mental events [7]. Thus, the holding of m during many steps of the SAI history
does not imply in the holding of an event-status function assignment.
The rule 9 in the Figure 2 defines a first-order constitution of an event-status
function. If (of f er(100), tom) ∈ EX 2
meaning that tom has uttered an offer of
$100 at the step 2, then the assignment of f er(100), to bid, tom holds in the
step 3, i.e. of f er(100), to bid, tom ∈ f −conste (F, C, X [2] , F [2] ) (considering θ =
{V alue/100, Agent/tom}). When t = , the event-status-function assignment is
assigned to the event x conditioned to the occurrence of two events at the same
step: the event x itself and the event t.

4.2 Second-Order Constitution

Constitutive rules specifying second-order constitution deﬁne that a status func-

tion counts as another status function. But even specifying a relation between
two status functions, the assignments resulting of the second-order constitution
are also relations between status functions and environmental elements. That is
to say, whenever status function s1 counts as a status function s2 all the elements
constituting s1 constitute also s2 . For example, even the rule 3 in the Figure 2
states that the auctioneer counts as an auction participant, the status function
of auction participant is actually assigned to all the concrete agents carrying the
status function of auctioneer.
Deﬁning the set of SFA due to second order constitution is an iterative pro-
cess, as each change in the constitutive state may produce new SFA. To deal
with this, the functions deﬁned as follows have the index n (e.g. s−constna ), rep-
resenting the nth iteration in the evaluation of second-order constitution in a
same step of the SAI history. Each iteration n takes into account the assign-
ments produced in the iteration n − 1. The whole set of SFA due to second-order
constitution in a step i of the SAI history is found when the SFAs produced in
the iterations n and n − 1 are the same.

Definition 8 (Second-order constitution of agent-status-functions).

Given the function s−constna (n ≥ 0) deﬁned below, the set of agent-status func-
tion assignments due to second-order constitution in the ith step of the SAI his-
tory is given by s−consta = s−constna for the lowest n s.t. s−constna = s−constn−1
a :

s−constna (F, C, X [i] , F [i] ) = {aX , y|∃θ ∃x, y, t, m ∈ C ∃s ∈ N ∀k ∈ [s, i − 1] :

s
(y ∈ AF ) ∧ (EX ∪ EFs |= tθ) ∧ (X k ∪ F k |= mθ) ∧ (xθ ∈ AF ∧ aX , xθ ∈ A}

[i]
AF if n = 0
where A =
AF ∪ s−constn−1
[i]
a (F, C, X [i] , F [i] ) otherwise
632 M. de Brito et al.

Informally, if there is a constitutive rule x, y, t, m whose element x, under a

substitution θ, corresponds to a status function already assigned to an agent aX ,
then this agent carries also the status function y ∈ AF (subject to the conditions
t and m, as in the ﬁrst-order constitution (Def.5)). When the agent aX ceases
to carry the status function xθ, it also ceases to carry the status function y.
The rule 3 of the Figure 2 deﬁnes a second-order constitution of an
agent-status function. Considering θ = {Agent/bob}, if bob is auctionner at
the ith step (i.e. bob, auctioneer ∈ AiF ), then bob, auction participant ∈
s−constna (F, C, X [i] , F [i] ), for n ≥ 0, and, eventually, bob ∈ AiF . Informally, the
rule states that an agent having the status function of auctioneer counts as an
auction participant and, as bob has the status function of auctioneer, he has
also the status function of auction participant.

Definition 9 (Second-order constitution of state-status-functions).

Given the function s−constns (n ≥ 0) deﬁned below, the set of state-status function
assignments due to second-order constitution in the ith step of the SAI history
is given by s−consts = s−constns for the lowest n s.t. s−constns = s−constn−1
s :
s−constns (F, C, X [i] , F [i] ) = {sX , y|∃θ ∃x, y, t, m ∈ C ∃s ∈ N ∀k ∈ [s, i − 1] :
s
(y ∈ SF ) ∧ (EX ∪ EFs |= tθ) ∧ (X k ∪ F k |= mθ) ∧ (xθ ∈ SF ∧ sX , xθ ∈ S)}

[i]
SF if n = 0
where S =
SF ∪ s−constn−1
[i]
s (F, C, X [i] , F [i] ) otherwise

If there is a constitutive rule x, y, t, m whose element x, under a substitution

θ, corresponds to a status function already assigned to a state sX , then this
state carries also the status function y ∈ SF (subject to the conditions t and m,
as in the ﬁrst-order constitution (Def.6)). When sX ceases to carry the status
function xθ, it also ceases to carry the status function y.
Let’s consider the status function payment phase in an auction scenario and
a constitutive rule stating that auction finished count-as payment phase.
Thus, if ε, auction f inished ∈ SFi , then ε, payment phase ∈ s−constns (F,
C, X [i] , F [i] ), for n ≥ 0 and, eventually, ε, payment phase ∈ SFi .

Definition 10 (Second-order constitution of event-status-functions).

Given the function s−constne (n ≥ 0) deﬁned below, the set of event-status func-
tion assignments due to second-order constitution in the ith step of the SAI his-
tory is given by s−conste = s−constne for the lowest n s.t. s−constne = s−constn−1
e :

s−constne (F, C, X [i] , F [i] ) = {eX , y, aX |∃θ ∃x, y, t, m ∈ C : (y ∈ EF )∧

i−1
(EX ∪ EFi−1 |= tθ) ∧ (X i−1 ∪ F i−1 |= mθ) ∧ xθ ∈ EF ∧ eX , xθ, aX ∈ E}

[i]
EF if n = 0
where E =
EF ∪ s−constn−1
[i]
e (F, C, X [i] , F [i] ) otherwise

If there is a constitutive rule x, y, t, m whose element x, under a substitution

θ, corresponds to a status function already assigned to the event eX , then eX
Bringing Constitutive Dynamics to Situated Artiﬁcial Institutions 633

carries also the status function y ∈ EF (subject to the conditions t and m, as in

the first-order constitution (Def.7)). The assignment of y to eX holds while the
assignment of x to eX holds.
The constitutive rule 11 in the Figure 2 states that bidding in an auction is
a commercial transaction. Supposing that the agent tom has uttered an offer at
the step i − 1, then by the rule 9, (of f er(100), to bid, tom) ∈ EFi and, by the
rule 12, of f er(100), commercial transaction, tom ∈ s−constne (F, C, X [i] , F [i] ),
for n ≥ 0, and eventually of f er(100), commercial transaction, tom ∈ EFi
because (i) the term x of the rule is an event-status-function that (ii) is already
assigned to the event of f er(100).
From the definitions 8 to 10 we can see define how status functions being
assigned to status functions allows to ground the institution in the environment
while it enables different kinds of manipulations inside the constitutive level,
such as the definition of multiple levels of abstraction (defining, for example,
that the status functions y1 counts as y2 , that, on its turn, counts as y3 ), as well
allowing to define relations inside the constitutive level such as generalization
(e.g. y1 and y2 count as y3 ), etc.

4.3 SAI Constitutive State

The previously presented functions permit to formally deﬁne the constitutive

state in the ith step of the SAI history as F i = AiF , EFi , SFi where:

AiF ={f −consta (F, C, X [i−1] , F [i−1] ) ∪ s−consta (F, C, X [i] , F [i] )}
EFi ={f −conste (F, C, X [i−1] , F [i−1] ) ∪ s−conste (F, C, X [i] , F [i] )}
SFi ={f −consts (F, C, X [i−1] , F [i−1] ) ∪ s−consts (F, C, X [i] , F [i] )}

4.4 Illustration of the Constitutive Dynamic within SAI

Following the semantics proposed in the previous sections, the interpretation

of the constitutive rules produces the assignments and revocations of status
functions, i.e. the constitutive dynamics of SAI. Such a constitutive dynamics is
illustrated here with a running example related to the auction scenario previously
explored.
The Table 1 shows 8 steps of the SAI history focusing on the environmen-
tal state X and on the constitutive one F . The environmental state evolves as
follows: at step 1, the agents bob and tom act in the system; at the step 2, bob
utters a proposal for an auction; at the step 5, tom utters an oﬀer and the agent
ana enters in the system; at the step 6, bob hits the hammer. The constitutive
rules, interpreted as described in Section 4, build the constitutive state F . The
column C.Rule shows the constitutive rule from the Figure 2 that has produced
each SFA. For example, in the step 4, bob, auctioneer is produced by consti-
tutive rule 1, bob, auction participant by the rule 3, and ε, auction running
by rule 7.
634 M. de Brito et al.

Table 1. Running example

Step Environmental State (X) Constitutive State (F ) C.Rule

1 Xa = {bob, tom}
Xa = {bob, tom}
2
Xe = {(propose(auction), bob)}
3 Xa = {bob, tom} AF = {bob, auctioneer, bob, auction participant} 1,3
AF = A3F 1, 3
4 Xa = {bob, tom}
SF = {ε, auction running} 7
Xa = {bob, tom, ana}} AF = A4F ∪ {tom, bidder, tom, auction participant} 1, 3, 2, 4
5
4
Xe = {(of f er(100), tom)} SF = SF 7
AF = A5F ∪ {ana, bidder, ana, auction participant} 1, 3, 2, 4, 2, 4
Xa = {bob, tom, ana}
6 EF = {(of f er(100), to bid, tom)} 9
Xe = {(hit hammer, bob)} 5
SF = SF 7
AF =A6F ∪ {tom current winner} 1, 3, 2, 4, 2, 4, 5
6
7 Xa = {bob, tom, ana} SF = SF ∪ {ε, current value(100), 7, 10,
ε, auction f inished} 8
AF = {tom current winner, tom, winner} 5, 6
8 Xa = {bob, tom, ana}
SF = { ε, current value(100), ε, auction f inished} 10, 8

Note that the proposed semantics does not define just the establishment of
the SFA but it defines also their revocations. For example, the constitutive rule 1
in the Figure 2 defines that the agent that proposes an auction is the auctioneer
while the auction is not finished. In the example, this condition ceases to hold in
the step 7, leading to a new state (8) where the assignment of the status function
auctioneer to the agent bob is revoked.

5 Related Work
Different approaches in the literature investigate how environmental facts affect
artificial institutions. Some, contrary to us, do not consider the environment pro-
ducing some kind of dynamic inside the institution: in [1,8], the environmental
elements are related to the concepts appearing in the norm specification but they
do not produce facts related to the dynamics of norms (violations, fulfilments,
etc); in [9], environmental facts determine properties that should hold in the
institution but the institution is in charge to take such information and produce
some dynamic where appropriate.
Some approaches, as we do, consider that environmental facts produce some
kind of dynamics in the institution: in [10] they affect the dynamics of organi-
sations producing role assignments, goal achievements, etc; they produce insti-
tutional events in [11]; they affect the normative dynamics in [12,13] producing
norm fulfilments, violations, etc. Compared to these related works, this paper
deals with the definition of how the environment determines another fact in the
institution, that is namely the constitution of status functions, defining (and not
just affecting) the constitutive dynamics that is the base of the regulation in
SAI.
When the constitution of each kind of status functions is considered in iso-
lation, some relations can be made, for example, between the constitution of
Bringing Constitutive Dynamics to Situated Artificial Institutions 635

state-status functions and the constitution of states of aﬀairs proposed in [14],

or between the constitution of event-status functions and the generation of insti-
tutional events proposed in [11]. But we deal with the constitution of agent-,
event-, and state-status function as, together, determining the dynamics of the
constitutive state of SAI. This constitutive state is not viewed just as a con-
tainer of constituted status functions but as a system having particular – and
well defined by this paper – dynamics taking into account the different nature
of their components. We deal with the constitution considering the particular
nature of the three different kinds of status function, but we also consider the
constitution of the three different status functions affecting each other.
Works such as those in line with [8], are concerned with the ontological
aspects of the count-as, i.e. with the constitution defining and providing mean-
ing to the institutional vocabulary. These aspects are also part of SAI conceptual
model and, regarding to them, this paper contributes providing clear representa-
tions and semantics to actually ground the institutional vocabulary in the envi-
ronment. In addition, by dealing with the second-order constitution, we clearly
define how the manipulation of concepts of the institutional vocabulary – not
explicitly related to the environment – is grounded in the environment.

6 Discussion and Perspectives

To be compliant with SAI deﬁnitions [2], the dynamics of the constitutive

state must consider that status functions are assigned to (and only to) agents,
events, and states under a uniform definition of constitution. Thus, our first sub-
objective was to define a uniform constitutive dynamics considering that SFA
may have specific life cycles according to their nature. To achieve it, we first
defined the life cycles of the SFAs that even being produced by similar definition
of constitutive rules, may be distinguished into: (i) agent-status function assign-
ments holding only while the agent that carries the status function participates
to the system, (ii) state-status function assignments holding while the state car-
rying the status function holds in the environment and (iii) event-status function
assignments holding only during a single step of the SAI history. These defini-
tions have been then complemented by the explicitation of the instantaneous and
fluent expressions conditioning these constitutions. We captured important prop-
erties on this dynamics such as: proper dynamics of status function assignment
for event, state or agents, stability of constituted status functions wrt repetition
of events, dropping of constituted status function as soon as state condition is
no more holding, etc.
The second sub-objective was to enrich the proposed dynamics issued of the
environmental elements with the dynamics of the constituted status functions
themselves. The approach that we took concerned first the conditions of con-
stitutive rules where constituted status functions may appear (defs. 5 to 10),
and definition of second-order constitution dynamics that highlights an impor-
tant property of SAI conceptual model: production of new constitutive states
based on facts that are indirectly related to the environment. This property is
636 M. de Brito et al.

important in the sense that it makes possible to situate the institution in the
environment while making possible to consider the deﬁnition and dynamics of
constitutive abstractions, generalisations, etc.
Future work include investigations about the normative state aﬀecting
the SAI constitutive state, normative dynamics on top of the constitutive
dynamic, and manipulations inside the constitutive level through second-order
constitution.

Acknowledgments. The authors thanks the ﬁnancial support given by CAPES

(PDSE 4926-14-5) and CNPq (grants 448462/2014-1 e 306301/2012-1).

References
1. Aldewereld, H., Álvarez Napagao, S., Dignum, F., Vázquez-Salceda, J.: Making
norms concrete. In: van der Hoek, W., Kaminka, G.A., Lespérance, Y., Luck, M.,
Sen, S. (eds) AAMAS 2010, pp. 807–814 (2010)
2. de Brito, M., Hübner, J.F., Boissier, O.: A conceptual model for situated artificial
institutions. In: Bulling, N., van der Torre, L., Villata, S., Jamroga, W., Vasconcelos,
W. (eds.) CLIMA XV 2014. LNCS (LNAI), vol. 8624, pp. 35–51. Springer, Heidelberg
(2014)
3. De Brito, M., Thevin, L., Garbay, C., Boissier, O., Hübner, J.F.: Situated artificial
institution to support advanced regulation in the field of crisis management. In:
Demazeau, Y., Decker, K.S., Bajo Pérez, J., De la Prieta, F. (eds.) PAAMS 2015.
LNCS (LNAI), vol. 9086, pp. 66–79. Springer, Heidelberg (2015)
4. Searle, J.: Making the Social World. The Structure of Human Civilization. Oxford
University Press (2009)
5. Brachman, R., Levesque, H.: Knowledge Representation and Reasoning. Morgan
Kaufmann Publishers Inc., San Francisco (2004)
6. Vos, M.D., Balke, T., Satoh, K.: Combining event-and state-based norms. In:
AAMAS 2013, pp. 1157–1158 (2013)
7. Cassandras, C.G., Lafortune, S.: Introduction to Discrete Event Systems. Springer-
Verlag New York Inc., Secaucus (2006)
8. Grossi, D., Meyer, J.-J.C., Dignum, F.P.M.: Counts-as: classification or constitu-
tion? an answer using modal logic. In: Goble, L., Meyer, J.-J.C. (eds.) DEON 2006.
LNCS (LNAI), vol. 4048, pp. 115–130. Springer, Heidelberg (2006)
9. de Brito, M., Hübner, J.F., Bordini, R.H.: Programming institutional facts in multi-
agent systems. In: Aldewereld, H., Sichman, J.S. (eds.) COIN 2012. LNCS (LNAI),
vol. 7756, pp. 158–173. Springer, Heidelberg (2013)
10. Piunti, M., Boissier, O., Hübner, J.F., Ricci, A.: Embodied organizations: a uni-
fying perspective in programming agents, organizations and environments. In:
MALLOW 2010. CEUR, vol. 627 (2010)
11. Cliffe, O., De Vos, M., Padget, J.: Answer set programming for representing and
reasoning about virtual institutions. In: Inoue, K., Satoh, K., Toni, F. (eds.)
CLIMA 2006. LNCS (LNAI), vol. 4371, pp. 60–79. Springer, Heidelberg (2007)
Bringing Constitutive Dynamics to Situated Artificial Institutions 637

12. Dastani, M., Grossi, D., Meyer, J.-J.C., Tinnemeier, N.: Normative multi-agent
programs and their logics. In: Meyer, J.-J.C., Broersen, J. (eds.) KRAMAS 2008.
LNCS (LNAI), vol. 5605, pp. 16–31. Springer, Heidelberg (2009)
13. Campos, J., López-Sánchez, M., Rodrı́guez-Aguilar, J.A., Esteva, M.: Formalising
situatedness and adaptation in electronic institutions. In: Hübner, J.F., Matson, E.,
Boissier, O., Dignum, V. (eds.) COIN 2008. LNCS (LNAI), vol. 5428, pp. 126–139.
Springer, Heidelberg (2009)
14. Jones, A., Sergot, M.: A formal characterisation of institutionalised power. Logic
Journal of IGPL 4(3), 427–443 (1996)
Checking WECTLK Properties of Timed
Real-Weighted Interpreted Systems
via SMT-Based Bounded Model Checking

Agnieszka M. Zbrzezny(B) and Andrzej Zbrzezny

IMCS, Jan Dlugosz University, Al. Armii Krajowej 13/15,

42-200 Czȩstochowa, Poland
{agnieszka.zbrzezny,a.zbrzezny}@ajd.czest.pl

Abstract. In this paper, we present the SMT-based bounded model

checking (BMC) method for Timed Real-Weighted Interpreted Systems
and for the existential fragment of the Weighted Epistemic Computa-
tion Tree Logic. We performed the BMC algorithm on Timed Weighted
Generic Pipeline Paradigm benchmark. We have implemented SMT-
BMC method and made preliminary experimental results, which demon-
strate the eﬃciency of the method. To perform the experiments we used
the state of the art SMT-solver Z3.

1 Introduction
The formalism of interpreted systems (ISs) was introduced in [2] to model multi-
agent systems (MASs) [7], which are intended for reasoning about the agents’
epistemic and temporal properties. Timed interpreted systems (TIS) was pro-
posed in [9] to extend interpreted systems in order to make possible reasoning
about real-time aspects of MASs. The formalism of weighted interpreted systems
(WISs) [10] extends ISs to make the reasoning possible about not only temporal
and epistemic properties, but also about agents’s quantitative properties.
Multi-agent systems (MASs) are composed of many intelligent agents that
interact with each other. The agents can share a common goal or they can
pursue their own interests. Also, the agents may have deadline or other tim-
ing constraints to achieve intended targets. As it was shown in [2], knowledge
is a useful concept for analyzing the information state and the behaviour of
agents in multi-agent systems. Another different extensions of temporal logics [1]
with doxastic [4], and deontic [5] modalities have been proposed. In this paper,
we consider the existential fragment of a weighted epistemic computation tree
logic (WECTLK) interpreted over Timed Real-Weighted Interpreted Systems
(TRWISs).
SMT-based bounded model checking (BMC) consists in translating the exis-
tential model checking problem for a modal logic and for a model to the satis-
fiability modulo theory problem (SMT-problem) of a quantifier-free first-order
formula.

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 638–650, 2015.
DOI: 10.1007/978-3-319-23485-4 64
Checking WECTLK Properties of Timed Real-Weighted Interpreted Systems 639

The original contributions of the paper are as follows. First, we define TRWIS
as a model of MASs with the agents that have real-time deadlines to achieve
intended goals and each transition holds a weight, which can be any non-negative
real value. Second, we introduce the language WECTLK. Third, we propose a
SMT-based BMC technique for TRWIS and for WECTLK.
To the best of our knowledge, there is no work that considers SMT-based
BMC methods to check multi-agent systems modelled by means of timed real-
weighted interpreted systems. Thus, in this paper we offer such a method. In
particular, we make the following contributions. Firstly, we define and imple-
ment an SMT-based BMC method for WECTLK and for TRWISs. Secondly, we
report on the initial experimental evaluation of our SMT-based BMC method.
To this aim we use a scalable benchmark: the timed weighted generic pipeline
paradigm [8,10].
The structure of the paper is as follows. In Section 2 we shortly introduce
the theory of timed real-weighted interpreted systems and the WECTLK lan-
guage. In Section 3 we present our SMT-based BMC method. In Section 4 we
experimentally evaluate the performance of our SMT-based BMC encoding. We
conclude the paper in Section 5.

2 Preliminaries
In this section we first explain some notations used through the paper, and next
we define timed real-weighted interpreted systems, and next we introduce syntax
and semantics of WECTLK.
Let IN be a set of natural numbers, IN+ = IN \ {0}, IR be the set of non-
negative real numbers, and X be a finite set of non-negative natural variables,
called clocks ranging over a set of non-negative natural numbers. A clock val-
uation is a function v : X → IN that assigns to each clock x ∈ X a non-
negative natural value v(x). A set of all the clock valuations is denoted by IN|X | .
The valuation v = v[X := 0], for X ⊆ X is defined as: ∀x∈X v (x) = 0 and
∀x∈X \X v (x) = v(x). For δ ∈ IN, v + δ denotes the valuation that assigns the
value v(x) + δ to each clock x.
The grammar
ϕ := true | x < c | x ≤ c | x = c | x ≥ c | x > c | ϕ ∧ ϕ
generates the set C(X ) of clock constraints over X , where x ∈ X and c ∈ IN. A
clock valuation v satisfies a clock constraint ϕ, written as v, iff ϕ evaluates to
be true using the clock values given by v.
Let cmax be a constant and v, v ∈ IN|X | two clock valuation. We say that
v v iff the following condition holds for each x ∈ X :
v(x) > cmax and v (x) > cmax or v(x) ≤ cmax and v (x) ≤ cmax and
v(x) = v (x)
The clock valuation v such that for each clock x ∈ X , v (x) = v(x) + 1 if
v(x) ≤ cmax , and v (x) = cmax + 1 otherwise, is called a time successor of v
(written succ(v)).
640 A.M. Zbrzezny and A. Zbrzezny

TRWISs. Let Ag = {1, . . . , n} denotes a non-empty and ﬁnite set of agents,

and E be a special agent that
is used to model the environment in which the
agents operate and PV = c∈Ag∪{E} PV c be a set of propositional variables,

such that PV c1 PV c2 = ∅ for all c1 , c2 ∈ Ag ∪ {E}. The timed real-weighted
interpreted system (TRWIS) is a tuple

({Lc , Actc , Xc , Pc , tc , Vc , Ic , dc }c∈Ag∪{E} , ι),

where Lc is a non-empty set of local states of the agent c, S = L1 × . . . × Ln × LE

is the set of all global states, ι ⊆ S is a non-empty set of initial states, Actc is a
non-empty set of possible actions of the agent c, Act = Act1 × . . . × Actn × ActE
is the set of joint actions, Xc is a non-empty set of clocks, Pc : Lc → 2Actc is
a protocol function, tc : Lc × C(Xc ) × 2Xc × Act → Lc is a (partial) evolution
function, Vc : Lc → 2PV is a valuation function assigning to each local state
a set of propositional variables that are assumed to be true at that state, Ic :
Lc → C(Xc ) is an invariant function, that speciﬁes an amount of time the agent
c may spend in a given local state, and dc : Actc → IR is a weight function.
For a given TRWIS we deﬁne a timed real-weighted model (or a model) as a
tuple M = (Act, S, ι, T, V, d), where:

–Act = Act1 × . . . × Actn × ActE is the set of all the joint actions,
–S = (L1 × IN|X1 | ) × . . . × (Ln × IN|Xn |) ) × (LE × IN|XE | ) is the set of all the global
states
–ι = (ι1 × {0}|X1 | ) × . . . × (ιn × {0}|Xn | ) × (ιE × ({0}|XE | ) is the set of all the
initial global states,
–V : S → 2PV is the valuation function defined as V(s) = c∈Ag∪{E} Vc (lc (s)),
T ⊆ S × (Act ∪ IN) × S is a transition relation defined by action and time
transitions. For a ∈ Act and δ ∈ IN:
a
1. action transition: (s, a, s ) ∈ T (or s −→ s ) iff for all c ∈ Ag ∪ E, there exists
a local transition tc (lc (s), ϕc , X , a) = lc (s ) such that vc (s) |= ϕc ∧ I(lc (s))
and vc (s ) = vc (s)[X := 0] and vc (s ) |= I(lc (s ));
2. time transition (s, δ, s ) ∈ T iff for all c ∈ Ag ∪ E, lc (s) = lc (s ) and vc (s ) =
vc (s) + δ and vc (s ) |= I(lc (s )).
–d : Act → IR is the “joint” weight function defined as follows: d((a1 , . . . ,
an , aE )) = d1 (a1 ) + . . . + dn (an ) + dE (aE ).

Given a TRWIS we can deﬁne the indistinguishability relation ∼c ⊆ S × S

for any agent c as follows: s ∼c s iff lc (s ) = lc (s) and vc (s ) vc (s) A run of
δ0 ,a0 δ1 ,a1 δ2 ,a2
TRWIS is an infinite sequence ρ = s0 −→ s1 −→ s2 −→ . . . of global states
such that the following conditions hold for all i ∈ IN : si ∈ S, ai ∈ Act, δi ∈ IN+ ,
and there exists si ∈ S such that (si , δ, si ) ∈ T and (si , a, si+1 ) ∈ T . Notice
that the definition of a run does not permit two consecutive joint actions to be
performed one after the other, i.e., between each two joint actions some time
must pass; such a run is called strongly monotonic.
WECTLK. WECTLK has been defined in [8] as the existential fragment of the
weighted CTLK with integer cost constraints on all temporal modalities. We
Checking WECTLK Properties of Timed Real-Weighted Interpreted Systems 641

extend WECTLK logic by adding non-negative real cost constraints. In the

syntax of WECTLK we assume the following: p ∈ PV is an atomic proposition,
c ∈ Ag, Γ ⊆ Ag, I is an interval in IR = {0 . . .} of the form: [a, ∞) and [a, b), for
a, b ∈ IN and a = b. Moreover, hereafter, by right(I) we denote the right end of
the interval I. The WECTLK formulae are defined by the following grammar:
ϕ ::= true | false | p | ¬p | ϕ ∨ ϕ | ϕ ∧ ϕ | EXI ϕ | E(ϕUI ϕ) |
EGI ϕ | Kc ϕ |DΓ ϕ | EΓ ϕ | CΓ ϕ.
In the semantics we assume the following definitions of epistemic relations:
def C def D def

∼E
Γ =
E + E
c∈Γ ∼c , ∼Γ = (∼Γ ) (the transitive closure of ∼Γ ), ∼Γ = c∈Γ ∼c ,
where Γ ⊆ Ag.
A WECTLK formula ϕ is true in a model M (in symbols M |= ϕ) iff
M, s0 |= ϕ for some s0 ∈ ι (i.e., ϕ is true at some initial state of the model M).
For every s ∈ S the relation |= is defined inductively as follows:

–M, s |= true, M, s |= false, M, s |= p iﬀ p ∈ V(s), M, s |= ¬p iﬀ p ∈ V(s),

–M, s |= α ∧ β iff M, s |= α and M, s |= β,
–M, s |= α ∨ β iff M, s |= α or M, s |= β,
–M, s |= EXI α iff (∃π ∈ Π(s))(Dπ[0..1] ∈ I and M, π(1) |= α),
–M, s |= EGI α iff (∃π ∈ Π(s))(∀i ≥ 0)(Dπ[0..i] ∈ I implies M, π(i) |= β),
–M, s |= E(αUI β) iff (∃π ∈ Π(s))(∃i ≥ 0)(Dπ[0..i] ∈ I and M, π(i) |= β and
(∀j < i)M, π(j) |= α),
–M, s |= Kc α iff (∃π ∈ Π) (∃i ≥ 0)(s ∼c π(i) and M, π(i) |= α),
–M, s |= Y α iff (∃π ∈ Π)(∃i ≥ 0)(s ∼ π(i) and M, π(i) |= α), where Y ∈
{DΓ , EΓ , CΓ } and ∼∈ {∼D E C
Γ , ∼Γ , ∼Γ }.

Abstract Model. Let IDc = {0, . . . , cc + 1} with cc be the largest constant

appearing in any enabling condition or state invariants of agent c and ID =

ID|Xc | . A tuple M = (Act, S, ι, T, V, d), is an abstract model, where ι =
c∈Ag∪E c
c∈Ag∪E ιc ×{0}
|Xc |
is the set of all initial global states, S = c∈Ag∪E Lc ×ID|X c
c|

PV
is the set of all abstract global states. V : S → 2 is the valuation function such
that: p ∈ V(s) iﬀ p ∈ c∈Ag∪E Vc (lc (s)) for all p ∈ PV; and T ⊆ S ×(Act∪τ )×S.
Let a ∈ Act. Then,

1. Action transition: (s, a, s ) ∈ T iff ∀c∈Ag ∃φc ∈C(Xc ) ∃Xc ⊆Xc (tc (lc (s),
φc , Xc , a) = lc (s ) and vc |= φc ∧ I(lc (s)) and vc (s ) = vc (s)[Xc := 0] and
vc (s ) |= I(lc (s )))
2. Time transition: (s, τ, s ) ∈ T iff ∀c∈Ag∪E (lc (s) = lc (s )) and vc (s) |= I(lc (s))
and succ(vc (s)) |= I(lc (s))) and ∀c∈Ag (vc (s ) = succ(vc (s ))) and (vE (s ) =
succ(vE (s))).
1 b 2 b 3 b
A path π in an abstract model is a sequence s0 −→ s1 −→ s2 −→ . . . of
transitions such that for each i ≤ 1, bi ∈ Act ∪ {τ } and b1 = τ and for each two
consecutive transitions at least one of them is a time transition.
Given an abstract model one can define the indistinguishability relation ∼c ⊆
S × S for agent c as follows: s ∼c s iff lc (s ) = lc (s) and vc (s ) = vc (s).
642 A.M. Zbrzezny and A. Zbrzezny

3 SMT-Based Bounded Model Checking

In this section, we present an outline of the bounded semantics for WECTLK and
define an SMT-based BMC method for WECTLK, which is based on the BMC
encoding presented in [8]. As usual, we start by defining k-paths and (k, l)−loops.
Next we define a bounded semantics, which is used for the translation to SMT.
Bounded Semantics. Let M be a model, and k ∈ IN a bound. A k-path πk
b1 b2 bk
is a finite sequence s0 −→ s1 −→ . . . −→ sk of transitions such that for each
1 ≤ i ≤ k, bi ∈ Act ∪ {τ } and b1 = τ and for each two consecutive transitions
at least one is a time transition. A k-path πk is a loop if l < k and π(k) = π(l).
Note that if a k-path πk is a loop, then it represents the infinite path of the form
b b b bl+2 b
uv ω , where u = (s0 −→1 2
s1 −→ l
. . . −→ k
sl ) and v = (sl+1 −→ . . . −→
sk ). Πk (s)
denotes the set of all the k-paths of M that start at s, and Πk = s0 ∈ι Πk (s0 ).
The bounded satisfiability relation |=k which indicates k-truth of a WECTLK
formula in the model M at some state s of M is also defined in [8]. A WECTLK
formula ϕ is k-true in the model M (in symbols M |=k ϕ) iff ϕ is k-true at some
initial state of the model M.
The model checking problem asks whether M |= ϕ, but the bounded model
checking problem asks whether there exists k ∈ IN such that M |=k ϕ. The
following theorem states that for a given model and a WECTLK formula there
exists a bound k such that the model checking problem (M |= ϕ) can be reduced
to the bounded model checking problem (M |=k ϕ).
Theorem 1. Let M be the abstract model and ϕ a WECTLK formula. Then,
the following equivalence holds: M |= ϕ iff there exists k ≥ 0 such that M |=k ϕ.

Proof. The theorem can be proved by induction on the length of the formula ϕ
(for details one can see [8]).

Translation to SMT. Let M be an abstract model, ϕ a WECTLK formula, and

k ≥ 0 a bound. The presented SMT encoding of the BMC problem for WECTLK
and for TRWIS is based on the SAT encoding of the same problem [10,12], and
it relies on defining the quantifier-free first-order formula:

[M, ϕ]k := [Mϕ,ι ]k ∧ [ϕ]M,k

that is satisﬁable if and only if M |=k ϕ holds.

Let c ∈ Ag ∪ {E}. The deﬁnition of the formula [M, ϕ]k assumes that
– each global state s ∈ S is represented by a valuation of a symbolic state
w = ((w1 , v1 ), . . . , (wn , vn ), (wE , vE )) that consists of symbolic local states and
each symbolic local state wc is a pair (wc , vc ) of individual variables ranging
over the natural numbers, in which the ﬁrst element represents a local state of
the agent c, and the second represents a clock valuation;
– each joint action a ∈ Act is represented by a valuation of a symbolic action
a = (a1 , . . . , an , aE ) that consists of symbolic local actions and each symbolic
local action ac is an individual variable ranging over the natural numbers;
Checking WECTLK Properties of Timed Real-Weighted Interpreted Systems 643

– each sequence of weights associated with the joint action is represented by

a valuation of a symbolic weights d = (d1 , . . . , dn+1 ) that consists of symbolic
local weights and each symbolic local weight dc is an individual variable ranging
over the natural numbers.
The formula [Mϕ,ι ]k encodes a rooted tree of k−paths of the model M. The
number of branches of the tree depends on the value of fk : WECTLK → IN
which is the auxiliary function deﬁned in [8]:

–fk (true) = fk (false) = 0;

–fk (p) = fk (¬p) = 0, where p ∈ PV;
–fk (α ∧ β) = fk (α) + fk (β);
–fk (α ∨ β) = max{fk (α), fk (β)};
–fk (EXI α) = fk (α) + 1;
–fk (EGI α) = (k + 1) · fk (α) + 1;
–fk (E(αUI β)) = k · fk (α) + fk (β) + 1;
–fk (CΓ α) = fk (α) + k;
–fk (Y α) = fk (α) + 1 for Y ∈ {Kc , DΓ , EΓ }.

The formula [Mϕ,ι ]k is deﬁned over (k + 1) · fk (ϕ) diﬀerent symbolic states,

k · fk (ϕ) different symbolic actions, and k · fk (ϕ) different symbolic weights.
Moreover, it uses the following auxiliary quantifier-free first-order formulae:

–Is (w) - it encodes the state s of the model M; c ∈ Ag ∪ E;

–Hc (wc , wc ) - it encodes equality of two local states, such that wc = wc for
c ∈ Ag ∪ E;
–Tc (wc , ((a, d), δ), w c ) - it encodes the local evolution function of agent c;
–A(a) - it encodes that each symbolic local action ac of a has to be executed by
each agent in which it appears;
–T (w, ((a, d), δ), w ) := A(a) ∧ c∈Ag∪{E} Tc (wc , ((a, d), δ), wc );
–Let πj denote the j-th symbolic k-path, i.e. the sequence of symbolic transitions:
(a1,j ,d1,j ),δ1,j (a2,j ,d2,j ),δ2,j (ak,j ,dk,j ),δk,j I
w0,j −→ w1,j −→ ... −→ wk,j . Then, Da,b;c,d (πn ) for
a ≤ b and c ≤ d is a formula that:
•for a < b and c < d encodes that the weight represented by the sequences
da+1,n , . . . , db,n and dc+1,n , . . ., dd,n belongs to the interval I,
•for a = b and c < d encodes that the weight represented by the sequence
dc+1,n , . . . , dd,n belongs to the interval I,
•for a < b and c = d encodes that the weight represented by the sequence
da+1,n , . . . , db,n belongs to the interval I,
I
•for a = b and c = d, the formula Da,b;c,d (πn ) is true iﬀ 0 ∈ I.

Thus, given the above, we can deﬁne the formula [Mϕ,ι ]k as follows:
fk (ϕ)
[Mϕ,ι ]k := s∈ι Is (w0,0 ) ∧ j=1 w0,0 = w0,j ∧
fk (ϕ) k−1
j=1 i=0 T (wi,j , ((ai,j , di,j ), δ i,j ), wi+1,j )
644 A.M. Zbrzezny and A. Zbrzezny

where wi,j , ai,j , and di,j are, respectively, symbolic states, symbolic actions, and
symbolic weights for 0 ≤ i ≤ k and 1 ≤ j ≤ fk (ϕ). Hereafter, by πj we denote
the j-th symbolic k-path of the above unfolding, i.e., the sequence of transitions:
(a1,j ,d1,j ),δ 1,j (a2,j ,d2,j ),δ 2,j (ak,j ,dk,j ),δ k,j
w0,j −→ w1,j −→ ... −→ wk,j .
The formula [ϕ]M,k encodes the bounded semantics of a WECTLK for-
mula ϕ, and it is defined on the same sets of individual variables as the for-
mula [Mϕ,ι ]k . Moreover, it uses the auxiliary quantifier-free first-order formulae
defined in [8].
Furthermore, following [8], our formula [ϕ]M,k uses the following auxiliary
functions gl , gr , gμ , hU , hG that were introduced in [11], and which allow to
divide the set A ⊆ Fk (ϕ) = {j ∈ IN | 1 ≤ j ≤ fk (ϕ)} into subsets needed for
translating the subformulae of ϕ. Let 0 ≤ n ≤ fk (ϕ), m k, and n = min(A).
The rest of translation is defined in the same way as in [8].
[m,n,A] [m,n,A]
–[true]k := true, [false]k := false,
[m,n,A]
–[p]k := p(wm,n ),
[m,n,A]
–[¬p]k := ¬p(wm,n ),
[m,n,A] [m,n,gl (A,fk (α))] [m,n,gr (A,fk (β))]
–[α ∧ β]k := [α]k ∧ [β]k ,
[m,n,A] [m,n,gl (A,fk (α))] [m,n,gl (A,fk (β))]
–[α ∨ β]k := [α]k ∨ [β]k ,
[m,n,A] [1,n ,g (A)]
–[EXI α]k := wm,n = w0,n ∧ (d1,n ∈ I) ∧ [α]k µ
, if k > 0; false,
otherwise,
[m,n,A] k [i,n ,hU (A,k,fk (β))(j)]
–[E(αUI β)]k := wm,n = w0,n ∧ i=0 ([β]k ∧
i i−1 [j, n ,hU (A,k,fk (β))]

( j=1 dj,n ∈ I ∧ j=0 [α]k ),
[m,n,A] k k i
–[E(GI α)]k := wm,n = w0,n ∧ ( j=1 dj,n ≥ right(I)∧ i=0 ( j=1 dj,n ∈ /

[i,n ,hG (A,k)(j)] k k i
I ∨ [α]k )) ∨ ( j=1 dj,n < right(I) ∧ i=0 ( j=1 dj,n ∈ / I∨
[i,n ,hG (A,k)(j)] k−1 k−1 I
[α]k ) ∧ l=0 (wk,n = wl,n ∧ i=l (¬D0,k;l,i+1 (πn )∨
[i,n ,hG (A,k)(j)]
[α]k ))) ,
[m,n,A] k [j,n ,gµ (A)]
–[Kc α]k := ( s∈ι Is (w0,n )) ∧ j=0 ([α]k ∧ Hc (wm,n , wj,n )),
[m,n,A] k [j,n ,gµ (A)]
–[DΓ α]k := ( s∈ι Is (w0,n )) ∧ j=0 ([α]k ∧ c∈Γ Hc (wm,n , wj,n )),
[m,n,A] k [j,n ,gµ (A)]
–[EΓ α]k := ( s∈ι Is (w0,n )) ∧ j=0 ([α]k ∧ c∈Γ Hc (wm,n , wj,n )),
[m,n,A] k j [m,n,A]
–[CΓ α]k := [ j=1 (EΓ ) α]k .

The theorem below states the correctness and the completeness of the presented
translation. It can be proved in a standard way by induction on the complexity
of the given WECTLK formula.

Theorem 2. Let M be a model, and ϕ a WECTLK formula. For every k ∈

IN, M |=k ϕ if, and only if, the quantifier-free first-order formula [M, ϕ]k is
satisfiable.
Checking WECTLK Properties of Timed Real-Weighted Interpreted Systems 645

4 Experimental Results
In this section we experimentally evaluate the performance of our SMT-based
BMC encoding for WECTLK over the TRWIS semantics.
The benchmark we consider is the timed weighted generic pipeline paradigm
(TWGPP) TRWIS model [10]. The model of TWGPP involves n + 2 agents:
– Producer producing data within certain time interval ([a, b]) or being inactive,
– Consumer receiving data within certain time interval ([c, d]) or being inac-
tive within certain time interval ([g, h]),
– a chain of n intermediate Nodes which can be ready for receiving data
within certain time interval ([c, d]), processing data within certain time interval
([e, f ]) or sending data.
The weights are used to adjust the cost properties of Producer, Consumer, and
of the intermediate Nodes.

Fig. 1. The TWGPP system

Each agent of the scenario can be modelled by considering its local states,
local actions, local protocol, local evolution function, local weight function, the
local clocks, the clock constraints, invariants, and local valuation function. Fig. 1
shows the local states, the possible actions, and the protocol, the clock con-
straints, invariants and weights for each agent. Null actions are omitted in the
ﬁgure.
Given Fig. 1, the local evolution functions of TWGPP are straightforward
to infer. Moreover, we assume the following set of propositional variables: PV =
{P rodReady, P rodSend, ConsReady, ConsF ree} with the following deﬁnitions
of local valuation functions:
646 A.M. Zbrzezny and A. Zbrzezny

–VP (P rodReady-0) = {P rodReady}, VP (P rodSend-1) = {P rodSend},

–VC (ConsReady-0) = {ConsReady}, VC (ConsF ree-1) = {ConsF ree}.
n
Let Act = ActP × i=1 ActNi × ActC , with ActP = {P roduce, Send1 }, ActC
= {Startn+1 , Consume, Sendn+1 }, ans ActNi = {Starti , Sendi , Sendi+1 ,
P roci } deﬁnes the set of joint actions for the scenario. For a ∈ Act let actP (a)
denotes an action of Producer, actC (a) denotes an action of Consumer, and
actNi (a) denotes an action of Node i. We assume the following local evolution
functions:

–tP (P rodReady, x0 ≥ a, ∅, a) = P rodSend, if actP (a) = P roduce

–tP (P rodSend, true, {x0 }, a) = P rodReady, if actP (a) = Send1 and actNi (a) =
Send1
–tC (ConsStart, true, {xn+1 }, a) = ConsReady, if actC (a) = Startn+1
–tC (ConsReady, xn+1 ≥ c, {xn+1 }, a) = ConsF ree, if actC (a) = Sendn+1 and
actNn (a) = Sendn+1
–tC (ConsF ree, xn+1 ≥ g, {xn+1 }, a) = ConsReady, if actC (a) = Consume

Finally, we assume the following two local weight functions for each agent:

–dP (P roduce) = 4, dP (send1 ) = 2, dC (Consume) = 4, dC (sendn+1 ) = 2,

dNi (sendi ) = dN i (sendi+1 ) = dNi (P roci ) = 2.
–dP (P roduce) = 4000, dP (send1 ) = 2000, dC (Consume) = 4000,
dC (sendn+1 ) = 2000, dNi (sendi ) = dN i (sendi+1 ) = dNi (P roci ) = 2000.

The set ofnall the global states S for the scenario is deﬁned as the product
(LP × IN) × i=1 (Li × IN) × (LC × IN). The set of the initial states is deﬁned as
ι = {s0 }, where
s0 = ((P rodReady-0, 0), (N ode1 Ready-0, 0), . . . , (N oden Ready-
0, 0), (ConsReady-0, 0)).

The system is scaled according to the number of its Nodes (agents), i.e., the
problem parameter n is the number of Nodes. For any natural number n ≥ 0, let
D(n) = {1, 3, . . . , n − 1, n + 1} for an even n, and D(n) = {2, 4, . . . , n − 1, n + 1}
for an odd n. Moreover, let
j j−1
r(j) = dP (P roduce) + 2 · i=1 dNi (Sendi ) + i=1 ·dNi (proci )
Then we deﬁne Right as follows:

Right = j∈D(n) r(j).
We consider the following formulae as speciﬁcations:
Checking WECTLK Properties of Timed Real-Weighted Interpreted Systems 647

ϕ1 = EF[0,Right) (ConsF ree) - it states that there exists a path on which

Consumer receives a data and the cost of receiving the data will be less
than Right.
ϕ2 = EF[0,Right) (ConsF ree ∧ EG(P rodSend ∨ ConsF ree)) - it states that
there exists a path on which Consumer receives a data and the cost of
receiving the data is less than Right and from that point there exists a
path on which always either the Producer has sent a data or the
Consumer has received a data.
ϕ3 = KP (EF[0,Right) (ConsF ree ∧ EG(P rodSend ∨ ConsF ree))) - it states
that it is not true that Producer knows that there exists a path on which
Consumer receives a data and the cost of receiving the data is less than
Right and from that point there exists a path on which always either
the Producer has sent a data or the Consumer has received a data.
ϕ4 = KP (EF[0,Right) (ConsF ree ∧ KC KP (EG(P rodSend ∨ ConsF ree))))
- it states that it is not true that Producer knows that there exists a path
on which Consumer receives a data and the cost of receiving the data is
less than Right and at that point it is not true that Consumer knows
that it is not true that Producer knows that there exists a path on
which always either the Producer has sent a data or Consumer has
received a data.
The number of the considered k-paths is equal to 1 for ϕ1 , 2 for ϕ2 , 3 for ϕ3 ,
and 5 for ϕ4 , respectively. The length of the witness is (n + 1) · 4 for the formula
ϕ1 , 9 if n = 1, and (n + 1) · 4 if n > 1 for the formula ϕ2 , 2 · n + 4 if n ∈ {1, 2}
and, 2 · n + 2 if n > 2 for the formula ϕ3 , 2 · n + 2 for the formula ϕ4 , respectively.

Performance Evaluation. We have performed our experimental results on

a computer equipped with I7-3770 processor, 32 GB of RAM, and the operating
system Arch Linux with the kernel 3.19.2. We set the CPU time limit to 3600
seconds. Our SMT-based BMC algorithm is implemented as standalone program
written in the programming language C++. We used the state of the art SMT-
solver Z3 [6] (http://z3.codeplex.com/).
For properties ϕ1 , ϕ2 , ϕ4 , and ϕ4 we have scaled up both the number of nodes
and the weights parameters. The results are summarised on charts in Fig. 2, Fig. 3,
Fig. 4, and Fig. 5. One can observed that our SMT-based BMC is not sensitive
(Fig. 2, Fig. 4, Fig. 5) to scaling up the weights, but it is sensitive to scaling up
the size of benchmark. More precisely, in order to calculate results for ϕ1 and for
TWGPP with 1 node and the basic weights (bwfor short), the bwmultiplied by 1,000
our method uses 13.4 MB and the test lasts less than 0.1 seconds. In order to cal-
culate results for ϕ1 and for TWGPP with 23 nodes and the bwmultiplied by 1,000
our method uses 236.1 MB and the test lasts 4510.0 seconds. The most interesting
result which can be observed is for the formulae ϕ2 . In this case time usage for the
bwis greater (9013.1 seconds) than for the bwmultiplied by 1,000 (570.9 seconds) for
11 nodes. In particular, in the time limit set for the benchmark, the SMT-based
BMC is able to verify the formula ϕ2 for the bwonly for 11 nodes while for the
bwmultiplied by 1,000 can handle 15 nodes.
648 A.M. Zbrzezny and A. Zbrzezny

Total time usage for a TWGPP, ϕ1 Memory usage for a TWGPP, ϕ1

5000 250
4500 t=1 t=1
4000 t = 1000 200 t = 1000

Memory in MB
3500
Time in sec.

3000 150
2500
2000 100
1500
1000 50
500
0 0
1 3 5 7 9 11 13 15 17 19 21 23 1 3 5 7 9 11 13 15 17 19 21 23
Number of Nodes Number of Nodes

Fig. 2. Formula ϕ1 : Scaling up both the number of nodes and weights.

Total time usage for a TWGPP, ϕ2 Memory usage for a TWGPP, ϕ2

10000 4500
9000 t=1 4000 t=1
8000 t = 1000 t = 1000
3500
Memory in MB

7000
Time in sec.

3000
6000
2500
5000
2000
4000
3000 1500
2000 1000
1000 500
0 0
1 3 5 7 9 11 13 15 1 3 5 7 9 11 13 15
Number of Nodes Number of Nodes

Fig. 3. Formula ϕ2 : Scaling up both the number of nodes and weights.

Total time usage for a TWGPP, ϕ3 Memory usage for a TWGPP, ϕ3

3000 14000
t=1 t=1
2500 t = 1000 12000 t = 1000
Memory in MB

10000
Time in sec.

2000
8000
1500
6000
1000
4000
500 2000
0 0
1 2 4 6 8 10 12 14 16 1 2 4 6 8 10 12 14 16
Number of Nodes Number of Nodes

Fig. 4. Formula ϕ3 : Scaling up both the number of nodes and weights.

Checking WECTLK Properties of Timed Real-Weighted Interpreted Systems 649

Total time usage for a TWGPP, ϕ4 Memory usage for a TWGPP, ϕ4

1600 20000
t=1 18000 t=1
1400 t = 1000 t = 1000
16000
1200

Memory in MB
14000
Time in sec.

1000 12000
800 10000
600 8000
6000
400
4000
200 2000
0 0
1 2 3 4 5 6 1 2 3 4 5 6
Number of Nodes Number of Nodes

Fig. 5. Formula ϕ4 : Scaling up both the number of nodes and weights.

In the case of properties ϕ3 and ϕ4 we obtained similar results. Namely, in

order to calculate results for ϕ3 and for TWGPP with 16 nodes and the bw, the
bwmultiplied by 1,000, our method uses 13074.2 (13072.0) MB and the test lasts
1864.4 (2624.5) sec. Next, in order to calculate results for ϕ4 and for TWGPP
with 6 nodes and the bw, the bwmultiplied by 1,000, our method uses 17904.5
(19240.9) MB and the test lasts 1536.9 (1424.4) sec.
In Tables 1 and 2 we present time usage, memory usage and the length of
the witness for the formula ϕ2 .

Table 1. Formula ϕ2 for basic weights Table 2. Formula ϕ2 for basic weights
multiplied by 1,000

n time memory witness length n time memory witness length

1 0.1 15.1 9 1 0.2 14.7 8
2 0.5 17.6 12 2 0.8 17.5 12
3 1.6 21.4 16 3 1.8 21.6 16
4 4.2 26.5 20 4 5.7 27.1 20
5 10.2 43.7 24 5 12.5 43.3 24
6 21.0 84.7 28 6 25.6 86.5 28
7 112.8 176.7 32 7 47.6 177.1 32
8 275.8 292.4 36 8 105.1 290.8 36
9 879.6 451.3 40 9 172.5 452.1 40
10 2284.4 707.9 44 10 290.2 705.3 44
11 9013.1 1056.5 48 11 570.9 1060.0 48
12 879.9 1557.1 52
13 1399.8 2198.9 56
14 2892.7 3078.3 60
15 3660.4 4222.0 64
650 A.M. Zbrzezny and A. Zbrzezny

5 Conclusions
We have proposed SMT-based BMC veriﬁcation method for model checking
WECTLK properties interpreted over the timed real-weighted interpreted sys-
tems. We have provided a preliminary experimental results showing that our
method is worth interest. In the future we are going to provide a comparison
of our new method with the SAT- and BDD-based BMC methods. The module
will be added to the model checker VerICS([3]).

Acknowledgments. Partly supported by National Science Centre under the grant

No. 2014/15/N/ST6/05079. The study is co-funded by the European Union, European
Social Fund. Project PO KL “Information technologies: Research and their interdisci-
plinary applications”, Agreement UDA-POKL.04.01.01-00-051/10-00.

References
1. Emerson, E.A.: Temporal and modal logic. In: van Leeuwen, J. (eds.) Handbook of
Theoretical Computer Science, vol. B, chapter 16, pp. 996–1071. Elsevier Science
Publishers (1999)
2. Fagin, R., Halpern, J.Y., Moses, Y., Vardi, M.Y.: Reasoning about Knowledge.
MIT Press, Cambridge (1995)
3. Kacprzak, M., Nabialek, W., Niewiadomski, A., Penczek, W., Pólrola, A., Szreter,
M., Woźna, B., Zbrzezny, A.: VerICS 2007 - a model checker for knowledge and
real-time. Fundamenta Informaticae 85(1–4), 313–328 (2008)
4. Levesque, H.: A logic of implicit and explicit belief. In: Proceedings of the 6th
National Conference of the AAAI, pp. 198–202. Morgan Kaufman, Palo Alto (1984)
5. Lomuscio, A., Sergot, M.: Deontic interpreted systems. Studia Logica 75(1), 63–92
(2003)
6. de Moura, L., Bjørner, N.S.: Z3: An eﬃcient SMT solver. In: Ramakrishnan, C.R.,
Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg
(2008)
7. Wooldridge, M.: An introduction to multi-agent systems, 2nd edn. John Wiley &
Sons (2009)
8. Woźna-Szcześniak, B.: SAT-Based bounded model checking for weighted deontic
interpreted systems. In: Reis, L.P., Correia, L., Cascalho, J. (eds.) EPIA 2013.
LNCS, vol. 8154, pp. 444–455. Springer, Heidelberg (2013)
9. Woźna-Szcześniak, B.: Checking EMTLK properties of timed interpreted systems
via bounded model checking. In: Bazzan, A.L.C., Huhns, M.N., Lomuscio, A.,
Scerri, P. (eds.) International Conference on Autonomous Agents and Multi-Agent
Systems, AAMAS 2014, Paris, France, May 5–9, pp. 1477–1478. IFAAMAS/ACM
(2014)
10. Woźna-Szcześniak, B., Zbrzezny, A.M., Zbrzezny, A.: SAT-Based bounded model
checking for weighted interpreted systems and weighted linear temporal logic. In:
Boella, G., Elkind, E., Savarimuthu, B.T.R., Dignum, F., Purvis, M.K. (eds.)
PRIMA 2013. LNCS, vol. 8291, pp. 355–371. Springer, Heidelberg (2013)
11. Zbrzezny, A.: Improving the translation from ECTL to SAT. Fundamenta Infor-
maticae 85(1–4), 513–531 (2008)
12. Zbrzezny, A.: A new translation from ECTL∗ to SAT. Fundamenta Informaticae
120(3–4), 377–397 (2012)
SMT-Based Bounded Model Checking
for Weighted Epistemic ECTL

Agnieszka M. Zbrzezny(B) , Bożena Woźna-Szcześniak, and Andrzej Zbrzezny

IMCS, Jan Dlugosz University,

Al. Armii Krajowej 13/15, 42-200 Czȩstochowa, Poland
{agnieszka.zbrzezny,b.wozna,a.zbrzezny}@ajd.czest.pl

Abstract. We deﬁne the SMT-based bounded model checking (BMC)

method for weighted interpreted systems and for the existential fragment
of the weighted epistemic computation tree logic. We implemented the
new BMC algorithm and compared it with the SAT-based BMC method
for the same systems and the same property language on several bench-
marks for multi-agent systems.

1 Introduction
The previous ten years in the area of multi-agent systems (MASs) have seen
significant research in verification procedures, which automatically evaluate
whether a MAS reaches its intended specifications. One of main techniques
here is symbolic model checking [2]. Unfortunately, because of the agents’ intri-
cate nature, the practical applicability of model checking is firmly limited by
the “state-space explosion problem” (i.e., an exponential growth of the system
state space with the number of agents). To reduce this issue, various techniques,
including the SAT- and BDD-based bounded model checking (BMC) [3,4], have
been proposed. These have been effective in permitting users to handle bigger
MASs, however it is still hard to check MASs with numerous agents and cost
demands on agents’ actions. The point of this paper is to help beat this inad-
equacy by employing SMT-solvers (i.e., satisfiability modulo theories tools for
deciding the satisfiability of formulae in a number of theories) [1,5].
The fundamental thought behind bounded model checking (BMC) is, given
a system, a property, and an integer bound k ≥ 0, to define a formula (in the
case of SAT-based BMC, this is a propositional logic formula; in the case of
SMT-based BMC, this can be a quantifier-free first-order formula) such that the
formula is satisfiable if and only if the system has a counterexample of length
at most k violating the property. The bound is incremented until a satisfiable
formula is discovered (i.e, the specification does not hold for the system) or a
completeness threshold is reached without discovering any satisfiable formulae.
To model check the prerequisites of MASs, different extensions of temporal
logics have been proposed. In this paper, we consider the existential fragment of
Partly supported by National Science Centre under the grant No. 2014/15/N/
ST6/05079.

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 651–657, 2015.
DOI: 10.1007/978-3-319-23485-4 65
652 A.M. Zbrzezny et al.

a weighted epistemic computation tree logic (WECTLK) interpreted over WISs

[6].
To the best of our knowledge, there is no work that considers SMT-based
BMC methods to check multi-agent systems modelled by means of interpreted
systems. Thus, in this paper we offer such a method. In particular, we make
the following contributions. First of all, we define and implement an SMT-based
BMC method for WECTLK and for weighted interpreted systems (WISs) [6,7].
Next, we report on the initial experimental evaluation of our SMT-based BMC
methods. To this aim we use two scalable benchmarks: the weighted generic
pipeline paradigm [7] and the weighted bits transmission problem [6]. Finally, we
compare our prototype implementation of the SMT-based BMC method against
the SAT-based BMC engine of [6,8], the only existing technique that is suit-
able with respect to the input formalism and checked properties. The results
show that the SMT-based BMC performs very good and is, in fact, sometimes
significantly faster than the tested SAT-based BMC method.
The rest of the paper is organised as follows. In the next section we briefly
present the theory of weighted interpreted systems and the WECTLK language.
In Section 3 we present our SMT-based BMC method. In Section 4 we experimen-
tally evaluate the performance of our SMT-based BMC encoding. In Section 5,
we conclude the paper.

2 Preliminaries
WIS. Let Ag = {1, . . . , n} be the non-empty and finite set of agents, and E
be a special agent that is used to model the environment in which the agents
operate, and let PV = c∈Ag∪{E} PV c be a set of propositional variables such
that PV c1 ∩ PV c2 = ∅ for all c1 , c2 ∈ Ag ∪ {E}. The weighted interpreted system
(WIS) [6,7] is a tuple ({Lc , Actc , Pc , Vc , dc }c∈Ag∪{E} , {tc }c∈Ag , tE , ι), where Lc
is non-empty and finite set of local states (S = L1 × . . . × Ln × LE denotes the
non-empty set of all global states), Actc is a non-empty and finite set of possible
actions (Act = Act1 × . . . × Actn × ActE denotes the non-empty set of joint
actions), Pc : Lc → 2Actc is a protocol function, Vc : Lc → 2PV c is a valuation
function, dc : Actc → IN is a weight function, tc : Lc × LE × Act → Lc is
a (partial) evolution function for agents, and tE : LE × Act → LE is (partial)
evolution function for the environment, and ι ⊆ S is a set of initial global states.
Assume that lc (s) denotes the local component of agent c ∈ Ag ∪ {E} in the
global state s ∈ S. For a given WIS we define a model as a tuple M = (Act, S, ι,
T, V, d), where the sets Act and S aredefined as above, V : S → 2PV is the
valuation function defined as V(s) = c∈Ag∪{E} Vc (lc (s)), d : Act → IN is a
“joint” weight function defined as d((a1 , . . . , an , aE )) = d1 (a1 ) + . . . + dn (an ) +
dE (aE ), and T ⊆ S × Act × S is a transition relation defined as: (s, a, s ) ∈ T (or
a
s −→ s ) iff tc (lc (s), lE (s), a) = lc (s ) for all c ∈ Ag and tE (lE (s), a) = lE (s ); we
assume that the relation T is total, i.e. for any s ∈ S there exists s ∈ S and a
a
non-empty joint action a ∈ Act such that s −→ s .
SMT-Based Bounded Model Checking for Weighted Epistemic ECTL 653

For each agent c ∈ Ag, the deﬁnition of the standard indistinguishability

relation ∼c ⊆ S × S is the following: s ∼c s iﬀ lc (s ) = lc (s). Finally, the
def C def
following deﬁnitions of epistemic relations: ∼EΓ =
E +
c∈Γ ∼c , ∼Γ = (∼Γ ) (the
D def

transitive closure of ∼E
Γ ), ∼Γ = c∈Γ ∼c , where Γ ⊆ Ag are assumed.

Syntax of WECTLK. The WECTLK logic has been defined in [6] as the
existential fragment of the weighted CTLK with cost constraints on all temporal
modalities.
For convenience, the symbol I denotes an interval in IN = {0, 1, 2, . . .} of the
form [a, ∞) or [a, b), for a, b ∈ IN and a = b. Moreover, the symbol right(I)
denotes the right end of the interval I. Given an atomic proposition p ∈ PV, an
agent c ∈ Ag, a set of agents Γ ⊆ Ag and an interval I, the WECTLK formulae
are defined by the following grammar: ϕ ::= true | false | p | ¬p | ϕ ∨ ϕ | ϕ ∧ ϕ |
EXI ϕ | E(ϕUI ϕ) | EGI ϕ | Kc ϕ |DΓ ϕ | EΓ ϕ | CΓ ϕ.
E (for some path) is the path quantifier. XI (weighted neXt time), UI
(weighted until) and GI (weighted always) are the weighted temporal modal-
ities. Note that the formula “weighted eventually” is defined as standard:
def
EFI ϕ = E(trueUI ϕ) (meaning that it is possible to reach a state satisfying
ϕ via a finite path whose cumulative weight is in I). Kc is the modality dual to
Kc . DΓ , EΓ , and CΓ are the dualities to the standard group epistemic modalities
representing, respectively, distributed knowledge in the group Γ , everyone in Γ
knows, and common knowledge among agents in Γ .
We omit here the definition of the bounded (i.e., the relation |=k ) and
unbounded semantics (i.e., the relation |=) of the logic, since they can be found
in [6] . We only recall the notions of k-paths and loops, since we need them to
explain the SMT-based BMC. Namely, given a model M and a bound k ∈ IN,
a1 a2 ak
a k-path πk is a finite sequence s0 −→ s1 −→ . . . −→ sk of transitions. A k-path
πk is a loop if l < k and π(k) = π(l). Furthermore, let M be a model, and ϕ
a WECTLK formula. The bounded model checking problem asks whether there
exists k ∈ IN such that M |=k ϕ, i.e., whether there exists k ∈ IN such that the
formula ϕ is k-true in the model M .

3 SMT-Based BMC
In order to encode the BMC problem for WECTLK by means of SMT, we
consider a quantifier-free logic with individual variables ranging over the natural
numbers. Formally, let M be the model, ϕ a WECTLK formula and k ≥ 0 a
bound. We define the quantifier-free first-order formula: [M, ϕ]k := [M ϕ,ι ]k ∧
[ϕ]M,k that is satisfiable if and only if M |=k ϕ holds.
The definition of the formula [M, ϕ]k is based the SAT encoding of [6], and
it assumes that each state, each joint action, and each sequence of weights
associated with a joint action are represented by a valuation of, respectively,
a symbolic state w = (w1 , . . . , wn , wE ) consisting of symbolic local states wc ,
a symbolic action a = (a1 , . . . , an , aE ) consisting of symbolic local actions ac ,
and a symbolic weights d = (d1 , . . . , dn+1 ) consisting of symbolic local weights
654 A.M. Zbrzezny et al.

dc , where each wc , ac , and dc are individual variables ranging over the natu-
ral numbers, for c ∈ Ag ∪ {E}. Next, the definition of [M, ϕ]k uses the auxil-
iary function fk :WECTLK → IN of [6] which returns the number of k-paths
that are required for proving the k-truth of ϕ in M . Finally, the definition of
[M, ϕ]k uses the following auxiliary quantifier-free first-order formulae: Is (w) -
it encodes the state s of the model M ; p(w) - it encodes the set of states of M
in which p ∈ PV holds; Hc (w, w ) := wc = w c for c ∈ Ag; Tc (wc , (a, d), w c ) -
it encodes the local evolution function of agent c ∈ Ag ∪ {E}; A(a) - it encodes
that each symbolic local action ac of a has to be executed by each agent in
which it appears; T (w, (a, d), w ) := A(a) ∧ c∈Ag∪{E} Tc (wc , (a, d), wc ). Let
πj denote the j-th symbolic k-path, i.e. the sequence of symbolic transitions:
a1,j ,d1,j a2,j ,d2,j ak,j ,dk,j
w0,j −→ w1,j −→ . . . −→ wk,j , and let di,j,m denotes the m-th com-
ponent of the symbolic joint weight di,j . Then,
k n+1
– BkI (πj ) := i=1 m=1 di,j,m < right(I) - it encodes that the weight repre-
sented by the sequence d1,j , . . . , dk,j is less than right(I);
I
– Da,b (πj ) for a ≤ b - if a < b, then it encodes that the weight represented by
the sequence da+1,j , . . . , db,j belongs to the interval I, otherwise, i.e. if a = b,
I
then Da,b (πj ) is true iff 0 ∈ I;
I
– Da,b;c,d (πj ) for a ≤ b and c ≤ d - it encodes that the weight represented by
the sequences da+1,j , . . . , db,j and dc+1,j , . . ., dd,j belongs to the interval I.
Given symbolic states wi,j , symbolic actions ai,j and symbolic weights di,j
for 0 ≤ i ≤ k and 0 ≤ j < fk (ϕ), the formula [M ϕ,ι ]k , which encodes a rooted
tree of k-paths of the model M , is defined as follows:
fk (ϕ)−1 k−1

ϕ,ι
[M ]k := Is (w0,0 ) ∧ T (wi,j , (ai+1,j , di+1,j ), wi+1,j )
s∈ι j=0 i=0

The formula [ϕ]M,k encodes the bounded semantics of the WECTLK formula
ϕ, it is deﬁned on the same sets of individual variables as the formula [M ϕ,ι ]k ,
and it uses the auxiliary functions gμ , hU G
k , hk of [9] that allow us to divide the
set A ⊆ Fk (ϕ) = {j ∈ IN | 0 ≤ j < fk (ϕ)} into subsets necessary for translating
the subformulae of ϕ.
[m,n,A]
Let [ϕ]k denote the translation of ϕ at symbolic state wm,n by using the
[0,0,Fk (ϕ)]
set A ⊆ Fk (ϕ). The formula [ϕ]M,k := [ϕ]k is deﬁned inductively with the
classical rules for the propositional fragment of WECTLK and with the following
rules for weighted temporal and epistemic modalities. Let 0 ≤ n ≤ fk (ϕ), m k,
n = min(A), hU U G G
k = hk (A, fk (β)), and hk = hk (A). Then,
[m,n,A] I [1,n ,g (A)]
–[EXI α]k := wm,n = w0,n ∧ D0,1 (πn ) ∧ [α]k µ
, if k > 0; false, else,
[m,n,A] k [i,n ,hUk (k)] I
–[E(αUI β)]k := wm,n = w0,n ∧ i=0 ([β]k ∧ D0,i (πn )∧
i−1 [j, n ,hUk (j)]
j=0 [α]k ),
SMT-Based Bounded Model Checking for Weighted Epistemic ECTL 655

k
:= wm,n = w0,n ∧ (¬BkI (πn ) ∧ i=0 (¬D0,i I
[m,n,A]
–[E(GI α)]k (πn )∨
[i,n ,hG (k)] k [i,n
,hG
(k)]
[α]k k
)) ∨ (BkI (πn ) ∧ i=0 (¬D0,iI
(πn ) ∨[α]k k
)∧
k−1 k−1 I
G
[i,n ,hk (k)]
l=0 (w k,n = w l,n ∧ i=l (¬D0,k;l,i+1 (πn ) ∨[α]k ))) ,

[m,n,A] k [j,n ,gµ (A)]

–[Kc α]k := ( s∈ι Is (w0,n )) ∧ j=0 ([α]k ∧ Hc (wm,n , wj,n )),
[m,n,A] k [j,n ,gµ (A)]
–[DΓ α]k := ( s∈ι Is (w0,n )) ∧ j=0 ([α]k ∧ c∈Γ Hc (wm,n , wj,n )),
[m,n,A] k [j,n ,gµ (A)]
–[EΓ α]k := ( s∈ι Is (w0,n )) ∧ j=0 ([α]k ∧ c∈Γ Hc (wm,n , wj,n )),
[m,n,A] k
:= [ j=1 (EΓ )j α]k
[m,n,A]
–[CΓ α]k .
The theorem below states the correctness and the completeness of the pre-
sented translation. It can be proven by induction on the length of the given
WECTLK formula.
Theorem 1. Let M be a model, and ϕ a WECTLK formula. For every k ∈
IN, M |=k ϕ if, and only if, the quantifier-free first-order formula [M, ϕ]k is
satisfiable.
The proposed SMT-based BMC is based on the SAT-based BMC defined in
[6]. The main difference between those two method is in the representation of
symbolic states, symbolic actions, and symbolic weights. Thus, the main result
is the generalisation of the propositional encoding of [6] into the quantifier-free
first-order encoding.

4 Experimental Results
Here we experimentally evaluate the performance of our SMT-based BMC
method for WECTLK over the WIS semantics. We compare our method with
the SAT-based BMC [6,8], the only existing method that is suitable with respect
to the input formalism (i.e., weighted interpreted systems) and checked proper-
ties (i.e., WECTLK). We have computed our experimental results on a computer
equipped with I7-3770 processor, 32 GB of RAM, and the operating system Arch
Linux with the kernel 3.15.3. We set the CPU time limit to 3600 seconds. For
the SAT-based BMC we used the PicoSAT solver and for the SMT-based BMC
we used the Z3 solver.
The ﬁrst benchmark we consider is the weighted generic pipeline
paradigm (WGPP) WIS model [6]. The problem parameter n is the number of
Nodes. Let M in be the minimum cost incurred by Consumer to receive the data
produced by Producer, and p denote the cost of producing data by Producer.
The speciﬁcations we consider are as follows:
ϕ1 = KP EF[M in,M in+1) ConsReady - it expresses that it is not true that Pro-
ducer knows that always the cost incurred by Consumer to receive data is
M in.
ϕ2 = KP EF(P rodSend ∧ KC KP EG[0,M in−p) ConsReady) - it states that it is
not true that Producer knows that always if it produces data, then Consumer
knows that Producer knows that Consumer has received data and the cost is
less than M in − p.
656 A.M. Zbrzezny et al.

The size of the reachable state space of the WGPP system is 4 · 3n , for
n ≥ 1. The number of the considered k-paths is equal to 2 for ϕ1 and 5 for
ϕ2 , respectively. The lengths of the discovered witnesses for formulae ϕ1 and ϕ2
vary, respectively, from 3 for 1 node to 23 for 130 nodes, and from 3 for 1 node
to 10 for 27 nodes.
The second benchmark of our interest is the weighted bits transmission prob-
lem (WBTP) WIS model [7]. We have adapted the local weight functions of [7].
This system is scaled according to the number of bits the S wants to communi-
cate to R. Let a ∈ IN and b ∈ IN be the costs of sending, respectively, bits by
Sender and an acknowledgement by Receiver. The specifications we consider are
as follows: 2n −2
φ1 = EF[a+b,a+b+1) (recack ∧ KS (KR ( i=0 (¬i)))) - it expresses that it is not
true that if an ack is received by S, then S knows that R knows at least one
value of the n-bit numbers except the maximal value, and the cost is a + b.
2n −1
φ2 = EF[a+b,a+b+1) (KS ( i=0 (KR (¬i))) - it expresses that it is not true that
S knows that R knows the value of the n-bit number and the cost is a + b.
The size of the reachable state space of the WBTP system is 3 · 2n for n ≥
1.The number of the considered k-paths is equal to 3 for φ1 and 2n + 2 for φ2 ,
respectively. The length of the witnesses for both formulae is equal to 2 for any
n > 0.
Performance Evaluation. The experimental results show that the both BMC
method, SAT- and SMT-based, are complementary. We have noticed that for
the WGPP system and both considered formulae the SMT-based BMC is faster
than the SAT-base BMC, however, the SAT-based BMC consumes less memory.
Moreover, the SMT-based method is able to verify more nodes for both tested
formulae. In particular, in the time limit set for the benchmarks, the SMT-based
BMC is able to verify the formula ϕ1 for 120 nodes while the SAT-based BMC
can handle 115 nodes. For ϕ2 the SMT-based BMC is still more efficient - it is
able to verify 27 nodes, whereas the SAT-based BMC verifies only 25 nodes.
In the case of the WBTP system the SAT-based BMC performs much better
in terms of the total time and the memory consumption for both the tested
formulae. In the case of the formula φ2 both methods are able to verify the same
number of bits. For the WBTP the reason of a higher efficiency of the SAT-
based BMC is, probably, that the lengths of the witnesses for both formulae
is constant and very short, and that there is no nested temporal modalities in
the scope of epistemic operators. For formulae like φ1 and φ2 the number of
arithmetic operations is small, so the SMT-solvers cannot show its strength.
Further more we have noticed that the total time and the memory consump-
tion for both benchmarks and all the tested formulae is independent from the
values of the considered weights.
SMT-Based Bounded Model Checking for Weighted Epistemic ECTL 657

5 Conclusions
We have proposed, implemented, and experimentally evaluated SMT-based
bounded model checking approach for WECTLK interpreted over the weighted
interpreted systems. We have compared our method with the corresponding
SAT-based technique. The experimental results show that the approaches are
complementary, and that the SMT-based BMC approach appears to be superior
for the WGPP system, while the SAT-based approach appears to be superior
for the WBTP system. This is a novel and interesting result, which shows that
the choice of the BMC method should depend on the considered system.

References
1. Clark, B., Sebastiani, R., Sanjit, S., Tinelli, C.: Satisfiability modulo theories. In:
Handbook of Satisfiability. Frontiers in Artificial Intelligence and Applications, vol.
185, chapter 26, pp. 825–885. IOS Press (2009)
2. Clarke, E.M., Grumberg, O., Peled, D.A.: Model Checking. The MIT Press (1999)
3. Jones, A.V., Lomuscio, A.: Distributed BDD-based BMC for the verification of
multi-agent systems. In: Proc. AAMAS 2010, pp. 675–682. IFAAMAS (2010)
4. Mȩski, A., Penczek, W., Szreter, M., Woźna-Szcześniak, B., Zbrzezny, A.: BDD- ver-
sus SAT-based bounded model checking for the existential fragment of linear tem-
poral logic with knowledge: algorithms and their performance. Autonomous Agents
and Multi-Agent Systems 28(4), 558–604 (2014)
5. de Moura, L., Bjørner, N.S.: Z3: An efficient SMT solver. In: Ramakrishnan, C.R.,
Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg
(2008)
6. Woźna-Szcześniak, B.: SAT-based bounded model checking for weighted deontic
interpreted systems. In: Reis, L.P., Correia, L., Cascalho, J. (eds.) EPIA 2013.
LNCS, vol. 8154, pp. 444–455. Springer, Heidelberg (2013)
7. Woźna-Szcześniak, B., Zbrzezny, A.M., Zbrzezny, A.: SAT-based bounded model
checking for weighted interpreted systems and weighted linear temporal logic. In:
Boella, G., Elkind, E., Savarimuthu, B.T.R., Dignum, F., Purvis, M.K. (eds.)
PRIMA 2013. LNCS, vol. 8291, pp. 355–371. Springer, Heidelberg (2013)
8. Woźna-Szcześniak, B., Szcześniak, I., Zbrzezny, A.M., Zbrzezny, A.: Bounded model
checking for weighted interpreted systems and for flat weighted epistemic compu-
tation tree logic. In: Dam, H.K., Pitt, J., Xu, Y., Governatori, G., Ito, T. (eds.)
PRIMA 2014. LNCS, vol. 8861, pp. 107–115. Springer, Heidelberg (2014)
9. Zbrzezny, A.: Improving the translation from ECTL to SAT. Fundamenta Informat-
icae 85(1–4), 513–531 (2008)
Dynamic Selection of Learning Objects
Based on SCORM Communication

João de Amorim Junior() and Ricardo Azambuja Silveira

PPGCC – UFSC, Florianópolis, Brazil

[email protected], [email protected]

Abstract. This paper presents a model to select Learning Objects in e-learning

courses, based on multi-agent paradigm, aiming to facilitate the learning ma-
terial reuse and adaptability on Learning Management Systems. The proposed
model has a BDI multi-agent architecture, as an improvement of the Intelligent
Learning Objects approach, allowing the dynamic selection of Learning Ob-
jects. As the first steps of our research, we implement a prototype to validate
the proposed model using the JADEX BDI V3 platform. Thereafter, we extends
the framework to allow the communication of the agents with SCORM standard
resources, making possible to build enhanced dynamic learning experiences.

Keywords: Dynamic learning experience · Intelligent learning objects ·

SCORM

1 Introduction and Related Works

Adaptability and reuse are important aspects that contribute to improve learning
process in virtual learning environments [1]. The former relates to different students’
profiles and needs. An adaptable system increases the student understanding, taking
into account its knowledge level and preferences [2,3,4]. The latter means that it is
unnecessary to develop new resources if there are others related to the same learning
purpose [4,5]. Some computational tools improve the teaching-learning process, i.e.:
(1) Intelligent Tutoring Systems (ITS) - applications created for a specific domain,
generally with few adaptability and interoperability [6]; (2) Learning Management
Systems (LMS) - environments used to build online courses (or publishing material),
allowing teacher to manage educational data [1], [7,8]; and (3) Learning Objects (LO)
- digital artifacts that promotes reuse and adaptability of resources [9]. LO and LMS
provide reusability, but they usually are not dynamically adaptable [8,9]. This article
presents our research that seeks the convergence of these different paradigms for the
development of intelligent learning environments and describes the mechanisms of an
Intelligent Learning Objects’ dynamic presentation model, based on communication
with SCORM (Sharable Content Object Reference Model) resources [19, 20].
There are analogous studies that provides adaptability to learning systems. Some
examples extend the LMS with distinct adaptive strategies, such as conditional jumps
[8], Bayesian networks [3] or data mining [7]. Other researches are not integrated with

F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 658–663, 2015.
DOI: 10.1007/978-3-319-23485-4_66
Dynamic Selection of Learning Objects Based on SCORM Communication 659

a LMS, and use diversified ways to adapt the learning to the students’ style, i.e.: ITS
[6], recommender system [2], genetic algorithm [10] and swarm intelligence [11].
Moreover, there are some similar works based on the Multi-Agent System (MAS)
approach resulting on smarter applications [12,13]. Some of them combine LMS and
MAS to make the former more adaptive [14], and another is a dynamically adaptive
environment, based on agents that are able to identify the student cognitive profile
[14]. These related works identify the student’s profile applying questionnaires in the
beginning of the course or by clustering the students according to their assessments
performance. Additionally, we observe in these papers that the attachment of new LO
to the system is not possible without teacher intervention. The educator needs to con-
figure previously all the possible course paths for each student style, what could be
hard and take so much time [3]. Further, the attaching of a new LO to the course in-
volves modifying its structure, resulting in limited adaptability and reuse.
In order to produce more intelligent LO, we have proposed in previous researches
the convergence between the LO and MAS technologies, called Intelligent Learning
Objects (ILO) [15]. This approach makes possible to offer more adaptive, reusable
and complete learning experiences, following the learner cognitive characteristics and
performance. According to this approach an ILO is an agent capable to play the role
of a LO, which can acquire new knowledge by the interaction with students and other
ILO (agents information exchange), raising the potential of student’s understanding.
The LO metadata permits the identification of what educational topic is related to the
LO [9]. Hence, the ILO (agents) are able to find out what is the subject associated
with the learning experience shown to the student, and then to show complementary
information (another ILO) to solve the student’s lack of knowledge in that subject.

2 ILOMAS

The proposed model integrates MAS and LMS into an intelligent behavior system,
resulting on the improvement of the related works, leading to dynamic LO inclusion.
The objective of the new model called Intelligent Learning Object Multi-Agent Sys-
tem (ILOMAS) is to enhance the framework developed to create ILO based on MAS
with BDI architecture [16], extending this model to allow the production of adaptive
and reusable learning experiences taking advantage of the SCORM data model ele-
ments. The idea is to select dynamically ILO in the LMS according to the student
performance, without previous specific configuration on the course structure. The
proposed model achieves reuse by the combination of pre-existed and validated LO
whose concept is the same of that the student needs to learn about, avoiding the build-
ing of new materials. Moreover, the course structure becomes more flexible, since it
is unnecessary to configure all the possible learning paths for each student profile.
The solution’s adaptability is based on the ability to attach new LO to the LMS
(that was not explicitly added before) as soon as the system finds out that the student
needs to reinforce its understanding on a specific concept. This is automatically iden-
tified through the verification of the student assessment performance (i.e.: grade), on
each instructional unit, or by student choice, when the learner interacting with the LO.
660 J. de Amorim Junior and R.A. Silveira

It is important to clarify that the approach does not use student’ learning profile (i.e.:
textual, interactive [2]) as information to select LOs. The scope of this research is to
consider only the learner performance results (grades, time of interaction, sequencing
and navigation). The ILOMAS is composed by agents with specific goals, and capa-
ble of communicating and offering learning experiences to students in a LMS course,
according to the interaction with these students, taking advantage of the SCORM
standard’s features [19]. The ILOMAS architecture needs two kinds of agents:

• LMSAgent – Finds out the subject that the student must learn about, and passes the
control of the interaction with the student to a new ILOAgent. Its beliefs are data
provided from the LMS database, i.e.: the topic that the student must learn about.
• ILOAgent – Searches for a LO on the repository (related to the topic obtained from
the LMSAgent), and exhibits it to the student. Besides, monitors the interaction be-
tween the student and the LO, which means the analysis of the data received from
the SCORM communication. Depending on the analyzed data (beliefs), the agent
will deliberate the exhibition of another LO (course with dynamic content).

The JADEX BDI V3 (V3) platform was chosen to implement the agents based on
the BDI architecture [12,13]. The design of ILOMAS includes the characteristics of e-
learning courses deployed on LMS (as MOODLE [7]), which means an environment
accessed mostly from Web Browsers. The Java Servlets and JSP technologies are the
bases of the interface between the client side (student) and the server side (agents’
environment), getting benefits of the V3 services communication structure [13]. A
non-agent class based on the Facade design pattern [17] keeps the low coupling be-
tween the MAS layer and the external items (front-end and servlets).
A first prototype was developed and tested with emphasis on the MAS develop-
ment, instead of visualization issues (such as LO formats or graphical user interfaces)
[18]. The simulation of a learning situation resulted on a different LO retrieved from
the repository. This new LO had the same subject as the previous LO shown. It was
not explicitly defined in the database that the student should have watched this new
LO (only the topic was defined, no specific LO), so the MAS obtained the related LO
dynamically, taking into account the metadata elements declared in IEEE-LOM [9].

2.1 ILOMAS and SCORM Integration

The extension of ILOMAS to use the SCORM standard [19,20] raises reuse, dynamic
sequencing, and interoperability on learning environments. The SCORM specification
defines a set of API functions, which allows the communication among the student,
the LO and the LMS. This API admits that the ILOMAS uses the data model elements
to define the student’s knowledge level, and to evaluate the status of the current expe-
rience. Some available elements are the learner’s answers to quizzes (result), the
elapsed time since the beginning of the interaction (latency), the weighting of the
interaction status relative to others, and a description of the LO’s objectives [19]. If
the learner demonstrates difficult in some subject (i.e.: wrong answers in sequence on
the SCORM quiz, or take a long time to interact with the LO without any progress), it
is possible to make decisions based on the historical received data.
Dynamic Selection of Learning Objects Based on SCORM Communication 661

Fig. 1. ILOMAS SCORM Web architecture

The main desire defined d to ILOAgent is to solve the student’s lack of understaand-
ing about the subject. Thuss, when the data received from SCORM points to a learrner
difficulty (error), the ILOM MAS deliberation process (based on the JADEX enggine
[13]) dispatch the goal relatted with this objective. The ILOAgent’s belief base sto ores
the data received, and the deliberation
d process defines that the student needs to vieew a
new (different) LO when th he student selects an incorrect answer. This is the mom ment
when the system achieves a dynamic learning situation, because a new LO not de-
fined previously becomes part of the course structure. From the student’s pointt of
view (and even the teacher’’s point of view) the accessed object was just one, but w with
several contents (a larger LOO composed dynamically by other smaller).
To validate this new version
v of the platform (SCORM integrated), we uused
some SCORM objects (on n the version SCORM 1.2) about Social Security Laaws
(Public Law course). The leearning interaction takes place in a custom LMS developed
with limited features, onlyy to test purposes. The implemented SCORM integrattion
to ILOMAS was tested to reproduce distinct learning situations (Table 1): studdent
that selects all the correcteed answers (Student 1), another that misses all questiions
(Student 2), and one who increases understanding on the subject during interacttion
(Student 3). Each time th hat a student makes a mistake, the ILOMAS identifies
the understanding problem m and suggests another related LO to fill the learnning
gap (Fig. 2).

Table 1. IL
LOMAS SCORM preliminaries evaluation tests

Student Quiz Errorrs Previously Configured New LO Dynamiic

LO Offered Behavioor
Student 1 0 1 0 No
Student 2 4 1 4 Yes
Student 3 1 1 1 Yes
662 J. de Amorim Juniorr and R.A. Silveira

Fig. 2. The ILOMAS SCORM M Web application execution: (1) The moment of the identificaation
that the student needs another LO (wrong answer); (2) New LO exhibition

3 Conclusions and
d Future Work

This research resulted on a prototype implementation to verify the proposed moodel

and its feasibility, followed d by the execution of some evaluation tests. The SCOR RM
API implementation gives to ILOMAS the ability of monitoring the overall comm mu-
nication between the LO an nd the learner, getting benefits of the SCORM data moodel
element (i.e.: interaction staatus and time of the current learning session).
Future work leads to enh hance the analysis of the received SCORM elements, ttak-
ing into consideration the history
h of the student’s experiences, and to explore all the
SCORM data model elemeents in the process of determining if the learner needss to
view a new LO. Another improvement would be the integration of ILOMAS w with
some MAS based recomm mender system for indexing and retrieving the related LO
within the repository [21]. Finally,
F future work involves building a new plugin to innte-
grate the ILOMAS inside th he MOODLE LMS, and testing the application with difffer-
ent learning situations insid
de a LMS production instance, with real students.

References
1. Allison, C., Miller, A., Oliver,
O I., Michaelson, R., Tiropanis, T.: The Web in educattion.
Computer Networks 56, 3811–3824
3 (2012)
2. Vesin, B., Klasnja-Milicev vic, A., Ivanovic, M., Budimac, Z.: Applying recommender systtems
and adaptive hypermediaa for e-learning personalization. Computing and Informatics 32,
629–659 (2013). Institute ofo Informatics
3. Bachari, E., Abelwahed, E., Adnani, M.: E-Learning personalization based on dynaamic
learners’ preference. Interrnational Journal of Computer Science & Information Technollogy
(IJCSIT) 3(3) (2011)
Dynamic Selection of Learning Objects Based on SCORM Communication 663

4. Mahkameh, Y., Bahreininejad, A.: A context-aware adaptive learning system using agents.
Expert Systems with Applications 38, 3280–3286 (2011)
5. Caeiro, M., Llamas, M., Anido, L.: PoEML: Modeling learning units through perspectives.
Computer Standards & Interfaces 36, 380–396 (2014)
6. Santos, G., Jorge, J.: Interoperable Intelligent Tutoring Systems as Open Educational
Resources. IEEE Transactions on Learning Technologies 6(3), 271–282 (2013). IEEE
CS & ES
7. Despotovic-Zrakic, M., Markovic, A., Bogdanovic, Z., Barac, D., Krco, S.: Providing
Adaptivity in Moodle LMS Courses. Educational Technology & Society 15(1), 326–338
(2012). International Forum of Educational Technology & Society
8. Komlenov, Z., Budimac, Z., Ivanovic, M.: Introducing Adaptivity Features to a Regular
Learning Management System to Support Creation of Advanced eLessons. Informatics in
Education 9(1), 63–80 (2010). Institute of Mathematics and Informatics
9. Barak, M., Ziv, S.: Wandering: A Web-based platform for the creation of location-based
interactive learning objects. Computers & Education 62, 159–170 (2013)
10. Chen, C.: Intelligent web-based learning system with personalized learning path guidance.
Computers & Education 51, 787–814 (2008)
11. Kurilovas, E., Zilinskiene, I., Dagiene, V.: Recommending suitable scenarios according to
learners’ preferences: An improved swarm based approach. Computers in Human Beha-
vior 30, 550–557 (2014)
12. Wooldridge, M.: An Introduction to MultiAgent Systems, 2nd edn. John Wiley & Sons
(2009)
13. Pokahr, A., Braubach, L., Haubeck, C., Ladiges, J.: Programming BDI Agents with Pure
Java. University of Hamburg (2014)
14. Giuffra, P., Silveira, R.: A multi-agent system model to integrate Virtual Learning Envi-
ronments and Intelligent Tutoring Systems. International Journal of Interactive Multimedia
and Artificial Intelligence 2(1), 51–58 (2013)
15. Silveira, R., Gomes, E., Vicari, R.: Intelligent Learning Objects: An Agent-Based
Approach of Learning Objects. IFIP – International Federation For Information
Processing, vol. 182, pp. 103–110. Springer-Verlag (2006)
16. Bavaresco, N., Silveira, R.: Proposal of an architecture to build intelligent learning objects
based on BDI agents. In: XX Informatics in Education Brazilian Symposium (2009)
17. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of Reusable
Object-Oriented Software. Addison-Wesley (1995)
18. de Amorim Jr., J., Gelaim, T.Â., Silveira, R.A.: Dynamic e-Learning content selection
with BDI agents. In: Bajo, J., Hallenborg, K., Pawlewski, P., Botti, V., Sánchez-Pi, N.,
Duque Méndez, N.D., Lopes, F., Vicente, J. (eds.) PAAMS 2015 Workshops. CCIS, vol.
524, pp. 299–308. Springer, Heidelberg (2015)
19. SCORM 2004. Advanced Distributed Learning. http://www.adlnet.org/scorm
20. Gonzalez-Barbone, V., Anido-Rifon, L.: Creating the first SCORM object. Computers &
Education 51, 1634–1647 (2008)
21. Campos, R.L.R., Comarella, R.L., Silveira, R.A.: Multiagent based recommendation
system model for indexing and retrieving learning objects. In: Corchado, J.M., Bajo, J.,
Kozlak, J., Pawlewski, P., Molina, J.M., Julian, V., Silveira, R.A., Unland, R., Giroux, S.
(eds.) PAAMS 2013. CCIS, vol. 365, pp. 328–339. Springer, Heidelberg (2013)
Sound Visualization Through a Swarm
of Fireflies

Ana Rodrigues(B) , Penousal Machado, Pedro Martins, and Amı́lcar Cardoso

CISUC, Deparment of Informatics Engineering, University of Coimbra,

Coimbra, Portugal
{anatr,machado,pjmm,amilcar}@dei.uc.pt

Abstract. An environment to visually express sound is proposed. It

is based on a multi-agent system of swarms and inspired by the visual
nature of fireflies. Sound beats are represented by light sources, which
attract the virtual fireflies. When fireflies are close to light they gain
energy and, as such, their bioluminescence is emphasized. Although real
world fireflies do not behave as a swarm, our virtual ones follow a typical
swarm behavior. This departure from biological plausibility is justified
by aesthetic reasons: the desire to promote fluid visualizations and the
need to convey the perturbations caused by sound events. The analysis of
the experimental results highlights how the system reacts to a variety of
sounds, or sequence of events, producing a visual outcome with distinct
animations and artifacts for different musical pieces and genres.

Keywords: Swarm intelligence · Computer art · Multi-agent systems ·

Sound visualization

1 Introduction

Although sound visualization has been an object of study for a long time, the
emergence of the computer, with graphic capabilities, allowed the creation of new
paradigms and creative processes in the area of sound visualization. Most of the
initial experiments were done through analogical processes. Since the advent
of computer science, art has taken significant interest in the use of computers
for the generation of automated images. In section 2, we present some of the
main inspirations to our work including sound visualization, generative artworks,
computer art and multi-agent systems.
Our research question relies on the possibility of developing a multi-agent
model for sound visualization. We explore the intersection between computer
art and nature-inspired multi-agent systems. In the context of this work, swarm
simulations are particularly interesting because they allow the expression of a
large variety of different types of behaviors and tend to be intuitive and natural
forms of interaction.
In section 3 we present the developed project, which is based on a multi-agent
system of swarms and inspired by the visual nature of fireflies. In the scope of our

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 664–670, 2015.
DOI: 10.1007/978-3-319-23485-4 67
Sound Visualization Through a Swarm of Fireflies 665

work, visualization of music is understood as the mapping of a speciﬁc musical

composition or sound into a visual language.
Our environment contains sources of light representing sound beats, which
attract the fireflies. The closer a firefly is to the light, the more emphasized is
its bioluminescence and higher is its chance of collecting energy (life). Using
Reynolds’ boids algorithm [6], fireflies interact with the surrounding environ-
ment by means of sensors. They use them to find and react to energy sources
as well as to other fireflies. In section 4 we present an analysis and correspond-
ing experimental results of the systems behavior to 5 different songs. Lastly, in
section 5 we present our conclusions and further work to be done.

2 Related Work
Ernst Chladni studied thoroughly the relation between sound and image. One
of his best-known achievements was the invention of Cymatics. It geometrically
showed the various types of vibration on a rigid surface [5]. In the 1940s Oskar
Fischinger made cinematographic works exploring the images of sound by means
of traditional animation [4]. His series of 16 studies was his major success [4].
Another geometric approach, was made by Larry Cuba in 1978, but this time
with digital tools. “3/78” consisted of 16 objects performing a series of precisely
choreographed rhythmic transformations [2].
Complex and self-organized systems have a great appeal for the artistic prac-
tice since they can continuously change, adapt and evolve. Over the years, com-
puter artifacts promoting emergent systems behaviors have been explored [1] [7].
Artists got fascinated with the possibility of an unpredictable but satisfying
outcome. Examples of this include the work of Ben F. Laposky, Frieder Nake,
Manfred Mohr, among many others [3].

3 The Environment
In this section we present a swarm-based system of fireflies and all of its interac-
tions. In this environment, fireflies are fed by the energy of sound beats (rhythmic
onsets). While responding to the surrounding elements of the environment, they
search for these energies (see Fig. 1). The colors were chosen according to the

Fig. 1. Systems behavior and appearance example. Best viewed in color.

666 A. Rodrigues et al.

real nature of ﬁreﬂies. Since they are visible at night, we opted for a dark blue in
the background and a brighter one for the sound beats. As for bioluminescence,
we used yellow.
The environment rules and behaviors, plus the visualization were imple-
mented with Processing. The mechanism for extracting typical audio informa-
tion was made with the aid of the Minim library, mainly because it contains a
function for sound beat detection.

3.1 Sound (Energy Sources)

Sound Analysis. To visualize sound, a preliminary analysis is necessary. A

sound is characterized by 3 main parameters: frequency, amplitude and duration.
Frequency determines the pitch of the sound. Amplitude determines how loud
the sound is. Duration can define the rhythm of music and also the instant in
the music where sound beats happen.
We perform sound analysis prior to the visualization, in order to promote
a fluid animation and convey the perturbations caused by sound events. We
compute the main sound characteristics (pitch, volume, sound beats) and export
them to a text file. Sound beats are detected note onsets. They are related to
the temporal/horizontal position of a sound event.
Although the mechanism used to extract audio is not novel and remains
simple, we think this approach is adequate to the goals of our system. It fits in
the amount of expressiveness that we intend to represent in our visualization, as
visual simplicity characterizes the fireflies natural environment

Fig. 2. Graphical representation of sound objects. a - Sound beats instants, b - Ampli-

tude, c - Frequency, d - Collision.

Sound’s Graphic Representation. After the sound analysis, all the proper-
ties of sound are mapped into graphical representations. Sound beats are mapped
into instants (t1, t2, t3,. . . ) which defined the objects horizontal position as
shown in Fig. 2a. Each sound object has a pre-defined duration, meaning that
it is removed from the environment at the end of its duration. Amplitude was
translated into the objects size, i.e., the size is directly proportional to the ampli-
tude (Fig. 2b). Lastly, frequency is mapped into the objects vertical position in
the environment (Fig. 2c). High frequencies (HIF) are positioned on the top
of screen and low frequencies (LOF) emerge in lower positions of the vertical
axis. A fourth characteristic presented in the graphical representation of sound
Sound Visualization Through a Swarm of Fireflies 667

objects is collision (Fig. 2d). This last one is not directly related to sound, only
to sound object’s physics. When a object collides with another one, a contrary
force is applied between these two, separating them from each other.

3.2 Agents (Fireflies)

Agent Behavior. Because the sound beats are presented from the left to right,
fireflies are initially born on the left side of the screen, vertically centered. Agents
are provided with a specific vision towards the surrounding environment. A vision
angle of 30◦ and a depth of 150 pixels were considered as optimum values (Fig. 3),
because the agents could have a high amount of independence and resemble to
their original behavior. Agents motion is based on the “Boids” algorithm. They
walk randomly until they find something that may affect their behavior, such as
source of light or other agents.

Fig. 3. Agent ﬁeld of view: angle (A) and depth (D).

The closer they are to a source of light, the more attracted they get to it,
meaning that there is a force of attraction towards it. Along with that, agents
have a swarming behavior, meaning that neighbor agents can see each others
and follow them through flocking behavior rules [6].
These rules were presented by Reynolds with a computational model of
swarms exhibiting natural flocking behavior. He demonstrated how a particular
computer simulation of boids could produce complex phenomena from simple
mechanisms. These behaviors define how each creature behaves in relation to its
neighbors: separation, alignment or cohesion [6].

Fig. 4. Left image: separation. Right image: cohesion.

The swarming behaviors present in this system are: separation and cohesion
(Fig. 4). Separation gives the agents the ability to maintain a certain distance
668 A. Rodrigues et al.

from others nearby in order to prevent agents from crowding together. Cohe-
sion gives agents the ability to approach and form a group with other nearby
agents [6]. No alignment force was applied. Alignment is usually associated with
flocking behavior, like birds and fishes do. Swarm behavior – like the one found
in bees, flies and our fireflies – does not imply alignment.
Additionally, the life and death of each agent is also determined by the way it
interacts with the environment. The agent begins with an initial lifespan, losing
part of its energy at each cycle. If the agent gets close to an energy source, it
gains more energy and a longer lifespan; otherwise, it keeps losing its energy
until it dies. There are no mechanisms for the rebirth of agents, as we intend to
keep a clear visualization and understanding of interactions among agents.

Agent’s Graphic Representation. Fireﬂies use bioluminescence to commu-

nicate and attract other fireflies. As an agent gradually approaches the light
emitted by a sound object within its field of view, the more excited it gets and
the more it emphasizes its bioluminescence (Fig. 5, left image). This will tem-
porarily influence the agents size because it gets intermittent. The real agent
size will be as big as the energy (Fig. 5, right image) that it has at a certain
instant. When an agent dies, it disappears from the environment.

Fig. 5. Left image: agent approximation to an object (AG→OB). Right image: agent
growth (E).

4 Results and Discussion

This section presents an analysis of the systems behavior in response to 4 dif-
ferent songs or melodic sequences (from track 1 to track 4). These tracks vary
in rhythm, intensity and frequencies, allowing us to illustrate and highlight how
the system reacts to different sound stimuli.
Unfortunately, conveying the overall feel of an animation1 in a paper has its
difficulties. To circumvent this issue and to ease our analysis, first we analyze a

Fig. 6. The music that generated this response is characterized by a variety of intensity
and big density of beats.
1
A demonstration video can be found at http://tinyurl.com/ky7yaql.
Sound Visualization Through a Swarm of Fireﬂies 669

complete visualization of the track so we can perceive the differences inside each
one. Secondly, we present the trajectory made by the agents of the corresponding
music to better analyze their behavior in the different tracks. We present only
one example of those figures due to space constraints.
Track 1 corresponds to a piece with high density of beats and low contrast of
intensities. This promotes a higher chance of having a longer lifespan. However,
the low contrast of the intensities implies that they do not gather so much energy
at once. Track 2 (Fig. 6), is also characterized by a high density of beats, but in
this case the contrast in intensities make swarms gain more energy. Track 3 has
a low contrast of frequencies and a balanced density of beats. For Last, Track
4 as opposed to almost all of the other examples so far described, has a strong
contrast between high and low frequencies. Adding to this, the low density of
beats results in a reduced lifespan for swarms as they have a short field of view.
From the observation of these patterns created by our system, we can con-
clude: (i) fireflies have a tendency to follow the pattern created by the sound
beats as we could see in the example depicted in Fig. 6; (ii) there is a bigger
concentration of fireflies in the sources that contain more energy; (iii) tracks
with a lower contrast between frequencies promote a more balanced spread of
the fireflies in the environment; (iv) tracks with a high density of beats give
fireflies a longer lifespan because the agents have a narrow vision field and thus
they can collect more energy even if it is in small pieces of it.

5 Conclusions and Future Work

We presented an environment to visualize audio signals. It was inspired by the

visual nature of fireflies and based on a multi-agent system of swarms proposed
by Reynolds. In this environment, sound is mapped into light objects with energy,
which attract the virtual fireflies. When fireflies are close to light they gain energy
and, as such, their bioluminescence is emphasized. The flocking behavior of the
group emerges based on simple rules of interaction.
In real life the presented technique may be useful for people with low under-
standing of music to take part in musical events. In further work we will expand
our system by introducing more sophisticated mechanisms for the sound analy-
sis, which allow the representation of higher-level concepts and musical events.
On the other hand, we also wish to explore alternative visual representations to
offer the user a wider array of choices. Finally, a user study should be performed
to assess the strengths and weaknesses of the different visualization variants and
evaluate the system.

Acknowledgments. This research is partially funded by project ConCreTe. Project

ConCreTe acknowledges the ﬁnancial support of the Future and Emerging Technologies
(FET) programme within the Seventh Framework Programme for Research of the
European Commission, under FET grant number 611733.
670 A. Rodrigues et al.

References
1. Barszczewski, P., Cybulski, K., Goliski, K., Koniewski, J.: Constellaction (2013).
http://pangenerator.com
2. Compart: Larry Cuba, 3/78 (nd). http://tinyurl.com/k2y3vef
3. Dietrich, F.: Visual intelligence: The ﬁrst decade of computer art (1965–1975).
Leonardo 19(2), 159–169 (1986)
4. Evans, B.: Foundations of a visual music. Computer Music Journal 29(4), 11–24
(2005)
5. Monoskop: Ernst Chladni (nd). http://monoskop.org/Ernst Chladni
6. Reynolds, C.W.: Steering behaviors for autonomous characters. In: Game Develop-
ers Conference, vol. 1999, pp. 763–782 (1999)
7. Uozumi, Y., Yonago, T., Nakagaito, I., Otani, S., Asada, W., Kanda, R.: Sjq++ ars
electronica (2013). https://vimeo.com/66297512
Social Simulation and Modelling
Analysing the Influence of the Cultural
Aspect in the Self-Regulation of Social
Exchanges in MAS Societies: An Evolutionary
Game-Based Approach

Andressa Von Laer, Graçaliz P. Dimuro(B) , and Diana Francisca Adamatti

PPGCOMP, C3, Universidade Federal Do Rio Grande (FURG), Rio Grande, Brazil
{andressavonlaer,gracaliz,dianaada}@gmail.com

Abstract. Social relationships are often described as social exchanges,

understood as service exchanges between pairs of individuals with the
evaluation of those exchanges by the individuals themselves. Social
exchanges have been frequently used for defining interactions in MAS.
An important problem that arises in the context of social simulation and
other MAS applications is the self-regulation of the social exchange pro-
cesses, so that the agents can achieve/maintain the equilibrium of the
exchanges by themselves, guaranteing the continuation of the interac-
tions in time. Recently, this problem was tackled by defining the spa-
tial and evolutionary Game of Self-Regulation of Social Exchange Pro-
cesses (GSREP), implemented in NetLogo, where the agents evolve their
exchange strategies by themselves over time, performing more equili-
brated and fair interactions. The objective of this paper is to analyse the
problem of the self-regulation of social exchange processes in the con-
text of a BDI-based MAS, adapting the GSREP game to Jason agents
and introducing a cultural aspect, where the society culture, aggregating
the agents’ reputation as group beliefs, influences directly the evolution
of the agents’ exchange strategies, increasing the number of successful
interactions and improving the agents’ outcomes in interactions.

1 Introduction
As it is well known in social sciences, the acts, actions and practices that involve
more than two agents and affect or take account of other agents’s activities, expe-
riences or knowledge states are called social interactions. Social interactions and,
mainly, the quality of these interactions, are crucial for the proper functioning
of the system, since, e.g., communication failure, lack of trust, selfish attitudes,
or unfair behaviors can leave the system far of a solution. The application of
the social interaction concept to enhancements of MAS’s functionality is a nat-
ural step towards designing and implementing more intelligent and human-like
populations of artificial autonomous systems. [13]
Social relationships are often described as social exchanges [1], understood
as service exchanges between pairs of individuals with the evaluation of those

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 673–686, 2015.
DOI: 10.1007/978-3-319-23485-4 68
674 A. Von Laer et al.

exchanges by the individuals themselves [16]. Social exchanges have been fre-
quently used for defining social interactions in MAS [10,15,21]. A fundamental
problem discussed in the literature is the regulation of such exchanges, in order
to allow the emergence of equilibrated exchange processes over time, promot-
ing the continuity of the interactions [12,21], social equilibrium [15,16] and/or
fairness behaviour.1 In particular, this is a difficult problem when the agents,
adopting different social exchange strategies, have incomplete information on the
other agents’ exchange strategies, as in open societies [9].
In the literature (e.g, [9,15,21]), different models were developed (e.g., cen-
tralized/decentralized control, closed/open societies) for the social exchange
regulation problem. Recently, this problem was tackled by Macedo et al. [12],
by introducing the spatial and evolutionary Game of Self-Regulation of Social
Exchange Processes (GSREP), where the agents, adopting different social
exchange strategies (e.g., selfishness, altruism), considering both the short and
long-term aspects of the interactions, evolve their exchange strategies along the
time by themselves, in order to promote more equilibrated and fair interactions.
This approach was implemented in NetLogo.
However, certain characteristics involved in social exchanges may be more
appropriately modeled with cognitive agents2 , such as BDI Agents (Belief,
Desire, Intention) [4]. Also, taking into account the observations made by a
society on the behavior of an agent ag in its past interactions, it is possible to
qualify ag’s reputation [5,6], which can be made available to the other agents
who themselves have not interacted with that agent. These indirect observations
can be aggregated to define any agent past behaviour based on the experiences of
participants in the system [11]. Reputation can assist agents in choosing partners
where there are other agents that can act so as to promote the disequilibrium of
the social exchange processes in the society. Given the importance of this kind
of analysis in many real-world applications, a large number of computational
models of reputation have been developed (e.g., [23,24]).
Then, this paper introduces an evolutionary and cultural approach of GSREP
game for the JaCaMo [2] framework, considering also the influence of the agent
society culture, so defining the Cultural-GSREP game. Observe that here are at
least five basic categories of cultural knowledge that are important in the belief
space of any cultural evolution model: situational, normative, topographic, his-
torical or temporal, and domain knowledge [18]. In this paper, we explore just
the normative category, and let the combination of other cultural aspects for
further work. We consider a specific society’s culture where the agents’ repu-
tations are aggregated as group beliefs [23], using the concept of artifacts [20].
Based on the idea that “the culture of a society evolves too, and its evolution
may be faster than genetics, enabling a better adaptation of the agent to the
environment”[19], we analyse the influence of the culture in the evolution of the

1
We adopted the concept of fairness behaviour/equilibrium as in [17, 25].
2
For discussions on the role of BDI agents in social agent-based simulation, see [14].
Analysing the Inﬂuence of the Cultural Aspect in the Self-Regulation 675

Fig. 1. An overview of the proposed model

agents’ exchange strategies, the increase of the number of successful interactions

and the improvement of the agents’ outcomes in their interactions.3

2 The Cultural-GSREP Game

The Cultural-GSREP game is built on the spatial and evolutionary game

of incomplete information presented in [12], for the self-regulation of social
exchange processes, where agents evolve their strategies in order to maximize
a fitness function by an evolutionary approach. In each simulation cycle, the fit-
ness function evaluates the material result of the exchanges of an agent with its
neighboring agents, getting influence by factors that characterize the exchange
strategies and attitudes adopted by the agents.
For the agent belief learning process, the agent society’s culture (consisted
of a belief space common to all agents) may influence on the decision making in
each single game (two-agent exchange). The addition of a belief space common
to all agents involved in the system works as a focal point (Schelling Point) [22],
serving as reference for agents, since the agents do not have knowledge about the
other agents’ exchange strategies. The belief space used in this paper is based
on the work developed in [23].
Figure 1 shows an overview of the Cultural-GSREP game model, which is
organized in two parts: the first is the social exchange game [12], where each
single exchange occurs in two sequential stages with their respective evaluations
3
We remark that the aim of the paper is not to introduce a novel method to evaluate
reputation. On the contrary, we adopt a very simple method for analysing the repu-
tation in order to show how this cultural aspect may influence the evolution of the
self-regulation of social exchanges. For discussions on the perspectives of culture in
a more general context see [8].
676 A. Von Laer et al.

Fig. 2. Two stages of a single social exchange game (for selﬁshness/altruism exchange
strategies)

by the agents themselves, and a ﬁtness function helps to evolve the agents’s
exchange strategies; the second part is the creation of group beliefs (GBs) using
artifacts, which forms the cultural level based on agents’ reputations constructed
over the exchanges experienced by the agents in the society. The model was
implemented in Jason [3], using the concept of Agents & Artifacts [20] for the
implementation of group beliefs in CArtAgO framework [2].4

3 The Game of Exchange Processes

Figure 2 shows the basic and simplified two-stage sequence of the first part of the
model, where the exchanges between each two agents occur in a single exchange
cycle.5 In the first stage of the game, denoted by I, the agent a offers a service
with some value of investment (R) to the agent b, such that R ≤ Rmax , where
Rmax is the maximal investment value a is willing to have for a service performed
for another agent. This yields a value of b’s satisfaction (S) and debt (T ) for
the service provided by a, which are directly related to a’s investment value.
4
Note that shared artifacts are common in normative and organizational systems (see,
e.g., [7]).
5
The two-stage social exchange was inspired by Piaget’s Theory of Social
Exchanges.[16]
Analysing the Influence of the Cultural Aspect in the Self-Regulation 677

However, if b believes that the service offered by a provides less satisfaction than
the minimal satisfaction (Smin ) it is willing to accept, than b refuses a’s offer
and this exchange stage does not occur. Supposing that b accepts the service
provided by a, then, at the end of this stage, the agent a has a credit value (V ),
that is, a credit related to the service it has previously performed to agent b. R
and S are called material values, related to the performed exchanges. T and V
are virtual values, related to future interactions, since they help the continuation
of the exchanges.
The second stage, denoted by II, is similar to the first, but referring to a
possible debt collection by agent a, when a charges b for a service in payment for
its virtual value (V ) (the credit a has obtained from b in the stage I). The agent
b has on its belief base a debt value (T ) and it then performs a service offer with
an investment value (R) to a (with R ≤ Rmax ), which in turn generates a value of
satisfaction (S) for b’s offer, in case that it accepts such satisfaction value (i.e.,
S ≥ Smin ), otherwise this exchange stage does not occur. After each 2-stage
exchange between a and b, they calculate the material reward they received,
using the payoff function pab : [0, 1]4 → [0, 1]:
pab (RIab , RIIba , SIba , SIIab ) = (1)
⎧ 1−R
⎪ Iab + SIIab
⎪
⎪ if (RIab ≤ Ra
max
∧ SIba ≥ Sbmin ) ∧ (RIIba ≤ Rbmax ∧ SIIab ≥ Samin )
⎨ 2
1 − RIab
⎪
⎪ if (RIab ≤ Ra
max
∧ SIba ≥ Sbmin ) ∧ (RIIba > Rbmax ∨ SIIab < Samin )
⎪
⎩ 2
0 if (RIab > Ra
max
∨ SIba < Sbmin ) ∧ (RIIba > Rbmax ∨ SIIab < Samin )

Observe that, according to Eq. (1), if both exchange stages I and II are suc-
cessfully performed, then the agents’ rewards are greater. On the contrary, if no
stage occurs, i.e., the agent b refuses the service of agent a in the ﬁrst stage I,
the payoﬀ is null.

3.1 Social Exchange Strategies in Beliefs

The agents evaluate the services according to their exchange strategies, e.g.,
an agent with selfishness strategy is more likely to devalue the received service
and overvalue an offered service, which impacts on debt and credit values. The
calculations of debts T and credits V are made using the debt and credit depre-
ciation (ρ = d) or overestimation (ρ = o) factors k ρt , k ρv ∈ [0, 1], respectively,
characterizing each strategy, as follows:
Depreciation: T = (1 − kdt )S, V = (1 − kdv )R (2)
ot ov
Overestimation: T = S + (1 − S)k , V = R + (1 − R)k (3)
Each agent has in its belief base an exchange strategy-based belief, called
chromosome belief, which evolves along the time, by mutation plans. For exam-
ple, the initial chromosome belief of an altruist agent is defined as:
chromosome([r(0), s(0), rmax(0.8), smin(0.2), e(0.1), g(0.9), kot(0.2), kdv(0.2)]),

where r(0) and s(0) are, respectively, the initial investment and satisfaction
values, and the current parameters that represent its exchange strategy are: R,
Rmax , S min , k t and k v , where R ∈ [0; 1] is the value of investment, Rmax ∈ [0; 1]
is the maximum value that the agent will invest, S min ∈ [0; 1] is the minimum
678 A. Von Laer et al.

value of satisfaction that the agent accepts, k ot ∈ [0; 1] and k dt ∈ [0; 1] are,
respectively, the debt overestimation and credit depreciation factors, as shown
in Eqs. (2) and (3), e ∈ [0, 1] is the weight that represents the agent’s tolerance
degree when its payoff is less than of its neighboring agents (envy degree), and
g ∈ [0, 1] represents the agent’s tolerance degree when its payoff is higher than
its neighboring agents’ payoffs (guilt degree) [12,25].
Analogously, the initial chromosome belief of a selfish agent is defined as:
chromosome([r(0), s(0), rmax(0.2), smin(0.8), e(0.9), g(0.1), kdt(0.2), kov(0.2)]).
To implement/evaluate the model, we consider five agents that perform the
exchanges, each agent with a different initial exchange strategy, namely: altru-
ism, weak altruism, selfishness, weak selfishness and rationality. The rational
agent plays just for the Nash Equilibrium6 , and then smin = e = g = k t = k v = 0.

3.2 The Fitness Evaluation

Given a neighborhood A = {1, . . . , m} of m agents, each agent i ∈ A plays the
exchange game with the other m − 1 neighboring agents j ∈ A, with j = i. In
each simulation cycle, each agent i evaluates its local social exchange material
results with each other neighboring agent j, using the local payoff function given
in Eq. (1). The total payoff received by each agent is calculated after each agent
has performed the two exchange stages with its entire neighborhood. For pij
calculated by Eq. (1), the total payoff allocation of a neighborhood of m agents
is given by
X = {x1 , . . . , xm }, where xi = pij . (4)
j∈A,j=i

The agent i calculates its adaptation degree through its ﬁtness function Fi :
[0, 1]m → [0, 1], whose deﬁnition, encompassing all types of exchange strategies, is:
ei gi
Fi (X) = xi − max(xj − xi , 0) − max(xi − xj , 0), (5)
(m − 1) (m − 1)
j=i j=i

where X is the total payoff allocation of agent i (Eq. (4)), ei and gi are i’s
envy and guilt degrees, respectively. To evaluate its fitness, the agent compares
its current fitness with the previous one: if it exceeds the value of the previous
fitness, then the current strategy is better than the previous one, and the agent
makes an adjustment in the vector of probabilities, increasing the probability of
the current strategy to be chosen again, increasing/decreasing the parameters of
the chromosome belief defining its strategy.7
The probability vector of adjustments is in Table 1. There are 27 possible
adjustments, e.g., p0i is the probability of increasing Ri , Rimax and Simin (by
a certain exogenously specified adjustment step), and p5i is the probability of
increasing the value of Ri , keeping the value of Rimax and decreasing Simin . The
probability and strategy adjustment steps fp and fs determine, respectively, on
which extent the probabilities of the probability vector and the values ri , rimax
and smin
i are increased or decreased.

6
See [12] for a discussion on the Nash Equilibrium of the Game of Social Exchange
Processes.
7
The ﬁtness function was based in [12, 25].
Analysing the Inﬂuence of the Cultural Aspect in the Self-Regulation 679

Table 1. The probability vector adjustment

Ri Rimax Simin Ri Rimax Simin Ri Rimax Simin

p0i ↑ ↑ ↑ p9i = ↑ ↑ p18
i ↓ ↑ ↑
p1i ↑ ↑ = p10
i = ↑ = p19
i ↓ ↑ =
p2i ↑ ↑ ↓ p11
i = ↑ ↓ p20
i ↓ ↑ ↓
p3i ↑ = ↑ p12
i = = ↑ p21
i ↓ = ↑
p4i ↑ = = p13
i = = = p22
i ↓ = =
p5i ↑ = ↓ p14
i = = ↓ p23
i ↓ = ↓
p6i ↑ ↓ ↑ p15
i = ↓ ↑ p24
i ↓ ↓ ↑
p7i ↑ ↓ = p16
i = ↓ = p25
i ↓ ↓ =
p8i ↑ ↓ ↓ p17
i = ↓ ↓ p26
i ↓ ↓ ↓

4 The Culture: Group Belief and Reputation Artifacts

The culture of the agent society is consisted of the group belief (GB) and the
reputation artifacts. For the implementation in CArtAgO, these artifacts are
ﬁrstly created by the mediator agent, which is also responsible for initiating the
exchanges by sending a message to all agents to start the sequence of exchanges.
The GB artifact stores the beliefs sent by agents after obtaining experience in
exchanges and the reputation artifact creates the reputation of agents.
The beliefs that compose the artifacts are observable properties. The
announcements are treated as interface operations, where some parameters are
informed: the predicate announcement, the degree of certainty of a belief and a
value of the strength of this certainty. The composition of a GB works as follows.
The formation rules of individual beliefs lie within the agent minds. The rules
that form the group beliefs (synthesis rules) are in an external entity to agents
and the communication for the formation of GB is made through announcements
sent to a component that aggregates it, forming a GB (see Fig. 3). The set A of
all announces is deﬁned by
def
A = {< p, c, s > |p ∈ P, c ∈ [0..1], s ∈ N}, (6)

where P is the set of all the predicates, and p, c and s are, respectively, the
predicate, the certainty degree and the strength degree of an announce. For
example, in the announce personality("selfish",bob), with certainty degree 0.8
and strength 6, the advertiser is quite sure that agent bob adopts a selﬁshness
exchange strategy, based on 6 experiences it had in past exchanges with bob. See
the method announce in Fig. 4.
Figure 5 shows the architecture of the artifact ArtCG of group beliefs, including
the classes AgentAnnounce, Belief and the announce method, which corresponds
to the announce operation of beliefs (Eq. (6)). When receiving an announce, the
artifact adds it to a list of announces, and whenever there exists at least one
equal announce from each agent present in the system, this announce becomes
a reputation (see Fig. 4).
The Belief class function is to represent the group belief composed by
the tuple: predicate, certainty degree and strength, and it implements a
ToProlog interface, which allows its description in the form of a predicate.
680 A. Von Laer et al.

Fig. 3. Group belief model

Fig. 4. Predicate announcement

The AgentAnnounce class represents the announces made by the agents and inher-
its the Belief class, also adding the advertiser attribute that represents the agent
that made the announce.
To create a reputation, the certainty and strength values are calculated by
the synthesis process and the artifact Reputation is notified of the new belief
group by the method update. If there is already a group belief with the same
predicate in the artifact, then it updates such values, otherwise a new group
belief is added.
In this paper, we consider a mixed society (composed of agents with five
different exchange strategies), and, due to this fact, the adopted aggregation
method is the weighted synthesis [23], where announcements are synthesized in
order to seek a middle term between them, so not only benefiting a optimistic
or pessimistic society. The function weighted synthesis sinponp , which gives the
certainty degreec, the strength s, where |Cp | is the subset containing all the
announcements of a predicate p, is given by:

ca sa sa
a∈Cp a∈Cp
sinponp =< p, c, s >, c = , s= (7)
sa |Cp |
a∈Cp

Then, in Fig. 2, to begin the second exchange stage between two agents a
and b, the agent a charges the agent b for the service made in the ﬁrst stage,
and then it sends b the credit value V that it thinks itself worthy. Through a
comparison between a’s credit value and the value R that a has invested in the
Analysing the Inﬂuence of the Cultural Aspect in the Self-Regulation 681

Fig. 5. Reputation diagram

ﬁrst stage, b is able to draw a conclusion about the exchange strategy adopted
by a:

– Ra > Va : if the value of investment used by a in the first stage is higher than
the credit value it attributed itself, b concludes that a is altruist;
– Ra < Va : if the value of investment used by a in the first stage is lower than
the credit value it attributed itself, b concludes that a is selfish;
– Ra = Va : If the value of a’s investment is equal to a’s credit value, b concludes
that a is rational.

The agent b sends its conclusion about the strategy adopted by agent a to
the group belief artifact ArtCG, using the announce method (Fig. 4), to form a
reputation of the agent a. Once the reputation is formed in the Reputation
artifact, it is added to the agents’ beliefs, thus becoming a common group belief
to all participants of the game.
Whenever there is a reputation that an agent i is selfish, the agents send a
message informing the mediator agent, which sends a message to agent i saying
that i can not participate in the next play. So, i fails to improve its fitness
value, unless it modifies its strategy to enter into the game again, increasing its
investment value R and the maximum investment value Rmax , and decreasing
its minimum satisfaction value Smin .

5 Simulation Analysis
An social exchange strategy is determined by how an agent behaves towards
the exchanges proposed by other agents, by the way this agent determines the
682 A. Von Laer et al.

Table 2. Initial Parameters of Exchange Strategies

Strategy r max smin g e kρt kρv

Altruism 0.8 0.2 0.9 0.1 0.2, ρ = o 0.2, ρ = d
Weak altruism 0.6 0.4 0.7 0.3 0.1, ρ = o 0.1, ρ = d
Selfishness 0.2 0.8 0.1 0.9 0.2, ρ = d 0.2, ρ = o
Weak selfishness 0.4 0.6 0.3 0.7 0.1, ρ = d 0.1, ρ = o
Rationality 0.2 0.2 0 0 0 0

amount of investment it wants to accomplish, and also by the guilt/envy degree

when comparing its results with the other agents. As the overall results emerge
over time, the agents become self-regulators of their exchange processes. The
evaluated characteristics that define each strategy (which are critical in the evo-
lution of the exchanges) are the maximum value that the agent intends to invest,
the minimum value of satisfaction accepted when an agent receives a service
proposal and the amount of investment it wants to accomplish. We adopted the
initial parameters of the social exchange strategies shown in Table 2. The guilt
and envy values related to the gain are null for the rational agent, therefore, the
values grac and erac are defined as 0 (zero).
Two different scenarios were defined, one without considering the culture of
the society, and the other with the group beliefs as a “culture” common to all
agents, as explained in Section 4. In each scenario there are five agents, each
with a different strategy, and each simulation was performed using 300 cycles,
for a total of 20 simulations by scenario. In both scenarios, the system stabilizes
before 300 cycles. For the lack of space, we present the detailed analysis just for
the second scenario.
Considering the two exchange stages in Fig. 2, given m agents, each playing
with m − 1 agents, with zero, one, or two successful exchange stages in each
two-agent interaction, then a cycle of a simulation is composed by m(m − 1) +
w1 + w2 + · · · + wm plays of stages of type I and/or II (successfully performed
or not), where w1 is the number of agents that agent 1 has credit with after the
first stage with all other agents (i.e., the number of successful exchanges for the
agent 1) and analogously one defines w2 , . . . , wm . In a single cycle, the number of
exchanges of type I (successfully performed or not) is m(m − 1), and the number
of exchanges of type II (successfully performed or not) is w1 + w2 + · · · + wm .
Note that if all the exchanges of the type I have been successfully for all agents,
then one cycle of a simulation presents 2m(m − 1) exchanges of type I or II
(successfully performed or not).
Figure 6 shows the simulation results in a range of 300 cycles.8 Whenever
the society culture is taken into account, the evolution of the agents’ strategies
provided an increase in the number of two-stage successful exchanges, which
starts with 8 and ends with 20 by the 50th cycle, with a decreasing in the
number of non successful interactions to zero, in a short period of time. In
comparison with the first scenario, this time was reduced in 44.45%. The average
and standard deviation of the number of exchanges are shown in Table 3. In Fig. 7

8
Each mark in the X-axis of 6 represents 10 cycles.
Analysing the Inﬂuence of the Cultural Aspect in the Self-Regulation 683

Fig. 6. Evolution of the number of exchanges, considering the culture

Fig. 7. Evolution of the Fitness value, considering the culture

we show the simulation of the evolution of the agents’ fitness values in a period
of 300 cycles.
Table 3 shows the number of two-stage exchanges, which was increased by
385.71 %. Table 5 shows the average value and standard deviation of the fitness
values in the initial and final cycles of the simulations, considering the different
exchange strategies. The increase in the fitness value of the altruist agent was
252.49 %, while for the weak altruist agent it was 258.20 %, for the rational
agent it was 188.94 %, for the selfish agent it was 385.77 % and, finally, for the
weak selfish agent it was 258.58 %. The strategy that showed lower growth was
the rationality strategy, while the selfishness strategy had a higher evolution. In
Table 4, we present the values of the overall average and the standard deviation
of the global fitness value, showing an increasing of 584.26%.
In the case with culture, the increase in the number of two-stage exchanges
was higher (385.71 %) than in the scenario without culture (171.73 %). Regarding
the fitness values, in the scenario with culture only the Weak selfish strategy
has not increased the average of the fitness value (343.95 % without culture
and 258.58 % with culture). The others strategies showed largest increase in
their fitness values, as shown in Table 6. Observe that, in both scenarios, the
684 A. Von Laer et al.

Table 3. Number of exchanges Table 5. Fitness value

Average Average
Begin End Strategy Initial fitness Final fitness
One-stage exchange 3.65 0.1 Altruist 0.27849 0.98170
Two-stage exchanges 2.8 13.6 Weak altruist 0.27411 0.98188
None exchange 8.65 0.1 Rational 0.33743 0.975
Standard deviation Selfish -0.33172 0.94797
Begin End Weak selfish 0.27673 0.99231
One-stage exchange 2.36809 0.44721 Standard deviation
Two-stage exchanges 3.17224 4.96726 Strategy Initial fitness Final fitness
None exchange 3.32890 0.44721 Altruism 0.02539 0.08101
Weak altruism 0.01128 0.07812
Rationality 0.00747 0.11180
Table 4. Global ﬁtness value Selfishness 0.02938 0.20056
Weak selfishness 0.12844 0.03437
Initial Final
Global standard deviation 0.28005 0.01673
Global average 0.16701 0.97577

Table 6. Increase of the ﬁtness value

Strategy Without the culture With the culture
Altruism 164.32% 252.49%
Weak altruism 144.89% 258.20%
Rationality 71.79% 188.94%
Selfishness 297.05% 385.77%
Weak selfishness 343.95% 258.58%

rationality strategy was the one that showed lower growth in relation to the
others, while selﬁshness strategies showed higher evolution.

6 Conclusion

In this paper, the GSREP game was adapted to a BDI-MAS society, using the
Jason language, with the addition of group beliefs as the society “culture” com-
mon to all agents involved in the system, implemented as a CArtAgO artifact.
We consider that the society culture is composed by the agents’ reputations.
This BDI version of the game was called the Cultural-GSREP game. Then, we
analysed and compared the simulation results considering two scenarios, just
taking into account or not the culture.
The equilibrium of Piaget’s Social Exchange Theory is reached when occurs
reciprocity in exchanges during the interactions. Our approach showed that with
the evolution of the strategies the agents were able to maximize their adaptation
values becoming self-regulators of exchanges processes and thereby contributing
to increasing the number of successful interactions. All agents have evolved and
contributed to the evolution of the society. Whenever the services offered are
more fair (balanced), the greater is the number of successful interactions. Com-
paring the two scenarios, we conclude that the addition of the culture – the
reputation as a focal point – in social exchanges had the expected influence on
the evolution of the agents’s strategies and exchange processes, increasing the
exchanges successfully performed and the fitness value in a shorter time.
Analysing the Influence of the Cultural Aspect in the Self-Regulation 685

Future work will consider the analysis of the final parameters of the strate-
gies that emerged in the evolution process, and other categories of the cultural
knowledge in the belief space, using belief artifacts in different scopes beyond
the reputation, and creating different ways for the agents to reason about the
group beliefs.

Acknowledgments. Supported by CNPq (Proc. No. 481283/2013-7, 306970/2013-9

and 232827/2014-1).

References
1. Blau, P.: Exchange & Power in Social Life. Trans. Publish., New Brunswick (2005)
2. Boissier, O., Bordini, R.H., Hübner, J.F., Ricci, A., Santi, A.: Multi-agent oriented
programming with JaCaMo. Science of Computer Programming 78(6), 747–761
(2013)
3. Bordini, R.H., Hübner, J.F., Wooldrige, M.: Programming Multi-agent Systems in
AgentSpeak Using Jason. Wiley Series in Agent Technology. John Wiley & Sons,
Chichester (2007)
4. Bratman, M.E.: Intention, plans, and practical reason. Cambridge University Press
(1999)
5. Castelfranchi, C., Falcone, R.: Principles of trust for MAS: cognitive anatomy,
social importance and quantification. In: Intl. Conf. of Multi-agent Systems
(ICMAS), pp. 72–79 (1998)
6. Castelfranchi, C., Falcone, R., Firozabadi, B., Tan, Y.: Special issue on trust,
deception and fraud in agent societies. Applied Artificial Intelligence Journal 1,
763–768 (2000)
7. Criado, N., Argente, E., Botti, V.: Open issues for normative multi-agent systems.
AI Communications 24(3), 233–264 (2011)
8. Dignum, V., Dignum, F. (eds.): Perspectives on Culture and Agent-based Simula-
tions. Springer, Berlin (2014)
9. Dimuro, G.P., Costa, A.R.C., Gonçalves, L.V., Pereira, D.: Recognizing and learn-
ing models of social exchange strategies for the regulation of social interactions
in open agent societies. Journal of the Brazilian Computer Society 17, 143–161
(2011)
10. Grimaldo, F., Lozano, M.A., Barber, F.: Coordination and sociability for intelligent
virtual agents. In: Sichman, J.S., Padget, J., Ossowski, S., Noriega, P. (eds.) COIN
2007. LNCS (LNAI), vol. 4870, pp. 58–70. Springer, Heidelberg (2008)
11. Huynh, T.D., Jennings, N.R., Shadbolt, N.R.: An integrated trust and reputation
model for open multi-agent systems. JAAMAS 13(2), 119–154 (2006)
12. Macedo, L.F.K., Dimuro, G.P., Aguiar, M.S., Coelho, H.: An evolutionary spatial
game-based approach for the self-regulation of social exchanges in mas. In: Schaub,
et al. (eds.) Proc. of ECAI 2014–21st European Conf. on Artificial Intelligence.
Frontier in Artificial Intelligence and Applications, no. 263, pp. 573–578. IOS Press,
Netherlands (2014)
13. Nguyen, N.T., Katarzyniak, R.P.: Actions and social interactions in multi-agent
systems. Knowledge and Information Systems 18(2), 133–136 (2009)
14. Padgham, L., Scerri, D., Jayatilleke, G., Hickmott, S.: Integrating BDI reasoning
into agent based modeling and simulation. In: Proc. WSC 2011, pp. 345–356. IEEE
(2011)
686 A. Von Laer et al.

15. Pereira, D.R., Gonçalves, L.V., Dimuro, G.P., Costa, A.C.R.: Towards the self-
regulation of personality-based social exchange processes in multiagent systems.
In: Zaverucha, G., da Costa, A.L. (eds.) SBIA 2008. LNCS (LNAI), vol. 5249, pp.
113–123. Springer, Heidelberg (2008)
16. Piaget, J.: Sociological Studies. Routlege, London (1995)
17. Rabin, M.: Incorporating fairness into game theory and economics. The American
Economic Review 86(5), 1281–1302 (1993)
18. Reynolds, R., Kobti, Z.: The effect of environmental variability on the resilience of
social networks: an example using the mesa verde pueblo culture. In: Proc. 68th
Annual Meeting of Society for American Archeology, vol. 97, pp. 224–244 (2003)
19. Reynolds, R., Zanoni, E.: Why cultural evolution can proceed faster than biological
evolution. In: Proc. Intl. Symp. on Simulating Societies, pp. 81–93 (1992)
20. Ricci, A., Viroli, M., Omicini, A.: The A&A programming model and technology
for developing agent environments in MAS. In: Dastani, M., Seghrouchni, A.E.F.,
Ricci, A., Winikoff, M. (eds.) ProMAS 2007, vol. 4908, pp. 89–106. Springer-Verlag,
Heidelberg (2008)
21. Rodrigues, M.R., Luck, M.: Effective multiagent interactions for open cooperative
systems rich in services. In: Proc. AAMAS 2009, Budapest, pp. 1273–1274 (2009)
22. Schelling, T.C.: The strategy of conflict. Harvard University Press, Cambridge
(1960)
23. Schmitz, T.L., Hübner, J.F., Webber, C.G.: Group beliefs as a tool for the forma-
tion of the reputation: an approach of agents and artfacts. In: Proc. ENIA 2012,
Curitiba (2012)
24. Serrano, E., Rovatsos, M., Botı́a, J.A.: A qualitative reputation system for multia-
gent systems with protocol-based communication. In: AAMAS 2012, Valencia, pp.
307–314 (2012)
25. Xianyu, B.: Social preference, incomplete information, and the evolution of ulti-
matum game in the small world networks: An agent-based approach. JASSS 13, 2
(2010)
Modelling Agents’ Perception: Issues and Challenges
in Multi-agents Based Systems

Nuno Trindade Magessi() and Luís Antunes

GUESS/BioISI, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal

[email protected], [email protected]

Abstract. In virtual agents modelling, perception has been one of main focus
for cognitive modelling and multi-agent-based simulation. Research has been
guided by the representation of human senses operations. In this sense, percep-
tion focus remains on the absorption of changes that occur in the environment.
Unfortunately, scientific literature has not covered the representation of most of
the perception mechanisms that are supposed to exist in an agent’s brain like for
example ambiguity. In terms of multi-agent based systems, perception is re-
duced to a parameter forgetting the complex mechanisms behind it. The goal of
this article is to point out that the challenge of modelling perception ought to be
centred on the internal mechanisms of perception that occur in our brains,
which increases the heterogeneity among agents.

1 Introduction

During the last decade, studies simulating virtual agents (VA) in multi-agent-based
simulation (MABS) systems have tried to bring more realism into modelling the per-
ception of environment. Researchers have been focused their efforts on improving
perception models, and corresponding techniques.
Nevertheless, recent work [1, 25] has produced sophisticated theoretical models for
reproducing the human senses like sight and hearing. The models were integrated then
in a sustainable multi-sense perception system, in order to put together a perceptual
system capable of approximately replicating the human sensory system. In fact, this is
a keystone to use VAs to simulate how the senses of people work in order to capture a
dynamic and nondeterministic environment [1]. The major problem of these proposals
is to summarise perception into the operations of human senses.
Many of the psychological activities involved in perception, as well as the inherent
mechanisms of the brain subsystems associated to it, have been overlooked. Is a VA
sure about what it is capturing under this multiple senses frameworks? Is it reality?
Clearly, the answer is no to both questions. The VA perception described in literature
misses and does not represent the principal target of human perception: recognition.
This article proposes to discuss the challenges involved in perception, including the
reproduction of all the mechanisms behind this cognitive process. The multi-senses
framework represents only a small component of this huge and complex process that
is perception. Perception goes beyond faithful representation of input sensors. As an
© Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 687–695, 2015.
DOI: 10.1007/978-3-319-23485-4_69
688 N.T. Magessi and L. Antunes

example, the article will focus on risk perception to demonstrate the challenge and
what is involved in it.
Section 2 reviews the literature on modelling perception for virtual agents in a
multi-agent-based system. Section 3 revisits the concepts behind perception. Section 4
discusses the principal main issues in the literature and presents the crucial chal-
lenges, that modelling perception will have in near future. Section 5 describes our
vision for implementing perception and section 6 puts some conclusions forward.

2 Related Literature

Literature presents different standpoints about endowing VAs with perception for
MABS. The most frequently used approach is to ensure that VAs have a generalised
knowledge about the environment [2]. This approach does not allow us to use cor-
rectly perception in order to simulate realistic scenarios, because the VA is not certain
about the veracity of what is capturing. The opposite case is the one in which agents
take their decisions sustained on the collected data received through their multiple
senses, having no knowledge about the environment, not even generic knowledge.
In between these extreme cases, we have agents that can perceive some informa-
tion, have a conception of the environment around them and “act on their perception”
[3]. In the case described by [4], agents have a graphical access to their environment.
According the described concept of perception, agents choose which path on the
graph is more feasible to achieve their target. There are several problems associated to
this perspective. The main one is the assumption that environments are static, which
raises difficulties in simulating complex scenarios. In this perspective, perception is
incomplete and conceived in a very restricted form. Clearly, it is not adequate to use it
for modelling realistic situations. Rymill and Dodgson have developed a method to
simulate vision and attention of individuals in a crowd [5]. The simulation was done
for open and closed spaces. Independently of the problems identified in the techniques
to filter information from a highly dynamic environment, the issue remained that per-
ception was incomplete and conceived in a very restricted form. Vision was modelled
only as an input sensor and attention is its precedent on the cognitive process.
Pelechano et al [6] made a debate about a simulator system for an evacuation
scenario, but this system was inaccurate in representing real vision and consequently
perception. Brooks [7] developed what he called creatures: a series of mobile robots
operating without supervision in standard office environments. The intelligent system
behind them was decomposed into independent and parallel activity producers, all of
which interfaced directly to the world through perception and action, rather than inter-
facing to each other.
Other proposals, like [8], had built-in simulators to describe hearing. However ol-
factory perception is limited to a few published articles with no consistent simulator.
And no study is known for simulating tactile senses. Steel et al [9] proposed and
developed a cohesive framework to integrate, under a modular and extensible archi-
tecture, many virtual agent perception algorithms, with multiple senses available.
Modelling Agents’ Perception: Issues and Challenges in Multi-agents Based Systems 689

Their architecture allows the assimilation, in the sense of integration, of dynamic and
distributed environments. They conceive perception according to an environment
module, where information is extracted and transposed to the agent’s memory.
Clearly, they identify the brain as being outside the scope of perception and more
related to memories. Kuiper et al [10] associated the vision process to perception and
presented more efficient algorithms to process visual input, which were entirely im-
plemented under the DIVAs (Dynamic Information Visualization of Agents systems)
framework. Recently, Magessi et al [11] presented an architecture for risk perception.
This architecture puts the main focus on the representativeness of perception as it is
performed in reality by individuals. Vision and other senses are designated by input
sensors and were considered as one component of this cognitive process.

3 Perception

3.1 Definition
Perception is one of the cognitive processes in the brain that precede decision making.
Perception is the extraction, selection, organisation and interpretation of sensory in-
formation in order to recognise and understand the environment [12]. Perception is
not restricted to passivity upon reception of input signals. Perception can suffer the
influence of psychological, social and cultural dimensions [13]. Psychology influ-
ences perception through capabilities and cognitive factors. For example, one individ-
ual who suffers from a psychological disorder may have the notion that his/her per-
ception may be being affected. Concerning the social dimension, the influence comes
from the interaction among individuals in society, towards imitation or persuasion, for
example. Learning, memory, and expectations can shape the way in which we per-
ceive things [13, 14]. Perception involves these “top-down” effects as well as “bot-
tom-up” methods for processing sensory input [14]. The “bottom-up” processing is
basically low-level information that is used to build up higher-level information (e.g.,
shapes for object recognition). The “top-down” processing refers to recognition task
in terms of what it was expected in a specific situation. It is a vital factor in determin-
ing where entities look like and knowledge that influence perception [23]. Perception
depends on the nervous system complex functions, but subjectively it seems mostly
effortless, because processing happens outside conscious awareness [10].
However, it is important to realise that if we want to have a more complete atten-
tion mechanism related to vision, the work must be conducted by the interaction of
bottom-up factors based on image features and top-down guidance based on scene
knowledge and goals. The top-down component could be understood as the epicentre
of attention allocation when a task is at hand. Meanwhile, the bottom-up component
acts as reaction mechanism of alert. It allows the system to discover potential oppor-
tunities or risks in order to stop threatening events. While the top-down process estab-
lishes coherence between the environment looked by agent and its goals or tasks, the
bottom-up component has the intent of reproducing the alert mechanisms, warning
about objects or places relevant to the agent.
690 N.T. Magessi and L. Antunes

3.2 The Perception Process

The perception process starts with a stimulus on body sensory organs [12]. These
sensory organs transform the input energy into neural activity through transduction
[12]. Then, neural signals are transmitted to the brain and therein processed [12]. The
resulting mental re-creation of the distal stimulus is the percept. Perception is some-
times described as the process of constructing mental representations of distal stimuli
using the information available in proximal stimuli.
People typically go through the following steps to form judgements: (i) first, when
they face an unknown target, they ignite the interest for this target. This means that
they activate their attention; (ii) second, people start to extract and select more infor-
mation about the target. Incrementally, people find hints that they associated to simi-
lar experiences and those hints help them to interpret and categorise the target; (iii) in
the third step, the hints become less efficient and selective. Thus, people try to look
for more hints with the intent of confirming the categorisation of the target. Unfortu-
nately, people also actively ignore and even distort hints that are against their initial
perceptions. Perception becomes gradually selective and people finally achieve a
judgment about the target.
Casual perception is one of fields with huge development nowadays [24]. It con-
sists of “the relatively automatic, relatively irresistible perception of certain sequences
of events as involving causation”. Normally, casual perception does not use conscious
thoughts or reasoning. It is a kind of “launching effect” in which people perceive
spontaneously.

3.3 Affordances
In [16], Gibson developed an interaction approach on perception and action, settled
on information available in the environment. He refused the framing assumption of
factoring external-physical and internal-mental processes. The interaction alternative
is centred on processes of agent-situation interactions that come from ecological psy-
chology and philosophy, namely situation theory [26, 27]. The concept of affordance
for an agent can be defined as the conditions or constraints in the environment to
which the agent is attuned. This broad view of affordances includes affordances that
are recognised as well as affordances that are perceived directly.
Norman used the term affordances to refer just the possibilities of action that are
perceivable by an individual [17]. He made the concept dependent on the physical
capabilities of an agent and his/her individual goals, plans, values, beliefs, or past
experiences. This means that he characterised the concept of affordance as relational,
rather than subjective or intrinsic. In 2002, Anderson et al. [18] sought that directed
visual attention, and not affordance, is the key responsible for the fast generation of
many motor signals associated with the spatial characteristics of perceived objects.
They discovered this by examining how the properties of an object affect an ob-
server’s reaction time for judging its orientation.
Modelling Agents’ Perception: Issues and Challenges in Multi-agents Based Systems 691

3.4 Perception vs. Reality

For some individuals, the perceived environment, event or object can differ from what
it is in reality. Their perception could put themselves far from what is in reality. An
object could be perceived differently by each person. This phenomenon is commonly
designated by perception gap [19]. This finding is patent in many psychological stud-
ies. For example, in the case of visual perception, there are individuals able to ac-
knowledge the perception gap in their minds. Others may not recognise the shape
shifting when the object changes. This happens when objects are ambiguous and mul-
tiple interpretations can be made on the perceptual level. So to reproduce human per-
ception in VAs, the perception gap must be taken into account and reflected in the
models.

4 Issues and Challenges

The agents’ perception assumes a critical preponderance in defining their decision,

which is then reflected in their behaviours, as social actors. Perception delivers infor-
mation from the surrounding environment, which assists agents on their activities of
planning and decision making [20]. In most approaches, the sensor systems architec-
ture is represented, instead of the complete perception process. Most of the works are
confined to relate the upstream part of the process. In some cases, for example the
vision sense is not designed as part of the brain [8], which goes against the usual
accounts from most of the relevant scientific areas involved. In most cases, the inter-
pretation of information component, which is critical to the success of the process, is
not described. The majority of the studies have the assumption that everything which
was picked up by senses is reflected on the VA’s knowledge [8]. This a strong as-
sumption, which has no correspondence with reality. Interpretation is the goal activity
of perception [14]. Grasping an object and having success on its recognition depends
on the effectiveness and efficiency of interpretation. We must not forget that the main
goal of perception is to produce a judgement about what was socially analysed. This
judgement may or may not be stored in memory, depending of its relevance. This
aspect is never mentioned in VA’s literature. For example, Brooks [7], with his intel-
ligent “Creatures” argues that perception is not necessary as central interface. This is
not correct if we want to have robots acting like humans. The proof is on his own
work, where he decomposed sensor data in many different sorts of processing, which
proceed independently and in parallel, each affecting the “overall system”. The “over-
all system” is in fact an example of spatial perception, ensured by the right extraction
and selection of data input. The success of “Creatures” with multiple processes comes
from the fact that perception is deeply rooted on his algorithm. However, the robots
could not output the result that took them into action. Unfortunately, Brooks traces a
direct and linear relation between input sensor and action. He omits representations
and implicitly considers an action as decision, which is wrong because decision could
result in action or not [16, 21]. This approach is overly simplistic, similarly to a
common confusion between judgement (perception) and decision. People can only
692 N.T. Magessi and L. Antunes

perceive something if they have a representation of that thing, or from the parts that
compose it, even if inchoate.
Other researchers assume that every perception even culminates in storing. How-
ever, memory should not be seen as passive, a simple storage of data collected by
sensors. It must be seen under a dualistic perspective. VA memory should also have
an influence on perception, because to perceive something we need to have the se-
mantic knowledge in our semantic memory, for that object or event. Otherwise, VAs
have to learn first, beside of accelerating recognition.
Clearly, the first challenge is to systematise all the perception process, including
the missed activities or dimensions that have the incumbency to format agent’s per-
ception. The second challenge is to bring to Multi-Agent Based Systems the capacity
to represent the interconnections among psychological, social and cultural dimensions
involved in perception [11]. These dimensions and subsequent factors are the
keystones for the dynamics of perception.
The third challenge, which is both ambitious and complex, is to establish the
macro-micro link between a specific judgement and the neuro-physiological dimen-
sion of perception. Modelling perception of VAs cannot be trapped to upstream stage
and moving on to the downstream stage of the process

5 Vision: Paths to Achieve Our Goals

Taking into account the issues and challenges described above, it is important to
figure out what would be the consequences if we improved perception modelling. The
major consequence is to separate perception from decision in VAs, similarly to what
happens in reality. This is determinant to understand many issues related to decision
science, where in fact the relation between what we assimilate and decision is not
linear. If we want to understand why a decision-maker took an incorrect decision we
need to have clearly modelled his decision and perception processes. If the problem
came from perception, it is relevant to pin down in which part of the process it
occurred. This brings more heterogeneity to agents in multi-agent based systems.
Another important consequence is to understand if an agent perceived the reality
surrounding him. Or instead, if he perceived something different from reality, when
he took the decision.
In terms of improving perception modelling in robots, the strategy goes by the use
of very simple cases, like the perception of a common figure which has associated
ambiguity. For example, the Rubin Vase, which has two interpretations, either as a
vase or as two faces. This can be done under pixel techniques, where the captured
images allow robots to perceive some figure formats when a connection is established
with their own semantic memory.
One of the common mistakes is to insist in capturing some kind of standard percep-
tion, common to all individuals. Of course, people have mechanisms in common for
perception. However, perception is highly subjective, since it depends on the past
experiences of each individual. These experiences and associated acquisition define
his/her representation of an object, figure, event or environment. So, instead of
Modelling Agents’ Perception: Issues and Challenges in Multi-agents Based Systems 693

searching for (or defining) standard mechanisms of perception, we could replicate the
perception of one individual. More specifically, to try to clone a specific person per-
ception by using its own description of what this person is perceiving. In the Rubin
Vase example, this means that one robot could recognise a vase and another could
recognise faces. Everything depends on the forms (vases or faces) that were collected
in the past by each robot and stored on their own semantic memory. In a case of
multi-agent based system, the ambiguity of perception is present on the way that
agents interpret the variations occurred in some parameters.
Another important key point about modelling perception is to build multi-
disciplinary teams to work on it. In this sense, the operational strategy must continue,
refresh and fix the idea of [7] where Brooks developed multiple algorithms working
in parallel to pursue perception. This strategy happens because a stimulus could not
be transformed into a percept. Our claim is that an ambiguous stimuli may be trans-
formed into multiple perceptions, experienced randomly, one at a time, in what is
called “multi stable perception.” [22] However, the same stimuli, or absence of them,
may induct in different perceptions depending on the person’s culture and previous
experiences. After we integrate fundamental psychological insights in perception
modelling and the advance of neuroscience brings us new inputs continuously, model-
lers will be able to substitute the developed algorithms by new ones, where these rep-
licate what happens in real physiology. So, this vision clearly defends that is possible
to build robots with perception similar to human beings if it focuses on a specific
target and/or individual.

6 Conclusion

Part of VAs literature on perception is focused on building a multi agent simulator

with a lot of features related to input sensors, instead of demanding the complete per-
ception process. This article claims that this is not perception and the major challenge
is to go forward in building a complete and holistic account of this cognitive process.
Introducing psychological and physiological insights can ensure that virtual agents
replicate better what happens in reality. In this sense, the challenge is to establish the
macro-micro link for perception, from the physiological dimension to the final
judgement.

References
1. Ray, A.: Autonomous perception and decision-making in cyberspace. In: The 8th Interna-
tional Conference on Computer Science & Education (ICCSE), Colombo, Sri Lanka, April
26–28, 2013
2. Uno, K., Kashiyama, K.: Development of simulation system for the disaster evacuation
based on multi-agent model using GIS. Tsinghua Science and Technology 13(1), 348–353
(2008)
694 N.T. Magessi and L. Antunes

3. Shi, J., Ren, A., Chen, C.: Agent-based evacuation model of large public buildings under
fire conditions. Automation in Construction 18(3), 338–347 (2009)
4. Sharma, S.: Simulation and modelling of group behaviour during emergency evacuation.
In: Proceedings of the IEEE Symposium on Intelligent Agents, Nashville, Tennessee,
pp. 122–127, March 30–April 2, 2009
5. Rymill, S.J., Dodgson, N.A.: Psychologically-based vision and attention for the simulation
of human behaviour. In: Proceedings of Computer Graphics and Interactive Techniques,
Dunedin, New Zealand, pp. 229–236, November 29–December 2, 2005
6. Pelechano, N., Allbeck, J., Badler, N.: Controlling individual agents in high-density crowd
simulation. In: 2007 ACM SIGGRAPH/Eurographics Symposium on Computer Anima-
tion, San Diego, California, pp. 99–108, August 2–4, 2007
7. Brooks, R.A.: Intelligence without representation. Artificial Intelligence 47, 139–159
(1991)
8. Piza, H., Ramos, F., Zuniga, F.: Virtual sensors for dynamic virtual environments. In: Pro-
ceedings of 1st IEEE International Workshop on Computational Advances in Multi-Sensor
Adaptive Processing (2005)
9. Steel, T., Kuiper, D., Wenkstern, R.: Virtual agent perception in multi-agent based simula-
tion systems. In: IEEE/WIC/ACM International Conference on Web Intelligence and Intel-
ligent Agent Technology (2010)
10. Kuiper, D., Wenkstern, R.Z.: Virtual agent perception in large scale multi-agent based si-
mulation systems (Extended Abstract). In: Tumer, K., Yolum, P., Sonenberg, L., Stone, P.
(eds.) Proc. of 10th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS
2011), Taipei, Taiwan, pp. 1235–1236, May 2–6, 2011
11. Magessi, N., Antunes, L.: An Architecture for Agent’s Risk Perception. Advances in
Distributed Computing and Artificial Intelligence Journal 1(5), 75–85 (2013)
12. Schacter, D.L., Gilbert, D.T., Wagner, D.M.: Psychology, 2nd edn. Worth, New York
(2011)
13. Magessi, N., Antunes, L.: Modelling agents’ risk perception. In: Omatu, S., Neves, J.,
Corchado Rodriguez, J.M., Paz Santana, J.F., Gonzalez, S.R. (eds.) Distributed Computing
and Artificial Intelligence. AISC, vol. 217, pp. 275–282. Springer, Heidelberg (2013)
14. Bernstein, D.A: Essentials of Psychology. Cengage Learning. pp. 123–124.
ISBN 978-0-495-90693-3 (Retrieved March 25, 2011)
15. Gregory, R.: Perception. In: Gregory, R.L., Zangwill, O.L., pp. 598–601 (1987)
16. Gibson, J.: The theory of affordances. In: Shaw, R., Bransford, J. (eds.) Perceiving,
Acting, and Knowing (1977). ISBN 0-470-99014-7
17. Norman, D.: Affordance, Conventions and Design. Interactions 6(3), 38–43 (1999)
18. Anderson, S.J., Yamagishi, N., Karavia, V.: Attentional processes link perception and ac-
tion. Proceedings of the Royal Society B: Biological Sciences 269(1497), 1225 (2002).
doi:10.1098/rspb.2002.1998
19. Ropeik, D.: How Risky Is It, Really? Why Our Fears Don’t Always Match the Facts.
McGraw Hill, March 2010
20. Steel, T., Kuiper, D., Wenkstern, R.Z.: Context-aware virtual agents in open environments.
In: Proceedings of the Sixth International Conference on Autonomic and Autonomous Sys-
tems (ICAS 2010), Cancun, Mexico, March 7–13, 2010
21. Fine, K., Rescher, Nicholas: The Logic of Decision and Action. Philosophical Quarterly
20(80), 287 (1970)
Modelling Agents’ Perception: Issues and Challenges in Multi-agents Based Systems 695

22. Eagleman, D.: Visual Illusions and Neurobiology. Nature Reviews Neuroscience 2(12),
920–926 (2001). doi:10.1038/35104092. PMID: 11733799
23. Yabus, A.: “Eye movements and vision”, chapter Eye movements during perception of
complex objects. Plenum Press, New York (1967)
24. Danks, D.: The Psychology of Causal Perception and Reasoning. In: Beebee, H.,
Hitchcock, C., Menzies, P. (eds.) Oxford Handbook of Causation. Oxford University
Press, Oxford (2009)
25. Kurzweil, R.: How to Create a Mind: The Secret of Human Thought Revealed. Viking
Books, New York (2012). ISBN 978-0-670-02529-9
26. Barwise, J., Perry, J.: Situations and attitudes. MIT Press/Bradford, Cambridge (1983)
27. Devlin, K.: Logic and information, pp. 49–51. Cambridge University Press (1991)
Agent-Based Modelling for a Resource
Management Problem in a Role-Playing Game

José Cascalho(B) and Pinto Mabunda

Universidade dos Azores, Pólo de Angra do Heroı́smo, Azores, Portugal

[email protected], [email protected]

Abstract. In this paper we present a prototype of a model created in

the context of a resource management problem in Gaza, Mozambique.
This model is part of a participatory approach to deal with a conflict of
water supply. Farmers and cattle producers are added to a stylized envi-
ronment and a conflict is modelled when cattle needs to access water and
destroy farmers’ harvest. To address the different behaviours of farmers
and cattle producers, a BDI architecture is used to support the conflict
simulation using a simple argument-based negotiation between proactive
agents. This model is intended to be used as a support to a Role Playing
Game (RPG) in the context of an interactive design assembled under the
Netlogo tool environment.

Keywords: Agent-based modelling · BDI · Conﬂict management

1 Introduction
In Gaza, a province of Mozambique, a scenario of conflict exists between farmers
and cattle producers. Both stakeholders need water and although the resource
exists in abundance, the lack of planning to circumscribe an area for the cattle
and a specific local for the agriculture, have been responsible for the increase
the number of conflicts between these two activities. The cattle are usually
abandoned in the fields near the river and, alone, these animals they follow
an erratically trajectory to the water destroying cultivated fields near the river.
Local authorities have difficult to deal with the problem because cattle produc-
ers argue that they have the right to have the cattle in lands that belong to
community. Although cattle producers pay fines for the farmers’ harvest losses,
their behaviour seems not to change. Ancient practices are difficult to modify.
To address this problem, it was decided to follow the steps identified in the
companion modelling approach [4]. With this approach we expect to promote
an open debate inside the community and to help to find a participated solu-
tion that will help to overcome the problem. Although some solutions have been
discussed between local authorities and population in general, the lack of invest-
ment and the difficulty to joint stakeholders to discuss the problem have delayed
the implementation of a definite solution.
Role Playing Games (RPG) have been used for different proposes and one
of them is the social learning. In fact, with RPG, it is possible to “reveal some

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 696–701, 2015.
DOI: 10.1007/978-3-319-23485-4 70
Agent-Based Modelling for a Resource Management Problem 697

aspects of social relationships, allowing the direct observation of interactions

among players” [1]. Players are the stakeholders in a problematic context. The
use of RPG in these cases, is useful in certain phases of negotiation for different
contexts, such as water, land-use and other resources (e.g. [8][6]).
In this project, the participatory approach is an essential feature for its own
success. A RPG will be implemented supported by an agent-based model [5]. As
described in the companion approach, we intend to add a board game in which
the stakeholders can show along the game, how they decide in case of conflict.
On the other hand, we are looking for some degree of autonomy in the agents
created in the context of an agent-based modelling (ABM) approach. In fact,
we want to add proactive agents, implementing an architecture that provides
agents’ behaviour related to the personality traits that is possible to identify in
the cattle producers and farmers because it gives us the possibility to generate
unexpected events which are important in the context of a simulation [5]. The
modelling of a real case-study led us to choose a BDI architecture [7] [10] to
support the modelling of some of the decisions of the agents in conflict. This
paper describes the first steps towards this implementation.
In the following section we will describe some of agent-based models used in
simulation concerning resource management. Then, it is explained how conflicts
are modelled in the context of the RPG and the implementation of the BDI
agents’ architecture. In the last section the conclusions are presented.

2 Modelling Resource Management in ABM and Role

Playing Games
The use of ABM to investigate environmental environmental management must
have as focus, the social interaction and the ecological dynamics. Agents repre-
sent stakeholders at an aggregated or individual level. Environment, which holds
renewable resources, are defined as the landscape. These resources are typically
modified by agents in the environment. One recurrent characteristics of these
systems is the representation of space which usually contributes to the structure
of interactions among agents. Different approaches can be used to create the
model. In our case-study, the RPG was intentionally selected because it fits the
desire to have a participatory approach that contributes to “social learning”.
RPG is part of the modelling process in which stakeholders participate actively.
On the other hand, ABM supports the interaction between stakeholders and
leverage the knowledge about the context that is being modelled.

3 Modelling the Scenario for Farmers and Producers

In the following subsections we present the agents modelled as well as the

main interactions between them, representing different stakeholders in conflict
described. In the scenario implemented, two conflictual agents are provided with
a simplified dialogue and in which different types of behaviour are identified.
698 J. Cascalho and P. Mabunda

It was assumed that the scenario could leverage our understanding of how the
different stakeholders see the conflict. As already pointed out, a BDI architecture
is used to support agents modelling in the ABM implementation. It is expected,
as the RPG is implemented, the agents’ model will be improved, adding beliefs,
desires, rules and filters which underly the decisions observed in the real negoti-
ation context.

3.1 The BDI Architecture and Conflict Resolution

A BDI architecture is used to support agents interactions along the conﬂict
resolution. We are only interested to model the negotiation between a farmer
and a producer about the price to pay for the loss in the farmer production.

Fig. 1. BDI architecture supporting the negotiation between farmers and producers.

The figure 1 shows a model for beliefs, desires and intentions of the agents
inspired in [7]. The personality traits are related to decisions related to the inten-
tion to be executed. This mechanism in the architecture is implemented using
a filter (F1) that defines the cases for which an agent selects one option. Notice
that this process of decision depend upon the value of uncertainty of the beliefs.
The rules are used to define which desires and beliefs activate which intentions.
These rules are also part of the agent’s trait. We adopt the model proposed by [9]
for the dialogue protocol. Two protocols are used. One for information-seeking
(info-seeking) and other for negotiation (negotiation). They are defined as sim-
ple request-response message sequences between two agents. While the former
are used to ask for some information, the latter is used for exchange resources.
In the case modelled, farmers ask to producers about their commitment to pay.
Then they negotiate the value to pay in different contexts, as a result of suc-
cessive agreements. The producers may have two different behaviours. They can
assume that they agree with the farmer point of view or, otherwise, will have
to negotiate. Although this is a very simple protocol, our goal was to test the
Agent-Based Modelling for a Resource Management Problem 699

Fig. 2. The interaction environment in the Netlogo representing the conﬂict between
the farmer and the cattle producer, represented by a link (blue line connecting the two
agents).

model to understand if it could provide with an initial model in which farmers

with diﬀerent traits could be modelled and to address diﬀerent behaviours to
each one of these traits.

3.2 Experiment in Netlogo

Netlogo[11] has been used as a simulation platform aimed at supporting the

realisation of (multi)agent-based simulations. It is a platform that provides a
general purpose framework. A typical simulation consists in a cycle where agents,
dubbed as turtles, are chosen to be performed an action, considering its situation
and state. The stylised scenario and the easy and versatile interaction with the
user (e.g. setting up diﬀerent initial conditions) were the main reasons to select
Netlogo.

The Interface for the Participatory Approach. The interface has two
distinct goals. The first one is to provide a styllized environment in which stake-
holders could identify the narrative space of the events described which are the
farmers, the red human-shape agents and the producers, the blue human-shape
agents. The latter is to foster autonomy to generate a simulation of events that
create the conflictual situations which are intended to be studied. In the case-
study, the river is identified as a blue area and the village where farmers and
producers live, correspond to the yellow area. The green area along the right
margin of the river is where the farmers have their cultivated areas. The inter-
face is prepared to interact with users. As the game is played some of the actions
are autonomous (e.g. the motion of the cattle to drink water). The red leaf-shape
agents are the cultivated areas damaged by the cattle. The link between a farmer
and a producer shows a conflict between them. The producer owns the cattle
identified with the number 4 which damaged a large area of plantation that
belongs to the farmer.
700 J. Cascalho and P. Mabunda

Fig. 3. The information-seeking messages sent by the farmer and the response of the
producer.

Interaction through the Conflict Resolution. The ﬁgure 3 presents the

sequences of messages exchanged by the farmers and the livestock producers.
When a conflict is detected (i.e. a cultivated area of the farmer is damaged),
the farmer invites the producer to negotiate. The messages sent by the farmer
are information-seeking messages. The example shows that the producer wishes
to negotiate the value to be payed by the damaged. Along the negotiation, the
producer and farmer will try to agree into a final value. A maximum number
of interactions along with thresholds concerning the limit values to achieve in
negotiation i.e. beliefs in the context of negotiation (not showed in the figure 3),
dictates the end of negotiations. The game in the context of the negotiation
have uncertainties that might be identified as a result of the degree of certainty
concerning beliefs and also the trait of the agents. Accepting conditions imposed
by the farmer or not, will drive the producer to negotiate.

4 Conclusions
In this paper we present a prototype for an agent-based modelling in the context
of conflict within a resource management in Gaza, Mozambique. To address this
problem we chose the RPG approach. The following steps were made:
1. To create a stylized scenario where stakeholders could interact and to identify
the situation of conflict;
2. To define a protocol of communication between agents in conflict. The pro-
pose of this protocol is to support the interaction of the simulation with the
different stakeholders;
Agent-Based Modelling for a Resource Management Problem 701

3. To provide a bdi-architecture to the agents in conﬂict. This architecture

contributes to the definition of traits and improves the autonomy of the
agents defined in the scenario.
This model will be used in the context of a RPG. A board game will be created
and the stylized world in the simulated environment will support the game. The
stakeholder decisions will be in part as result of game interaction and also based
on the knowledge acquired along the steps of implementation of the model. The
agents traits allow the RPG to interact with agents with different degree of
autonomy. In scenarios where a social learning is a target (e.g. Sylvopast model
[5]), a generation of contexts where unexpected events occurs are important to
improve the learning and to foster the interaction between the stakeholders.

References
1. Adamatti, D.F., et al.: A prototype using multi-agent-based simulation and role-
playing games in water management. In: CABM-HEMA-SMAGET, 2005, Bourg-
Saint-Maurice, Les Arcs. CABM-HEMA-SMAGET 2005, pp. 1–20. CDROM
(2005)
2. Bandini, S., Manzoni, S., Vizzari, G.: Agent Based Modeling and Simulation: An
Informatics Perspective. Journal of Artiﬁcial Societies and Social Simulation 12(4),
4 (2009)
3. Barreteau, O., Bousquet, F., Attonaty, J.M.: Role-playing games for opening the
black box of multi-agent systems: method and lessons of its application to Senegal
River Valley irrigated systems. Journal of Artiﬁcial Societies and Social Simulation
4(3) (2001)
4. Barreteau, O., et al.: Participatory approaches. In: Edmonds, B., Meyer, R. (eds.)
Simulating Social Complexity, pp. 197–234. Springer, Heidelberg (2013)
5. Bousquet, F., et al.: Multi-agent systems and role games: collective learning pro-
cesses for ecosystem management. In: Janssen, M.A. (ed.) Complexity and Ecosys-
tem Management: The Theory and Practice of Multi-agent Systems, pp. 249–285.
E. Elgar, Cheltenham (2002)
6. Briot, J.-P., et al.: A computer-based role-playing game for participatory manage-
ment of protected areas: the SimParc project. In: Anais do XXVIII Congresso da
SBC, Belém do Pará (2008)
7. Cascalho, J., Antunes, L., Corrêa, M., Coelho, H.: Characterising agents’ behaviours:
selecting goal strategies based on attributes. In: Klusch, M., Rovatsos, M., Payne,
T.R. (eds.) CIA 2006. LNCS (LNAI), vol. 4149, pp. 402–415. Springer, Heidelberg
(2006)
8. Cleland, D., et al.: REEFGAME as a Learning and Data-Gathering Computer-
Assisted Role-Play Game. Simulation Gaming 43, 102 (2012)
9. Hussain, A., Toni, F.: Bilateral agent negotiation with information-seeking. In:
Proc. of the 5th European Workshop on Multi-Agent Systems (2007)
10. Luong, B.V., et al.: A BDI game master agent for computer role-playing games.
In: Proceedings of the 2013 International Conference on Autonomous Agents and
Multi-Agent Systems (AAMAS 2013), pp. 1187–1188 (2013)
11. Wilensky, U., Stroup, W.: Learning through participatory simulations: network-
based design for systems learning in classrooms. In: Proceedings of Computer Sup-
ported Collaborative Learning (CSCL 1999). Stanford, CA, December 12–15, 1999
An Agent-Based MicMac Model for Forecasting
of the Portuguese Population

Renato Fernandes1(B) , Pedro Campos2 , and A. Rita Gaio3

1
LIADD/INESC TEC, Faculdade de Ciências da Universidade do Porto,
Porto, Portugal
[email protected]
2
LIADD/INESC TEC, Faculdade de Economia da Universidade do Porto,
Porto, Portugal
3
Departamento de Matemática, Faculdade de Ciências da Universidade do Porto
and Centro de Matemática da Universidade do Porto, Porto, Portugal

Abstract. Simulation is often used to forecast human populations. In

this paper we use a novel approach by combining Micro-Macro (MicMac)
models into an Agent-Based perspective to simulate and forecast the
behavior of the Portuguese population. The models include migrations
and three scenarios corresponding to three diﬀerent expected economic
growth rates. We conclude that the increase in the number of emigrants
leads to a reduction of the Portuguese women that are in the fertile age.
This justiﬁes the decrease of births and therefore the general decrease of
the total Portuguese Population.

Keywords: Agent-Based computational demography · Social

simulation · MicMac model · Forecasting

1 Introduction
Agent-Based computational demography models (Billari et al. [4] Ferber [5])
can deal with complex interactions between individuals, constituting an alter-
native to mainstream modelling techniques. Conventional population projection
methods forecast the number of people at a given age an a given point in time
assuming that the members of a cohort are identical with respect to demographic
behaviour. Different approaches include: macro simulation, based on policy inter-
ventions and other external events and conditions, and micro simulation, based
on life courses of individual cohort members. Micro and Macro (MicMac) (Gaag
et al. [6]) approaches offer a bridge between aggregate projections of cohorts.
These are important contributes for the sustainability of the health care and
pension systems, for example, as they are issues of current concern.
We construct an Agent-Based model, based on the MicMac approach, to
simulate the behaviour of the Portuguese population, open to migrations. The
notation and the main model components (fertility, mortality and migration) are
firstly introduced. Then the iterative simulation process is created and a forecast
for the Portuguese Population from 2011 to 2041 is presented.

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 702–707, 2015.
DOI: 10.1007/978-3-319-23485-4 71
An Agent-Based MicMac Model for Forecasting 703

2 Population Model
2.1 Fertility and Mortality
We start by establishing the notation and the main components of the model.
The variables with A (resp. G) refer to agent variables (resp. global variables).
The indices a, s, k and y are used to denote age, sex, agent identification and
year, respectively. Any variable indexed by a, s, k, y represents the realization of
the variable in the agent k, aged a years-old and of sex s, in the year y. A similar
interpretation applies to any subset of these indices. The following variables are
then defined:
AAlive
a,s,k,y : vital status, taking the value 1 if the agent is alive and 0 otherwise
Alive
GAlive : number of living agents; clearly G Alive
= k Aa,s,k,y .
a,s,y
MaleFreq
a,s,y

Gy : relative frequency of male agents, equal to GAlive
a,M,y / GAlive
a,s,y
a a,s
GBirths
a,s,y : number of births of sex s given by female agents aged a years-old
GFertR
a,y+1 : global fertility rate
GDeaths
a,s,y : number of deaths; it satisfies

GDeaths a,s,k,y = 0 ∧ Aa−1,s,k,y−1 = 1},

= #{AAlive a = 0
Alive
a,s,y

GDeaths a,s,y − G0,s,y
GBirths Alive
0,s,y =
a

GMortR
a,s,y+1 : global mortality rate.

Real data from the 2011 Portuguese Census is used as the base population, in
a 2% size scale. The updating of the mortality and fertility rates is ensured by
MicMac models, as in Gaag et al. [6]. The Mac part is ruled by the predictions
obtained from Statistics Portugal [7], controlling the overall evolution of the
variables. The Mic part is based on the results obtained from the previous year.
The controlling factor for the fertility (resp. mortality) rate is the expected
mean fertility (resp. mortality) growth rate for the year y, denoted by GFertEvo
y
(resp. GMortEvo
y ). Then

GBirths
s
a,s,y GDeaths
a,s,y
GFertR = GFertEvo , GMortR
a,s,y+1 = GMortEvo .
a,y+1
GDeaths
a,F,y + GAlive
a,F,y
y
GDeaths
a,s,y + GAlive
a,s,y
y

Whenever the population size is very small, the mortality formula is replaced
a,s,y+1 = Ga,s,y × Gy
by GMortR MortR MortEvo
.
The following random variables are created in order to achieve heterogeneity
among the population of agents (division by 3 in the fractions ensures that its
values lie between 0 and 1):
GFertR FertR
a,y+1 1−Ga,y+1
FertR
Xa,y+1 ∼ N (GFertR FertR
a,y+1 , σxa,y+1 ), σxFertR
a,y+1 = min{0.02, 3
, 3
}
GMortR MortR
a,s,y+1 1−Ga,s,y+1
MortR
Xa,s,y+1 ∼ N (GMortR MortR
a,s,y+1 , σxa,s,y+1 ), σxMortR
a,s,y+1 = min{0.02, 3
, 3
}.
704 R. Fernandes et al.

The parameters described next are assigned to each agent:

AFertR
a+1,k,y+1 : probability for a female agent to give birth in the year y+1; it is
a+1,y+1 ∈ Xa+1,y+1
given by xFertR FertR
MortR
Aa+1,s,k,y+1 : probability for an agent to die in the year y+1; it is given by
a+1,s,y+1 ∈ Xa+1,s,y+1 .
xMortR MortR

The evolution process is done according to the following steps:

Step 1. Increase simulation year by one and age every living agent by one.
Step 2. Give birth to new agents according to the fertility rates: randomly
sample u1 , u2 from U (0, 1); if u1 < AFertR
a,k,y then a new agent is born;
if u2 < GMaleFreq
y then set the agent to male sex, else set it to female.
Step 3. Randomly ”kill” agents: randomly sample u3 from U (0, 1); if u3 <
AMortR Alive
a,s,k,y , set Aa,s,k,y = 0.
Step 4. Compute next year fertility and mortality parameters and male pro-
portion rates and deﬁne each agent’s fertility and mortality.

2.2 Migration
The Portuguese population is also affected by migrations, with a high amount
of entries and exits, summing a total negative net migration. Our model also
includes such process. Throughout, c is an index denoting a given country while
c0 denotes Portugal.
GHealth
c : health indicator with AHealthW
k as its corresponding weight
GSafety
c : safety indicator with ASafetyW
k as its corresponding weight
GWage
c,y : wage andicator with AWageW
k as its corresponding weight
GPop
c : indicator for the Portuguese population size, with APopW
k as its
corresponding weight
GLang
c : indicator for the Portuguese language, with ALangW
k as its corre-
sponding weight
GLimit
c : emigration limits for country c, defined by the destination country
ECounter
Gc : emigration counter
The first four indicators range between 0 and 1. The used indicator must be
the same for all countries and it is preferable that the data source is the same,
because the same indicator may vary in different sources. GLang c equals 1 if
Portuguese is the native language and 0 otherwise. The wage indicator changes
every year according to the country expected mean wage growth. Data were
obtained from the UN and OECD databases [9], [8].
The above weights are assigned to each agent by a randomly sampled value
from N (μ, 0.75μ); μ for the first three and last weights are obtained from
Balaz [3]. The value of APopW
k was based on findings from Anjos and Campos [2].
The gains of migrating also depend on the will to migrate, which is highly
dependent on the age of the agent and its employment status. We define
An Agent-Based MicMac Model for Forecasting 705

GEmpProp
a,s : proportion of employed individuals, obtained from the 2011 Por-
tuguese rates of INE database [7]
GEcoGrow : expected economic growth for Portugal
AEmp
a,s,k,y : agent’s employment status, coded −1 if employed and 1 otherwise.

Every year, and for each agent, AEmp

a,s,k,y is obtained by randomly sampling u

EmpProp EcoGrow (y−y0 )
from U (0, 1): if u < Ga,s G , the agent is employed; else it
is unemployed. The variable AEmpW
k is created as the weight for AEmp a,s,k,y and is
sampled from N (μ, 0.75μ), where μ is chosen to fit the emigration data.
The Weibull distribution function W (x, λ, k) (and its derivative w(x, λ, k))
is used to model the age effect on the will to emigrate. Its shape (λ) AShape k and
scale (k) AScale
k are estimated by statistical fitting. These values are also jittered
for each agent, and three variables are created: AX-axis
k (resp. AY-axis
k ) to perform
dilatation of the function on the x-axis (resp. y-axis), and a third variable ABase k
to define a base level for the will to emigrate.
For an agent k, its will to emigrate in the first year y0 of the simulation or
when it becomes 18 years-old, is
y-axis
AWill
a,k,y0 = Ak × W (Ax-axis
k × a; AShape
k , AScale
k ) + ABase
k

and is updated by
y-axis
AWill
a+1,k,y+1 = Ak × w(Ax-axis
k × a; AShape
k , AScale
k ) + ASuccessW
k × ASuccess
k,y .

The gain from emigration is then given by

AGain
c,a,k,y = AHealthW
k × GHealth
c + ASafetyW
k × GSafety
c + AWageW
k × GWage
c,y +

+ APopW
k × GPop
c + ALangW
k × GLang
c × AWill
a,k,y , if c = c0

while the gain to remain in Portugal corresponds to

AGain
c0 ,a,k,y = Ak
HealthW
× GHealth
c0 + ASafetyW
k × GSafety
c0 + AWageW
k × GWage
c0 ,y +

+ APopW
k × GPop
c0 + Ak
LangW
× GLang
c0 × 1 − AWill
a,k,y .

The emigration process is now done by the following steps:

Step 1. Initialize the emigration counter GECounter
c at 0
Step 2. Update each agent’s AWill
a,k,y and AGain
c,a,k,y
Step 3. For each agent, determine its desired emigration destination

ADest
k = arg max{AGain
c,a,k,y }
c

restricted to GECounter
ADest
< GLimits
ADest
and add 1 to the counter GECounter
ADest
k k k
Step 4. Remove all agents with ADest
k diﬀerent from Portugal.
706 R. Fernandes et al.

As this mechanism needs some iterations to converge, the emigration amount

in the first 6 years will be equal to that of 2011.
The immigration is exogenously defined by linear regressions on the countries
with non-negative immigration rate in 2011 using OECD data since 2002 [8] and
the age distribution of immigrants is fitted with UN data [9].
Finally, by defining

GImmi
c,y : estimated amount of immigrants;
ImmiAge
X : Weibull distribution for the immigrants’ age;
GImmiProp
c : male proportion of immigrants;
GFertF
c : immigrants multiplying fertility factor, as in Adsera and Ferrer [1];
AImmiC
k : origin country of the immigrant agent k;
the immigration process is done according to:

Step 1. Create the immigrant agents as determined by GImmi

c,y and deﬁne the
origin country AImmiC
k accordingly;
Step 2. For each newly immigrant agent, set its age to a randomly sampled
value x from X ImmiAge ;
Step 3. For each newly immigrant agent k, randomly sample u from U (0, 1);
if u < GImmiProp
AImmiC
then set agent k as a male;
k
Step 4. For each immigrant agent k, deﬁne its birth and death parameters
for the following year;
Step 5. For each immigrant agent k of age a and sex s, set ABirthRa,s,k,y =
ABirthR
a,s,k,y × GFertF
A ImmiC .
k

3 Results
The results from the previously presented model for three different expected eco-
nomic growth rates for Portugal, GEcoGrow ∈ {0.9, 1.0, 1.1}, are now presented.
For each scenario, 300 simulations were considered for the period 2011-2041.
The outputs of the model are: total population size, number of births, number of
deaths and the total number of emigrants, by age and for each of the considered
years. Totals across all ages (and subsequently their means) are obtained.
Whatever the economic scenario, the population size is expected to be a
decreasing function with time. Moreover, the decrease is deepest when the eco-
nomic growth rate attains the lowest value. This derives from the fact that eco-
nomic growth plays a major role on the emigration decision and a decrease on this
parameter would increase emigration. Such expectation is confirmed by fig 1.
In addition, although the economic does not directly affect the fertility rate, a
decrease on this parameter leads to a faster decrease on the number of births over
the years. This is most likely due to the fact that the primary age interval of the
Portuguese emigrant population is within the women fertile ages. So the increase
in the number of emigrants leads to a reduction of the Portuguese women that
An Agent-Based MicMac Model for Forecasting 707

Fig. 1. Mean Population projection for diﬀerent Economic Growth values

are in the fertile age. This justiﬁes the decrease of births and further decreases
the total Portuguese Population.

Acknowledgments. The ﬁrst and second authors were partially ﬁnanced by the FCT
Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Tech-
nology) within project UID/EEA/50014/2013. The last author was partially supported
by CMUP (UID/MAT/00144/2013), which is funded by FCT (Portugal) with national
(MEC) and European structural funds through the programs FEDER, under the part-
nership agreement PT2020.

References
1. Adsera, A., Ferrer, A.: Factors influencing the fertility choices of child immigrants
in Canada. Population Studies: A Journal of Demography 68(1), 65–79 (2014)
2. Anjos, C., Campos, P.: The role of social networks in the projection of international
migration flows: an Agent-Based approach. Joint Eurostat-UNECE Work Session
on Demographic Projections. Lisbon, April 28–30, 2010
3. Baláž, V., Williams, A., Fifeková, E.: Migration Decision Making as Complex
Choice: Eliciting Decision Weights Under Conditions of Imperfect and Complex
Information Through Experimental Methods. Popul. Space Place (2014)
4. Billari, F., Ongaro, F., Prskawetz, A.: Introduction: Agent-Based Computational
Demography. Springer, New York (2003)
5. Ferber, J.: Multi-Agent Systems - An Introduction to Distributed Artificial
Intelligence. Addison-Wesley Longman, Harlow (1999)
6. Gaag, N., Beer, J., Willekens, F.: MicMac Combining micro and macro approaches
in demographic forecasting. Joint Eurostat-ECE Work Session on Demographic
Projections. Vienna, September 21–23, 2005
7. INE: Statistics Portugal (2012). www.ine.pt/en/
8. OECD: Organization for Economic Co-operation and Development (2015). www.
stats.oecd.org/
9. UN: United Nations (2015). www.data.un.org/
Text Mining and Applications
Multilingual Open Information Extraction

Pablo Gamallo1(B) and Marcos Garcia2

1
Centro Singular de Investigacin en Tecnoloxas da Informacin (CITIUS),
Universidade de Santiago de Compostela, Santiago de Compostela, Spain
[email protected]
2
Cilenis Language Technology, Santiago de Compostela, Spain
[email protected]

Abstract. Open Information Extraction (OIE) is a recent unsuper-

vised strategy to extract great amounts of basic propositions (verb-based
triples) from massive text corpora which scales to Web-size document col-
lections. We propose a multilingual rule-based OIE method that takes
as input dependency parses in the CoNLL-X format, identiﬁes argu-
ment structures within the dependency parses, and extracts a set of
basic propositions from each argument structure. Our method requires
no training data and, according to experimental studies, obtains higher
recall and higher precision than existing approaches relying on train-
ing data. Experiments were performed in three languages: English, Por-
tuguese, and Spanish.

1 Introduction
Recent advanced techniques in Information Extraction aim to capture shallow
semantic representations of large amounts of natural language text. Shallow
semantic representations can be applied to more complex semantic tasks involved
in text understanding, such as textual entailment, filling knowledge gaps in text,
or integration of text information into background knowledge bases. One of the
most recent approaches aimed at capturing shallow semantic representations is
known as Open Information Extraction (OIE), whose main goal is to extract a
large set of verb-based triples (or propositions) from unrestricted text. An Open
Information Extraction (OIE) system reads in sentences and rapidly extracts
one or more textual assertions, consisting in a verb relation and two arguments,
which try to capture the main relationships in each sentence [1]. Wu and Weld
[2] define an OIE system as a function from a document d, to a set of triples,
(arg1, rel, arg2), where arg1 and arg2 are verb arguments and rel is a textual
fragment (containing a verb) denoting a semantic relation between the two verb
arguments. Unlike other relation extraction methods focused on a predefined set
of target relations, the Open Information Extraction paradigm is not limited
to a small set of target relations known in advance, but extracts all types of
(verbal) binary relations found in the text. The main general properties of OIE
systems are the following: (i) they are domain independent, (ii) they rely on
unsupervised extraction methods, and (iii) they are scalable to large amounts of
text [3].

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 711–722, 2015.
DOI: 10.1007/978-3-319-23485-4 72
712 P. Gamallo and M. Garcia

The objective of this article is to describe a heuristic-based OIE system,

called ArgOE, which uses syntactic analysis to detect the argument structure
of each verb, as well as a set of rules to generate the corresponding triples (or
basic propositions) from each argument structure. In our work, an argument
structure has a very broad sense, since it includes all those syntactic dependen-
cies headed by a verb except speciﬁers, auxiliars, and adverbs. So, it includes all
main clause constituents: subjects, objects, attributes, and prepositional phrases
referring to locations, instrumentals, manners, causes, etc. So, there is no dis-
tinction between traditional arguments and adjuncts, both are used to build the
argument structure.
Consider for example the sentence:
In May 2010, the principal opposition parties boycotted the polls after accusa-
tions of vote-rigging.

First, our OIE system detects the argument structure of the verb boycotted in
this sentence: there is a subject, a direct object, and two prepositional phrases
functioning as verb adjunts. Then, a set of basic rules transform the argument
structure into a set of triples:

(“the principal opposition parties”, “boycotted”, “the polls”),

(“the principal opposition parties”, “boycotted the polls in”, “May”),
(“the principal opposition parties”, “boycotted the polls after”, “accusations of vote-rigging”)

ArgOE requires no training data, generates triples without any post-

processing, and takes as input dependency parses in CoNLL-X format [4,5].
Given that such a dependency-based representation is provided by many robust
parsers including multilingual systems, e.g., MaltParser [6] or DepPattern [7],
ArgOE can be seen as a multilingual open information extractor. We will describe
experiments of triples extraction performed on English, Portuguese, and Spanish
text. ArgOE’s source code conﬁgured for English, Spanish, Portuguese, French,
and Galician, as well as other resources are released under GPL license.
This article is organized as follows. Section 2 introduces previous work on
OIE: in particular it describes diﬀerent types of OIE systems. Next, in Section 3,
the proposed method, ArgOE, is described in detail. Then, some experiments are
performed in Section 4, where ArgOE system is compared against several systems
and evaluated in several languages, including Portuguese. Finally, conclusions
and future work are addressed in 5.

2 Related Work

The goal of an OIE system is to extract triples (arg1, rel, arg2) describing basic
propositions from large amounts of text. A great variety of OIE systems has been
developed in recent years. They can be organized in two broad categories: those
systems requiring automatically generated training data to learn a classiﬁer and
those based on hand-crafted rules or heuristics. In addition, each system category
Multilingual Open Information Extraction 713

can also be divided in two subtypes: those systems making use of shalow syn-
tactic analyisis (PoS tagging and/or chunking), and those based on dependency
parsing. In sum, we identify four categories of OIE systems:

(1) Training data and shallow syntax: The ﬁrst OIE system, TextRunner
[8], belongs to this category. A more recent version of TextRunner, also using
training data (even if hand-labeled annotated) and shallow syntactic analysis
is R2A2 [9]. Another system of this category is WOEpos [2] whose classiﬁer
was trained with corpus obtained automatically from Wikipedia.
(2) Training data and dependency parsing: These systems make use of
training data represented by means of dependency trees: WOEdep [2] and
OLLIE [10].
(3) Rule-based and shallow syntax: They rely on lexico-syntactic patterns
hand-crafted from PoS tagged text: ReVerb [11], ExtrHech [12], and LSOE
[13].
(4) Rule-based and dependency parsing: They make use of hand-crafted
heuristics operating on dependency parses: ClauseIE [3], CSD-IE [14],
KrakeN [15], and DepOE [16].

Our system belongs to the fourth category and, thus, is similar to ClauseIE
and CSD-IE, which are the best OIE extractors to date according to the results
reported in both [3] and [14]. However, these two systems are dependent on the
output format of a particular syntactic parser, namely the Standford dependency
parser [17]. In the same way, DepOE reported in [16], relies on a specific depen-
dency parser, DepPattern [7], since it only operates on the by-default output
given by this parser. ArgOE, by constrast, uses as input the standard CoNLL-X
format and, then, does not depend on a specific dependency parser.
Another significant difference between ArgOE and the other rule-based sys-
tems is that ArgOE does not distinguish between arguments and adjuncts. As
this distiction is not always clear and well identified by the syntactic parsers, we
simplify the number of different verb constituents within the argument struc-
ture: all prepositional phrases headed by a verb are taken as verb comple-
ments, regardeless of their degree of dependency (internal arguments or external
adjuncts) with the verb. So, the set of rules used to generate triples from this
simplified argument structure is smaller than in other rule-based approaches.
In addition, we make extraction multilingual. More precisely, our system has
the following properties:

– Extraction of triples represented at diﬀerent levels of granularity: surface

forms and dependency level.
– Multilingual extraction based on multilingual parsing.

3 The Method

Our OIE method consists of two steps: detection of argument structures and
generation of triples.
714 P. Gamallo and M. Garcia

3.1 Step 1: Argument Structure Detection

For each parsed sentence in the ConLL-X format, all verbs are identified and,
for each verb (V), the system selects all dependents whose syntactic function
can be part of its argument structure. Each argument structure is the abstract
representation of a clause. The functions considered in such representations are
subject (S), direct object (O), attribute (A), and all complements headed by a
preposition (C). Five types of argument structures were defined and used in
the first experiments: SVO, SVC+, SVOC+, SVA, SVAC+, where “C+” means
one or more complements. All these argument structures are correct syntactic
options in our working languages: English, Portuguese, and Spanish. Table 1
shows English examples for each type of argument structure.

Table 1. Examples of argument structures extracted from our testing dataset.

Type Example Constituents

SVO A Spanish official offered what he S=”A Spanish official”, V=”offered”,
believed to be a perfectly reasonable O=”what he believed to be a per-
explanation for why the portable facil- fectly reasonable explanation for why the
ities weren’t in service portable facilities weren’t in service”
SVC1 C2 Output was reduced in 1996 after one of S=”Output”, V=”was reduced”,
its three furnaces exploded C1 =”in 1996”, C2 =”after one of its
three furnaces exploded”
SVOC These immigrants deserve consideration S=”These immigrants”, V=”deserve”,
under the laws that were in place O=”consideration”, C=”under the laws”
SVA Koplowitz’s next concert will be a more S=”Koplowitz’s next concert”, V=”will
modest affair be”, A=”a more modest affair”
SVAC Gallery hours are 11 a.m. to 6 p.m. S=”Gallery hours”, V=”are daily”,
daily A=”11 a.m.”, C=”to 6 p.m.”

Within a sentence, it is possible to ﬁnd several argument structures corre-

sponding to different clauses. For instance, the SVO example in Table 1 repre-
sents the argument structure associated with the clause introduced by the verb
offered, but there are three more clauses introduced by other verbs (in bold): he
believed to be a perfectly reasonable explanation for why the portable facilities
weren’t in service, what be a perfectly reasonable explanation for why the portable
facilities weren’t in service, and the portable facilities weren’t in service, giving
rise to the different argument structures shown in Table 2.

Table 2. Argument structures extracted from the sentence A Spanish official offered
what he believed to be a perfectly reasonable explanation for why the portable facilities
weren’t in service.
Type Constituents
SV0 S=”A Spanish official”, V=”offered”, O=”what he believed to be a perfectly rea-
sonable explanation for why the portable facilities weren’t in service”
SV0 S=”he”, V=”believed to”, O=”be a perfectly reasonable explanation for why the
portable facilities weren’t in service”
SVA S=”what”, V=”be”, A=”a perfectly reasonable explanation for why the portable
facilities weren’t in service”
SVA S=”the portable facilities”, V=”weren’t”, A=”in service”
Multilingual Open Information Extraction 715

The constituents of an argument structure are the full phrases or clauses

playing different syntactic functions within the structure. Each constituent is
built by finding all dependency paths from its head to all its (direct and indirect)
dependents. For instance, consider the SVA example in Table 1. To build the
full constituents, the first step is to identify the head word of each constituent:
S=”concert”, V=”be”, A=”affair”. Then, each head is extended with all its
dependency words by exploring the full dependency path and by taking into
account the position in the sequence. This results in full phrases representing all
constituents of the clause: S=”Koplowitz’s next concert”, V=”will be”, A=”a
more modest affair”.
There is, however, an important exception in the process of building full con-
stituents: namely, relative clauses. The constituents we generate do not include
those clauses introduced by a verb modifying a noun. For instance, the SVOC
example in Table 1 contains the constituent C=”under the laws”, extracted
from the expression under the laws that were in place. In this case, the rela-
tive clause was not taken into account to generate the constituent C within the
argument structure of the main verb deserve. However, relative clauses and their
antecedents also introduce argument structures. In the same example, we iden-
tify a SVA argument structure from the chain “the laws that were in place”,
where S=”the laws”, V=”were”, and A=”in place”. The main reason for remov-
ing relatives from constituents is to guarantee the generation of coherent and
non over-specified propositions, as we will report in the next section.
Moreover, coordinatated conjunctions in verbal phrases are splitted into dif-
ferent argument structures, one for each coordinated verb. However, by taking
into account the experiments performed in [3], coordinated phrases in the verb
arguments are not processed.
Finally, notice that the argument structure SVO1 O2 (e.g. John gave Mary a
present) is not considered here, since it is not a correct syntactic structure in
Spanish (nor in the rest of latin languages). In order the system to be multi-
lingual, we have defined only those argument structures that are shared by our
working languages.

3.2 Step 2: Generation of Triples

One of the most discussed problems of OIE systems is that about 90% of the
extracted triples are not concrete facts [1] expressing valid information about
one or two named entities, e.g. “Obama was born in Honolulu”. However, the
vast amount of high confident relational triples (propositions) extracted by OIE
systems are a very useful starting point for further NLP tasks and applications,
such as common sense knowledge acquisition [18], and extraction of domain-
specific relations [19]. It follows that OIE systems are not suited to extract facts,
but to transform unstructured texts into structured and coherent information
(propositions), closer to ontology formats. Having this in mind, our objective
is to generate propositions from argument structures, where propositions are
defined as coherent and non over-specified pieces of basic information.
716 P. Gamallo and M. Garcia

From each argument structure detected in the previous step, our OIE sys-
tem generates a set of triples representing the basic propositions underlying the
linguistic structure. We assume that every argument structure can convey dif-
ferent pieces of basic information which are, in fact, minimal units of coherent,
meaningful, and non over-speciﬁed information. For example, consider again the
sentence:
In May 2010, the principal opposition parties boycotted the polls after accusa-
tions of vote-rigging.

which gives rise to the following SVOC1 C2 argument structure:

S=”the principal opposition parties” , V=”boycotted”, O=”the polls”,
C1 =”In May”,
C2 =”after accusations of vote-rigging”
An incoherent and over-speciﬁed extraction would generate from this struc-
ture the following odd propositions:

P1 =(“the principal opposition parties”, “boycotted in”, “May”)

P2 =(“the principal opposition parties”, “boycotted after”, “accusations of
vote-rigging”)
P3 =(“the principal opposition parties”, “boycotted the polls after accusations
of vote-rigging in”, “May”)

Propositions P1 and P2 are incoherent extractions because the direct object

constituent (O) is not optional and, then, may not be omitted from any propo-
sition built from that argument structure. In addition, P3 contains an over-
specified relation constituted by several constituents of the argument structure.
To ensure a correct extraction, we defined a set of simple rules allowing us to
extract only those propositions that are considered as coherent and non over-
specified. For this purpose, direct objects are never omitted and relations cannot
contain more than one clause constituent. This way, the three coherent proposi-
tions generated from the above argument structure are the following:

P1 =(“the principal opposition parties”, “boycotted”, “the polls”)

P2 =(“the principal opposition parties”, “boycotted the polls after”, “accusa-
tions of vote-rigging”)
P3 =(“the principal opposition parties”, “boycotted the polls in”, “May”)

As has been said, another restriction to avoid over-speciﬁcation is to remove

relative clause from the constituents. In the same way, that-clauses that are
direct objects are never inserted in the relation so as to avoid long and over-
speciﬁcied relations.
Propositions are generated using trivial extraction rules that transform argu-
ment structures into triples. Table 3 shows the set of rules we used to extract
triples from our ﬁve types of argument structures. As in the case of all current
OIE systems, we only consider the extraction of verb-based triples. We took this
Multilingual Open Information Extraction 717

decision in order to make a fair comparison when evaluating the performance of

our system against similar systems (see Section 4). However, nothing prevents us
from deﬁning extraction rules to generate several triples from non-verbal struc-
tures: noun-prep-noun, noun-noun, adj-noun, and verb-adverb dependencies.

Table 3. Rules applied on ﬁve argument structures to generate the corresponding

triples

Argument Structure Rules

SVO arg1=S, rel=V, arg2=O
SVC+ for i = 1 to n where n is the number of Complements C:
Ci is descomposed in prepi and Termi
arg1=S, rel=V+prepi , arg2=Termi
SVOC+ if O is not a that-clause, then:
arg1=S, rel=V, arg2=O
for i = 1 to n where n is the number of Complements C:
Ci is descomposed in prepi and Termi
arg1=S, rel=V+O+prepi , arg2=Termi

if O is a that-clause, then:
arg1=S, rel=V, arg2=O
for i = 1 to n where n is the number of Complements C:
Ci is descomposed in prepi and Termi
arg1=S, rel=V+prepi , arg2=Termi
SVA arg1=S, rel=V, arg2=A
SVAC+ arg1=S, rel=V, arg2=A
for i = 1 to n where n is the number of Complements C:
Ci is descomposed in prepi and Termi
arg1=S, rel=V+A+prepi , arg2=Termi

The output of ArgOE does not oﬀer conﬁdence values for each extraction. As
the system is rule-based, there is not probabilistic information to be considered.
Finally, with regard to the output format, it is worth mentioning that most OIE
systems produce triples only in textual, surface form. This can be a problem if
triples are used for NLP tasks requiring more linguistic information. This way,
in addition to surface form triples, ArgOE also provides syntax-based informa-
tion, with PoS tags, lemmas, and heads. If more syntactic information would be
required, it can be easily obtained from the dependency analysis.

4 Experiments
We conducted thre experimental studies: with English, Spanish, and Portuguese
texts. Preliminary studies were performed to select an appropriate syntactic
parser as input of ArgOE. Two multilingual dependency parsers were tested:
718 P. Gamallo and M. Garcia

MaltParser 1.7.11 and DepPattern 3.0 2 , which is provided with a format con-
verter that changes the standard output of the parser into the CoNLL-X format.
We opted for DepPattern as input of ArgOE because the tagset and dependency
names of DepPattern is the same for all the languages it is able to analyze, and
then, there is no need to configure and adapt ArgOE for each new language. The
use of MaltParser with different languages would require implementing convert-
ers from tagsets and dependency names defined for a particular language to a
common set of PoS tags and dependency names. Besides DepPattern, we also
use two different PoS taggers as input of the syntactic analyzer: TreeTagger [20]
for English texts and FreeLing [21] for Spanish and Portuguese. All datasets,
extractions and labels of the two experiments, as well as a version of ArgOE
configured for English, Spanish, Portuguese, French, and Galician, are freely
available3 .

4.1 English Evaluation

We compare ArgOE against several OIE existing systems for English, namely
TextRunner, ReVerb, OLLI, WOEparse , and ClausIE. In this experiment, we
will report the results obtained by the the best version of ClauseIE, i.e., without
considering redundancy and without processing conjunctions in the arguments.
Note that we are comparing four systems based on training data (TextRunner,
ReVerb, OLLI, and WOEparse ) against two rule-based methods: ClausIE and
ArgOE.
The dataset used in the experiment is the Reverb dataset4 manually labeled
for the evaluation reported in [3]5 . The dataset consists of 500 sentences with
manually-labeled extractions for the five systems enumerated above. In addi-
tion, we manually labeled the extractions obtained from ArgOE for the same
500 sentences. To maintain consistency among the labels associated to the five
systems and those associated to ArgOE, we automatically identified those triples
extracted by ArgOE that also appear in, at least, one of the other labeled extrac-
tions. As a result, we obtained 355 triples extracted by ArgOE that were labeled
by annotators of previous work. Then, the extractions of ArgOE were given to
two annotators who were instructed to consider the 355 already labeled extrac-
tions as starting point. So, our annotators were required to study and analyze
the evaluation criteria used by other annotators before starting annotating the
rest of extracted triples. We also instructed the annotators to treat as incor-
rect those triples denoting incoherent and uninformative propositions, as well
as those triples constituted by over-specified relations, i.e., relations containing
numbers, named entities, or excessively long phrases (e.g., boycotted the polls
after accusations of vote-rigging in). An extraction was considered as correct

1
htpp://www.maltparser.org/
2
http://gramatica.usc.es/pln/tools/deppattern.html/
3
http://172.24.193.8/ArgOE-epia2015.tgz (anonymous version)
4
http://reverb.cs.washington.edu/
5
http://www-mpi-inf.mpg.de/departments/d5/software/clausie
Multilingual Open Information Extraction 719

if it was labeled as correct by both annotators. The two annotators agreed on

75% of extractions (Cohen’s kappa k = 0,50), which is considered a moderate
agreement. In sum, we follow similar criteria to those defined in previous OIE
evaluations [9].
The results of our evaluation are summarized in Table 4 and Figure 1. Table
4 shows the number of correct expressions extracted as well as the total number
of extractions for each system. Precision is defined as the number of correct
extractions divided by the number of returned extractions. Recall is estimated
by identifying a pool of relevant extractions which is the total number of different
correct extractions made by all the systems (this pool is our gold-standard). So,
recall is the number of correct extractions made by the system divided by the
total number of correct expressions in the pool (3, 222).

Table 4. Number of correct extractions and total number of extractions in the Reverb
dataset, according to the evaluation reported in [3] and our own contribution with
ArgOE.

Systems correct extractions total extractions

textrunner 286 798
reverb 388 727
woe 447 1028
ollie 547 1242
argoe 582 1162
clausie 1706 2975

Fig. 1. Evaluation of six OIE systems

The results show that the two rule-based systems, ClausIE and ArgOE,
perform better than the classiﬁers based on automatically generated training
data. This is in accordance with previous work reported in [3,14]. Moreover,
the four systems based on dependency analysis (ClausIE, ArgOE, OLLIE, and
720 P. Gamallo and M. Garcia

WOEparse ) improve over those relying on shallow syntax (TextRunner and

ReVerb). And ﬁnally, ClausIE clearly outperforms the other systems, in terms
of both precision and recall. A common problem for parse-based OIE systems
is the large inﬂuence of parser errors. So, the quality of the parser can deter-
mine the quality of the OIE extractor. ClausIE uses the Standford Dependency
Parser, while ArgOE uses DepPattern, and OLLIE the MaltParser. One possible
reason for the comparably low precision of our system against ClausIE might be
the lower parsing performance of DepPattern against the Standford Dependency
Parser for the English language.

4.2 Spanish Evaluation

In this experiment, we compare ArgOE against the only OIE system that has
been evaluated for other language than English: ExtrHech [12]. This is also a
rule-based system, but it does not operate on dependency parsing but on shallow
syntax (patterns of PoS tags). The Spanish dataset, called Raw Web6 , contains
159 sentences randomly extracted with a web crawler from over 5 billion web
pages in Spanish. Each extraction was labeled by two independent annotators.
An extraction was considered as correct if it was labeled as correct by both
annotators. They agreed on 81% of extractions (Cohen’s kappa k = 0,62). Table
5 depicts the results obtained by the two systems on these sentences. Unfortu-
nately, the extractions made by ExtrHech are not available, so it is not possible
to create a pool of correct triples extracted by the two systems to measure recall.
Only precision can be compared even if we were not able to unify the criteria
given to our annotators with those deﬁned in [12]. Notice that the precision
of ArgOE is identical to that obtained for English (50%), which can be seen
as an indirect evidence that the two parsers used by our system have similar
performance.
Table 5. Precision of both ArgOE and ExtrHech on the Spanish dataset

Systems correct extractions total extractions Precision (%)

argoe 107 214 50%
extrahech - - 55%

Most errors made by our OIE system come from three diﬀerent sources: the
syntactic parser, the PoS tagger, and the Named Entity Recognition module used
by the PoS tagger. So, the improvement of our system relies on the performance
of other NLP tasks.

4.3 Portuguese Evaluation

For this purpose, we selected 103 test sentences from a domain-speciﬁc cor-
pus, called CorpusEco [22], containing texts on ecological issues. ArgOE was
6
http://www.gelbukh.com/resources/spanish-open-fact-extraction
Multilingual Open Information Extraction 721

applied on the sentences and 190 triples was extracted. One annotator labeled
the extracted triples and Table 6 shows the number of correct triples and pre-
cision achieved by the system. To the best of our knowledge, this is the ﬁrst
experiment that reports an OIE system working on Portuguese. Precision is
again similar (53%) to that obtained in the previous experiments. Again, most
errors are due to problems from the syntactic parser and PoS tagger.

Table 6. Precision of ArgOE on the Portuguese dataset

Systems correct extractions total extractions Precision (%)

argoe 95 190 53%

5 Conclusion
We have described a rule-based OIE system to extract verb-based triples than
takes as input dependency parsers in the CoNLL-X format. So, it may take
advantage of efficient, robust, and multilingual syntactic parsers. Even if our
system is outperformed by other similar rule-based methods, it reaches better
results than those strategies based on training data. As far as we know, ArgOE
is the first OIE system working on more than one language. In future work, we
will include NLP modules to find linguistic generalizations over the extracted
triples: e.g., co-reference resolution to link the arguments of different triples, and
synonymy detection of verbs to reduce the open set of extracted relations and,
then, to enable semantic inference.
Acknowledgments. This work has been supported by projects Plastic and Celtic,
Innterconecta (CDTI).

References
1. Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open infor-
mation extraction from the web. In: International Joint Conference on Artiﬁcial
Intelligence (2007)
2. Wu, F., Weld, D.S.: Open information extraction using wikipedia. In: Annual Meet-
ing of the Association for Computational Linguistics (2010)
3. Corro, L.D., Gemulla, R.: Clausie: clause-based open information extraction. In:
Proceedings of the World Wide Web Conference (WWW-2013), Rio de Janeiro,
Brazil, pp. 355–366 (2013)
4. Hall, J., Nilsson, J.: CoNLL-X shared task on multilingual dependency parsing. In:
The Tenth CoNLL (2006)
5. Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilson, J., Riedel, S., Yuret, D.: The
CoNLL-2007 shared task on dependency parsing. In: Proceedings of the Shared
Task Session of EMNLP-CoNLL 2007, Prague, Czech Republic, pp. 915–932 (2007)
6. Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., Marinov, S.,
Marsi, E.: Maltparser: A language-independent system for data-driven dependency
parsing. Natural Language Engineering 13(2), 115–135 (2007)
722 P. Gamallo and M. Garcia

7. Gamallo, P., González, I.: A grammatical formalism based on patterns of part-of-

speech tags. Journal of Corpus Linguistics 16(1), 45–71 (2011)
8. Banko, M., Etzioni, O.: The tradeoffs between open and traditional relation extrac-
tion. In: ACL-08 (2008)
9. Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam: Open information
extraction: the second generation. In: International Joint Conference on Artificial
Intelligence (2011)
10. Mausam, Schmitz, M., Soderland, S., Bart, R., Etzioni, O.: Open language learning
for information extraction. In: EMNLP-12, pp. 523–534 (2012)
11. Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information
extraction. In: EMNLP-11 (2011)
12. Zhilla, A., Gelbukh, A.: Comparison of open information extraction for Engish and
Spanish. In: Dialogue 2014 (2014)
13. Xavier, C.C., Souza, M., de Lima, V.S.: Open information extraction based
on lexical-syntactic patterns. In: Brazilian Conference on Intelligent Systems,
pp. 189–194 (2013)
14. Bast, H., Haussmann, E.: Open information extraction via contextual sententence
decomposition. ICSC 2013, 154–159 (2013)
15. Akbik, A., Loser, A.: Kraken: N-ary facts in open information extraction. In: Joint
Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge
Extraction, pp. 52–56 (2012)
16. Gamallo, P., Garcia, M., Fernández-Lanza, S.: Dependency-based open information
extraction. In: ROBUS-UNSUP Workshop at EACL-2012, Avignon, France (2012)
17. Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: ACL-03, pp. 423–430
(2003)
18. Lin, T., Mausam, Etzioni, O.: Identifying functional relations in web text. In:
Conference on Empirical Methods in Natural Language Processing (2010)
19. Soderland, S., Roof, B., Qin, B., Xu, S., Mausam, Etzioni, O.: Adapting open
information extraction to domain-specific relations. AI Magazine 31(3), 93–102
(2010)
20. Schimd, H.: Improvements in part-of-speech tagging with an application to german.
In: ACL SIGDAT Workshop, Dublin, Ireland (1995)
21. Padró, L., Stanilovsky, E.: Freeling 3.0: towards wider multilinguality. In: LREC
2012, Istanbul, Turkey (2012)
22. Zavaglia, C.: O papel do léxico na elaborac̃ão de ontologias computacionais: do seu
resgate à sua disponibilizac̃ão. In: Lingüı́stica IN FOCUS - Léxico e morfofonologia:
perspectivas e análises. Uberlândia: EDUFU, pp. 233–274 (2006)
Classification and Selection of Translation
Candidates for Parallel Corpora Alignment

K.M. Kavitha1,3(B) , Luı́s Gomes1,2 , José Aires1,2 , and José Gabriel P. Lopes1,2
1
NOVA Laboratory for Computer Science and Informatics (NOVA LINCS),
Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa,
2829-516 Caparica, Portugal
[email protected], [email protected], [email protected],
[email protected]
2
ISTRION BOX-Translation & Revision, Lda., Parkurbis,
6200-865 Covilhã, Portugal
3
Department of Computer Applications, St. Joseph Engineering College,
Vamanjoor, Mangaluru 575 028, India
[email protected]

Abstract. By incorporating human feedback in parallel corpora align-

ment and term translation extraction tasks, and by using all human
validated term translation pairs that have been marked as correct, the
alignment precision, term translation extraction quality and a bunch of
closely correlated tasks improve. Moreover, such a labelled lexicon with
entries tagged for correctness enables bilingual learning. From this per-
spective, we present experiments on automatic classification of transla-
tion candidates extracted from aligned parallel corpora. For this pur-
pose, we train SVM based classifiers for three language pairs, English-
Portuguese (EN-PT), English-French (EN-FR) and French-Portuguese
(FR-PT). The approach enabled micro f-measure classification rates of
95.96%, 75.04% and 65.87% respectively, for the EN-PT, EN-FR and
FR-PT language pairs.

1 Introduction
Annotated bilingual lexica with their entries tagged for (in)correctness can be
mined to discover the nature of new term translation extractions and/or align-
ment errors. An automated classification system can then be trained when suffi-
cient amount of positive and negative evidence is available. Such a classifier can
facilitate and speed up the manual validation process of automatically extracted
term translations, and contribute to make the human validation effort easier
while augmenting the number of validated (rejected and accepted) bilingual
entries in a bilingual term translation lexicon. Bionic interaction between lin-
guists and highly precise machine classifiers in a continuous common effort,
without loosing knowledge contributes to improve alignment precision and, at
another level, translation quality. It is therefore important to have term trans-
lation extractions automatically classified prior to having them validated by
human specialists.

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 723–734, 2015.
DOI: 10.1007/978-3-319-23485-4 73
724 K.M. Kavitha et al.

In this paper, we assume sentence aligned parallel corpora for extracting term
translations, constructing translation tables or obtaining that parallel corpora
aligned at a subsentence grain [7,13]. In this setting, translation correspondences
are identified between term pairs by computing their occurrence frequencies or
similarities within the aligned sentences rather than in the entire corpus.
In the completely unsupervised models based on parallel corpora, all the
phrase pairs that are considerably consistent with the word alignment are
extracted and gathered into a phrase table along with their associated probabil-
ities [4,19]. Naturally, the resulting table extracted from the alignment, with no
human supervision, contains alignment errors. Moreover, many of the transla-
tions in the phrase table produced are spurious or will never be used in any
translation [10]. A recent study shows that nearly 85% of phrases gathered
in the phrase table can be reduced without any significant loss in translation
quality [21].
A different approach [1], that deviates from this tradition acknowledges the
need for blending the knowledge of language, linguistics and translation as
relevant for research in Machine Translation [27]. The approach being semi-
supervised and iterative takes privilege in informing the machine for not making
the same kind of errors in subsequent iterations of alignment and extraction. In
this partially supervised, iterative strategy, first a bilingual lexicon is used to
align parallel texts [7]. New1 term-pairs are then extracted from those aligned
texts [1]. The newly extracted candidates are manually verified and then added
to the existing bilingual lexicon with the entries manually tagged as accepted
(Acc) and rejected (Rej). Iteration over these three steps (parallel text alignment
using an updated and validated lexicon, extraction of new translation pairs and
their validation) results in an improved alignment precision, improved lexicon
quality, and in more accurate extraction of new term-pairs [7]. Human feedbacks
are particularly significant in this scenario as incorporating them prevents align-
ment and extraction errors from being fed back into subsequent alignment and
extraction iterations. The work described in this paper may be easily integrated
in such a procedure.
Several approaches for extracting phrase translations prevail [1,4,8,15]. How-
ever, it is important to have the extractions automatically classified prior to
having them validated by human specialists. We view classification as a pre-
validation phase that allows a first-order separation of correct entries from incor-
rect ones, so that the human validation task becomes lighter [11]. We extend our
previous work by using a larger set of extracted translation candidates for the
language pair EN-PT and by additionally adopting other extraction techniques
[4,15] and others not yet published. Experimental evaluations of the classifier
for additional language pairs EN-FR and FR-PT are also presented. Further,
the performance of the classifier with additional features is discussed.
In the Section 2, we provide a quick review of the related work. In the
Section 3, we present the classification approach for selecting translation candi-
dates and the features used. The data sets used, the classification results, and
1
Not seen in the bilingual lexicon that was used for alignment.
Classification and Selection of Translation Candidates 725

their subsequent analysis are presented in Section 4. We conclude with Section 5

by reﬂecting a bit on the future work.

2 Selection of Bilingual Pairs

The translation selection process might be aimed from the perspective of
improving the alignment precision and extraction quality or from the translation
perspective itself [11,24]. Nevertheless, different researchers have demonstrated
varied views regarding the influence of alignment on translation quality, pre-
dominantly from the perspective of entries in a phrase table. It is shown that
better alignment presents threefold benefit that includes the advantage of pro-
ducing a phrase table of manageable size with fewer phrase pairs, a reduced
decoding time in searching the phrase table for the most probable translation,
and a better quality of word or phrase level translation [21]. However, it is also
observed that the decreased alignment error rate does not necessarily imply a
significant increase in the translation quality [6,18,26]. We reiterate that we aim
at an improved alignment precision and extraction accuracy.
The task of selecting appropriate translation candidates may be cast as the
problem of filtering spurious bilingual pairs from the associated tables2 or as
we view it, the learning phase (training) of a classifier that is then used for
classifying the extracted bilingual pairs as ‘Accepted ’ or ‘Rejected ’ for further
manual verification. Various filtering approaches have been proposed and used
in selecting appropriate translations [1,5,10,17,22,23,28,29].

2.1 Support Vector Machines in Selecting Bilingual Pairs

Ever since its introduction, the Support Vector Machine (SVM) [25] has been
successfully adapted for various translation related machine learning tasks.
Related applications include learning translation model for extracting word
sequence correspondences (phrase translations) and automatic annotation of cog-
nate pairs [3,20].
The use of SVM based classifiers in selecting the translation candidates has
been proposed earlier [2,11,14]. Common criteria for selection include translation
coverage, source and target term co-occurrence and the orthographic similarity.
Further, One-Class SVM has been used with the Mapping Convergence (MC)
algorithm to differentiate the usable and useless phrase pairs based on the con-
fidence scores assigned by the classifier [24]. While the focus is on translation
quality and avoiding alignment errors, the classifier is trained with a corpus
that comprises of only useful instances. All phrase pairs involved in best phrasal
derivations3 by the Oracle decoder are labeled as positive phrase pairs. Unla-
belled examples of phrase pairs, however, are employed in addition to the positive
examples in a semi-supervised framework4 to improve the performance. We on
2
Phrase table or a bilingual lexicon.
3
One that maximises a combination of model score and translation quality metric.
4
MC algorithm.
726 K.M. Kavitha et al.

the other hand, view the selection of translation candidates as a supervised clas-
siﬁcation problem with labeled training examples for both the classes (positive
and negative instances).

3 Classification Model
In the current section, we discuss the use of SVM based classiﬁer in segregating
the extracted translation candidates as accepted, ‘Acc’ or rejected, ‘Rej ’. The
classiﬁcation task involves training and testing data representing bilingual data
instances. Each bilingual pair is a data instance represented as a feature vector
and a target value known as the class label5 . We train the learning function
with the scaled training data set, where each sample is represented as a feature
vector with the label +1 (‘Acc’) or -1 (‘Rej’). The estimated model is then
used to predict the class for each of the unknown data instance kept aside for
testing, represented similarly as any sample in the training set, but with the class
2
label 0. We use the Radial Basis Function (RBF) kernel: K(xi , xj ) = eγx−y ;
parameterised by (C, γ), where C > 0 is the penalty parameter of the error term
and γ > 0 is the kernel parameter.

3.1 Features
Adequate feature identiﬁcation for representing the data in hand is fundamental
to enable good learning. An overview of the features used in our classiﬁcation
model is discussed in this section. We use the features derived using the ortho-
graphic similarity measures (strsim) and the frequency measures (freq) discussed
in the section below as baseline (BLstrsim+f req ) for our experiments.

Orthographic Similarity. Two orthographic similarity measures based on

edit distance are used to quantify the similarity between the terms in a bilingual
pair: the Levenshtein Edit Distance [16] (Equation 1) and the Spelling Similarity
measure [8] (Equation 2).
EditDist(X, Y )
EditSim = 1.0 − (1)
M ax(|X|, |Y |)
where EditDist(X,Y) is the edit distance between the term X in first language
and the term Y in second language.
D(X, Y )
SpSim(X, Y ) = 1.0 − (2)
M ax(|X|, |Y |)
where the distance function D(X,Y) is the EditDist discounting characteristic
spelling differences that were learnt previously. In Equations 1 and 2, |X| repre-
sents the length of X and |Y | represents the length of Y.
5
Positive and negative examples are respectively labeled as +1 and -1. Data to be
classified is labeled 0.
Classification and Selection of Translation Candidates 727

We use the ‘accepted ’ entries in the training dataset with EditSim ≥ 0.65 as
examples to train SpSim and a dictionary containing the substitution patterns
is learnt. For instance, the substitution pattern extracted from EN-PT cognate
word pair ‘phase’ and ‘fase’ is (‘ˆph’, ‘ˆf’), after eliminating all matched (aligned)
characters, ‘a’ ⇔ ‘a’, ‘s’ ⇔ ‘s’ and ‘e’ ⇔ ‘e’. The caret (ˆ), at the beginning of
the aligned strings distinguishes that the pattern appears as a preﬁx.

Frequency of Occurrence. To represent the translational equivalence, based

on the frequencies of the terms in a bilingual pair, two measures are used: the
Dice association measure and the MinMaxRatio.
The Dice association measure for a pair of terms (X,Y) takes into account
the frequency of the term X in the ﬁrst language text, F(X); the frequency of
the term Y in the second language text, F(Y); and the co-occurrence frequency
of the terms in aligned segments of the parallel texts, F(X,Y) and is given by
the equation,
2 ∗ F (X, Y )
Dice(X, Y ) = (3)
F (X) + F (Y )
Another measure that eﬃciently substitutes the individual frequencies F(X)
and F(Y) is the minimum to maximum frequency ratio given by the equation,

M in(F (X), F (Y ))
M inM axRatio(X, Y ) = (4)
M ax(F (X), F (Y ))

Table 1. The Similarity and Bad Ending Scores

TermEN TermPT EdSim SpSim BESW BEPatR−A

general indiﬀerence indiferença geral 0.15 1.0 (0.00, 0.00) (0.00, 0.00)
oﬃcial comercial 0.56 0.66 (0.00, 0.00) (0.00, 0.00)
commitments compromissos de crédito 0.29 0.24 (0.00, 0.00) (0.00, 0.00)
limits of the limites de a 0.54 0.82 (1.00, 1.00) (1.00, 1.00)
impact on the impacto em a indústria 0.39 0.47 (1.00, 0.00) (1.00, 0.00)

Bad Ends. The bilingual pair ‘limits of the ⇔ limites de a’ instantiates a

particular type of inadequate translation wherein, the term (on both sides) ends
with a determiner following which a noun or a noun phrase is anticipated. It
is the absence of the noun or a noun phrase after the determiner that makes
the translation incomplete. By allowing this entry into the lexicon as a correct
translation, we cannot refrain other entries ending with ‘o’, ‘os’, and so forth
from accommodating the determiner’s position. We refer to such translations
with inadequate endings as having bad ends (BE). To keep a check over such
entries, we use a binary valued feature signifying whether a translation ends with
a determiner (1) or not (0). This introduces two features, each representing the
goodness of the translation endings on each side of the bilingual pair.
728 K.M. Kavitha et al.

We use two diﬀerent approaches to identify bad ends: one set of two features
based on endings that are stop words (BESW ) and the other set of two features
based on endings seen in the rejected, but not in the accepted training dataset
(BEP atR−A ). We consider only those endings that occur more than 5 times in
the rejected but not in the accepted training dataset. To avoid the content words
from being considered as bad ends, the term length is restricted to less than 5
characters.

Translation Miscoverage. A typical error observed in the extracted candi-

dates represents the lack of parallelism with respect to content words. An exam-
ple is the bilingual pair ‘commitments ⇔ compromissos de crédito’. For this pair
to be considered as correct, ‘crédito’ needs to be translated either as ‘lending’
or as ‘loan’ in EN. So the correct term translation would be ‘lending commit-
ments ⇔ compromissos de crédito’ or ‘loan commitments ⇔ compromissos de
crédito’. Likewise, the bilingual pair ‘union level ⇔ união’ is incorrect because
no translation exists for the English word ‘level ’ on the right hand side.
To assess the bilingual candidates for parallelism, we introduce two features.
We say that a translation candidate has translation gap with respect to the
ﬁrst language (gapL1 =1) when the term in the ﬁrst language does not have a
translation in the second language in whole or in parts and vice versa. Lack of
parallelism implies a gap in translation.

Stemmed Coverage. While looking for coverage, if the expressions on both

sides are not covered by the lexicon, we set the features gapL1 and gapL2 to
0.56 . To deal with such situations reflecting our lack of support, we extract two
features representing coverage using the stemmed training data. These features
work in the same way as discussed above except that they look only at the
word stems. To instantiate, while looking for coverage for the bilingual pair
‘bronchitically ⇔ bronquiticamente’, we use its stemmed version ‘bronchit ⇔
bronquit’, as the coverage is examined using the stemmed training and test
datasets. If the training dataset contains the term ‘bronchit’ in EN and ‘bronquit’
in PT, then (gapL1 , gapL2 ) would be (0.0, 0.0). This feature would find less gaps
in translations that are indeed parallel, and thus decrease the number of false
negatives (i.e., good translations that are classified as bad).
For identifying the translation gaps, we use the Aho-corasick set-matching
algorithm that checks if the terms in the key-word tree7 occur as (sub-
)expressions in the bilingual pair to be validated and if they occur are accepted
translations [9]. Similarly, to find the stemmed coverage, we use the stemmed
training and test datasets, obtained using the Snowball stemmer. Here, each
keyword tree is constructed using the stemmed part of the term. Translation

6
A neutral value reflecting our lack of support in deciding whether to accept or to
reject that pair.
7
Constructed separately using the first and second language terms in the accepted
bilingual training data.
Classification and Selection of Translation Candidates 729

gaps are identiﬁed using the Aho-corasick set-matching algorithm as mentioned

previously.

4 Experimental Setup and Evaluation

We use the SVM based tool, LIBSVM8 to learn the binary classifier, which tries
to find the hyperplane, that separates the training examples with the largest
margin. We scale the data in the range [0 1]. We perform a grid-search on
RBF kernel parameters, (C, γ) using cross-validation, so that the classifier can
accurately predict unknown data (testing data).

4.1 Data Sets

The translation candidates used in our experiments were acquired using vari-
ous extraction techniques applied on a (sub-)sentence aligned parallel corpora9
[1,4,8,15]. We experimented with 3 language pairs, EN-PT, EN-FR and FR-PT.
The suﬃx array based phrase translation extraction technique was employed
only for the language pair EN-PT and was excluded in extracting EN-FR and
FR-PT bilingual pairs [1]. The statistics of the training and test datasets (val-
idated bilingual lexicon) are as shown in Table 2. We set aside randomly 5%
of the validated lexicon as the test set. We repeat experiments for comparing
the experimental results related to the size of the training corpus by taking into
account randomly extracted 50%, 75%, 80%, 90% and the entire 95% of the
training set.

Table 2. Training and Testing Data Statistics

4.2 Results
In the current section, we discuss the classification results and the performance of
the classifier with respect to various features using the complete data set (95%)
introduced in the Section 4.1 for each of the language pairs EN-PT, EN-FR and
FR-PT.
8
A library for support vector machines - Software available at http://www.csie.ntu.
edu.tw/∼cjlin/libsvm
9
DGT-TM - https://open-data.europa.eu/en/data/dataset/dgt-translation-memory
Europarl - http://www.statmt.org/europarl/
OPUS (EUconst, EMEA) - http://opus.lingfil.uu.se/
730 K.M. Kavitha et al.

The Table 3 shows the precision (PAcc , PRej ), recall (RAcc , RRej ) and the
accuracy of the estimated classiﬁer in predicting each of the classes (Acc and
Rej ) while using diﬀerent features. Micro-average Recall (μR ), Micro-average
Precision (μP ), and Micro-average f-measure (μF )10 are used to assess the global
performance over both classes.
As might be seen from the Table 3, for EN-PT, substantial improvement is
achieved by using the feature that looks for translation coverage on both sides
of the bilingual pair. We observe an increase in μF of 22.85% over the base line
and 19.32% over a combination of the features representing baseline and bad
ends. Best μF is obtained when the stemmed11 lexicon is used to look for stem
coverage rather than the original lexicon. However, for EN-FR, training with
stemmed lexicon did not show a meaningful improvement.

Table 3. Classiﬁer Results using diﬀerent features for EN-PT, EN-FR and FR-PT

FR-PT results are worse than the results obtained for other language pairs:
the best μF and accuracy of 65.87% and 84.13% respectively are obtained when
we use a combination of features BL+BEP atR−A + CovStm + SpSim. How-
ever, the improvement is negligible (approximately ranging from 0.01% - 0.14%)
against the baseline (BLstrsim+f req ) in every terms (precision, recall and micro
f-measure) over both classes. This may be explained because the number of ‘sin-
gle word - single word’ pairs is comparatively larger than for the other language
pairs and the number of ‘multi-word - multi-word’ pairs is small (50,552 for the
accepted). Approximately 250K French multi-words are paired with single Por-
tuguese words and approximately 9K Portuguese multi-words are paired with
single French words. Moreover, approximately 130K are single word pairs for
this pair of languages which is quite diﬀerent from the EN-PT scenario.
10
Computed as discussed in [11].
11
Stemmed using the snowball stemmer.
Classiﬁcation and Selection of Translation Candidates 731

Also, patterns indicating bad ends that are stop words (BESW ) are substan-
tially few in number with respect to FR-PT12 and EN-FR13 lexicon corpus as
opposed to EN-PT14 . This is because extractions for these language pairs use
all of the techniques mentioned in section 4.1 except for the suﬃx array based
extraction technique [1]. Hence EN-FR and FR-PT were much cleaner.

4.3 Classifier Performance by Training Set Size

We analyzed the impact of varying the size of training datasets on the improve-
ment given by various features. Table 4 shows the results obtained using the fea-
tures BLstrsim+f req +BESW +Cov (EN-PT) and BLstrsim+f req +BEP atR−A +
Cov + SpSim (EN-FR and FR-PT) respectively.

Table 4. Classiﬁer Results for EN-PT, EN-FR and FR-PT by training set sizes

Looking at the classiﬁcation results for EN-PT using SVM and the training
set, we observe that the larger the training set larger the recall (RAcc is 92.6%
against 92.22%) for the ‘Accepted ’ class. Meanwhile, when we augment the train-
ing set we loose in precision from 99.45% to 98.38%. However, by augmenting
the training set we augment the precision (RAcc from 89.59% to 89.91%) for the
‘Rejected ’ class, whereas the recall drops (RRej from 99.24% to 97.74%). As the
training set is much larger than for other language pairs (95% of the corpus) we
12
5 in FR and 8 in PT; most frequent are ‘de’ in FR with 27 occurrences and ‘de’ in
PT with 43 occurrences.
13
43 in EN and 15 in FR; most frequent are ‘to’ in EN with 210 occurrences and ‘pas’
in FR with 237 occurrences.
14
112 in EN and 86 in PT; most frequent are ‘the’ in EN with 27,455 occurrences and
‘a’ in PT with 22,242 occurrences.
732 K.M. Kavitha et al.

do not necessarily gain much. Thus, precision and recall for EN-PT does evolve
in a way, such that, while one augments the other tends to decrease, partially
deviating from the trend observed in our earlier experiments [11]. It is possible
that some sort of overﬁtting occurs.
Unlike EN-PT, for the language pairs EN-FR and FR-PT, with larger train-
ing sets the performance of the trained classiﬁer improved. For the features listed
in Table 3, best results were obtained with 95% and 90% of the training set.

4.4 Classifier Trained on One Language Pair in Classifying Others

Motivated by the classiﬁer performance for language pairs EN-PT, we conducted

few more experiments: we trained the classiﬁer using the full set of features
on one language pair, and tested on the other. Training on EN-PT data and
testing on EN-FR and FR-PT resulted in μF of 55.64% and 54.99%, far below
the baseline for EN-FR (a drop by approximately 15% from 71.18%) and FR-
PT (a drop by approximately 11% against 65.81%) respectively. Training the
system with EN-FR and testing on FR-PT did even worse, leading to a micro
f-measure of 52.96%. Training on FR-PT data and testing on EN-FR, led to a
μF of 47.8%. This lets us to conclude that it does not make any sense to use a
classiﬁer trained on one language pair in classifying the data from other language
pairs. The related results are shown in Table 5.

Table 5. Performance of Classiﬁer trained on one language pair when tested on others.

5 Conclusion
We have discussed the classification approach as a means for selecting appro-
priate and adequate candidates for parallel corpora alignment. Experimental
results demonstrate the use of the classifiers on EN-PT, EN-FR and FR-PT lan-
guage pairs under small, medium and large data conditions. Several insights are
useful for distinguishing the adequate candidates from inadequate ones such as,
lack (presence) of parallelism, spurious terms at translation ends and the base
properties (similarity and occurrence frequency) of the translation pairs.
This work is motivated by the need for a system that evaluates the trans-
lation candidates automatically extracted prior to their submission for human
Classification and Selection of Translation Candidates 733

validation. Automatically extracted bilingual translations after human valida-

tion, are subsequently used for realigning parallel corpora and extracting new
translations forming an indefinite cycle of iterations. Automatic classification
prior to validation contributes to speed up the process of distinguishing the cor-
rect translations from naturally occurring alignment and extraction errors. The
positive side effect is an enriched annotated lexicon suitable for machine learning
systems such as bilingual morphology learning and translation suggestion tool,
apart from its primary use as an aid in alignment, extraction and translation.
In future, the use of bilingual stem and suffix correspondences in classifying
FR-PT and EN-FR word-to-word translations shall be studied [12]. Looking
for coverage in word pairs might be cast as a morphological coverage problem
that would enable us to classify word-to-word translations with high accuracy.
Further, some experiments should be done on EN-FR and FR-PT using the suffix
array based extractor [1]. Experiments must also be carried out to determine an
optimal interval for the number of positive and negative bilingual lexicon entries
in obtaining optimal classification results.

Acknowledgments. K. M. Kavitha and Luı́s Gomes acknowledge the Research

Fellowship by FCT/MCTES with Ref. nos., SFRH/BD/64371/2009 and SFRH/
BD/65059/2009, respectively, and the funded research project ISTRION (Ref.
PTDC/EIA-EIA/114521/2009) that provided other means for the research carried out.
The authors thank NOVA LINCS, FCT/UNL for providing partial ﬁnancial assistance
to participate in EPIA 2015, and ISTRION BOX - Translation & Revision, Lda., for
providing the data and valuable consultation.

9. Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and
computational biology. Cambridge Univ Pr., pp. 52–61 (1997)
10. Johnson, J.H., Martin, J., Foster, G., Kuhn, R.: Improving translation quality by
discarding most of the phrasetable. In: Proceedings of EMNLP (2007)
11. Kavitha, K.M., Gomes, L., Lopes, G.P.: Using SVMs for filtering translation
tables for parallel corpora alignment. In: 15th Portuguese Conference in Arificial
Intelligence, EPIA 2011, pp. 690–702, October 2011
12. Kavitha, K.M., Gomes, L., Lopes, J.G.P.: Identification of bilingual suffix classes
for classification and translation generation. In: Bazzan, A.L.C., Pichara, K. (eds.)
IBERAMIA 2014. LNCS, vol. 8864, pp. 154–166. Springer, Heidelberg (2014)
13. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N.,
Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: open source toolkit for
statistical machine translation. In: Proceedings of the 45th Annual Meeting of the
ACL on Interactive Poster and Demonstration Sessions, pp. 177–180. ACL (2007)
14. Kutsumi, T., Yoshimi, T., Kotani, K., Sata, I., Isahara, H.: Selection of entries
for a bilingual dictionary from aligned translation equivalents using support vector
machines. In: Proceedings of PACLING (2005)
15. Lardilleux, A., Lepage, Y.: Sampling-based multilingual alignment. In: Proceedings
of RANLP, pp. 214–218 (2009)
16. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and
reversals. Soviet Physics Doklady 10, 707–710 (1966)
17. Melamed, I.D.: Automatic evaluation and uniform filter cascades for inducing
n-best translation lexicons. In: Proceedings of the Third Workshop on Very Large
Corpora, pp. 184–198. Boston, MA (1995)
18. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment
models. Computational linguistics 29(1), 19–51 (2003)
19. Och, F.J., Ney, H.: The alignment template approach to statistical machine trans-
lation. Computational Linguistics 30(4), 417–449 (2004)
20. Sato, K., Saito, H.: Extracting word sequence correspondences based on support
vector machines. Journal of Natural Language Processing 10(4), 109–124 (2003)
21. Tian, L., Wong, D.F., Chao, L.S., Oliveira, F.: A relationship: Word alignment,
phrase table, and translation quality. The Scientific World Journal (2014)
22. Tiedemann, J.: Extraction of translation equivalents from parallel corpora. In:
Proceedings of the 11th NoDaLiDa, pp. 120–128 (1998)
23. Tomeh, N., Cancedda, N., Dymetman, M.: Complexity-based phrase-table filtering
for statistical machine translation (2009)
24. Tomeh, N., Turchi, M., Allauzen, A., Yvon, F.: How good are your phrases? Assess-
ing phrase quality with single class classification. In: IWSLT, pp. 261–268 (2011)
25. Vapnik, V.: The Nature of Statistical Learning Theory. Data Mining and Knowl-
edge Discovery 1–47 (2000)
26. Vilar, D., Popovic, M., Ney, H.: AER: Do we need to “improve” our alignments?
In: IWSLT, pp. 205–212 (2006)
27. Way, A., Hearne, M.: On the role of translations in state-of-the-art statistical
machine translation. Language and Linguistics Compass 5(5), 227–248 (2011)
28. Zens, R., Stanton, D., Xu, P.: A systematic comparison of phrase table pruning
techniques. In: Proceedings of the 2012 Joint Conference on EMNLP and CoNLL,
EMNLP-CoNLL 2012, pp. 972–983. ACL (2012)
29. Zhao, B., Vogel, S., Waibel, A.: Phrase pair rescoring with term weightings for
statistical machine translation (2004)
A SMS Information Extraction Architecture
to Face Emergency Situations

Douglas Monteiro(B) and Vera Lucia Strube de Lima

PUCRS, Faculdade de Informática, Programa de Pós-Graduação em Ciência da

Computação, Porto Alegre, Brazil
[email protected], [email protected]

Abstract. In disasters, a large amount of information is exchanged via

SMS messages. The content of these messages can be of high value and
strategic interest. SMS messages tend to be informal and to contain
abbreviations and misspellings, which are problems for current infor-
mation extraction tools. Here, we describe an architecture designed to
address the matter through four components: linguistic processing, tem-
poral processing, event processing, and information fusion. Thereafter,
we present a case study over a SMS corpus of messages sent to an elec-
tric utility company and a prototype built with Python and NLTK to
validate the architecture’s information extraction components, obtain-
ing Precision of 88%, Recall of 59% and F-measure (F1) of 71%. The
work also serves as a roadmap to the treatment of emergency SMS in
Portuguese.

Keywords: Information extraction · Short messages · Emergencies

1 Introduction
Currently, it is hard to imagine any line of business that does not use any textual
information. Aside from the available content on the Internet, a large amount
of information is generated and transmitted by computers and smartphones all
over the world. Gary Miner et al. estimate that 80% of the information available
in the world are in free text format and therefore not structured [7]. With such
large amount of potentially relevant data, an information extraction system can
structure and refine raw data in order to find and link relevant information amid
extraneous information [3,5]. This process is made possible by understanding
the information contained in texts and their context, but this complex task
face difficulties when processing informal languages, such as SMS messages or
tweets [1,6,10].
Messages using the Short Message Service (SMS), as well as tweets, are widely
used for numerous purposes, which makes them rich and useful data for informa-
tion extraction. The content of these messages can be of high value and strategic
interest, specially during emergencies1 . Under these circumstances, the amount
1
Also referred to as crisis events, disasters, mass emergencies and natural hazards by
other researchers in the area.

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 735–746, 2015.
DOI: 10.1007/978-3-319-23485-4 74
736 D. Monteiro and V.L.S. de Lima

of messages tends to increase considerably. However, users of these services write

messages freely, with abbreviations, slangs and misspellings. Short messages tend
to be brief, informal and to present similarities to speech.
In light of this, we propose an architecture to extract information from SMS
messages exchanged during emergency situations. This Information Extraction
architecture has as input a corpus of SMS messages and comprises four compo-
nents: a linguistic processing component, a temporal processing component, an
event processing component, and an information fusion component. The linguis-
tic processing component preprocesses messages, handling with abbreviations
and punctuation, sentence splitting, tokenization and stopword removal. The
temporal processing component uses a set of rules and a list of temporal key-
words to identify and classify temporal expressions. The event processing com-
ponent is responsible for identifying events according to a set of domain-deﬁned
categories and provides additional information regarding situation awareness. As
output, the architecture consolidates information so one can visualize strategic
information in order to help the decision-making process. We built a prototype
to validate the architecture and evaluated the information extraction taggers
resulting in Precision of 88%, Recall of 59% and F-Measure (F1) of 71%.
This paper is organized in six sections, the ﬁrst one being this introduction. In
Section 2, we review related work on information extraction from short messages
and its applications. In Section 3, we introduce the SMS information extraction
architecture of messages sent during emergencies. Section 4 details the case study
conducted to validate this architecture over a corpus built from SMS Messages
sent by costumers to an electric utility company during emergencies. In Section 5,
we discuss the evaluation performed over the prototype and the results obtained.
Finally, in Section 6, we comment on challenges faced, as well as on future work.

2 Related Work
Corvey et al. introduce a system that incorporates linguistic and behavioral
annotation on tweets during crisis events to capture situation awareness infor-
mation [2]. The system filters relevant and tactical information intending to
help the affected population. Corvey et al. collected data during five disaster
events and created datasets for manual annotation. The authors linguistically
annotated the corpus, looking for named entities of four types: person, name,
organization and facilities. A second level of behavioral annotation assesses how
community members tweet during crisis events. Tweets receive different and
non-exclusive qualitative tags according to the type of information provided.
Tweets containing situational awareness information are collected and tagged
with macro-level (environmental, social, physical or structural) and micro-level
(regarding damage, status, weather, etc.) information. The results indicated that,
under emergencies, “users communicate via Twitter in a very specific way to con-
vey information” [2]. Becoming aware of such behavior helped the framework’s
machine learning classifier to achieve accuracy of over 83% using POS tags and
bag of words. To classify location, they used Conditional Random Fields (CRFs)
A SMS Information Extraction Architecture to Face Emergency Situations 737

with lexical and syntactic information and POS as features. The annotated cor-
pus was divided into 60% for training and 40% for testing. They obtained an
accuracy of 69% for the complete match and 86% for the partial match and
recall of 63% for the complete match and 79% for the partial match.
Sridhar et al. present an application of statistical machine translation to
SMS messages [11]. This research details on the data collection process and
steps and resources used on a SMS message translation framework, which uses
finite state transducers to learn the mapping between short texts and canonical
form. The authors used a corpus of tweets as surrogate data and a bitext corpus
from 40,000 English and 10,000 Spanish SMS messages, collected from transcrip-
tions of speech-based messages sent through a smartphone application. Another
1,000 messages were collected from the Amazon Mechanical Turk2 . 10,000 tweets
were collected and normalized by removing stopwords, advertisements and web
addresses. The framework processes messages segmented into chunks using an
automatic scoring classifier. Abbreviations are expanded using expansion dictio-
naries and translated using a translation model based on sentences. The authors
built a static table to expand abbreviations found in SMS messages, where a
series of noisy texts have the corresponding canonical form mapped. For example,
“4ever” is linked to the canonical form “forever”. Next, the framework segments
phrases using an automatic punctuation classifier trained over punctuated SMS
messages. Finally, the Machine Translation component uses a hybrid translation
approach with phrase-based translation and sentences from the input corpus
represented as a finite-state transducer. The framework was evaluated over a set
of 456 messages collected in a real SMS interaction, obtaining a BLEU score of
31.25 for English-Spanish translations and 37.19 for Spanish-English.
Ritter et al. present TwiCAL, an open-domain event extraction and catego-
rization system for Twitter [9]. This research proposes a process for recognizing
temporal information, detecting events from a corpus of tweets and outputting
the extracted information in a calendar containing all significant events. The
authors focused on identifying events referring to unique dates. TwiCAL extracts
a 4-tuple representation of events, including a named entity, an event phrase, and
an event type. The authors trained a POS tagger and a NE tagger on in-domain
Twitter data. To build an event tagger, they trained sequence models with a
corpus of annotated tweets, and a rule-based system and POS to mark tempo-
ral expressions on a text. The open-domain event categorization uses variable
models to discover types that match the data and discards incoherent types.
The result is applied to the categorization of extracted events. The classifica-
tion model is evaluated according to the event types created from a manual
inspection of the corpus. The authors compared the results with a supervised
Maximum Entropy baseline, over a set of 500 annotated events using 10-fold
cross validation. Results achieved a 14% increase in maximum F1 score over the
supervised baseline. A demonstration of the system is available at the Status
Calendar webpage3 .

2
https://www.mturk.com/
3
http://statuscalendar.com
738 D. Monteiro and V.L.S. de Lima

Dai et al. present SoMEST (Social Media Event Sentiment Timeline), a

framework for competitive intelligence analysis for social media and the archi-
tecture of a NLP tool combining NER, event detection and sentiment analy-
sis [4]. This research presents an architecture to extract information from social
media texts and the visualization of these information. The authors use Event
Timeline Analysis (ETA) to detect events and display them in a timeline, high-
lighting trends or behaviors of competitors, consumers, partners and suppliers.
Dai et al. also use Sentiment Analysis to measure human opinions from texts
written in natural language, searching for the topic, its author, and if its a pos-
itive or negative opinion. The process comprises three phases: data collection,
extraction and classification, and synthesis. From social media texts generated
by customers, SoMEST focus on detecting events published from companies and
opinions shared by customers. The extraction and classification phase consists
of analyzing data and generating event extracts and opinion extracts, which are
synthesized into a social media profile, unifying events and opinions linked to
brands, services and products of a corporation into a period of time. The time-
line displays a chronological order of the corporations events, the competitors
events and changes in customers opinions.
Accordingly, as even IE systems built for different tasks may present similar-
ities, we could understand common points in different IE architectures, mainly
due to the nature of short text messages. From this learning, we could elaborate
an information extraction architecture for SMS messages according to core com-
ponents shared by most IE systems reviewed here, such as POS taggers, tokeniza-
tion, and normalization, while adding other components to treat domain-specific
characteristics.

3 SMS Information Extraction Architecture

SMS messages contain information that can be extracted, providing valuable
resources to support decision-making under emergency situations. With this in
mind, here, we detail the proposal for an architecture to extract information
from messages under these circumstances. As seen in Figure 1, the proposed
IE architecture takes as input a Corpus of SMS messages. Then, the linguistic
processing component preprocesses each message and prepares them for Informa-
tion Extraction. The temporal expression tagger component recognizes and tags
all temporal information within the messages, while the event tagger identiﬁes
and tags domain-related events accordingly. The information fusion component
displays the extracted information so as users of this system can interpret its
results. The output of the system is the extracted information organized in a
readable display, regarding the application. We detail each component in sec-
tions 3.1 to 3.4.

3.1 Linguistic Processing Component

The linguistic processing component comprises a preprocessing module, includ-
ing four steps: normalization, sentence splitting, tokenization, and stopword
A SMS Information Extraction Architecture to Face Emergency Situations 739

Fig. 1. IE Architecture overview

removal; and steps speciﬁcally designed for linguistic processing: POS tagging
and spell-checking. The normalization step is responsible for adjusting the text
while facing spelling variations, abbreviations, treating special characters and
other features of the short message language. Next, the sentence splitting step
divides each message into a list of sentences in order to process them individ-
ually. Every token is compared to a list of stopwords, which enables discarding
unnecessary items and speeding up the process of information extraction.
Accordingly, the tokens are tagged with a part-of-speech tagger, which is
trained with an annotated corpus of messages. The following step in the linguistic
processing component comprises a Spell Checker, which makes use of an external
dictionary to label untagged tokens and submits them to the POS tagger for
revision. This component outputs a set of preprocessed sentences that serve as
input for the temporal processing component.

3.2 Temporal Processing Component

The temporal processing component is responsible for applying regular expres-
sions in order to identify temporal expressions related to events in SMS messages.
Since temporal expressions are limited to a ﬁxed set of syntactic patterns, most
Temporal Expression Recognition systems make use of rule-based methods to
recognize syntactic chunks [8].
Initially, the temporal expression recognizer uses a rule-based approach to
identify variations of temporal references mentioned in the sentences. Despite the
rule set being able to identify simple temporal expressions present in messages,
rather complex expressions are still to be treated. For cases like “desde às 8h
de domingo” (since Sunday 8am), the temporal expression recognizer counts on
a list of temporal keywords, such as times of the day and days of the week to
determine the extent of temporal expressions.
740 D. Monteiro and V.L.S. de Lima

The temporal reference classiﬁer analyzes the expression according to its lex-
ical triggers and deﬁnes the type and value of the temporal expression. Finally,
the component tags the temporal expression according to the TIMEX2 tag sys-
tem provided by TimeML4 .

3.3 Event Processing Component

The event processing component starts from the event detection step, which is
responsible for finding relevant events in a sentence. This step counts on a set
of rules to identify the event.
Since the proposed IE architecture aims to extract information from messages
during emergencies situations, one can determine a certain set of categories of
events to detect during this step. For instance, as discussed in Section 2, Corvey
et al. proposed a situational awareness annotation level with the intention of
understanding crisis events as a whole [2]. To address that matter, the authors
define categories such as ‘Social Environment’, ‘Built Environment’ and ‘Physical
Environment’. Each category has subcategories with specific information, such
as ‘Crime’, ‘Damage’ or ‘Weather’.
Accordingly, in order to be executed, the event processing component requires
a previous definition of a set of domain-related observable categories. Conse-
quently, sentences that match any of these categories pass through a classifica-
tion step which relates the event to the categories. This step makes use of a list
of domain-related keywords. Then, the component can assign the correspondent
tags to the event mention.

3.4 Information Fusion Component

This component groups and organizes all tagged information in a human under-
standable manner. All relevant information are “fused” to show the results of
the IE application. As there may exist several ways to represent the results,
this decision is linked to the intended purpose of their application. For instance,
Ritter et al. [9], display their results in form of a calendar, where the respec-
tive events are shown. On the other hand, Dai et al. [4] present the Information
Extraction results in a timeline, showing the progress of event mentions and the
amount of associated opinions.
Given a set of tagged events and their corresponding temporal information,
the relation between and event and when it occurred must be clearly expressed.
An IE system built on this architecture must display the extracted information
in a meaningful and relevant way to provide situation awareness and to aid
decision-making during emergencies.

4 Case Study
In order to validate our proposal, we present a case study conducted over the IE
Architecture. In this chapter, we detail our choices and decisions made.
4
http://www.timeml.org/site/publications/timeMLdocs/timeml 1.2.1.html
A SMS Information Extraction Architecture to Face Emergency Situations 741

The input data for the process was organized from a set of 3,021 short mes-
sages received by an electric utility company. Clients notify the company when
there is a power outage, sending short messages with the word “LUZ” (light)
and the installation number (provided by the company). As observed in mes-
sages received, the companys clients use this communication channel to provide
situation awareness information, which is currently not yet processed but could
be of great help in services provision. It is important to extract information from
these messages to deliver relevant and strategic information about emergencies
so as to restore power to customers as quickly and safely as possible. The corpus
was built in a XML format, comprising the messages and their delivery date.
We split the corpus in a ‘learning corpus’, containing 2,014 messages; a ‘gold
standard corpus’, containing 100 messages, to perform an evaluation of the pro-
totypes taggers; a ‘test corpus’ to improve the prototype from the evaluation
results.
We prototyped the architecture using Python5 (version 2.7), mainly due to
its ease of use, productivity and features for handling strings, lists, tuples and
dictionaries, along with the Natural Language Toolkit (NLTK6 ) (version 3.0).
NLTK provides some interesting features for Portuguese, like tokenizers, stem-
mers, Part-of-Speech taggers, and annotated corpora for training purposes. We
highlight the main aspects of the components implementation as follows.

4.1 Linguistic Processing

The component standardizes the text input. SMS messages contain many mis-
spellings, as texters tend not to follow spelling and grammar rules, which led us
to address this matter beforehand, covering the most common cases found on
the learning corpus. Some variations are caused by diﬀerent levels of literacy,
besides idiosyncratic SMS language characteristics. In this step, we lowercased
messages, removed commas, hyphens and special characters, such as ‘#’ and ‘@’,
and unnecessary full stops, such as in zip codes or abbreviations. Each sentence
undergoes a tokenization step, using whitespaces to mark word boundaries. The
prototype uses wordpunct tokenize7 to split strings into lists of tokens. We built
an external list containing 45 stopwords found on the learning corpus, as well as
common word shortenings and phonetic abbreviations, such as “vc” and “q”.
The component uses MacMorpho8 , a tagged training corpus with news in
standard Brazilian Portuguese. However, the lack of a tagged corpus of texts in
SMS language hampers the POS tagging step. Even though Normalization han-
dles some misspellings, many words not written in standard Portuguese remain
untagged. To address this matter, we used PyEnchant9 , a spell checking library
for Python, as a step to mitigate the spelling variation problem. We added the

5
https://www.python.org/
6
http://www.nltk.org
7
http://www.nltk.org/api/nltk.tokenize.html
8
http://www.nilc.icmc.usp.br/macmorpho/
9
https://pythonhosted.org/pyenchant
742 D. Monteiro and V.L.S. de Lima

Open Oﬃce10 Brazilian Portuguese dictionary extension. The prototype only

checks the spelling of untagged tokens. We also built an external list of domain-
related words the POS tagger cannot resolve, like “transformador” (transformer)
or “estouro”(burst), along with their corresponding POS tag, so the prototype
can use this list to review untagged tokens.

4.2 Temporal Processing

Once tagged, messages proceed to temporal processing, which comprises the fol-
lowing steps: a Temporal Expression Recognizer, a Temporal Reference Classifier
and a Temporal Tagger. There are two external resources associated with the
component: a set of regular expressions and a list of lexical triggers.
From a linguistically processed message containing a time anchor, the Tem-
poral Expression Recognizer must be able to identify and extract existing tempo-
ral information. For instance, the duration of an event, such as a power outage,
may be of great importance to indicate the severity of the problem. A client may
inform the existence of a natural disaster that causes a blackout that lasts for
hours and affects an entire region.
We found many variations for the same temporal information like ‘10:30’,
‘10h30’, ‘10h30min’, and so on. In light of this, we resorted to a rule-based
matcher, which relies on a set of regular expressions. Moreover, we added rules
to identify days of the week and times of the day.
Since incoming messages express ongoing situations, the existing temporal
expressions refer to past or present events. In order to identify more complex
expressions like “desde ontem às 14h” (since yesterday 2pm), we built a list
of lexical triggers, containing the most common temporal keywords found in
the learning corpus and their corresponding TIMEX2 value, such as “ontem”
(yesterday) with value ‘-1D’ (minus one day) and “noite” (night) with a time
modifier ‘NI’. Tokens that correspond to words in the list are considered part of
a temporal expression.
Each tagged temporal expression can contain a value and a modifier, accord-
ing to its type, which, along with a time anchor (the delivery date), allows us to
determine the beginning or duration of an event. For instance, the lexical trig-
ger “amanhã” (tomorrow) has a type DATE and value ‘1D’, indicating one day
must be added to the time anchor to determine the TIMEX value. The Tempo-
ral Expression Classifier verifies the list of lexical triggers to determine modifiers
and values of each expression, considering first lexical triggers expressing largest
periods of time. Duration expressions have precedence over dates and dates have
precedence over times. Finally, the Tagger groups values and modifiers in a single
form and assigns the corresponding time tag to the TEs.

4.3 Event Processing Component

To detect events, we focused on a verb-triggered rule-based approach to identify
speciﬁc features that may be useful in the context, considering the urgency of
10
http://www.openoﬃce.org/
A SMS Information Extraction Architecture to Face Emergency Situations 743

the messages. From understanding the verb, its meaning and accessories, one
can determine the structure of the sentence of which it is part. The prototype
considers sentences in the following structure: Noun Phrase + Verb + Object.
Both the Noun phrase and the Object can play semantic roles of agent and
patient.
The prototype iterates through sentences looking for POS tags assigned dur-
ing the linguistic processing in order to find verbs. After, the Event Detection
step marks the boundaries of the event mention by greedy searching for nouns,
prepositions, noun compounds, adjectives or pronouns on the surroundings of
the verb. Verbs interspersed with other POS tags mark different event mentions,
while adjectives and nouns (or noun compounds) mark the boundaries of event
mentions.
Next, the prototype classifies the detected events. Through an extensive study
of the learning corpus, we observed how clients communicate during emergen-
cies as well as what they notify. Then, we listed the most relevant events and
related words we found whilst defining the annotation standard. We defined
three non-mutually exclusive categories of events, according to the observed
events and thirteen notification types to provide situation awareness informa-
tion. “Instalação” refers to messages containing information regarding the con-
sumer unit (electrical installation), such as power outages, voltage drops and
instabilities. “Rede” groups information about the electrical grid status and its
components, such as short circuits or fallen utility poles. “Ambiente” comprises
information regarding the environment that might affect the electrical grid, like
fallen trees, storms and lightnings.
To properly classify the event, we split sentences and analyze separately noun
phrases, verbs and objects, searching for domain-related words. The component
depends on a list built from 83 words related to the notification types, collected
from sources such as Dicionário Criativo11 and Wordnet12 . For instance, the sen-
tence “caiu uma arvore na rede” (a tree fell over the power grid), is divided in
two phrases: “caiu” (verb), and “uma arvore em rede” (object). Once verified,
the Event Classifier considers that while the verb does not determine a notifica-
tion type, the object notifies a notification type “Queda de Árvore”, due to the
presence of the words “arvore” and “rede”. Finally, the Tagger groups all the
information in a set of tags, according to its categories and notification types.

4.4 Information Fusion

Once the information are already tagged, one can use diﬀerent approaches to
visualize and understand such data. Being aware of other possibilities that could
be explored in a more extended study, we generated charts from tagged mes-
sages and their corresponding notiﬁcation types, allowing the visualization of
the application of the proposed model.

11
http://dicionariocriativo.com.br/
12
http://wordnetweb.princeton.edu/
744 D. Monteiro and V.L.S. de Lima

We exported the output of the prototype to a spreadsheet containing mes-

sages and their corresponding notiﬁcation types. Structured information can be
more easily manipulated, in order to speed up the recognition and attendance
of occurrences with more situation awareness.

5 Evaluation and Discussion

In order to evaluate the prototype’s taggers, we elaborated a three-step plan
comprising: confirming the model of categories and notification types; providing
a gold standard - a manually annotated corpus considered as “definitive answer”;
and comparing the prototypes results to the gold standard. Furthermore, we
assessed the results of the Information Fusion component over the gold standard
and the test corpus.
As some answers shall be provided by domain experts, we invited three judges
with domain knowledge. They opined on the model of categories of events and
temporal expressions and evaluated a set of 100 SMS according to this model.
From their answers, we have composed the gold standard and compared it to
the output of the IE prototype and obtained precision of 88%, recall of 59% and
F-measure (F1) of 71%.
The prototype correctly identified relevant events, with 125 true positives
over 16 false positives and 84 false negatives. The results indicate that the set
of defined rules is accurate while detecting the events and temporal information
mentioned in the gold standard corpus. However, a low recall score alerts us
that there are other events, domain-related words and temporal information
still uncovered by the model created.
Table 1 shows the hit percentage of the prototype when compared to the
Gold Standard, as well as the amount of notifications found in the corpus. The
prototype could not resolve mentions to “Wind” and “Rain” events. Analyzing
the messages in the gold standard, we can see that some event mentions were not
detected by the prototype as they omit verbs, like in “toda nossa comunidade
sem luz devido muita chuva ventos fortes” (our entire community without elec-
tricity because of a lot of rain and strong winds). There were also problems in

Table 1. Hit Percentage by information type

Extracted Information Prototype Gold Standard Percentage

Temporal Information 24 27 89%
Power Outage 86 88 98%
Downed Power Line 7 32 22%
Short Circuit 3 5 60%
Broken Power Pole 4 10 40%
Power Line Fire 5 7 71%
Fallen Tree 5 12 42%
Wind 0 4 0%
Rain 0 2 0%
A SMS Information Extraction Architecture to Face Emergency Situations 745

diﬀerentiating “está” (is) and its popular contraction “tá” (absent in the training
corpus) from “esta” (this) which compromised the detection of some events.
The temporal processing component behaved well over the gold standard. In
fact, in one of the evaluated messages, the component tagged “15 minutos” (15
minutes) as a temporal expression, while the judges did not recognize it, showing
that this understanding is not clear even for humans.

6 Considerations
During emergencies, any detail can help services provision. For that matter, SMS
messages can be an important source of valuable information, as it is one of the
most widely used means of communication. However, SMS messages are usually
written in a proper language containing abbreviations, slangs and misspellings,
which hamper their processing in their context of operation.
As observed in Section 2, even IE systems built for different tasks may present
similarities. From this learning, we could propose an architecture for information
extraction from SMS messages according to core components shared by most IE
systems reviewed, while adding other components to treat domain-specific char-
acteristics. The architecture comprises a linguistic processing component, which
prepares messages for information extraction; a temporal processing component,
which identifies and tags existing temporal information within messages; an event
processing component, which detects and classifies events according to a list of
domain-related categories; and an information fusion component that interprets
information and displays them in a human-understandable manner.
To validate the architecture, we conducted a case study over a corpus of SMS
messages sent to an electric utility company. We studied how users communicate
during emergencies and defined categories of information that could aid services
provision. We validated the architecture against a gold standard corpus built
with the assistance of judges with domain knowledge. Among the tagging stages,
we established a degree of severity (varying from 1 to 5) to distinguish the
categories of events. We assessed the range of scores given by the judges, resulting
in a kappa coefficient of 0.0013, i.e., a poor level agreement, which led us to use
fewer severity degrees.
As, to the best of our knowledge, there is no architecture to address this mat-
ter, especially for the Portuguese language, we expect this proposal to bring focus
to this area and encourage other researchers to contribute to its improvement.
IE systems built on this architecture may attend other electric utility companies
or address other types of disasters or emergencies and other short messages, such
as tweets. As improvement opportunities unveiled, we could mention resorting to
a more appropriate tagged corpus trained over the SMS language. Such resource
would decrease the number of untagged tokens, which in turn would increase the
accuracy of the event detection step. However, to the present, we do not know
of the existence of such resource for the Portuguese language.
For future work, we intend to continue our research, revising the Case Study
results, and refine the prototype according to other approaches, such as machine
746 D. Monteiro and V.L.S. de Lima

learning to automatize the categories deﬁnition step. Envisaged features com-

prise adding named entity recognition towards gathering geographic informa-
tion, which was not considered during this stage of our work, but can be of great
importance for services provision. Moreover, we intend to use temporal expres-
sions to determine the start and duration of detected events. We will also assess
the information fusion component results as well as alternatives to enhance it.

References
1. Bernicot, J., Volckaert-Legrier, O., Goumi, A., Bert-Erboul, A.: Forms and func-
tions of SMS messages: A study of variations in a corpus written by adolescents.
Journal of Pragmatics 4412, 1701–1715 (2012)
2. Corvey, W.J., Verma, S., Vieweg, S., Palmer, M., Martin, J.H.: Foundations of a
multilayer annotation framework for twitter communications during crisis events.
In: 8th International Conference on Language Resources and Evaluation Confer-
ence (LREC), p. 5 (2012)
3. Cowie, J., Lehnert, W.: Information extraction. Communications of the ACM 391,
80–91 (1996)
4. Dai, Y., Kakkonen, T., Sutinen, E.: SoMEST: a model for detecting competi-
tive intelligence from social media. In: Proceedings of the 15th International Aca-
demic MindTrek Conference: Envisioning Future Media Environments, pp. 241–248
(2011)
5. Jurafsky, D., Martin, J. H.: Speech and language processing, 2nd edn. Prentice
Hall (2008)
6. Melero, M., Costa-Juss, M.R., Domingo, J., Marquina, M., Quixal, M.: Holaaa!!
writin like u talk is kewl but kinda hard 4 NLP. In: 8th International Conference
on Language Resources and Evaluation Conference (LREC), pp. 3794–3800 (2012)
7. Miner, G., Elder, J.I., Hill, T., Nisbet, R., Delen, D.: Practical Text Mining and
Statistical Analysis for Non-structured Text Data Applications. Elsevier, Burling-
ton (2012)
8. Pustejovsky, J., Stubbs, A.: Natural Language Annotation for Machine Learning.
OReilly Media, Inc. (2012)
9. Ritter, A., Etzioni, O., Clark, S., et al.: Open domain event extraction from twitter.
In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, pp. 1104–1112 (2012)
10. Seon, C.-N., Yoo, J., Kim, H., Kim, J.-H., Seo, J.: Lightweight named entity extrac-
tion for korean short message service text. KSII Transactions on Internet and
Information Systems (TIIS) 5–3, 560–574 (2011)
11. Sridhar, V.K.R., Chen, J., Bangalore, S., Shacham, R.: A Framework for trans-
lating SMS messages. In: Proceedings of COLING 2014, the 25th International
Conference on Computational Linguistics: Technical Papers, pp. 974–983 (2014)
Cross-Lingual Word Sense Clustering for Sense
Disambiguation

João Casteleiro, Joaquim Ferreira da Silva(B) , and Gabriel Pereira Lopes

NOVA LINCS FCT/UNL, 2829-516 Caparica, Portugal

[email protected], {jfs,gpl}@fct.unl.pt

Abstract. Translation is one of the areas where word disambiguation

must be solved in order to find out adequate translations for such words
in the contexts where they occur. In this paper, a Word Sense Disam-
biguation (WSD) approach using Word Sense Clustering within a cross-
lingual strategy is proposed. Available sentence-aligned parallel corpora
are used as a reliable knowledge source. English is taken as the source
language, and Portuguese, French or Spanish as the targets. Clusters
are built based on the correlation between senses, which is measured by
a language-independent algorithm that uses as features the words near
the ambiguous word and its translation in the parallel sentences, together
with their relative positions. Clustering quality reached 81% (V-measure)
and 92% (F-measure) in average for the three language pairs. Learned
clusters are then used to train a support vector machine, whose clas-
sification results are used for sense disambiguation. Classification tests
showed an average (for the three languages) F-measure of 81%.

Keywords: Word Sense Disambiguation · Clustering · Parallel cor-

pora · V-measure · F-measure · Support vector machine

1 Introduction
Word sense ambiguity is present in many words no matter the language, and
translation is one of the areas where this problem is important to be solved. So,
in order to select the correct translation, it is necessary to find the right meaning,
that is, the right sense, for each ambiguous word. Although multi-word terms
tend to be semantically more accurate than single words, multiword terms may
also have some ambiguity, depending on the context.
Thus, a system for automatic translation, for example, from English to Por-
tuguese, should know how to translate the word bank as banco (an institution for
receiving, lending, exchanging and safeguarding money), or as margem (the land
alongside or sloping down to a river or lake). As the efficiency and effectiveness
of a translation system depends on the meaning of the text being processed,
disambiguation will always be beneficial and necessary.
Approaches to tackle the issue of WSD may be divided in two main types:
the supervised and the unsupervised learning. The former requires semantically

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 747–758, 2015.
DOI: 10.1007/978-3-319-23485-4 75
748 J. Casteleiro et al.

tagged training data. Although supervised approaches can provide very good
results, the need for tagging may become a limitation: semantic tagging depends
on more or less complex approaches and it may occur that tagging is not pos-
sible for some languages; and POS-tagging, if used, needs good quality tag-
gers that may not exist for some languages. On the other hand, by working
with untagged information, unsupervised approaches are more easily language-
independent. However, the lack of tags may be a limitation to reach the same
level of results as those achieved by supervised approaches.
One way to work around the limitations of both supervised and unsupervised
approaches, keeping their advantages, is the use of a hybrid solution. We propose
the use of a reliable and valid knowledge source, automatically extracted from
sentence-aligned untagged bilingual parallel corpora.
In this paper we present a cross-lingual approach for Word Sense Clustering
to assist automatic and human translators on translation processes when faced
with expressions which are more complex, more ambiguous and less frequent
than general. The underlying idea is that the clustering of word senses provides
a useful way to discover semantically related senses, provided that each clus-
ter contains strongly correlated word senses. To achieve our target we propose
a semi-supervised strategy to classify words according to their most probable
senses. This classiﬁcation uses a SVM classiﬁer which is trained by the informa-
tion obtained in the process of the sense clustering. Clusters of senses are built
according to the correlation between word senses taking into account the combi-
nations of their neighbor words and the relative position of those neighbor terms;
those combinations are taken as features, which are automatically extracted [1]
from a sentence-aligned parallel corpora.

2 Related Work

Several studies that combine clustering processes with word senses and parallel
corpora has been assessed by several authors in the past years. In [3], the authors
present a clustering algorithm for cross-lingual sense induction that generates
bilingual semantic resources from parallel corpora. These resources are composed
by the senses of words of one language that are described by clusters of their
semantically similar translations in another language. The authors proved that
the integration of sense-clusters resources leads to important improvements in
the translation process. In [4], the authors proposed an unsupervised method for
clustering translations of words through point-wise mutual information, based
on a monolingual and a parallel corpora. Comparing the induced clusters to ref-
erence clusters generated from WordNet, they demonstrated that their method
identiﬁes sense-based translation clusters from both monolingual and parallel
corpora.
Brown et al. described in [5] a statistical technique for assigning senses to
words based on the context in which they appear. By incorporating this method
in a machine translation system, a signiﬁcant reduction of the translation error
rate was achieved. In [7], Diab addresses the problem of WSD from a multilingual
Cross-Lingual Word Sense Clustering for Sense Disambiguation 749

perspective, expanding the notion of context to encompass multilingual evidence.

Given a parallel corpus and a sense inventory for one of the languages in the
corpus, an approach to resolve word sense ambiguity in natural language was
proposed. In [15], the authors present a method that exploits word clustering
based on automatic extraction of translation equivalents, supported by available
aligned wordnets. Apidianaki in [2] described a system for SemEval-2013 Cross-
lingual WSD task, where word senses are represented by means of translation
clusters in a cross-lingual strategy. The WSD method clusters the translations
of target words in a parallel corpus using source language context vectors. These
vectors are exploited in order to select the most appropriate translations for new
instances of the target words in context.
With the goal of increasing the accuracy of WSD systems when faced with
expressions that are more complex, ambiguous and less frequent than general,
we propose the extension and changes of several works in the field [2–5,7,15]. It
has differences from those mentioned above, since specific and validated bilingual
lexicons, automatically extracted, are used to provide neighbor contexts enabling
the calculation of the statistical correlation between senses, which is the basis
to build sense clusters, therefore being a language independent approach.

3 System Description
3.1 Dataset
The experiments performed to support the research presented in this article
comply with the datasets presented in Table 1.

Table 1. Datasets of ambiguous words and possible senses for English-Portuguese

(EN-PT), English-French (EN-FR) and English-Spanish (EN-SP)

Dataset Source-Words Target-Words

(Ambiguous) (Senses)
EN-PT 15 94
EN-FR 15 70
EN-SP 15 83

Thus, by Table 1 we see that in the experiments performed to support this

research, we used, for example, 15 English ambiguous words that could be trans-
lated in 94 diﬀerent Portuguese words, each one having a meaning, that is a
sense.

3.2 The Gathering of Word Senses

The gathering of word senses consists of extracting meanings of words for a given
ambiguous word. For this, we use the ISTRION (EN-PT; EN-FR; EN-SP) lex-
icon, which is a bilingual and strongly validated data source, resultant from
750 J. Casteleiro et al.

the project ISTRION 1 . This lexicon contains 810.000 validated entries for the
English-Portuguese language pair, 380.000 for the English-French and 290.000 for
the English-Spanish one. This knowledge was automatically extracted and man-
ually validated. For each ambiguous word in the source language (eg. English) we
get all different senses existing in the target language (eg. Portuguese, French,
Spanish) by consulting the bilingual lexica database; see tables 2 and 3 con-
taining an example for Portuguese and French respectively. These tables show a
set of different senses for the same English word “sentence”, each one expressed
in a word in the target language. According to the content of each table, the
reader may predict that the senses could be divided in two semantically different
groups (clusters): those signed with a “*”, which are related to textual units;
and those with a “+”, related to Court resolutions. Thus, one of the purposes of
this approach is to build clusters of senses according to the semantic closeness
among word senses.

Table 2. Example of the diﬀerent senses for the ambiguous word “sentence” concerning
the translation to Portuguese. Senses signed with a “*” are textual units of one or more
words. Those signed with a “+” are related to Court resolutions

Ambiguous Word Sense

(English) (Portuguese)
sentence oração (clause) *
sentence expressão (expression) *
sentence frase (phrase) *
sentence sentença (sentence) +
sentence pena (penalty) +
sentence condenação (condemnation) +

Table 3. Example of the diﬀerent senses for the ambiguous word “sentence” concerning
the translation to French. Senses signed with a “*” are textual units of one or more
words. Those signed with a “+” are related to Court resolutions

Ambiguous Word Sense

(English) (French)
sentence condamnation (condemnation) +
sentence jugement (sentence) +
sentence phrase (phrase) *
sentence peine (penalty) +
sentence condamner (condemn) +

3.3 Feature Extraction

According to the authors in [10], local context features with bilingual words evi-
dence starts from the assumption that incorporating knowledge from more than
1
http://citi.di.fct.unl.pt/project/project.php?id=97
Cross-Lingual Word Sense Clustering for Sense Disambiguation 751

one language into the feature vector will be more informative than only using
monolingual features . By using a sentence-aligned parallel corpora, the pro-
posal we present in this paper conﬁrms this principle. Thus, we use a sentence-
aligned parallel corpora (composed by Europarl2 and DGT3 ), from which we
extract features from the neighbor context of the target pair (Ambiguous Word)
\t (Sense N) that fall within a window of three words to the left and three
words to the right of each word of the pair, discarding stop-words. Each tar-
get pair has a set of features where each one is a combination of one of the
words in the window and its relative position. For a better understanding, let
us take the example of the target pair “sentence” – “frase” and one of the
sentence-pairs containing it, retrieved from the bilingual parallel corpora (EN \t
PT): Besides being syntactically well-formed, the sentence is correctly translated
\t Para além de estar sintaticamente bem formada, a frase está corretamente
traduzida. Thus, the context words of the target pair “sentence” – “frase” in this
sentence-pair are “Besides”, “syntactically”, “well-formed”, “correctly”, “trans-
lated”, “sintaticamente”, “bem”, “formada”, “corretamente” and “traduzida”,
taking into account the limits of the window (three words to the left and three
words to the right of each word of the pair). Following this, the correspond-
ing features include a tag indicating the language and the relative position of
the context word to the corresponding word of the target pair: “enL3 Besides”,
“enL2 syntactically”, “enL1 well-formed”, “enR1 correctly”, “enR2 translated”,
“ptL3 sintaticamente”, “ptL2 bem”, “ptL1 formada”, “ptR1 corretamente” and
“ptR2 traduzida” —Recall that stop-words are discarded. “L” and “R” stands
for Left and Right respectively.
However, for each target pair, there are usually several sentence-pairs
retrieved from the bilingual parallel corpora (EN \t PT), containing that target
pair. This means that probably several contexts will neighbor the same target
pair, generating several features. In our approach, everytime a feature occurs in
a sentence-pair, its frequency is incremented for the corresponding target pair.
In other words, taking the feature “enL2 syntactically”, it may have for exam-
ple: 3 occurrences for target pair “sentence” – “frase” (meaning that the word
“syntactically” occurs 2 positions left to the target word “sentence”, in 3 of
the sentence-pairs containing this target pair); 2 occurrences for “sentence” –
“oração”; 0 occurrences for “sentence” – “pena”, etc..
We consider that, there is a tendency such that, the closer the relative posi-
tion of the context word to the target word, the stronger the semantic relation
between both words. So, in this approach a diﬀerent importance is assigned
to each feature,√according to their relative positions. Thus, we use the crite-
rion we called p f , that is: for features whose relative position is p, the root
of degree p is applied to frequency f , which is the number of times the feature
occurs in the set of sentence-pairs containing the target pair. This criterion was
chosen empirically as it showed good results after some experiments. Table 4
shows part of the feature extraction concerning the ambiguous word “sentence”.

2
http://www.statmt.org/europarl/
3
http://ipsc.jrc.ec.europa.eu/?id=197
752 J. Casteleiro et al.

Table 4. Feature extraction for the target pairs concerning the ambiguous word “sen-
tence” (only a small part is shown)

Sense Feature Frequency Final Assigned Value

oração ... ... ...
... ... ... ...
expressão ... ... ...
... ... ... .√. .
3
frase enL3 Besides 1 √1
frase enL2 syntactically 3 3
frase enL1 well-formed 2 2
frase enR1 correctly 3 √3
frase enR2 translated 3 √ 3
3
frase ptL3 sintaticamente 2 √3
frase ptL2 bem 4 4
frase ptL1 formada 3 3
frase ptR1 corretamente 2 √2
frase ptR2 traduzida 3 3
frase ... ... ...
sentença ... ... ...
... ... ... ...
pena ... ... ...
... ... ... ...
condenação ... ... ...
... ... ... ...

For reasons of space, only values for some features of one of the target pairs (“sen-
tence” – “frase”) are shown.√Values in column Final Assigned Value contain the
result of the application of p f criterion on the values of column Frequency.
The information contained in all columns of Table 4, except Frequency, form
a matrix which is the base for obtaining the Word Sense Clustering concerning
the ambiguous word “sentence”. At the end of the feature extraction task we
obtained 15 matrices per language pair, corresponding to each of the 15 ambigu-
ous words used, as referred in Table 1.

3.4 Feature Reduction by Sense Correlation

As we have seen in the previous subsection, the number of features associated

to each ambiguous word tend to be huge when compared to the number of
senses which may correspond to just a few words as in the case of tables 2
and 3. So, considering the purpose of clustering the senses, we transform each
of the previously obtained matrices into a new and more compact matrix of
correlations (similarities) between each pair of senses. This is a N×N symmetric
matrix where N is the number of senses of the ambiguous word. Each line of
this matrix corresponds to one of the senses now characterized by the correlation
Cross-Lingual Word Sense Clustering for Sense Disambiguation 753

between that sense and each of the N senses. Each correlation is given by (1),
which is based on the Pearson’s correlation coeﬃcient.

Cov(Si , Sj )
Corr(Si , Sj ) = (1)
Cov(Si , Si ) × Cov(Sj , Sj )

1

Cov(Si , Sj ) = f (Si , F ) − f (Si , .) × f (Sj , F ) − f (Sj , .) (2)
F − 1
F ∈F

where F is an element of the feature set F and f (Si , F ) stands for the Final
Assigned Value (a column of Table 4) of feature F for sense Si ; f (Si , .) gives the
average Final Assigned Value of the features for sense Si , which is given by (3).

1
f (Si , .) = f (Si , F ) (3)
F
F ∈F

Correlation given by (1) measures how semantically close are senses Si and
Sj . However, a qualitative explanation for this can be given through (2), rather
than by (1). Thus, (2) shows that, for each feature F , two deviations are taken:
one is given by the Final Assigned Value of feature F for sense Si , subtracted
from the average Final Assigned Value for Si , that is, f (Si , F ) − f (Si , .); the
other one is obtained by the Final Assigned Value of the same feature F for
sense Sj , subtracted from the average Final Assigned Value for Sj , that is,
f (Sj , F ) − f (Sj , .). If both deviations have the same algebraic sign (+/−), the
product will be positive, which means that both senses present similar deviations
concerning feature F . And, if positive products happen for most of features
resulting in high values, then there will be a strong positive covariance value
(Cov(Si , Sj )), and therefore, a high correlation (Corr(Si , Sj )) — notice that (1)
has the eﬀect of just standardizing Cov(Si , Sj ) values, ranging from -1 to +1.
Still analyzing (2), if the partial sum of the positive products has a similar
value to the partial sum of the negative ones (when deviations are contrary), then
the correlation is close to 0, which means that the semantic closeness between
both senses is very weak (or even null). In other words, Corr(Si , Sj ) gives close
to +1 values, meaning a high correlation, when both senses tend to occur in the
same contexts. If one of the senses occur in contexts where the other sense never
occurs, and vice versa, then there is a negative correlation between them.

3.5 Finding Clusters

Our goal is to join similar senses of the same ambiguous word in the same
cluster, based on the correlation matrix obtained as explained in Subsec. 3.4. To
754 J. Casteleiro et al.

create clusters we used the WEKA tool [8] with X-means [12] algorithm. With X-
means the user does not need to supply the number of clusters, contrary to other
clustering algorithms such as k-means or k-medoids. The algorithm returns the
best solution for the correlation matrix presented as input. As a matter of fact,
for the example of the ambiguous word “sentence”, regarding the Portuguese as
the target language, it assigned the words “oração”, “expressão”, and “frase” to a
cluster, while “pena” and “condenação” were assigned to another one. In other
words, it returned the results expected that were presented in Table 2. With
respect to the possible translations of the same ambiguous word “sentence”
to French, the clusters were correctly formed too, according to the expected
distribution shown in Table 3.
The results of the clustering phase for all ambiguous words gave rise to the
evaluation presented in tables 5, 6 and 7.

4 Experiments and Results

4.1 Evaluation of the Clusters
In order to determine the consistency of the obtained clusters, all of them will
be evaluated with V-measure and F-measure. V-measure introduces two criteria
presented in [14]: homogeneity (h) and completeness (c). A clustering process
is considered homogeneously well-formed if all of its clusters contain only data
points which are members of a single class. Comparatively, a clustering result
satisﬁes completeness if all data points that are members of a given class are
elements of the same cluster. So, increasing the homogeneity of a clustering solu-
tion often results in decreasing its completeness. These two criteria run roughly
in opposition. This measure will be used to evaluate the resulting clusters. The
value of homogeneity varies between 1 and 0. In the perfectly homogeneous case,
the homogeneity is 1. In an imperfect situation, it is 0, which happens when the
class distribution within each cluster is equal to the overall class distribution.
Similarly to determination of the homogeneity, the results of the completeness
also vary between 1 and 0. In the perfectly complete case, it is 1. In the worst
case scenario, each class is represented by every cluster with distribution equal
to the distribution of cluster sizes; consequently completeness is 0. V-measure is
thus a measure for evaluating clusters, which studies the harmonic relationship
between homogeneity and completeness. It is given by V β (see (4)) where β = 1,
which is a value usually used. V-measure values varies between 1 and 0.

(1 + β) × h × c (1 + β)2 × P recision × Recall

Vβ = Fβ = (4)
(β × h) + c (β 2 × P recision) + Recall
To establish a comparison between diﬀerent criteria of clusters evaluation, we
also compute F-measure [13], given by F β (see (4)) where β = 1, which is also a
frequently used measure. As it is shown, this well known metric is based on both
Precision and Recall measures. In this context, Precision is determined according
Cross-Lingual Word Sense Clustering for Sense Disambiguation 755

Table 5. English-Portuguese sense-clusters evaluation

Ambiguous V-measure F-measure

Word (Portuguese) (Portuguese)
plant 1.0 1.0
train 0.81 0.96
motion 1.0 1.0
general 1.0 1.0
fair 0.45 0.81
sentence 1.0 1.0
cold 0.91 0.98
chair 1.0 1.0
break 0.51 0.86

Table 6. English-Spanish sense-clusters evaluation

Ambiguous V-measure F-measure

Word (Spanish) (Spanish)
heart 1.0 1.0
plant 1.0 1.0
joint 1.0 1.0
motion 0.62 0.89
train 1.0 1.0
right 0.43 0.80
chair 1.0 1.0
sentence 1.0 1.0
break 0.51 0.86

to the following procedure: for each sense-cluster of a clustering, Precision is

given by the size of the largest semantic group of senses contained in the cluster
(the number of True Positives), divided by the size of the cluster (the sum of
True Positives and False Positives); the average Precision of the clusters gives
the Precision of the clustering. In order to calculate Recall, for each cluster of a
clustering, this measure is given by the same number of True Positives, divided
by the real size of the corresponding semantic group of senses (the sum of True
Positives and False Negatives); the average Recall of the clusters is taken as the
Recall of the clustering. As V-measure, F-measure also varies between 0 and 1,
being the former the worst scenario and the latter the optimal.
Due to lack of space, tables 5, 6 and 7 contain the results of subset samples
of the clusterings obtained for each of the 15 ambiguous English words regard-
ing each target language. These tables show good results for sense-clusters for
Portuguese, Spanish and French language, getting average V-measure and F-
measure values of 0.81 and 0.92, respectively. Values tend to be higher for F-
measure criterion. Tables also show that a signiﬁcant part of the clustering were
perfectly formed, that is, V-measure = 1.0 and F-measure = 1.0.
756 J. Casteleiro et al.

Table 7. English-French sense-clusters evaluation

Ambiguous V-measure F-measure

Word (French) (French)
plant 1.0 1.0
motion 0.62 0.90
train 1.0 1.0
heart 0.51 0.86
joint 1.0 1.0
tank 1.0 1.0
fair 0.56 0.88
break 1.0 1.0
cold 0.62 0.90

However, for some target pairs existing in the bilingual lexica database, there
were very few occurrences in the sentence-aligned parallel corpora, which pre-
vents the accurate calculation of the correlation between senses. This is the
reason why clustering results are relatively poor for some ambiguous words: for
example “motion” for English-French and English-Spanish, among others, as
shown in tables 5, 6 and 7.

4.2 Classification — Assigning Sense Clusters to Ambiguous Words

To accomplish the classification process, potentially ambiguous sentences (con-
taining ambiguous words) were extracted from a corpora that were not used in
the learning stage, totaling 96 expressions. The purpose is to determine how the
disambiguation system behaves when faced with a set of potentially ambiguous
sentences. The classification task usually involves the use of two separate training
and testing sets. The training phase is closely related with the stage of clustering
achieved previously, since we used the acquired knowledge from clusters to train
the system. Each cluster is encoded by the presence or absence of all features
that belong to all clusters related with a particular ambiguous word, and a target
value (i.e. the class label) corresponding to the sense cluster. In what concerns
to the testing phase, our goal is to encode each testing sentence, extracted from
a corpora which was not used in the training phase, and confront it with the
training set. So, since each ambiguous word was taken to be translated to a
target language, to encode each expression we analyze the presence or absence
in the sentence of just the features from English used in the training set.
The classifier used was a support vector machine (LIBSVM) [6] with the
Radial Basis Function (RBF) as kernel type, that allow to handle the case when
the relation between class labels and attributes is non-linear.

4.3 Classification Results

The results obtained with the application of SVM classiﬁer were evaluated using
F-measure [13], again given by F β (see (4)) where β = 1. All 96 ambiguous
Cross-Lingual Word Sense Clustering for Sense Disambiguation 757

sentences (containing ambiguous words) were classiﬁed and so, F-measure were
calculated regarding each target language, as shown in Table 8. The fact that F-
measure for French and Spanish did not reach the same value as for Portuguese
(0.79 vs 0.85) is probably due to the fact that the EN-PT language pair lexicon
used in the clustering process was considerably larger (810, 000 entries) than
the ones used for the other two language pairs (380, 000 and 290, 000), implying
therefore a better quality training phase for that pair.
In order to have a baseline for comparison, the same tests described above
were performed using the output of GIZA++ alignments on DGT [11], where the
most probable sense is used to disambiguate each sentence: the results obtained
were 0.43, 0.38 and 0.38 respectively for EN-PT, EN-FR and EN-SP pairs.

Table 8. Results for the assignment of sense-clusters to ambiguous words

Target Language F-measure

French 0.79
Portuguese 0.85
Spanish 0.79

5 Conclusions and Future Work

In this paper, a language-independent WSD approach using Word Sense Clus-
tering was proposed. Sentence-aligned parallel corpora revealed to be essential
for achieving the objectives accomplished as it provided the neighbor contexts of
ambiguous words, enabling the calculation of the statistical correlation between
senses, and therefore the building of sense-clusters. The results obtained for sense
clustering (V-measure and F-measure) allow us to conclude that the learned clus-
ters are reliable sources of information, supporting the whole process of disam-
biguation. In the classiﬁcation process, results were very positive for all language
pairs tested, showing that well-formed sense clusters are a strong base for WSD.
As future work we would like to improve the approach by studying the opti-
mal size of the neighbor context of the ambiguous words, which will require
larger sentence-aligned parallel corpora, in order to keep, and gain, statistical
representativeness. Future experiments will include ambiguous multi-words, for
which this algorithm was also designed. The approach presented in this paper
will enable us to build a semantic translation tagger that can be useful for trans-
lation aligners or other translation systems. It is also our intention to test our
approach with existing datasets, used on cross-lingual WSD tasks of SemEval[9].
758 J. Casteleiro et al.

References
1. Aires, J., Lopes, G.P., Gomes, L.: Phrase translation extraction from aligned par-
allel corpora using suffix arrays and related structures. In: Lopes, L.S., Lau, N.,
Mariano, P., Rocha, L.M. (eds.) EPIA 2009. LNCS, vol. 5816, pp. 587–597.
Springer, Heidelberg (2009)
2. Apidianaki, M.: Cross-lingual word sense disambiguation using translation sense
clustering. In: Proceedings of the 7th International Workshop on Semantic Evalu-
ation (SemEval 2013), pp. 178–182. *SEM and NAACL (2013)
3. Apidianaki, M., He, Y., et al.: An algorithm for cross-lingual sense-clustering tested
in a MT evaluation setting. In: Proceedings of the International Workshop on
Spoken Language Translation, pp. 219–226 (2010)
4. Bansal, M., DeNero, J., Lin, D.: Unsupervised translation sense clustering. In:
Proceedings of the 2012 Conference of the North American Chapter of the Associ-
ation for Computational Linguistics: Human Language Technologies, pp. 773–782.
Association for Computational Linguistics (2012)
5. Brown, P.F., Pietra, S.A.D., Pietra, V.J.D., Mercer, R.L.: Word-sense disambigua-
tion using statistical methods. In: Proceedings of the 29th annual meeting on Asso-
ciation for Computational Linguistics, pp. 264–270. Association for Computational
Linguistics (1991)
6. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM
Transactions on Intelligent Systems and Technology (TIST) 2(3), 27 (2011)
7. Diab, M.T.: Word sense disambiguation within a multilingual framework. Ph.D.
thesis, University of Maryland at College Park (2003)
8. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The
weka data mining software: an update. ACM SIGKDD Explorations Newsletter
11(1), 10–18 (2009)
9. Lefever, E., Hoste, V.: Semeval-2010 task 3: Cross-lingual word sense disambigua-
tion. In: Proceedings of the 5th International Workshop on Semantic Evaluation,
pp. 15–20. Association for Computational Linguistics (2010)
10. Lefever, E., Hoste, V., De Cock, M.: Five languages are better than one: an attempt
to bypass the data acquisition bottleneck for WSD. In: Gelbukh, A. (ed.) CICLing
2013, Part I. LNCS, vol. 7816, pp. 343–354. Springer, Heidelberg (2013)
11. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment mod-
els. Computational Linguistics 29(1), 19–51 (2003)
12. Pelleg, D., Moore, A.W., et al.: X-means: Extending k-means with efficient esti-
mation of the number of clusters. In: ICML, pp. 727–734 (2000)
13. Rijsbergen, V. (ed.): Information Retrieval, 2nd edn. Information Retrieval Group,
University of Glasgow (1979)
14. Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external
cluster evaluation measure. EMNLP-CoNLL 7, 410–420 (2007)
15. Tufiş, D., Ion, R., Ide, N.: Fine-grained word sense disambiguation based on paral-
lel corpora, word alignment, word clustering and aligned wordnets. In: Proceedings
of the 20th international conference on Computational Linguistics, p. 1312. Asso-
ciation for Computational Linguistics (2004)
Towards the Improvement of a Topic Model
with Semantic Knowledge

Adriana Ferrugento1(B) , Ana Alves1,2 , Hugo Gonçalo Oliveira1 ,

and Filipe Rodrigues1
1
CISUC, Department of Informatics Engineering,
University of Coimbra, Coimbra, Portugal
[email protected], {ana,hroliv,fmpr}@dei.uc.pt
2
Coimbra Institute of Engineering,
Polytechnic Institute of Coimbra, Coimbra, Portugal

Abstract. Although typically used in classic topic models, surface

words cannot represent meaning on their own. Consequently, redundancy
is common in those topics, which may, for instance, include synonyms.
To face this problem, we present SemLDA, an extended topic model
that incorporates semantics from an external lexical-semantic knowledge
base. SemLDA is introduced and explained in detail, pointing out where
semantics is included both in the pre-pocessing and generative phase
of topic distributions. As a result, instead of topics as distributions over
words, we obtain distributions over concepts, each represented by a set of
synonymous words. In order to evaluate SemLDA, we applied preliminary
qualitative tests automatically against a state-of-the-art classical topic
model. The results were promising and confirm our intuition towards the
benefits of incorporating general semantics in a topic model.

Keywords: Topic model · Semantics · WordNet · SemLDA

1 Introduction

Topic models allow us to infer probability distributions over a set of words,

called “topics”, which are useful for uncovering the main subjects in a collection
of documents. They improve searching, browsing and summarization in such
collections, and their application is not limited to text mining, as they revealed
to be useful in fields such as computer vision [20] or bioinformatics [6].
Classic topic modelling algorithms, such as LDA [1], rely on the co-
occurrences of surface words to capture their semantic proximity. They con-
sider a surface word to be identical in different contexts and leverage on its
co-occurrences with other words to differentiate topics. This fails to consider
additional semantic knowledge on the words, which may, on one hand, exclude
different senses of the same word from occurring in different topics and, on the
other hand, lead to redundant topics, for instance with synonyms, that do not
add information to the topic.

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 759–770, 2015.
DOI: 10.1007/978-3-319-23485-4 76
760 A. Ferrugento et al.

Whether it was during pre-processing [8], the generative process [18], or post
processing [2], incorporating semantics into topic modeling emerged as an app-
roach to deal with concepts rather than surface words. Since a word may have
different meanings (e.g. bank ) and since the same concept may be denoted by
different words (e.g. car and automobile), these attempts exploit external seman-
tic resources, such as WordNet [11] or, alternatively, follow a fully unsuper-
vised approach, for instance, using word sense induction techniques [3]. In those
approaches, topic distributions with synonymous and semantically similar words
are unified in concept representations, such as synsets.
In order to improve current topic models, we propose a new model, SemLDA,
which incorporates semantics in the well-known LDA model, using knowledge
from WordNet. Similarly to other semantic topic models, the topics produced
by SemLDA are sets of synsets, instead of words. The main difference is that
SemLDA considers all possible senses of the words in a document, together with
their probabilities. Moreover, it only requires a minimal intuitive change to the
classic LDA algorithm.
The remaining of the paper is organized as follows: in Section 2, there is a brief
enumeration of existing approaches to topic modelling; Section 3 introduces the
proposed model in detail, with special focus on the differences towards the classic
LDA. Section 4 reports on the performed experiments, with illustrative examples
of the obtained topics and their automatic evaluation against the classic LDA.
Finally, Section 5 draws some conclusions and future plans for this work.

2 Related Work

The first notable approach to reduce the dimensionality of documents was Latent
Semantic Indexing (LSI) [5], which aimed at retaining the most of the variance
present in the documents, thus leading to a significant compression of large
datasets. Probabilistic Latent Semantic Indexing (pLSI) [9] later emerged as a
variant of LSI, where different words in documents are modelled as samples from
a simple mixture model where the mixture components are multinomial random
variables that can be viewed as representations of “topics”. Nevertheless, pLSI
was still not a proper generative model of documents, given that it provides no
probabilistic model at the level of documents. Having this limitation in mind,
Blei et al. [1] developed the Latent Dirichlet Allocation (LDA), a generalization
of pLSI that is currently the most applied topic model. It allows documents
to have a mixture of topics, given that it enables to capture significant intra-
document statistical structure via the mixing distribution.
The single purpose of the previous models is to discover and assign different
topics – represented by sets of surface words, each with a different probability –
to the collection of documents provided. Those approaches have no concern with
additional semantic knowledge about words, which can lead to some limitations
in the generated topics. For instance, they might include synonyms, and thus
be redundant and less informative. Alternative attempts address this problem
using, for instance, WordNet [11], a lexical-semantic knowledge base of English.
Towards the Improvement of a Topic Model with Semantic Knowledge 761

WordNet is structured in synsets, which are groups of synonymous words that

may be seen as concept representations of a language. Synsets may be connected
according to different semantic relations, such as hypernymy (generalization) or
meronymy (part-of).
In an attempt to include semantics in topic modelling and, at the same time,
perform word sense disambiguation (WSD), Boyd-Graber and Blei [2] presented
LDAWN, a modified LDA algorithm that includes a hidden variable for rep-
resenting the sense of a word, according to WordNet. Each topic consists of
a random walk through the WordNet hypernymy hierarchy, which is used to
infer topics and their synsets, based on the words from documents. LDAWN was
also applied to word sense disambiguation (WSD), although its authors accept
the worse performance when compared with state-of-the-art WSD algorithms.
One of the proposed solutions is to acquire local context to improve WSD, in
the future. However, there is additional work towards the discovery of concept-
based topics, not always relying in WordNet. For instance, LDA was used as
a ground model to generate topics based on concepts of an ontology [4]; and a
commonsense knowledge-based algorithm was used to transform documents into
commonsense concepts, which were then clustered to generate the topics [17].
Despite some similarities, the model proposed in this paper differs from the
previous in various ways. Instead of words, the produced topics are also distri-
butions over concepts (synsets) and, similarly to LDAWN, it exploits WordNet
and modifies the basic LDA by adding a sense variable. But SemLDA considers
all possible senses of a word, with a distribution over all the synsets that include
it. Indeed, we do not benefit from similar words in the same topic to improve
WSD, as in LDAWN. Rather, we try to avoid it. This is why, in the future, our
sense probabilities might be obtained from WSD. SemLDA is further explained
in the next section.

3 Proposed Model

In order to consider the general semantics of a language during topic discovery,

we propose SemLDA, a topic model that exploits external semantic knowledge,
acquired from Princeton WordNet [11], a lexical-semantic knowledge base of
English. SemLDA is based on the Latent Dirichlet Allocation model [1], but it
introduces a new set of parameters η 1:S , where S is the number of synsets where
the word occurs (one for each of its senses). The parameters correspond to the
probabilities of each word belonging to a synset, which, in the current imple-
mentation, is obtained directly from the SemCor corpus [12] – in SemCor, words
are manually annotated according their WordNet senses and, in WordNet, both
synsets and word senses are ordered according to their frequencies in SemCor.
Given a corpus D = {wd }D d=1 of size |D| and the probabilities of each word
n in a synset s, ηs,n , SemLDA estimates the most likely set of topics. Each
document is represented by a distribution of topics, θ, but, in contrast to the
traditional LDA, a topic is represented as a distribution over the synsets in the
vocabulary, β. This is a major diﬀerence because, in LDA, words are handled
762 A. Ferrugento et al.

according to the documents where they appear, regardless of their known seman-
tics. In SemLDA, instead of just words, all the possible senses for each word are
considered, although with different probabilities. We should notice that the more
informative output (synsets) does not necessarily imply an increasing complexity
in the topic representation. If needed, in order to have a comparable output to
other topic models, a single word can be selected from each synset. When using
WordNet, it makes sense to select the first word of a synset, which we recall to
be the most frequently used to denote the concept.
The graphical model of SemLDA is displayed in Figure 1, where D is the
number of documents in the corpus, K is the number of topics, N is the number
of words in a document and S is the number of synsets of a given word. In
this model, each word of a document, wn , is drawn from a concept, cn . This is
represented by using the synset’s distribution over words, parameterized by η,
which we shall assume to be fixed. The concept cn is determined by a discrete
topic-assignment zn , picked from the document’s distribution over topics θ and
a topic distribution β. It follows the same reasoning as the LDA model, but
includes a new layer corresponding to the concepts cn that the words wn express.
The generative process of a document d under SemLDA is the following:

1. Choose topic proportions θ|α ∼ Dir(α)

2. For each concept, cn
(a) Choose topic assignment zn |θ ∼ M ult(θ)
(b) Choose concept cn |zn , β1:K ∼ M ult(βzn )
(c) Choose word to represent concept wn |cn , η1:S ∼ M ult(ηcn )

Fig. 1. Graphical model.

Our goal is thus to calculate, for every document, the posterior distribution
over the latent variables, θ, z1:N , c1:N . However, as in LDA, performing exact
inference is intractable, so we need to use an approximate inference method. In
this paper, we use variational inference to perform approximate Bayesian infer-
ence. The purpose of variational inference is to minimize KL divergence between
the variational distribution q(θ, z1:N , c1:N ) and the true posterior distribution
p(θ, z1:N , c1:N |w1:N ). A fully factorized (mean ﬁeld) variational distribution q,
Towards the Improvement of a Topic Model with Semantic Knowledge 763

of the form represented in equation 1, is employed, with γ, φ and λ as the

variational parameters.

N
q(θ, z1:N , c1:N ) = q(θ|γ) q(zn |φn )q(cn |λn ) (1)
n=1

KL minimization [10] is equivalent to maximizing the lower bound on the log

marginal likelihood (equation 2) by using coordinate ascent.

p(θ, z1:N , c1:N , w1:N |Θ)q(θ, z1:N , c1:N )
log p(w1:N |α, β1:K , η1:S ) = log
θz c
q(θ, z1:N , c1:N )
1:N 1:N

≥ Eq [log p(θ, z1:N , c1:N , w1:N )] − Eq [log q(θ, z1:N , c1:N )

= L(γ, φ1:N , λ1:N |Θ) (2)
The greek letter Θ is used to denote the model’s parameters Θ = {α, β, η}.
By optimizing L w.r.t γ, we get the update the same update as in the classic
LDA in [1], given by equation 3.

N
γi = αi + φn,i (3)
n=1

The variational parameter φ can be optimized by collecting only the terms

in the lower bound that contain the parameter. Notice that this is a constrained
K
maximization problem, since k=1 φn,k = 1, which is necessary for it to be a
valid probability distribution. Hence, we need to include the necessary Lagrange
multipliers, yielding equation 4.

N
K K
N
S
K
L[φ] = φn,i Ψ (γi ) − Ψ γj + λn,j φn,i log βi,j
n=1 i=1 j=1 n=1 j=1 i=1

N
K
K
− φn,i log φn,i + μ φn,k − 1 (4)
n=1 i=1 k=1

Setting the derivatives of L[φ] w.r.t φ to zero gives the update in equation 5.

K
S
φn,i ∝ exp Ψ (γi ) − Ψ γj + λn,j log βi,j (5)
j=1 j=1

In order to optimize L w.r.t λ, we again start by collecting only the terms

in the bound that contain
S λ. Notice that this is also a constrained maximiza-
tion problem, since k=1 λn,k = 1. Hence, we need to also add the necessary
Lagrange multipliers (see equation 6).
Vj

N
S
K
N
S
L[λ] = λn,j φn,i log βi,j + λn,j wn,i log ηj,i
n=1 j=1 i=1 n=1 j=1 i=1

N
S
S
− λn,j log λn,j + μ λn,k − 1 (6)
n=1 j=1 k=1
764 A. Ferrugento et al.

Setting the derivatives of L[λ] w.r.t λ to zero gives the update in equation 7.
Vj

K
λn,j ∝ exp φn,i log βi,j + wn,i log ηj,i (7)
i=1 i=1

The variational inference algorithm iterates between the diﬀerent updates

presented until the convergence of the evidence lower bound. Since the param-
eters α and η are assumed to be fixed, we only need to estimate β, for which a
variational EM algorithm is used. In the E-step of the expectation-maximization
algorithm, variational inference is used to find an approximate posterior for
each document, as previously described. In the M-step, as in exact EM, we find
maximum likelihood estimates of the parameters using the expected sufficient
statistics computed in the E-step.
We start by collecting only the terms in the lower bound that V contain β.
Notice that this is a constrained maximization problem, since k=1 βi,k = 1,
which is necessary to be a valid probability distribution. Hence, we also need to
include the necessary Lagrange multipliers (see equation 8).
D
K
N S K

S
L[β] = λdn,j φdn,i log βi,j + μi βi,k − 1 (8)
d=1 n=1 i=1 j=1 i=1 k=1

Setting the derivatives of L[β] w.r.t β to zero gives the following update in
equation 9, which is analogous to the update in standard LDA [1], but with the
d
words wn,j replaced by their probability in the j th concept, λdn,j .
Nd

D
βi,j ∝ λdn,j φdn,i (9)
d=1 n=1

4 Results and Evaluation

In this section, the experiments performed towards the validation of SemLDA
are reported. The datasets used are described, some of the obtained synset-based
topics are shown, and the evaluation metrics are presented. The latter were used
to compare SemLDA with the classic LDA.

4.1 Datasets
Two freely available textual corpora were used in our experiments, namely: the
Associated Press (AP) and the 20 Newsgroups dataset, both in English. AP is a
large news corpus, from which we used only a part. More precisely, the sample
data for the C implementation of LDA, available in David Blei’s website1 , which
1
http://www.cs.princeton.edu/∼blei/lda-c/
Towards the Improvement of a Topic Model with Semantic Knowledge 765

includes 2,246 documents. The 20 Newsgroups2 is a popular dataset for exper-

iments in text applications of machine learning techniques. It contains 20,000
documents, organized into 20 diﬀerent newsgroups. Both datasets went through
the same pre-processing phase, which included stop-word and numbers removal,
as well as word lemmatization. For stop-word removal, we used the snowball stop
words list3 . Word lemmatization was based on the NLTK4 WordNet reader.

4.2 Experiments
The experiments performed aimed at comparing the classic LDA algorithm [1]
with SemLDA. An implementation of the classic algorithm, implemented in C,
is available from Blei’s website5 . No changes were made to his code. We just had
to pre-process the documents, generate a suitable input, and execute it.
For running SemLDA, extra work was needed. First, we retrieved all synsets
from the SemCor 3.0 annotations6 , and calculated their probability in this cor-
pus. This is a straightforward task for those WordNet synsets that are in Sem-
Cor. But SemCor is a limited corpus and does not cover all words and senses in
WordNet. To handle this issue, an extra pre-processing step was added, where
all documents were reviewed and, when a word did not occur in SemCor, a new
‘dummy’ synset was created with a special negative id, and probability equal to
the average probability of all the other synsets. This value was chosen to balance
the unknown probabilities of dummy synsets according to the probabilities of
the remaining synsets, and thus not favor any of them.
For each dataset, the SemLDA input file had the synsets retrieved from
SemCor and the words that were in the documents, but not in SemCor. The
only difference on the text pre-processing is the use of part-of-speech (POS)
tagging, to consider only open-class words, namely nouns, verbs, adjectives and
adverbs.
Instead of trial and error with different numbers of topics, we used a Hierar-
chical Dirichlet Process (HDP) [19] to discover the appropriate number of top-
ics for each dataset. The results obtained suggested that the 20 Newsgroups
dataset contains 15 topics and the AP corpus 24. After the pre-processing
phase, each model was run for both datasets, with the α parameter fixed at
0.5. Tables 1 and 2 illustrate the results obtained with the classic LDA and
SemLDA, respectively for the 20 Newsgroups and for the AP corpus. For each
topic discovered by the SemLDA presented, we tried to find an analogous topic
by the classic LDA, in a sense that they share similar domains. For the sake of
simplicity, we only show the top 10 synsets for each SemLDA topic, with their
Synset ID, POS-tag, words and gloss. Underlined words are those present in
SemCor and WordNet, whereas the others only appear in WordNet. For each
LDA topic only the top 10 words are displayed.
2
http://qwone.com/∼jason/20Newsgroups/
3
http://snowball.tartarus.org/algorithms/english/stop.txt
4
http://www.nltk.org/
5
http://www.cs.princeton.edu/∼blei/lda-c/
6
http://web.eecs.umich.edu/∼mihalcea/downloads.html#semcor
766 A. Ferrugento et al.

Table 1. Illustrative (analogous) topics from 20 Newsgroups, obtained with the classic
LDA (top) and with SemLDA (bottom).

LDA
medical, health, use, patient, disease, doctor, cancer, study, infection, treatment
SemLDA
Synset ID POS Words Gloss
14447908 N health, wellness A healthy state of well being free from disease.
3247620 N drug A substance that is used as a medicine or narcotic.
10405694 N patient A person who requires medical care.
10020890 N doctor, doc, physician, MD, A licensed medical practitioner.
Dr., medico
14070360 N disease An impairment of health or a condition of abnormal
functioning.
14239918 N cancer, malignant neoplastic Any malignant growth or tumor caused by abnor-
disease mal and uncontrolled cell division.
47534 ADV besides, too, also, likewise, In addition.
as well
14174549 N infection The pathological state resulting from the invasion
of the body by pathogenic microorganisms.
1165043 V use, habituate Take or consume (regularly or habitually).
7846 N person, individual, someone, A human being.
somebody, mortal, soul

The results show success on incorporating semantics into LDA. Topics are
based on synsets and WordNet can be used to retrieve additional information
on the concept they denote, including their definition (gloss), POS and other
words with the same meaning. With both models, the top words of each topic are
consistently nouns, which should transmit more content. The presented examples
clearly describe very close semantic domains. They share many words and the
other are closely related to each other (eg. drug and treatment, or exchange and
trading). We call attention to topics where the same word is in different synsets
(Table 2). While this might sometimes be undesirable, and a possible sign of
incoherence, it also shows that the algorithm is correctly handling different senses
of the same word. These situations should be minimized in the future, as we
intend to acquire sense probabilities from word sense disambiguation (WSD) [14],
instead of relying blindly in SemCor for this purpose. This will also minimize
the number of dummy synsets.
We can say that the overall results are satisfying. Despite one or another less
clear word association, we may say that we are moving towards the right direc-
tion. Still, to measure progress towards the classic LDA, we made an automatic
evaluation of the coherence of the discovered topics.

4.3 Evaluation
Although, at a ﬁrst glance, the results might seem promising, they were validated
automatically, using metrics previously applied to the context of topic modelling,
namely: pointwise mutual information (PMI) and topic coherence.

PMI is a measure of word association that, according to Newman et al. [15],

is highly correlated with human-judged topic coherence. In this context, PMI
Towards the Improvement of a Topic Model with Semantic Knowledge 767

Table 2. Illustrative (analogous) topics from AP, obtained with the classic LDA (top)
and with SemLDA (bottom).

LDA
stock, market, percent, rate, price, oil, rise, say, point, exchange
SemLDA
Synset ID POS Words Gloss
8424951 N market The customers for a particular product or service.
13851067 N index A numerical scale used to compare variables with
one another or with some reference number.
8072837 N market, securities industry The securities markets in the aggregate.
13342135 N share Any of the equal portions into which the capital
stock of a corporation is divided and ownership of
which is evidenced by a stock certificate.
79398 N trading Buying or selling securities or commodities.
3843092 N oil, oil color, oil colour Oil paint containing pigment that is used by an
artist.
7167041 N price A monetary reward for helping to catch a criminal.
14966667 N oil A slippery or viscous liquid or liquefiable substance
not miscible with water.
5814650 N issue An important question that is in dispute and must
be settled.
13333833 N stock The capital raised by a corporation through the
issue of shares entitling holders to an ownership
interest (equity).

is calculated for each topic, based on the co-occurrence probabilities of every

pair of its words. In our case, we used the 10 most probable words of each
topic, which results in 45 diﬀerent pairs. For both datasets, co-occurrence is
computed from Wikipedia, which provides a large and wide-coverage source of
text, completely independent from the datasets used and from WordNet. After
comparing diﬀerent approaches for evaluating topic coherence automatically,
Newman et al. [16] concluded that PMI over Wikipedia provides a score that is
very consistent with human judgements. See equation 10 for the PMI’s formula,
where p(w) is the probability of a word w (in our case, the number of Wikipedia
articles using this word), and p(wi , wj ) is the probability of words wi and wj
co-occurring (in our case, the number of Wikipedia articles using both words).
After computing PMI for all topics, we computed the average score for the full
topic set.
1 p(wi , wj )
P M I − Score(w) = log , ij ∈ {1...10} (10)
45 i<j p(wi )p(wj )

We recall that topics discovered by SemLDA are sets of synsets and not
of surface words. Therefore, to enable a fair comparison with the classic LDA,
before computing the PMI scores, we converted our topics to a plain word rep-
resentation. For this purpose, instead of full synsets, we used only their ﬁrst
word. According to WordNet, this is the word most frequently used to denote
the synset concept, in the SemCor corpus. For instance, the SemLDA topic
in Table 2 becomes: market, index, market, share, trading, oil, price, oil, issue,
stock. On the one hand, this representation limits the extent of our results, which
are, in fact, synsets. On the other hand, by doing so, it might lead to duplicate
words in the same topic, though corresponding to diﬀerent senses.
768 A. Ferrugento et al.

Coherence measures the co-occurrence, within the modelled documents, of pairs

of words in the same topic [13]. As in the PMI measure, the 10 most probable
words of each topic are used. It is also calculated for each topic and, in the end,
the average is computed for the full topic set. Assuming that, in every document,
there is an explicit theme, by calculating this, we can analyze if the grouping of
words is coherent, given their co-occurrence. This measure is very similar to PMI
but, in some situations, it achieved higher correlation with human judges [13].
(t) (t)
Equation 11 shows the formula of this measure, where D(vm , vl ) is the co-
document frequency of two words, 1 is a smoothing count to avoid the logarithm
(t)
of zero and D(vl ) is the document frequency of a word.
M m−1
(t) (t)
D(vm , vl ) + 1
C(t; V (t) ) = log (t)
(11)
m=2 l=1 D(vl )

The results obtained with the two topic models, for both datasets, are pre-
sented in Table 3. Even if it was a close call, SemLDA outperformed the classic
LDA. On both metrics, SemLDA had better scores in the AP corpus, which was
the dataset originally used by Blei. For the 20 Newsgroups dataset, the topic
coherence measure was very close with both models, whereas the PMI score was
better for SemLDA. This conﬁrms that we are heading towards a promising
approach that, by exploiting an external lexical-semantic knowledge base, may
improve the outcome of the classic LDA model. We should still stress that these
are just preliminary results. The following steps are explained with further detail
in the next section.

Table 3. Results obtained with the two evaluation metrics.

20 Newsgroups AP
PMI Coherence PMI Coherence
LDA 1.16 ± 0.39 -32.89 ± 19.77 1.12 ± 0.31 -13.62 ± 9.51
SemLDA 1.22 ± 0.46 -35.4 ± 17.65 1.43 ± 0.26 -9.18 ± 7.51

5 Concluding Remarks
We have presented SemLDA, a topic model based on the classic LDA that incor-
porates external semantic knowledge to discover less redundant and more infor-
mative topics, represented as concepts, instead of surface words. We may say that
we have been successful so far. The classic algorithm was eﬀectively changed to
produce topics based on WordNet synsets, which, after an automatic validation,
shown to have comparable coherence to the original topics. Despite the promising
results, there is still much room for improvement.
In fact, to simplify our task, we relied on some assumptions that should be
dealt with in a near future and, hopefully, lead to improvements. For instance,
Towards the Improvement of a Topic Model with Semantic Knowledge 769

the α parameter of LDA was simply set to a fixed value of 0.5. Its selection
should be made after testing different values and assess their outcomes. We are
also considering to add a Dirichlet prior over the variable concerning topics, β,
so that it produces a smooth posterior and controls sparsity. Additional planned
tests include the generation of topics considering just a subset of the open class
words, for instance, just nouns, which might be more informative. Last but
not least, we recall that we obtained the word sense probabilities directly from
SemCor. While this corpus is frequently used in WSD tasks and should thus have
some representativeness, this approach does not consider the context where the
words occur. Instead of relying in SemCor, it is our goal to perform all-words
WSD to the input corpora, and this way extract the probabilities of selecting
different synsets, given the word context. This should also account for words
that are not present in SemCor, and minimize the number of dummy synsets,
with the averaged probabilities assigned.
Moreover, we should perform an additional evaluation of the results, not
just through automatic measures, but possibly using people to assess the topic
coherence. For instance, we may adopt the intruder test, where judges have
to manually select the word not belonging to a topic (see [15]). It is also our
intention to evaluate SemLDA indirectly, by applying it in tasks that require
topic modelling, such as automatic summarization and classification.
We conclude by pointing out that, although the proposed model is language
independent, it relies in language specific resources, especially the existence of a
wordnet, besides models for POS-tagging and lemmatization. One of our mid-
term goals is precisely to apply SemLDA to Portuguese documents. For such, we
will use available POS-taggers and lemmatizers for this language, as well as one
or more of the available wordnets (see [7] for a survey on Portuguese wordnets).
Since sense probabilities will soon be obtained from a WSD method, the
unavailability of a SemCor-like corpus for Portuguese is not an issue.

Acknowledgments. This work was supported by the InfoCrowds project - FCT-

PTDC/ECM-TRA/1898/2012FCT.

References
1. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine
Learning Research 3, 993–1022 (2003)
2. Boyd-Graber, J., Blei, D., Zhu, X.: A topic model for word sense disambigua-
tion. In: Proceedings of 2007 Joint Conference on Empirical Methods in Natural
Language Processing and Computational Natural Language Learning (EMNLP-
CoNLL), pp. 1024–1033. ACL Press, Prague, Czech Republic, June 2007
3. Brody, S., Lapata, M.: Bayesian word sense induction. In: Proceedings of 12th Con-
ference of the European Chapter of the Association for Computational Linguistics.
EACL 2009, pp. 103–111. ACL Press (2009)
4. Chemudugunta, C., Holloway, A., Smyth, P., Steyvers, M.: Modeling documents by
combining semantic concepts with unsupervised statistical learning. In: Sheth, A.P.,
Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.)
ISWC 2008. LNCS, vol. 5318, pp. 229–244. Springer, Heidelberg (2008)
770 A. Ferrugento et al.

5. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.:
Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)
6. Flaherty, P., Giaever, G., Kumm, J., Jordan, M.I., Arkin, A.P.: A latent variable
model for chemogenomic profiling. Bioinformatics 21(15), 3286–3293 (2005)
7. Gonçalo Oliveira, H., de Paiva, V., Freitas, C., Rademaker, A., Real, L., oes, A.S.:
As wordnets do português. In: Simões, A., Barreiro, A., Santos, D., Sousa-Silva, R.,
Tagnin, S.E.O. (eds.) Linguı́stica, Informática e Tradução: Mundos que se Cruzam,
OSLa, vol. 7, no. 1, pp. 397–424. University of Oslo (2015)
8. Guo, W., Diab, M.: Semantic topic models: combining word distributional statistics
and dictionary definitions. In: EMNLP, pp. 552–561. ACL Press (2011)
9. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd
annual international ACM SIGIR conference on Research and development in infor-
mation retrieval, pp. 50–57. ACM (1999)
10. Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to vari-
ational methods for graphical models. Machine learning 37(2), 183–233 (1999)
11. Miller, G.A.: Wordnet: a lexical database for english. Communications of the ACM
38(11), 39–41 (1995)
12. Miller, G.A., Chodorow, M., Landes, S., Leacock, C., Thomas, R.G.: Using a
semantic concordance for sense identification. In: Proceedings of ARPA Human
Language Technology Workshop. Plainsboro, NJ, USA (1994)
13. Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing
semantic coherence in topic models. In: Proceedings of the Conference on Empirical
Methods in Natural Language Processing. EMNLP 2011, pp. 262–272. ACL Press
(2011)
14. Navigli, R.: Word sense disambiguation: A survey. ACM Computing Surveys 41(2),
1–69 (2009)
15. Newman, D., Bonilla, E.V., Buntine, W.: Improving topic coherence with reg-
ularized topic models. In: Advances in Neural Information Processing Systems,
pp. 496–504 (2011)
16. Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic
coherence. In: Human Language Technologies: The 2010 Annual Conference of the
North American Chapter of the Association for Computational Linguistics. HLT
2010, pp. 100–108. ACL Press (2010)
17. Rajagopal, D., Olsher, D., Cambria, E., Kwok, K.: Commonsense-based topic mod-
eling. In: Proceedings of the 2nd International Workshop on Issues of Sentiment
Discovery and Opinion Mining, p. 6. ACM (2013)
18. Tang, G., Xia, Y., Sun, J., Zhang, M., Zheng, T.F.: Topic models incorporat-
ing statistical word senses. In: Gelbukh, A. (ed.) CICLing 2014, Part I. LNCS,
vol. 8403, pp. 151–162. Springer, Heidelberg (2014)
19. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical dirichlet processes.
Journal of the american statistical association 101(476) (2006)
20. Wang, C., Blei, D., Li, F.F.: Simultaneous image classification and annotation.
In: IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2009,
pp. 1903–1910. IEEE (2009)
RAPPORT — A Portuguese
Question-Answering System

Ricardo Rodrigues1,2(B) and Paulo Gomes1

1
Centre for Informatics and Systems, University of Coimbra, Coimbra, Portugal
{rmanuel,pgomes}@dei.uc.pt
2
College of Education of the Polytechnic Institute of Coimbra, Coimbra, Portugal

Abstract. We present a question answering system for Portuguese that

depends on subject-predicate-object triples extracted from sentences in
a corpus. It is supported by indices that store triples, related sentences
and documents. The system processes the questions and retrieves answers
based on the triples.
For testing and evaluation purposes, we have used the CHAVE cor-
pus, which has been used in multiple editions of CLEF. The questions
from those editions were used to query and benchmark our system. In its
current stage, the system has found the answer to 42% of the questions.
This document describes the modules that compose the system and
how they are combined, providing a brief analysis on them, and also
some preliminary results, as well as some expectations regarding future
work.

Keywords: Question answering · Triple extraction · Portuguese

1 Introduction

When querying a system that provides or retrieves information about a given

topic, with its contents in natural language, the user should not have to care
about system speciﬁc details, such as:

– knowing the best keywords to get an answer to a speciﬁc question;

– using system speciﬁc syntax in order to interact with it;
– perusing the multiple documents that may contain the eventual answer;
– being limited by the questions someone has compiled before.

It is also possible that the information sources are not, or can not be, struc-
tured in such a way that can be easily accessed by more conventional techniques
of information retrieval (IR) [2].
Most of these issues are addressed by question answering (QA) systems [22],
which allow the user to interact with the systems by using natural language, and
process documents whose contents are speciﬁed also using natural language.

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 771–782, 2015.
DOI: 10.1007/978-3-319-23485-4 77
772 R. Rodrigues and P. Gomes

In this context, we present a system that addresses QA for Portuguese, using

triples extracted from sentences in a corpus, that are then used to present “short
answers” (excerpts), alongside the sentences and documents they belong to.
In the remaining document, we brieﬂy address the state of the art, describe
the overall proposed approach and each of its modules, and draw some conclu-
sions and reﬂections about future work.

2 Question Answering
Question answering, as in other subﬁelds of IR, may include techniques such as:
named entity recognition (NER) or semantic classiﬁcation of entities, relations
between entities, and selection of semantically relevant sentences, phrases or
chunks [14], beyond the customary tokenization, lemmatization, and part-of-
speech (POS) tagging. QA can also address a restricted set of topics, in a closed
world domain, or forgo that restriction, operating in an open world domain.
Most approaches usually follow the framework shown in Fig. 1, where most
of the processing stages are made on run-time (except for document indexing).

Fig. 1. A Typical Framework for a QA System (based on [10])

Regarding speciﬁc approaches to Portuguese, below is a list of the most

relevant works whose results are compared against our work.

2.1 Esfinge

Esfinge [4] is a general domain question answering system that tries to take
advantage of the great amount of information existent in the Web. Esfinge relies
on pattern identification and matching. For each question, a tentative answer is
created. For instance, a probable answer for a “What is X ?” question will start
RAPPORT — A Portuguese Question-Answering System 773

with “X is...”. Then this probable answer beginning is used to search the corpus,
through the use of a search engine, in order to ﬁnd possible answers that match
the same pattern. In the following stages of the process, n-grams are scored and
NER is performed in order to improve the performance of the system.

2.2 Senso
The Senso Question Answering System [19] (previously PTUE [17]) uses a local
knowledge base, providing semantic information for text search terms expansion.
It is composed of two modules: the solver module, that uses two components to
collect plausible answers (the logic and the ad-hoc solvers); and the logic solver,
that starts by producing a ﬁrst-order logic expression representing the question
and a logic facts list representing the text information and then looking for
answers within the facts list that unify and validate the question logic form.
There is also an ad-hoc solver for cases where the answer can be directly detected
in the text. After all modules are used, the results are merged for answer list
validation, to ﬁlter and adjust answers weight.

2.3 Priberam

Some of the most well known works on NLP and QA have been done at Priberam.
Priberam’s QA System for Portuguese [1] uses a conservative approach, where
the system starts by building contextual rules for performing morphological dis-
ambiguation, named entity recognition, etc.. Then it analyses the questions and
divides them into categories. These same categories are applied to sentences
in the source text. This categorization is done according to question patterns,
answer patterns and question answering patterns (where pattern match between
answer and question is performed).

2.4 NILC

Brazil’s Ncleo Interinstitucional de Lingstica Computacional (NILC) has built

a summarization system to be used in the task of monolingual QA for Por-
tuguese texts. NILC’s system uses a text summarizer that comprises three main
processes: text segmentation, sentence ranking, and extract production [5] (asso-
ciating sentences to a topic). The questions are then matched against the sen-
tences, with the associated summaries being used to produce an answer.

2.5 RAPOSA

The RAPOSA Question Answering System [21] tries to provide a continuous

on-line processing chain from question to answer, combining the stages of infor-
mation extraction and retrieval. The system involves expanding queries for event-
related or action-related factoid questions using a verb thesaurus automatically
generated using information extracted from large corpora.
774 R. Rodrigues and P. Gomes

RAPOSA consists of seven modules more or less typical on QA systems:

a question parser, a query generator, a snippet searcher, an answer extractor,
answer fusion, an answer selector, and query expansion. It also deals with two
categories of questions: deﬁnition questions and factoid questions.

2.6 QA@L2 F
QA@L2 F [13], the question-answering system from L2 F, INESC-ID, is a system
that relies on three main tasks: information extraction, question interpretation
and answer finding.
This system uses a database to store information obtained by information
extraction, where each entry is expected to represent the relation between the
recognized entities. On a second step, the system processes the questions, cre-
ating SQL queries that represent the question and are run in the database. The
retrieved records from the database are then used to ﬁnd the wanted questions,
through entity matching.

2.7 IdSay

IdSay: Question Answering for Portuguese [3] is an open domain question

answering system that uses mainly techniques from the area of IR, where the
only external information that it uses, besides the text collections, is lexical
information for the Portuguese language.
IdSay starts by performing document analysis and then proceeding to entity
recognition. After that, the system makes use of patterns to deﬁne the type
of the question and expected answers. IdSay uses a conservative approach to
QA, being its main stages: information indexing, question analysis, document
retrieval, passage retrieval, answer extraction and answer validation.

3 The Proposed System — RAPPORT

Our system, RAPPORT, follows most of the typical framework for a QA sys-
tem, while being an open domain system. It does, however, improves on some
techniques that diﬀer from other approaches to Portuguese.
One of the most diﬀerentiating elements is the use of triples as the basic
unit of information regarding any topic, represented by a subject, a predicate
and an object, and then using those triples as the base for answering questions.
This approach shares also some similarities with open information extraction,
regarding the storage of information in triples [7].
RAPPORT depends on a combination of four modules, addressing infor-
mation extraction, storage, querying and retrieving. The basic structure of the
system comprehends the following modules:

– triple extraction (performed oﬄine);

– triple storage (performed oﬄine);
RAPPORT — A Portuguese Question-Answering System 775

– data querying (performed online);

– and answer retrieving (performed online).
Each of these modules are described next, specifying each of the main tasks.

3.1 Triple Extraction

The first module processes the contents of the corpus, picking each of the doc-
uments, identifying sentences and extracting triples. It includes multiple tasks,
namely sentence splitting, chunking, tokenization, POS tagging, lemmatization,
dependency parsing, and NER.
The sentence splitting, tokenization, POS tagging, and NER tasks are done
using tools included in the Apache OpenNLP toolkit 1 , with some minor tweaks
for better addressing the texts used. For instance, the system groups tokens
that should be processed together — e.g., person names and dates — and, at
the same time, it also splits composed tokens, such as it is the case of some
Portuguese verbal conjugations and clitics — e.g., the preposition “no”2 becomes
“em o”. The model used in the chunker was also created for Portuguese (although
following guidelines from OpenNLP), as there was no available pre-built model.
For the lemmatization process, LemPORT [18], a Portuguese specific lem-
matizer was used. Another tool was used for dependency parsing, namely, Malt-
Parser [16], with the model used by the parser being trained on Bosque 8.03
(available through Linguateca4 ). However, the output of MaltParser is further
processed in order to group the tokens around the main dependencies, such as:
subject, root (verb), and objects, among others.
Specifically regarding triple extraction, it is performed using two complemen-
tary approaches, both involving named entities, as a way of determining which
triples are of use. The triples are defined by three fields: subject, predicate, and
object. After the documents are split into sentences, each sentence is directly
processed in order to extract named entities. Then, the sentences either are
chunked or undergo tokenization, POS tagging and lemmatization before apply-
ing the MaltParser to identify the main dependencies. An algorithm describing
the process is found in Alg. 1.
As can be noticed, only the triples with entities in the subject or in the
object are further used. The triples extracted this way are then stored for future
querying. Also, the predicate has the verb stored in its lemmatized form in order
to facilitate later matches.
For clarification, in the triples that are based on the proximity between
chunks, most of the predicates comprehend, but are not limited to, the verbs
ser (to be), pertencer (to belong), haver (to have), and ficar (to locate). For
instance, if two NP chunks are found one after another, and the first chunk
contains a named entity, it is highly probable that it is further characterized by
1
http://incubator.apache.org/opennlp/
2
This corresponds to the combination of the preposition “in” with the article “the”.
3
http://www.linguateca.pt/floresta/BibliaFlorestal/completa.html
4
http://www.linguateca.pt/
776 R. Rodrigues and P. Gomes

Data: Corpus documents

Result: Triples
Read documents;
foreach document do
Split sentences;
foreach sentence do
Tokenize, lemmatize, POS tag and dependency parse;
Extract named entities;
Get proximity chunks;
foreach chunk do
if chunk contains any entity then
if neighbouring chunk has a specific type then
Create triple relating both chunks, depending on the
neighbouring chunk type and contents;
end
end
end
Get dependency chunks;
foreach chunk do
if chunk contains any entity and is a subject or an object then
Create triple using the subject or object, the root, and
corresponding object or subject, respectively;
end
end
end
end
Algorithm 1: Triple Extraction Algorithm

the second chunk. If the second chunk starts with a determinant or a noun, the
predicate of the future triple is set to ser ; if it starts with the preposition em
(in), it is used the verb ficar ; if it starts with the preposition de (of), it is used
the verb pertencer ; and so on.
As an example, the sentence “Mel Blanc, o homem que deu a sua voz a o
coelho mais famoso de o mundo, Bugs Bunny, era alrgico a cenouras.”5 yields
three distinct triples: “{Bugs Bunny} {ser} {o coelho mais famoso do mundo}”
and “{Mel Blanc} {ser} {o homem que deu a sua voz ao coelho mais famoso do
mundo}”, both using the proximity approach, and “{Mel Blanc} {ser} {alrgico
a cenouras}”, using the dependency approach.

3.2 Triple Storage

After triple extraction is performed, Lucene 6 is used for storing the triples, the
sentences where the triples are found, and the documents that, by its turn,
contain those sentences. For that purpose, three indices were created:

5
In English,“Mel Blanc, the man who lent his voice to the world’s most famous rabbit,
Bugs Bunny, was allergic to carrots.”.
6
For Lucene, please refer to http://lucene.apache.org.
RAPPORT — A Portuguese Question-Answering System 777

– the triple index stores the triples (subject, predicate and object), their id,
and the ids of the sentences and documents that contain them;
– the sentence index stores the sentences’ id (a sequential number representing
its order within the document), the tokenized text, the lemmatized text and
the documents’ id they belong to;
– the document index stores the data describing the document, as found in
CHAVE (number, id, date, category and author);
Although each index is virtually independent from the others, they can refer
one another by using the ids of the sentences and of the documents. That way,
it is easy to determine the relations between documents, sentences, and triples.
These indices (mainly the sentence and the triple indices) are then used in the
next steps of the present approach.

3.3 Data Querying

In a similar way to the annotation made to the sentences in the corpus, the
questions are processed in order to extract tokens, lemmas and named entities,
and identify their types, categories and targets.
For building the queries, the system starts by performing NER and lemma-
tizing the questions. The lemmas are useful for broadening the matches and
results that could be found only by using directly the tokens. The queries are
essentially made up of the lemmas found in the questions (including named enti-
ties and proper nouns). In those queries, all elements are, by default, optional,
excluding the named entities. If no entities are present in the questions, proper
nouns are made mandatory; by its turn, if there are also no proper nouns in the
questions, it is the nouns that are used as mandatory keywords in the queries.
For instance, in order to retrieve the answer to que question “A que era alrgico
Mel Blanc? ”7 , the query will end up being deﬁned by three words: “+Mel Blanc
a que ser alrgico”. We have opted for keeping all the lemmas because Lucene
scores higher the hits with the optional lemmas, and virtually ignores them if
they are not present.
The query is then applied to the sentence index — the system searches for
sentences with the lemmas previously identiﬁed, with the verb as an optional
term. When a match occurs, the associated triples are retrieved, along with the
document data.
The triples that are related to the sentence are then processed, checking for
the presence of question’s entities in either the subject or the object of the triples,
for selecting which triples are of interest.

3.4 Answer Retrieving

After a sentence matches a query, as stated before, the associated triples and
document data are retrieved. As the document data is only used for better
characterizing the answers, let us focus on the triples.
7
In English, “What was Mel Blanc allergic to?”
778 R. Rodrigues and P. Gomes

When a sentence contains more than one triple, it is selected the triple which
predicate matches the verb in the initial query. If that fails, it is selected the
triple, as a whole, that best matches the query, accordingly to the Lucene ranking
algorithm for text matches. After a triple is selected, if the best match against the
query is found in the subject, the object is returned as being the answer. If, on
the other hand, the best match is found against the object, it is the subject that
is returned. An algorithm describing both the data querying and this process is
found in Alg. 2.

Data: Question
Result: Answers
Create query using named entities, proper nouns, or nound as mandatory, and
the remaining lemmas from the question as optional;
Run query against sentence index ;
foreach sentence hit do
Retrieve triples related to the sentence;
foreach triple do
if subject contains named entities from question then
Add object to answers and retrieve sentence and document
associated with the triple;
end
else if object contains named entities from question then
Add subject to answers and retrieve sentence and document
associated with the triple;
end
end
end
Algorithm 2: Answer Retrieval Algorithm

Continuing with the used example, given the correct sentence, of the three
triples, the one that best matches the query is “{Mel Blanc} {ser} {alrgico a
cenouras}”. Removing from the triple the known terms from que question, what
remains must yield the answer: “[a] cenouras”. Besides that, as the named entity,
Mel Blanc, is found in the subject of the triple, the answer is most likely to be
found in the object.

4 Experimentation Results
For the experimental work, we have used the CHAVE corpus [20], a collection
of the 1994 and 1995 editions — a total of 1456 — of newspapers “Pblico” and
“Folha de So Paulo”, with each of the editions usually comprehending over one
hundred articles, identiﬁed by id, number, date, category, author, and the text
of the news article itself.
CHAVE was used in the Cross Language Evaluation Forum (CLEF)8 QA
campaigns as a benchmark — although in the lasts editions of the Multilingual
QA Track at CLEF a dump of the Portuguese Wikipedia was also used.
8
http://www.clef-initiative.eu/
RAPPORT — A Portuguese Question-Answering System 779

Almost all of the questions used in each of the CLEF editions are known. It is
also known the results of each of the contestant systems. The questions used in
CLEF adhere to the following criteria [12]: they can be list questions, embedded
questions, yes/no questions (although none was found in the questions used for
Portuguese), who, what, where, when, why, and how questions, and deﬁnitions.
For reference, in Table 1 there is a summary of the best results (all answers
considered) for the Portuguese QA task on CLEF from 2004 to 2008 (abridged
from [6,8,11,12,23]), alongside with the arithmetic mean for each system com-
prehending the editions where they were contenders. At the end of the table, it
is also shown the results of the proposed system.

Table 1. Comparison of the Results at CLEF 2004 to 2008

Overall Accuracy (%)

Approach
2004 2005 2006 2007 2008 Average
Esfinge 15.08 23.00 24.5 8.0 23.5 18.82
PTUE/Senso 28.54 25.00 — 42.0 46.5 35.51
Priberam — 64.50 67.0 50.5 63.5 61.34
NILC — — 1.5 — — 1.5
RAPOSA — — 13.0 20.0 14.5 18.83
QA@L2 F — — — 13.0 20.0 16.5
IdSay — — — — 32.5 32.5
RAPPORT 47.78 33.33 48.89 41.07* 37.78* *

Although we are using the questions for Portuguese used in CLEF in those
years, a major restriction applies: just the questions made upon CHAVE with
known answers were selected, as made available from Linguateca. As such, we
are using a grand total of 641 questions for testing our system.
Notice that the years of 2004, 2005 and 2006 have only 180 (out of 200) known
questions each, with their answers found in CHAVE, and the other two years
have the remaining, with 56 in 2007, and 45 in 2008. In those two last years, the
majority of the questions had the answers found on the Portuguese Wikipedia
instead of just CHAVE. As such, the results for 2007 and 2008 represent the
overall accuracy (grouping CHAVE and Wikipedia) of the diﬀerent systems in
those years, and not just for questions over CHAVE — unfortunately, the values
for CHAVE and Wikipedia were not available separately. That is the reason for
omitting the average result of our system in Table 1, and signalling the results
for the years 2007 and 2008.
For verifying if the retrieved triples contain the expected answers, the triples
must contain (in the subject or in the object) the named entities found in the
questions (or, in alternative, proper nouns, and, if that fails, just the remaining
nouns), and also match in the subject or in the object the known answers from
CHAVE (alongside the same document id).
Using the set of questions that were known to have their answers found on
CHAVE, on a base line scenario, we were able to ﬁnd the triples that answer
780 R. Rodrigues and P. Gomes

42.09% of the questions (274 in 641), grouping all the question from the already
identiﬁed editions of CLEF, without a limit of triples for each question. (If the
maximum retrieved triples per question is reduced to 10, the number of answered
questions drops to 20.75%.)
On the answers that have not been found, we have determined that in a
few cases the fail is due to questions depending on information contained in
other questions or their answers. In other situations, the problem lies on the
use of synonyms, hyponyms, hypernyms and related issues: for instance, the
question focusing on a verb and the answer having a related noun, as in “Who
wrote Y? ” for a question and “X is the author of Y.”. There are certainly also
many shortcomings in the creation of the triples, mainly on the chunks that are
close together, as opposed to the dependency chunks, that should and must be
addressed, in order to improve and create more triples. Furthermore, there are
questions that refer to entities that fail to be identiﬁed as such by our system,
an so no triples were created for them when processing the sentences.

5 Conclusions and Future Work

Although the proposed system currently only scores in an average place among
the other systems, the use of triples seems to be a promising way of selecting the
right and shorter answers to most of the questions addressed. However, there is
a lot that can be improved.
Triples could be improved, namely those that are built from the relations of
proximity between chunks, so the system is able to have a number of retrieved
triples on par with with the sentences that contain the answers (and the triples).
Another boost to the approach would be to diﬀerentiate the queries accordingly
to the types of the named entities found in the questions, and improve NER,
both on questions an on corpus sentences.
It is also our intention to use synonyms, hyponyms, hypernyms, and other
relations between tokens or lemmas, in order to expand and improve the queries
made to the indices, which will increase the number of retrieved sentences — and
the number of right triples would also increase — using a wordnet-like resource,
such as Onto.PT [9].
We are currently studying a way of relating words of the same family, such
as “escritor ” (writer) and “escrever ” (to write) that when lemmatized, or even
stemmed, end up being put apart. That can be an issue in situations where the
question uses a verb for characterizing the agent, and the candidate answer uses,
for instance, an adjective (of the same family of the verb) to characterize the
same agent. A solution may be a list of agents and corresponding verbs, applying
a set of rules to the verbs in order to generate the corresponding agents.
Another aspect that should be considered is the use of coreference resolu-
tion [10] in order to improve the recall of triples by way of replacing, for instance,
pronouns with the corresponding, if any, named entities, and hence increasing
the number of usable triples.
We believe that expanding the queries using the above technics, and creating
better model extract triples, can achieve better results in a short time span.
RAPPORT — A Portuguese Question-Answering System 781

Finally, the next major goal is to use the Portuguese Wikipedia as a repos-
itory of information, either alongside CHAVE, to address the latter editions of
CLEF, or by itself, as it has happened in Págico [15].

References
1. Amaral, C., Figueira, H., Martins, A., Mendes, A., Mendes, P., Pinto, C.: Prib-
eram’s question answering system for Portuguese. In: Peters, C., et al. (eds.) CLEF
2005. LNCS, vol. 4022, pp. 410–419. Springer, Heidelberg (2006)
2. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press,
New York (1999)
3. Carvalho, G., de Matos, D.M., Rocio, V.: IdSay: question answering for Portuguese.
In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 345–352. Springer,
Heidelberg (2009)
4. Costa, L.F.: Esfinge – a question answering system in the web using the Web. In:
Proceedings of the Demonstration Session of the 11th Conference of the European
Chapter of the Association for Computational Linguistics, pp. 410–419. Association
for Computational Linguistics, Trento, Italy, April 2006
5. Filho, P.P.B., de Uzêda, V.R., Pardo, T.A.S., das Graças Volpe Nunes, M.: Using
a text summarization system for monolingual question answering. In: CLEF 2006
Working Notes (2006)
6. Forner, P., et al.: Overview of the CLEF 2008 multilingual question answering
track. In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 262–295.
Springer, Heidelberg (2009)
7. Gamallo, P.: An overview of open information extraction. In: Pereira, M.J.V.,
Leal, J.P., Simões, A. (eds.) Proceedings of the 3rd Symposium on Languages,
Applications and Technologies (SLATE 2014). OpenAccess Series in Informatics,
pp. 13–16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publish-
ing, Germany (2014)
8. Giampiccolo, D., et al.: Overview of the CLEF 2007 multilingual question answer-
ing track. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A.,
Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 200–236. Springer,
Heidelberg (2008)
9. Gonçalo Oliveira, H.: Onto.PT: Towards the Automatic Construction of a Lexical
Ontology for Portuguese. Ph.D. thesis, Faculty of Sciences and Technology of the
University of Coimbra (2012)
10. Jurafsky, D., Martin, J.H.: Speech and Language Processing, 2nd edn. Pearson
Education International Inc, Upper Saddle River (2008)
11. Magnini, B., et al.: Overview of the CLEF 2006 multilingual question answering
track. In: Peters, C., Clough, P., Gey, F.C., Karlgren, J., Magnini, B., Oard, D.W.,
de Rijke, M., Stempfhuber, M. (eds.) CLEF 2006. LNCS, vol. 4730, pp. 223–256.
Springer, Heidelberg (2007)
12. Magnini, B., et al.: Overview of the CLEF 2004 multilingual question answering
track. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini,
B. (eds.) CLEF 2004. LNCS, vol. 3491, pp. 371–391. Springer, Heidelberg (2005)
13. Mendes, A., Coheur, L., Mamede, N.J., Ribeiro, R., Batista, F., de Matos,
D.M.: QA@L2 F, first steps at QA@CLEF. In: Peters, C., Jijkoun, V., Mandl, T.,
Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS,
vol. 5152, pp. 356–363. Springer, Heidelberg (2008)
782 R. Rodrigues and P. Gomes

14. Moens, M.F.: Information Extraction: Algorithms and Prospects in a Retrieval

Context. Springer-Verlag, Heidelberg (2006)
15. Mota, C.: Resultados Págicos: Participação, Resultados e Recursos. Linguamática
4(1) (April 2012)
16. Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryiğit, G., Kübler, S., Marinov, S.,
Marsi, E.: MaltParser: A language-independent system for data-driven dependency
parsing. Natural Language Engineering 13(2), 95–135 (2007)
17. Quaresma, P., Quintano, L., Rodrigues, I., Saias, J., Salgueiro, P.: The University
of Évora approach to QA@CLEF-2004. In: CLEF 2004 Working Notes (2004)
18. Rodrigues, R., Gonçalo Oliveira, H., Gomes, P.: LemPORT: a high-accuracy cross-
platform lemmatizer for Portuguese. In: Pereira, M.J.V., Leal, J.P., Simões, A.
(eds.) Proceedings of the 3rd Symposium on Languages, Applications and Tech-
nologies (SLATE’14). OpenAccess Series in Informatics, pp. 267–274. Schloss
Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany (2014)
19. Saias, J., Quaresma, P.: The senso question answering approach to Portuguese
QA@CLEF-2007. In: Nardi, A., Peters, C. (eds.) Working Notes for the CLEF
2007 Workshop. Budapest, Hungary (2007)
20. Santos, D., Rocha, P.: The key to the first CLEF in Portuguese: topics, ques-
tions and answers in CHAVE. In: Proceedings of the 5th Workshop of the Cross-
Language Evaluation Forum, pp. 821–832. Springer-Verlag, Bath, September 2005
21. Sarmento, L., Oliveira, E.: Making RAPOSA (FOX) smarter. In: Nardi, A., Peters, C.
(eds.) Working Notes for the CLEF 2007 Workshop. Budapest, Hungary (2007)
22. Strzalkowski, T., Harabagiu, S. (eds.): Advances in Open Domain Question
Answering, Text, Speech and Language Technology, vol. 32. Springer-Verlag,
Heidelberg (2006)
23. Vallin, A., et al.: Overview of the CLEF 2005 multilingual question answering track.
In: Peters, C., et al. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 307–331. Springer,
Heidelberg (2006)
Automatic Distinction of Fernando Pessoas’
Heteronyms

João F. Teixeira1(B) and Marco Couto2

1
University of Porto, Porto, Portugal
[email protected]
2
HASLab / INESC TEC, University of Minho, Braga, Portugal
[email protected]

Abstract. Text Mining has opened a vast array of possibilities concerning

automatic information retrieval from large amounts of text documents. A
variety of themes and types of documents can be easily analyzed. More
complex features such as those used in Forensic Linguistics can gather
deeper understanding from the documents, making possible performing
difficult tasks such as author identification. In this work we explore the
capabilities of simpler Text Mining approaches to author identification
of unstructured documents, in particular the ability to distinguish poetic
works from two of Fernando Pessoas’ heteronyms: Álvaro de Campos and
Ricardo Reis. Several processing options were tested and accuracies of 97%
were reached, which encourage further developments.

Keywords: Authorship classiﬁcation · Machine learning · SVM · Text

mining

1 Introduction

With the dawn of Text Mining (TM) a massive amount of information was
enabled to be retrieved automatically. It is intended to find and quantify even
subtle correlations over a large amount a data. This way, a wide variety of themes
(economics, sports, etc) with different levels of structure, could be analyzed
with little effort. Many TM solutions have been employed in security and web
text analysis (blogs, news, etc). TM has been used in sentiment analysis as
for evaluating movie reviews to estimate acceptability [1]. Forensic Linguistics
enhances TM by considering higher level features of text. Linguistic techniques
are usually applied to legal and criminal contexts for problems such as document
authorship, analysis and measure of content and intent.
The purpose of this study is to generate a small representation of a large
corpus of poems, able to discern between authors or aliases. For this initial
study we selected to classify the collection of poems by two of Fernando Pes-
soa’s heteronyms. Ricardo Reis and Álvaro de Campos were chosen due to their
contrasting themes and initial concerns relative to the model’s accuracy for this
kind of dataset.

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 783–788, 2015.
DOI: 10.1007/978-3-319-23485-4 78
784 J.F. Teixeira and M. Couto

To the best of our knowledge, there are no pattern recognition studies for
alias distinction on poetic texts, therefore, no direct comparison of this work can
be made. On the other hand, there is research on generic alias identification [2],
however the objective is to find which aliases correspond to the same author and
not to distinguish between personnas.
The author whose works we analyze is Fernando Pessoa [3], who wrote under
several heteronyms or aliases. Each one had their own life stories and personal
taste in writing style and theme.
Ricardo Reis is an identity of classical roots, when considering his poems’
structure, theatricality and entities mentioned (ancient Greek and Roman ref-
erences). He is fixated with death and avoids sorrow by trying to disassociate
himself with anything in life. He seeks resignation and intellectual happiness.
Álvaro de Campos has a different personality, even presenting an internal
evolution. Initially, he is shown to be a thrill seeker, mechanic enthusiast, and
wishes to live the future. In the end, he feels defeated by time and devoided
of the will to experience life. Consequently, he uses a considerable amount of
interjections, in a weakly formatted writing style, with expressive punctuation.
The remaining of this document is structured as follows: In Section 2 the
dataset is presented and described. Section 3 shows the methodology employed.
Section 4 details the experimental approach, along with the result discussion.
Section 5 the overall findings are presented along with possible future work.

2 Dataset

The dataset used in this work consists of the complete known poetic works1 of
Ricardo Reis (class RR) and Álvaro de Campos (class AC). Table 1 presents
some statistics concerning the dataset.

Table 1. Class distribution

# of Words # of Verses
Class # of entries % Avg. Std. Min Max Avg. Std. Min Max
RR 129 54% 77.9 65.1 19 570 14 12 4 106
AC 108 46% 360.9 904.2 29 7857 46 103 5 909

3 Methodology

In this section, the steps taken and experimental approach followed are shown.
First, we tested the classiﬁer with the tokenized documents and we progressively
introduced other pre-processing models, comparing their performance.
The SVM model was validated with 70% of the dataset using 5-fold cross-
validation while the remaining 30% enabled to evaluate the generalization per-
formance of the generated model, i.e., the voting result of the 5 fold models.

1
Available at: http://www.dominiopublico.gov.br
Automatic Distinction of Fernando Pessoas’ Heteronyms 785

3.1 Document Pre-processing

S1 -Tokenization. Each document was turned into a sequence of word-level terms.
Then, they were compacted into bags-of-words (BoW), disregarding their order,
which is the most common document representation [4].
S2 -Casing Transformation. After the tokenization, all words suffer a lower-
case transformation, reducing the number of different terms. Here, we disregard
the capitalization of the poems’ first word at every verse, while inadvertently
removing significance from capitalized names and some metaphoric references.
S3 -Length Filtering. We remove from the token bags terms that contain below
4 or above 15 letters. The reason for this relates to the high probability of shorter
words being irrelevant articles or connectors, leading to overfitting of the model,
and not many Portuguese words have such large lengths. In fact, the average
size of Portuguese words is 4.64 [5]. Nevertheless, removing words produces a
more compact representation of the dataset and reduces possible dimensionality
issues the classification model may experience with larger feature spaces.
S4 -Stemming. Word Stemming also compacts document instances. This con-
sists of removing word affixes, leaving only the root term. Generally, stemmers
follow iterative replacement rules, some even dealing with irregular and rare
terms. For this work, the Snowball Portuguese dictionary was used [6] [7].
S5 -Stopword. Finally, we include stopword removal. This consists of ignoring
all terms in a given dictionary. This might help the classifier focus on meaningful
terms instead of considering articles, connectors and overall writing style. Also,
specific unwanted words can be eliminated.

3.2 Occurrence Metrics

To evaluate if a word is distinctive for the classification task some methods based
on its occurrence can be used. The following metrics were experimented:
Binary Term Occurrence. BTO identifies the number of documents in which
a given term occurs. It provides little information thus is rarely used.
Term Occurrence. The TO metric provides the number of times a word occurs
on each document of the collection. This can be viewed as a measure of signifi-
cance of a given word for each document.
Term Frequency. The TF is a relative measure of the word occurrence consid-
ering the number of words in a document. Consequently, this can be misleading
depending on the documents’ length variability.
TF-IDF. is generally calculated as the product of TF and the Inverse Docu-
ment Frequency (IDF) [8]. The IDF approach concerns the number of documents
which contain a given term. A term that occurs frequently does not provide dis-
criminative power and should be given less importance (lower weight) [9].
We used SVMs [10] that can linearly separate clusters of data on feature
space, by maximizing the hyperspace boundary margin. The model was fed an
array of occurrence metrics for the terms included after pre-processing.
Since the focus of this work was on Text Mining, the SVM model employed
was relatively simple. A linear kernel with shrinking heuristics was used. It
included a termination tolerance ε = 0.001 and no penalty (C = 0).
786 J.F. Teixeira and M. Couto

4 Experimental Results
4.1 Estimation Using Cross-Validation

In Section 3, we conduct several experiments in which the text pre-processing

algorithms are incrementally included. These experiments considered the occur-
rence metric tf-idf since it is intuitively the most appropriate to compare the
results of models trained with such different instance content. The accuracy of
the experiments is S1 :93.35%, S2 :91.58%, S3 :90.44%, S4 :91.03% and S5 :90.44%.
With the length and stopword filtering several of the top scoring terms (SVM
weights) were removed. However most of these were articles and connectors which
could lead to model overfitting. Their removal only decreased slightly the accu-
racy. Along with stemming and the lowercase transformation, the number of
attributes considered was reduced in more than half (8941 to 4398 terms).
The following experiments aim to evaluate the influence of different the occur-
rence metrics on the classification model. Table 2 presents the results of those
experiments. The results show that the model using Term Occurrence based met-
rics (BTO, TO) performs worse than with frequency based metrics (TF, tf-idf )
including misclassification rate balance.
We note that this comparison is not truly fair. The processing pipeline for the
experiments was previously optimized for tf-idf thus, providing only a general
comparison. The performances with these last two are very close and, therefore,
the best method cannot be directly found.

Table 2. Occurrence Metrics Experiments (%)

Binary TO TO TF TF-IDF
Acc F1RR F1AC Acc F1RR F1AC Acc F1RR F1AC Acc F1RR F1AC
80.74 84.91 73.33 68.71 77.39 49.01 91.03 91.89 89.80 90.44 91.58 88.73

4.2 Evaluation of Validation Setup

In this section, we considered, from the previous experiments, the pipelines with
the two best performances and with BTO (baseline), while using the remaining
30% of the dataset. The results are shown on Table 3.
As expected, BTO maintained the low accuracy and obtained lower F1AC .
The best two models kept the high accuracy, however, tf-idf managed to over-
come the improvement of TF, from the validation phase, even if only by 3
instances. This suggests that, even though tf-idf presented lower validation
results, it was somewhat underﬁtting. Either way, these comparative results were
expected due to the consideration of term rarity metrics of idf.

4.3 Influence of Long Poems

Due to a large diﬀerence of length statistics between the two labels, we conducted
further analysis. The documents were segmented into multiple instances such
Automatic Distinction of Fernando Pessoas’ Heteronyms 787

Table 3. Testing Set Results (%)

Binary TO TF TF-IDF
Acc F1RR F1AC Acc F1RR F1AC Acc F1RR F1AC
66,20 76,47 40,00 92,96 93,83 91,80 97,18 97,44 96,88

Table 4. Updated Class distribution

# of Words # of Verses
Class # of entries % Avg. Std. Min Max Avg. Std. Min Max
RR 169 51% 63.5 27.0 3 161 11 4 1 17
AC 165 49% 240.8 150.0 29 521 30 145 5 54

that the portions had the maximum amount of verses equal to the previous mean
for that class (plus a tolerance). Table 4 presents the updated class distribution.
For this experiment, the complete word processing pipeline and tf-idf scor-
ing criteria were used (testing phase best results). For the cross-validation and
testing steps, respectively, the accuracy was 96.58% and 96.04%; the F1RR was
96,64% and 96,15% and the F1AC was 96,49% and 95,92%.
The results show that imposing the upper bound on the number of verses
per poem increased the accuracy of the model in the validation phase by around
5%. Apart from accuracy, the model should have really improved since the mis-
classification rates became more balanced.
It is safe to assume that longer poems might be more difficult to sort correctly
into classes since they encompass more terms that can be highly influential to
the tf-idf metric (through emphatic repetition, etc) which may not contribute
positively for the accurate learning of attribute weights. Thus, this can affect
poorly on the classification. On the other hand, the test results, in a way, con-
tradict the analysis from the cross-validation. It performs slightly worse than the
test experiment for tf-idf. As of this, we cannot provide an acceptable hypothesis
as to which this occurs, rendering this analysis inconclusive.

5 Discussion and Conclusions

In this work we aimed to distinguish the authorship of poetic texts from two
heteronyms of Fernando Pessoa, solely using basic Text Mining approaches. To
our surprise, the methods were able to predict quite accurately (most over 70%,
best ∼97%), further verifying the a clear diﬀerence between the heteronyms.
This comes as a revelation mainly because, the author is, in fact, the same,
despite having created these two personnas, and thus, the vocabulary and certain
parts of writing style should be ubiquitous to the heteronyms.
Many of the best discerning words were related to writing style, including
several possessive related terms for AC. However, obviously, some of the best
terms were theme related keywords such as grande and sentir, referring to the
788 J.F. Teixeira and M. Couto

magniﬁcence of feelings of AC, cansaço, domingo and sonho to the tiredness AC

feels towards the end and recollections of the past; while RR tries to remain
forever calm and avoids pain.
Among the settings tested, tf-idf demonstrated, as expected, the best balance
and generated the highest accuracy for the testing set.
The change in accuracy for the shorter instance set was not conclusive.
These results suggest that dividing the larger poems was either not that relevant
or additional instances would be needed to confirm (accuracy already close to
100%).
Although our methodology produces good results, we intend to extend the
study to the rest of the heteronyms to evaluate if this kind of simple analysis is
still sufficient for discernibility. Additional relevant experiments and approaches
could include the model’s response to a few verses instead of large chunks or
complete poems and word n-grams analysis for style traits identification.
We realized that, according to Zipf’s law, both the highest and lowest fre-
quent terms are less frequent in large documents. Thus, our approach could be
improved concerning the enhancement of term relevance instead of only minding
to frequency. This could be done by including a normalization term in the tf-idf
formula [11].

References
1. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using
machine learning techniques. In: Proceedings of the ACL 2002 Conference on
Empirical Methods in Natural Language Processing, EMNLP 2002, vol. 10,
pp. 79–86. Association for Computational Linguistics, Stroudsburg (2002)
2. Nirkhi, S., Dharaskar, R.V.: Comparative study of authorship identification tech-
niques for cyber forensics analysis (2014). CoRR abs/1401.6118
3. de Castro, M.G. (ed.): Fernando Pessoa’s Modernity Without Frontiers: Influences,
Dialogues, Responses. Tamesis Books, Woodbridge (2013)
4. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Com-
put. Surv. 34(1), 1–47 (2002)
5. Quaresma, P., Pinho, A.: Análise de frequências da lı́ngua portuguesa. In: Livro de
Actas da Conferência Ibero-Americana InterTIC, pp. 267–272. IASK, Porto (2007)
6. Porter, M.F.: Snowball: A language for stemming algorithms, October 2001.
http://snowball.tartarus.org/texts/introduction.html
7. Porter, M.F.: Snowball: Portuguese stemming algorithm. http://snowball.tartarus.
org/algorithms/portuguese/stemmer.html
8. Salton, G., Buckley, C.: Term-weighting Approaches in Automatic Text Retrieval.
Inf. Process. Manage. 24(5), 513–523 (1988)
9. Robertson, S.: Understanding inverse document frequency: on theoretical argu-
ments for IDF. Journal of Documentation 60(5), 503–520 (2004)
10. Vapnik, V.N.: An overview of statistical learning theory. Trans. Neur. Netw. 10(5),
988–999 (1999)
11. Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization.
In: Proceedings of the 19th Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval, SIGIR 1996, pp. 21–29. ACM,
New York (1996)
Social Impact - Identifying Quotes
of Literary Works in Social Networks

Carlos Barata1,2(B) , Mónica Abreu1 , Pedro Torres2 , Jorge Teixeira2,3 ,

Tiago Guerreiro1 , and Francisco M. Couto1
1
Departamento de Informática, Faculdade de Ciências,
Universidade de Lisboa, Lisbon, Portugal
[email protected]
2
SAPO Labs, Lisboa, Portugal
3
LIACC, Universidade Do Porto, Porto, Portugal

Abstract. A non-neglectable amount of information shared in social

networks has quotes to literary works that, most of the times, is not
linked to the original work or author. Also, there are erroneous quotes
that do not fully match the original work, for example by including syn-
onyms and slang words. Moreover, users sometimes associate their quotes
to the wrong author, which creates misleading information. This paper
presents Social Impact framework as an approach to identify quotes in
social networks and match them to the original literary works from a par-
ticular author. This framework was applied to two case-studies: O Mundo
em Pessoa and Lusica. In the ﬁrst case-study, Social Impact evaluation
achieved 98% for precision measure and 59% for recall, whereas in the
latter case-study it obtained 100% for precision and 53% of recall.

Keywords: Information retrieval · Information extraction · Web min-

ing · Text mining · Pattern recognition

1 Introduction

Social networks emerged in last decade and changed the way we communicate,
becoming essential tools in the human interaction. This happened possibly due
to the fact that, at the distance of a click, lays the possibility to send and share
content. As Kwak et al.[4] refers, this wide use of social networks provide a great
interest of investigation in many areas like extraction and information analysis.
Most of the information shared in social networks such as Twitter and Face-
book is in text format, and an interesting amount of such information (messages)
contains quotes to literary works (e.g.: “Tudo vale a pena, quando a alma não é
pequena - Fernando Pessoa”). Nevertheless, in a non-neglectable number of cases
there is no reference to which text, book or literary work the quote is referred.
Due the fact that quotes may have incoherencies (e.g.: quote is diﬀerent
from the original text), the identiﬁcation of the original text or author can be
very challenging. These incoherencies have a higher presence on social networks

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 789–795, 2015.
DOI: 10.1007/978-3-319-23485-4 79
790 C. Barata et al.

(against, for instance, opinion articles on news) because of particular charac-

teristics of the network, namely: short messages or reduced context. The use of
synonyms or typos are some of the most common causes for the lack of accu-
racy in the quotes published in social networks. One may think hash-tags can
substantially reduce the complexity of this task, but unfortunately the usage of
hash-tags on messages literary work is low as referred in [8] study. This study
concluded that in Twitter, the ratio of hash-tags per tweet is between 4%(in
Japanese language) to 25%(in German language) of the total tweets using them.
This paper presents Social Impact platform as an approach to this problem.
The framework major goal is to identify quotes from literary work on social
network messages, supported on SocialBus 1 and Apache Lucene systems. Eval-
uation was performed on two case-studies: O Mundo em Pessoa and Lusica.

2 Related Work
Social Impact platform is generically supported on two diﬀerent technological
blocks: SocialBus, a social network crawling and analysis platform, and Lucene,
a high-scalable infrastructure for indexing and querying documents.
SocialBus: Social networks such as Twitter and Facebook provide APIs that
allow access to public messages, within certain limits, giving the possibility of
analysing such content for a variety of purposes, including quotes detection. We
propose to use SocialBus platform[2,7], a framework that collects and analyse
data from Twitter and Facebook for a pre-deﬁned set of users representative of
the Portuguese community.
Lucene: is an open-source software2 for text searching and indexing through
a document indexation, coded in Java programming language and developed by
Apache Software Foundation. According to Gospodnetic et al.[3] this framework
works through the indexation of documents, information parsers and queries
to consult and retrieve the indexed information. The result is a ranked list of
documents ordered by relevance [1,5,6].

3 Social Impact Platform

Social Impact main objective is to ﬁnd quotes in messages published in social

networks and subsequently link them to their original literary work. This frame-
work stores such data in a relation database and provides such data as RESTful
APIs. More importantly, Social Impact architecture is abstract enough to be
applied on diﬀerent contexts and scenarios.

1
http://reaction.fe.up.pt/socialbus/
2
http://lucene.apache.org/core/
Social Impact - Identifying Quotes of Literary Works in Social Networks 791

3.1 Architecture

The Social Impact’s structure is based on a Service-Oriented Architecture(SOA),

broadly used in web applications, due to its standardisation approach. This
architecture is represented in Figure 1 and it has three main layers, described
below.

Fig. 1. Social Impact Global Architecture

External Layer: represents information and knowledge external to Social

Impact platform and that somehow is collected into the system. The leftmost
block, External knowledge, represents data speciﬁc to each case-study, including
literary work (e.g.: poems from Fernando Pessoa3 ) or domain speciﬁc keywords
used to narrow the search over SocialBus collected data (e.g.: poems or musics
authors). The remain two blocks represent Twitter4 and Facebook5 APIs to feed
Social Impact with data from social networks.
Backend Layer: is the core layer of Social Impact, and it is responsible
for processing the messages coming from SocialBus and analyse them through
the Quotes Detector, as well as store those messages and their subsequently
generated meta-data on suitable a relational database (MySQL).
Application Layer: represents the interface with the potential applications
using Social Impact platform. This layer comprehends a set of RESTful APIs
that provides information previously processed in the Backend layer to the web
applications.

3
Data obtained from “Arquivo Pessoa” available at http://arquivopessoa.net
4
https://dev.twitter.com/rest/public
5
https://developers.facebook.com
792 C. Barata et al.

3.2 Quotes Detector

Figure 2 presents a detailed diagram of the Quotes Detector module, with two
essential ﬂows of information:

Fig. 2. Quotes detector workﬂow

Pre-processing and Indexing External Knowledge: represented in

Figure 2 as “I” is imported only once and include, for instance, all the literary
work from a particular author. Each of these documents (e.g.: a single poem) is
submitted to Lucene engine, filtered through a stopwords filter and indexed.
Identification and Indexing of Quotes (refer to “II” in Figure 2) mod-
ule is listening to SocialBus and imports new data as new messages arrive to
SocialBus. Those messages are then filtered with a stopwords and a badwords
(curse words) filter. Each filtered message is transformed into a lucene query
syntax. The “Search” operation compares the indexed documents from Social
Impact (External Knowledge) with each new message and retrieve, if the score
if above a given threshold, the most relevant document (poem, music, etc.) as a
positive match of a quote. Moreover, all tokens from the matched message are
isolated and stored in the database.

4 Case Studies
O Mundo em Pessoa 6 is a web based project that aims to depict the presence
of Fernando Pessoa poems on social networks, based on quotes to his literary.
This project is based on Social Impact platform and it covers Fernando Pessoa
work and from all his heteronyms. The list of terms used to narrow the mes-
sages crawling (refer to Section 3.1) contains the names of all Fernando Pessoa
heteronyms. This project is supported on a Web Application that displays the
identiﬁed quotes from Fernando Pessoa organized by timeframes, going from one
6
http://fernandopessoa.labs.sapo.pt/
Social Impact - Identifying Quotes of Literary Works in Social Networks 793

day to one month. For each quote, the user has the possibility to explore the
number of social network users that publish that particular quote and access the
original message, among other features.
Lusica 7 main purpose was study the lusophone music and its presence on the
social networks, supported on Social Impact platform. There are two importante
aspects that diﬀerentiate “Lusica” from “Mundo em Pessoa”: (i) the domain is
music instead of literary, and (ii) a large eﬀort was put on the visualization of the
information obtained from the quotes detection, through an interactive graph
available online. “Lusica” external knowledge (refer to Figure 1) is based on the
musics’ and albums’ titles from lusophone artists. Such information was obtained
from LastFM APIs8 (the list of lusophone artists) and from MusicBrainz service9
(the albums and musics titles for each lusophone artist).

5 Results and Discussion

In this section will be displayed the results of the evaluation of Social Impact for
both case studies. This evaluation aims to provide an overview of the system’s
performance.
Data Collection: For evaluation purposes, we used a subset of data pub-
lished between January 2014 and June 2014. Regarding O Mundo em Pessoa,
from a set of 56.212 collected messages, only approximately 8% (4.720 messages)
were identified as quotes (with Lucene score larger than 1,0). As expected, most
of the collected messages are not classified as quotes, and this phenomenon can
be explain by the fact that many of the collected messages are just references to
Fernando Pessoa but are not actually a quote to his literary work. For Lusica,
results are similar to O Mundo em Pessoa, with only less than 5% (7.628) of
messages representing references to the songs’ artists, in a set of approximately
420.000 messages.
Quotes Detector Evaluation: Precision and recall metrics were calculated
based on the following types of documents: True positive (TP): messages cor-
rectly classified as quotes; False positive (FP): messages incorrectly classified as
quotes; True negative (TN): messages correctly classified as not quotes; False neg-
ative (FN): messages incorrectly classified as not quotes. Precision is measured
as P = T P/(T P +F P ) while recall is R = T P/(T P +F N ). Regarding recall, our
assumption is that SocialBus filtered messages correspond to all representative
messages for the specific domain of the case-study. Evaluation was performed
manually on a sample of 200 randomly chosen messages for each of the case-
studies. Regarding “Mundo em Pessoa”, the evaluation dataset was divided in 4
parts according to Lucene score and Twitter versus Facebook messages. Results
for Twitter shown a precision of 19% and recall of 100% for low scores (between
0, 5 and 1, 0) and precision of 98% and recall of 100% for high scores (between
7
http://lusica.labs.sapo.pt/
8
http://www.last.fm/api
9
https://musicbrainz.org/doc/MusicBrainz Identifier
794 C. Barata et al.

1, 0 and 2, 0). Concerning Facebook, precision value for low scores was 100% and
recall 21% while for high scores was precision was 96% and recall was 100%. The
average precision for “Mundo em Pessoa” was PM undoemP essoa = 98%, while
recall was PM undoemP essoa = 59%. In respect to “Lusica”, the same principle
was followed, by selecting a sample of 200 messages and dividing them in two
groups (Twitter messages with low and high Lucene score). Results shown a
average precision value PLusica = 100%, while recall was PLusica = 53%.
Execution Time: the performance of Social Impact platform was also eval-
uated, measuring the execution time of processing a single message in a desktop
with an Intel(R) Xeon(R) CPU E5405 @ 2.00GHz processor and 3GB of RAM
memory. For O Mundo em Pessoa the results achieved an average time of exe-
cution of 0,01 seconds (±0, 002). Regarding Lusica, the result obtained was an
average execution time of 0,02 seconds (±0, 004).

6 Conclusions

This paper presented an approach to ﬁnd quotes of original literary work in

shared messages on social networks. The proposed approach is supported on the
Social Impact developed platform presented in this paper. This framework was
applied to two distinct case studies: O Mundo em Pessoa and Lusica.
Evaluation shown that most of the collected messages from SocialBus are not
classified as quotes (less than 8%) because they are just references to the author
and do not contain any quotes. Social Impact evaluation achieved high precision
values for both case-studies: PM undoemP essoa = 98% and PLusica = 100%.
Future work includes: (i) to automatically set the threshold for the Lucene
score based on machine learning approaches; (ii) improve the domain specific list
of keywords using automatic approaches; (iii) use user feedback to fill missing
information about authors and song lyrics; and (iv) apply Social Impact platform
on other non-literary corpora, such as plagiarism detection.

Acknowledgments. This work was partially supported by SAPO Labs and FCT
through the project PEst-OE/EEI/UI0408/2013 (LaSIGE), and by the European Com-
mission through the BiobankCloud project under the Seventh Framework Programme
(grant #317871). The authors would like to thank to Bruno Tavares, Sara Ribas and
Ana Gomes from SAPO Labs, João Martins, Tiago Aparcio, Farah Mussa, Gabriel
Marques and Rafael Oliveira from University of Lisbon and Arian Pasquali from Uni-
versidade of Porto for all their support, insights and feedback.

References
1. Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern information retrieval, vol. 463.
ACM press New York (1999)
2. Boanjak, M., Oliveira, E., Martins, J., Mendes Rodrigues, E., Sarmento, L.:
Twitterecho: a distributed focused crawler to support open research with twitter
data. In: Proceedings of the 21st WWW, pp. 1233–1240. ACM (2012)
Social Impact - Identifying Quotes of Literary Works in Social Networks 795

3. Hatcher, E., Gospodnetic, O.: Lucene in action. Manning Publications (2004)

4. Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news
media? In: Proceedings of the 19th WWW, pp. 591–600. ACM (2010)
5. Liu, B.: Web data mining. Springer (2007)
6. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval,
vol. 1. Cambridge University Press, Cambridge (2008)
7. Oliveira, E.J.S.L.: TwitterEcho: crawler focado distribuı́do para a Twittosfera por-
tuguesa. Master’s thesis, Faculdade de Engenharia da Universidade do Porto (2010)
8. Weerkamp, W., Carter, S., Tsagkias, M.: How people use twitter in diﬀerent lan-
guages. Proceedings of the Web Science (2011)
Fractal Beauty in Text

João Cordeiro1,3(B) , Pedro R.M. Inácio1,2 , and Diogo A.B. Fernandes2

1
Department of Computer Science, University of Beira Interior,
Rua Marquês d’Ávila e Bolama, 6200-001 Covilhã, Portugal
{jpaulo,inacio}@di.ubi.pt
2
Instituto de Telecomunicações, Av. Rovisco Pais, 1 - Torre Norte,
1049-001 Lisboa, Portugal
[email protected]
3
INESC – TEC, University of Porto,
Rua Dr. Roberto Frias, 378, 4200-465 Porto, Portugal
[email protected]

Abstract. This paper assesses if text possesses fractal properties,

namely if several attributes that characterize sentences are self-similar.
In order to do that, seven corpora were analyzed using several statistical
tools, so as to determine if the empirical sequences for the attributes were
Gaussian and self-similar. The Kolmogorov-Smirnov goodness-of-ﬁt test
and two Hurst parameter estimators were employed. The results show
that there is a fractal beauty in the text produced by humans and suggest
that its quality is directly proportional to the self-similarity degree.

Keywords: Self-similarity in text · Statistical linguistic studies

1 Introduction
Since the advent of the World Wide Web, an increasing number of users insert
text from a multitude of sources, namely from newly created web pages, news
in electronic newspapers, blogs, product reviewing, and social networks. This
opened new opportunities for linguistic studies and the need for new applications
to intelligently deal with all this text and to make sense out of it.
The question of automatically and effectively assessing the quality of a text
remains unanswered. In general, an experienced human reader can judge the
complexity and quality of a given text, a task not so easily attained with com-
putational means. The human reader can not only determine if phrases are
grammatically correct, but also figure out the lexicon degree and the structural
and rhetorical combination of words, sentences, and ideas. There are aesthetic
principles in the way of writing, yielding different types of texts. The spectrum
ranges from almost telegraphic accretions, posted in Twitter, up to nobel prize
winning novels. The goal of this study covers the topic of the text quality referred
herein by looking after the mathematical principles underlying it.
This paper discusses a series of experiments conducted to find out if the text
produced by humans exhibits a statistical property known as self-similarity. Self-
similarity is a property of fractals, and it refers to the possibility of parts of a

c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 796–802, 2015.
DOI: 10.1007/978-3-319-23485-4 80
Fractal Beauty in Text 797

(mathematical) object to be similar to its parts. In the case of self-similarity, it

refers more specifically to the fact that the statistical properties of the object
look the same independently of the scale from which it is observed.
This paper presents our initial findings on the self-similar properties of text.
Its main contributions are twofold: (i) it is shown that the sequences, constructed
by measuring several attributes of text blocks produced by humans, are self-
similar, suggesting also that the self-similarity degree is related with the quality
of the text; and (ii) the way the analysis is performed defines the basic founda-
tions for future research on the intersection of these two fields.

2 Related Work
To the best of our knowledge, no other work has been involved in the deter-
mination of self-similarity in text and how it can be used for characterization
assessment of general aesthetic principles. However, there are a number of related
works with similar goals. For example, McCarthy and Jarvins [9] compared dif-
ferent methods to determine lexical diversity (LD) in text.
The LD index measures the vocabulary diversity of a given text segment
and is usually calculated by dividing the number of tokens by the number of
types (the number of unique tokens in a segment of text), also known as the
token-type proportion. As it depends on the considered text segment length, it
is not used directly to compute the LD index. Instead, a number of strategies
have been proposed [6,9], some of them being based on the division of a text in
fixed segments of n tokens. The LD index has been used to assess the writing
skills of a subject in a variety of studies, namely in children language skills
measurements, English second language acquisition, Alzheimer’s onset, and even
speaker socioeconomic status [7,8].
Forensic linguistic analysis [10] is a recent field with diverse applications,
such as plagiarism detection [1], authorship identification [12], cybercrime and
terrorism tracking [10], among others. The work in this field is based on the use
of a number of text characteristics in different levels of analysis like morpho-
logical, lexical, syntactical, and rhetorical [1,12]. These textual characteristics
necessarily exhibit different self-similar properties that suggest investigation. The
findings presented in this work are a first step toward that objective, specifically
at the lexical level.

3 Self-Similarity and Statistical Tools

In this paper, fractality is studied for values representing attributes of text. It
does not concern any particular visual interpretation of the text, in which some
artifact is repeated as we graphically zoom in or out. It refers to an interpretation
for the statistical behavior of the aforementioned values (e.g., the number of
words or non-words in a sentence). Self-similarity refers to the property of a
stochastic process that looks statistically identical for any (aggregation) scale
from which it is observed from. It is typical to deﬁne a (discrete-time) self-similar
798 J. Cordeiro et al.

d
process {X(t)}t∈N as the one that fulfills the condition that X(t) = a−H X(at),
d
where = denotes equality in all finite-dimensional distributions, a ∈ N and 0 <
H < 1 is the Hurst parameter, also referred to as the self-similarity degree or
the Hurst exponent. The most widely known example of a self-similar process
is the fractional Brownian motion (fBm), which has a Gaussian distribution.
Its first order differences process, denoted as fractional Gaussian noise (fGn) is
often useful too, since many natural or artificial processes occur in this form.
Thus, when performing self-similarity analysis, it is typical to assess whether the
empirical values are consistent with sampling a Gaussian variable. In this work,
the Kolmogorov-Smirnov goodness-of-fit test [3] was used for that purpose.
There are several methods for estimating the Hurst parameter from empirical
data, most of them based on repeatedly calculating a given statistic (e.g., variance
or maximum value) for the original process and for a finite number of aggregated
processes. The Hurst parameter estimators used in this study were the well-known
Variance Time (VT) and Rescaled Range Statistics (RS) estimators. The statisti-
cal tools mentioned herein are all implemented in the open-source TestH tool [4].
It accepts files containing raw values separated by space or newline, normalizes
them, and outputs the estimated values of the Hurst parameter and the p-value
concerning the Kolmogorov-Smirnov goodness-of-fit test.
When the Hurst parameter is 0.5, the process is memoryless and each occur-
rence is completely independent of any past or future occurrences. For values
of the Hurst parameter ranging from 0 to 0.5, the process is anti-persistent or
short-range dependent, while for values between 0.5 and 1, the process is said to
be persistent or long-range dependent. There are many examples of long-range
dependent processes in natural and artificial processes (e.g., the water level in
rivers [5]). Prior from starting this work, our expectation was that the text was
self-similar with Hurst parameter larger than 0.5 and that the degree of self-
similarity was perhaps related with the quality of the text.
Our base unit to construct processes of text attributes is a block of 100
words, meaning that all the sequences analyzed in the scope of this work refer to
attributes per 100 tokens. If a self-similar structure is embedded in the data, then
the statistical behavior of these attributes is the same (apart from scaling) for
each block of tokens, or for any number of them. Additionally, it is an indication
that human writing is done in bursts, which means that blocks with higher counts
in some attribute are probably followed by other blocks with higher counts also,
and vice-versa.

4 Data and Experimentation

Five diﬀerent types of English-written corpora were analyzed in the performed
experiments. Each corpus was prepared identically before being submitted for
testing. There are essentially three major genres that stand out: Literature, News
Stories, and Blogs. In order to strengthen the validation of the main hypothesis,
it was also decided to include a text corpus generated randomly from the words of
the English language. Below, follows a more detailed description of each corpus:
Fractal Beauty in Text 799

The Blogs Corpus: Also known as the Blog Authorship Corpus [11], is a mas-
sive collection of 681K blog posts, gathered from 19320 users, from the blog-
ger.com website. These blogs cover a wide range of subjects like Advertising,
Biotechnology, Religion, Science, among others. It contains a total of 38 differ-
ent subjects, with 300 million words (844 MB). There are three user age clusters:
13-17 years (8240 users), 23-27 years (8066 users) and 33-48 years (2944 users);
The News Corpus: This corpus is formed by a huge set of news stories, auto-
matically collected from the web. An amount of 4.2 MB of text was randomly
selected from the set. The news stories were collected for several main subjects,
namely Politics, Economics, Finance, Science, among others;
The Literature Corpus: The (i) complete work of Shakespeare and the (ii)
set of books (66) from the Bible1 was selected for this corpus type;
The Random Corpus: A corpus of similar size was randomly generated in
order to validate the proposed self-similarity measure. Each word was randomly
taken from the English vocabulary, according to a uniform2 distribution.
The aim of this study is to know whether certain text characteristics exhibit
self-similarity properties by resorting to the estimation of the Hurst parameter.
The test herein described was designed to determine self-similarity in time series,
performing a large number of measurements of attributes over time. Here, the
origin of the several time series is a corpus consisting of a considerable amount
of text. Thus, the experience had to be drawn to meet two principles: the (i) set
of the textual attributes to be measure and the (ii) reading structure of the text
in order to achieve a number of significant measurements.
Definition of Attributes: For the first principle we have considered six lexical
features that are measured for a given block (amount) of text: the number of
(A0 ) non-words; (A1 ) small words (|w| < 3); (A2 ) medium words (3 ≤ |w| < 7);
(A3 ) long words (|w| > 6); (A4 ) sentences; and (A5 ) the lexical diversity.
Reading Structure: To satisfy the second principle we have decided to divide
the text in sequential blocks with equal number of words. It was decided to choose
the block size near to an average paragraph length, assuming five sentences per
paragraph with each one having an average length of 20 words [2]. In previous
studies of this kind, researchers usually take text chunks of this length [6,9].
Below is one such block of 100 tokens:
The guy’s license plate was a little obvious" 68CONVT". I mean, you can see that
it’s a convertible please, because the top was down. Anyway, I stared straight
ahead. but could hear that low throaty rumble next to me. Suddenly, I felt tears
prickling in my eyes. It dawned on me that I was suffering from the maliblues.
What will happen at Hot August Nights? Those muscle cars cruise nightly and rev
and rev and rev. I’m thinking I should get a medical bracelet with maliblues
----------------------------------------------------------------------------------
(~W, |W|<3, 3=<|W|<6, |W|>=6, #Sentences, Lex. Div.) ---> (18, 16, 51, 15, 8, 67)

1
We have chosen the English translation version from King James.
2
In the future a Zipﬁan law will be considered.
800 J. Cordeiro et al.

5 Results
Each corpora (Section 4) was processed to produce the necessary sequences of
numbers representing the time series to be analyzed. Given the attributes and
block size, these sequences consisted of integer numbers larger than 0 and smaller
than 100. After being input to the TestH tool, they were normalized and the
VT √and RS estimators were applied to the resulting process. The p-value for
the nD statistic was also calculated, via the application of the Kolmogorov-
Smirnov goodness-of-ﬁt statistical test available in the tool. Note that the values

Table 1. Results obtained on seven corpora regarding six attributes.

A0 : num. non words A1 : num. words < 3 chars A2 : num. words 3-6 chars
Corpora
VT RS KS VT RS KS VT RS KS
Blogs 13-17 0.73778 0.86064 0.20301 0.70702 0.83312 0.27614 0.72485 0.83613 0.27614
Blogs 23-27 0.76648 0.86895 0.10449 0.81549 0.83606 0.34726 0.75532 0.81676 0.34726
Blogs 33-48 0.84432 0.85356 0.09445 0.83969 0.86252 0.11157 0.86863 0.84596 0.07483
News 0.63524 0.78496 0.01202 0.46540 0.81174 0.00491 0.73615 0.85698 0.00351
Bible 0.53201 0.83720 0.19772 0.65733 0.79622 0.19772 0.88749 0.81389 0.00688
Shakespeare 0.64795 0.84271 0.01136 0.63140 0.75893 0.03116 0.57925 0.78148 0.02035
Random 0.28842 0.46262 0.00432 0.68364 0.51305 0.97693 0.57995 0.46603 0.03807
A3 : num. words > 6 chars A4 : num. sentences A5 : Lexical Diversity (LD)
Corpora
VT RS KS VT RS KS VT RS KS
Blogs 13-17 0.71949 0.86469 0.69745 0.76482 ∗ VOR 0.43729 0.76005 0.82502 0.32957
Blogs 23-27 0.81764 0.88098 0.65178 0.82454 ∗ VOR 0.40002 0.76380 0.81367 0.13147
Blogs 33-48 0.88160 0.92609 0.28191 0.81082 0.88063 0.66304 0.81663 0.82250 0.18178
News 0.75006 0.87849 0.03398 0.66815 ∗ VOR 0.99286 0.64356 0.77951 0.25809
Bible 0.80095 0.84375 0.07482 0.81976 0.89219 0.59353 0.87111 0.86753 0.00039
Shakespeare 0.69241 0.75295 0.14385 0.66554 0.90009 0.29770 0.69826 0.81306 0.00000
Random 0.60186 0.40670 0.03351 0.26403 0.47593 0.67707 0.29526 0.46444 0.00495
∗ VOR: Estimated value out of range.

in the KS column (table 1) suggest that the sequences seem to be coming from
Gaussian processes, with only seven cases rejecting the null hypothesis if the
significance level is set to 0.01 (bold values). This is interesting, and we will
explore if it may be the consequence of the central limit theorem, verifying if
the attributes are the result of the sum of several independent and identically
distributed variables.
For the three age sub-sets of the blogs corpus, we can see that the VT and RS
values improve consistently in almost all attributes, and this difference was very
marked in the A3 and A5 attributes for the VT estimator. This shows well that,
even within the same genre, more mature and possibly more experienced authors
produce text with an higher degree of self-similarity. The estimated values of the
Hurst parameter are consistently larger for RS.
In the literary corpora, we have a significant difference between text from
Shakespeare and the Bible. In the latter, self-similarity values are much higher for
most of the attributes and in particular for the lexical diversity (A5 ). Moreover,
Shakespeare text reveals a low self-similarity in all attributes, with the exception
for the RS estimator in A4 . The news genre did not reveal high self-similarity for
most attributes. There are, however higher parameter values, for the attributes
A1 , A2 , and A3 , which suggests strong self-similarity in the lexicon used. The low
values obtained in the randomly generated corpus (Random) allow us to put in
perspective the values obtained for the other corpora. This reveals a phenomenon
of self-similarity in the human writing process.
Fractal Beauty in Text 801

6 Conclusions and Future Work

The paper shows that there is fractal beauty in text. It is naturally not a prop-
erty that is consciously included by the authors of the text, which makes the
results even more surprising. They suggest that there might be a relation between
what is perceived as text quality and the self-similarity degree, although a more
complete (more attributes) and exhaustive (more corpora) analysis is required.
Future research work includes the assessment of self-similarity and goodness-
of-fit for more attributes and a larger number of different corpus. It should also
focus on a more detailed pre-processing of the data, which allows to address the
potential lacks of stationarity in large corpus. Gaussianity will also be tested
using the Chi-squared goodness-of-fit test, for consistency purposes. The RS will
be replaced by the DFA estimator in the future, since the later is known to be
more reliable than the aforementioned one. Our efforts will be directed towards
finding the (mathematical) explanations behind our initial findings. We intend
also to explore self-similarity on different levels of linguistic attributes.

Acknowledgments. This work is ﬁnanced by the FCT – Fundação para a Ciência

e a Tecnologia (Portuguese Foundation for Science and Technology) within project
UID/EEA/50014/2013.

References
1. Alzahrani, S., Naomie, S., Ajith, A.: Understanding plagiarism linguistic patterns,
textual features, and detection methods. IEEE Transactions on Systems, Man, and
Cybernetics, Part C: Applications and Reviews 42(2), 133–149 (2012)
2. Cordeiro, J., Dias, G., Cleuziou G.: Biology Based Alignments of Paraphrases for
Sentence Compression. Workshop on Textual Entailment (ACL-PASCAL) (2007)
3. Corder, G.W., Foreman, D.I.: Nonparametric Statistics for Non-Statisticians:
A Step-by-Step Approach. Wiley, New Jersey (2009)
4. Fernandes, D.A.B., Neto, M., Soares, L.F.B., Freire, M.M., Inácio, P.R.M.: A tool
for estimating the hurst parameter and for generating self-similar sequences. In:
Proceedings of the 46th Summer Computer Simulation Conference 2014 (SCSC
2014), Monterey, CA, USA (2014)
5. Hurst, H.: Long-Term Storage Capacity of Reservoirs. Transactions of the American
Society of Civil Engineers 116, 770–799 (1951)
6. Koizumi, R.: Relationships Between Text Length and Lexical Diversity Measures:
Can We Use Short Texts of Less than 100 Tokens? Vocabulary Learning and
Instruction 1(1), 60–69 (2012)
7. Malvern, D., Richards, B., Chipere, N., Durán, P.: Lexical diversity and language
development: Quantiﬁcation and assessment. Houndmills, NH (2004)
8. McCarthy, P., Jarvis, S.: A theoretical and empirical evaluation of vocd. Language
Testing 24, 459–488 (2007)
802 J. Cordeiro et al.

9. McCarthy, P., Jarvis, S.: MTLD, vocd-D, and HD-D: A validation study of sophis-
ticated approaches to lexical diversity assessment. Behavior Research Methods
42(2), 381–392 (2010)
10. Olsson, J., Luchjenbroers, J: Forensic linguistics. A&C Black (2013)
11. Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Eﬀects of Age and Gender
on Blogging. AAAI Spring Symposium: Computational Approaches to Analyzing
Weblogs 6, 199–205 (2006)
12. Stamatatos, E.: A survey of modern authorship attribution methods. Journal of
the American Society for Information Science and Tech. 60(3), 538–556 (2009)
How Does Irony Affect Sentiment Analysis Tools?

Leila Weitzel1(), Raul A. Freire2, Paulo Quaresma3, Teresa Gonçalves3,

and Ronaldo Prati2
1
Universidade Federal Fluminense, Niterói, RJ, Brazil
[email protected]
2
Universidade Federal Do ABC, Santo André, SP, Brazil
{f.raul,ronaldo.prati}@ufabc.edu.br
3
Universidade de Évora, Évora, Portugal
{pq,tcg}@uevora.pt

Abstract. Sentiment analysis applications have spread to many domains: from

consumer products, healthcare and financial services to political elections and
social events. A common task in opinion mining is to classify an opinionated
document into a positive or negative opinion. In this paper, a study of different
methodologies is conducted to rank polarity as to better know how the ironic
messages affect sentiment analysis tools. The study provides an initial under-
standing of how irony affects the polarity detection. From the statistic point of
view, we realize that there are no significant differences between methodolo-
gies. To better understand the phenomenon, it is essential to apply different me-
thods, such as SentiWordNet, based on Lexicon. In this sense, as future work,
we aim to explore the use of Lexicon based tools, thus measuring and compar-
ing the attained results.

Keywords: Social media · Irony · Sarcasm · Opinion mining · Polarity detection

1 Introduction and Motivation

Sentiment analysis and opinion mining are very growing topics of interest over the
last few years due to the large number of texts produced through Web 2.0. A common
task in opinion mining is to classify an opinionated document as a positive or a nega-
tive opinion. A comprehensive review of both sentiment analysis and opinion mining
as a research field for Natural Language Processing (NLP) is presented in Pang and
Lee [1]. The demand for applications and tools to accomplish sentiment classification
tasks has attracted the researchers´ attention in this area. Hence, sentiment analysis
applications have spread to many domains: from consumer products, healthcare and
financial services to political elections and social events. Sentiment classification is
commonly categorized in two basic approaches: machine learning and lexicon-based.
Machine learning approach uses a set of features, usually some function of the voca-
bulary frequency, which are learned from annotated corpora or labelled examples.
The lexicon-based approach uses lexicon to provide the polarity, or semantic orienta-
tion, for each word or phrase in the text. Despite the considerable amount of research,
© Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 803–808, 2015.
DOI: 10.1007/978-3-319-23485-4_81
804 L. Weitzel et al.

the classification of polarity is still a challenging task; mostly because it involves a

deep understanding of explicit and implicit information conveyed by language struc-
tures. Henceforth, irony or sarcasm has become an important topical issue in NLP.
Ironic writing is common in opinionated user generated content such as blog posts
and product reviews. As a whole, irony is an activity of saying or writing in such a
way that the textual meaning of what is said is the opposite of what is meant. Ac-
cording to Rioff et al. [2] ironic message typically conveys a negative opinion using
only positive words. In this paper, we present a study of different methodologies to
classify polarity to better know how the ironic messages affect tools of sentiment
analysis.
The classifications were carried out by Machine Learning Algorithms (mainly
Support Vector Machine – SVM and Naïve Bayes Classifier). We aim to find out how
irony affects the sentiment classification. Therefore, our research main question is:
What is the best methodology able to boost performance classification?
The outline of this paper is as follows: In section 2, we present the related works. Sec-
tion 3 explores the background in which this research is established. Section 4 presents
our methodology, and Section 5 and 6 provides our results and main conclusions.

2 Related Work

In [3] the authors presented a semi-supervised approach for identification of sarcasm

on two different data sets a collection of 5.9 million tweets collected from Twitter,
and a collection of 66000 product reviews from Amazon. Using the Mechanical Turk,
they created a gold standard sample in which each sentence was tagged by 3 annota-
tors, obtaining F-scores of 0.78 on the product reviews dataset and 0.83 on the Twitter
dataset. Tayal et al [4] proposed two algorithms, one to identify a sarcastic tweet and
other to perform polarity detection on political sarcastic tweets. The main goal is to
analyze and predict who will win the 2014 Indian Central Government Election based
on sarcastic tweets. They came to the conclusion that using a supervised approach and
their proposed algorithm, they will be able to achieve their goal. They found out that
sarcastic tweets can predict election results in an efficient level.

3 Background

The Web 2.0 is the ultimate manifestation of User-Generated Content (UGC) systems.
The UGC can be virtually about anything including politics, products, people, events,
etc. One of highlights is the Twitter. Twitter constitutes a very open social network
space, whose lack of barriers to access, e.g., even non-registered users are able to use
Twitter to track breaking news on their chosen topics, from “World Economic Crisis” to
“European Football Championship”, for instance. Twitter social networkers communi-
cate with each other by posting tweets allowing for a public interactive dialogue. On
Twitter, users often post or update short messages referred to as tweets, describing one’s
current status within a limit of 140 characters [5]. Beyond merely displaying news and
reports, the Twitter itself is also a large platform where different opinions are presented
How Does Irony Affect Sentiment Analysis Tools? 805

and exchanged. The interest that users (companies, politicians, celebrities) show in on-
line opinions about products and services and the potential influence such opinions
wield is something that vendors of these items are paying more and more attention to.
Thus, it is important for correct identification of users opinions expressed in written text.
In the general area of sentiment analysis, irony and sarcasm play a role as an interfering
factor that can flip the polarity of a message. According to Macmillan English Dictio-
nary (2007), irony is “a form of humor in which you use words to express the opposite
of what the words really mean”. This mean that it is the activity of saying or writing the
opposite of what you mean. Unlike a simple negation, an ironic message typically con-
veys a negative opinion using only positive words or even intensified positive words
[2]. As humans, when we communicate with one another, we have access to a wide
range of spoken and unspoken figures that help create the intended message and ensure
that our audience will understand what we are saying. Some of these figures include
body language, hand gestures, inflection, volume, and accent. Hence, the challenge for
Natural Language Processing (NLP) is: how to recognize sarcasm and gauge the appro-
priate sentiment of any given statement.

4 Methodology

4.1 Data Description and Corpus Generation

This research focuses on document-level irony detection on English Twitter datasets.
The text content of Twitter is usually ambiguous and rich of acronym slang and fa-
shion word. And, apart from the plain text, a tweet can contain others elements
such as hashtags which are tags assigned by the user to identify topic (e.g.
#Obama) or sentiment (#angry, #sarcasm); hyperlinks (typically a bitly URL, i.e., a
URL shortening service), emoticons (it is a pictorial representation of a facial expres-
sion), references to other users (@<user>), and etc. In our experiments, the tweets
were extracted by means of a Java package developed in-house, used for streaming
posts [6]. We gathered about ten thousand tweets. Sarcastic tweets were collected
using previously selected hashtags {sarcasm, irony, lying, moresarcasm, notcool,
notreally, notsarcasm, somuchsarcasm}. These hashtags were used as an indicator of
ironic or sarcastic tweets. An example is < I just love all the support I get from my
mom. #sarcasm > or like this <I love my highspeed internet connection #moresar-
casm>. Our assumption is that the best judge of whether a tweet is intended to be
sarcastic or not is the author of the tweet, expressed by his hashtags.

4.2 Preprocessing of Data

The semi-automatic cleaning of the corpus is done to address concerns about corpus
noise. Firstly, we removed: Tweets starting with “RT” because they refer to a pre-
vious tweets, e.g., <RT@iFraseSincera:  #Sarcasm… http://t.co/MkFuO4JKa >;
posts that contain only user name and link e.g. <@ZarethPanther @Ambybutt. #sar-
casm http://t.co/zw2ieHVUGN>; Tweets that have less than 3 words e.g. <Price is
low. #sarcasm>; tweets that are meaningless, e.g. <@smackalalala yeah, ooooo yeap
806 L. Weitzel et al.

#Sarcasm>. We also removed special characters ($, %, &, #, etc.); punctuation marks
(full stops, commas etc); all hashtags words and emoticons (smiley). We applied au-
tomatic filtering to remove duplicates tweets, and tweets that are written in other
idioms. Afterwards, we manually classify the ironic posts in order to ensure that all
messages are truly ironic or sarcastic. We gathered about 10000 tweet, after the pre-
processing step, the sample size was about 7628 tweets, where 3288 are positive,
3600 are sarcastic and 740 are neutral.

4.3 Set of Experiments

As aforesaid, we used two different approaches in our experiment. The first method
was SUPPORT VECTOR MACHINE CLASSIFIER (SVM) with: wrapper class called
Library for Support Vector Machines (LIBSVM with default kernel RBF, i.e., Gaus-
sian kernel) [7]; and the second classifier was NAIVE BAYES MULTINOMIAL (NBM)
which is a learning algorithm that is frequently employed to tackle text classification
problems. Our motivation to test multiple classifiers also stemmed from related work,
which mostly test more than one classifier. We use the Weka [8] (Waikato Environ-
ment for Knowledge Analysis) where, each document (row or a tweet) is called as
instance and each feature (term) is called as attribute. The SVM classifier requires
that each data instance be represented as a vector of real numbers. Thus in order to set
a vetorial input data, we use an unsupervised Weka build-in function called String-
ToWordVector (that converts string attributes into a set of attributes representing
word occurrence), with the following parameters: IDF and TF transform (to show
how important a word is to a document in a collection or corpus). All tokens were
converted to lowercase before being added to the dictionary. The minimum frequency
of each term was restricted to five due to tweet size (up to 140 characters). We also
removed stopwords and no stemmer algorithm was used. The classification is based
on unigrams of words. Then, the resulting model has 7628 instance and 1502
attributes. All test modes are based on a stratified ten-fold cross-validation; this means
that the data is randomly partitioned into 10 equal-sized parts. Thus, each attribute
selection method is applied ten times on sub samples of the training set. This valida-
tion method reduces the variability of the classification. After finding the best para-
meters and build the final model, the adopted classifiers were applied on the test set.
We also considered the metrics: (i) Precision (also called positive predictive value),
(ii) Recall (also known as sensitivity) and (iii) F-measure as a main measure to eva-
luate the performance of classifier methodologies. All these metrics are generally
accepted at Information Retrieval approaches as evaluation performance methods. It
is, by far, the most widely performance metrics used in IR. Others measure were also
considered, such as: True-positive (TP) and False-Positive (FP) rate (also known re-
spectively as Sensitivity and Specificity measures), ROC area (The area under the
ROC curve is a measure of how well a parameter can distinguish features), and Kappa
statistic (The kappa coefficient measures pairwise agreement among a set of coders
making category judgments).
How Does Irony Affect Sentiment Analysis Tools? 807

5 Classification Results

The predictive performances of the models can be seen in Table 1. Table 2 shows the
accuracy and the kappa coefficient. We have not obtained a reasonable model, which
means we have not achieved a small number of misclassification errors (Relative
Absolute Error is about 34%). The simple mean of ROC area is about 87%, which
indicates a good performance of the models in terms of AUC. The Accuracy rates
(equal to 79%) and Kappa (equal to 66%) point out that there is a moderate statistical
dependence between the attributes and the classes. The best performance was
achieved by NBM algorithm according to Precision (86%), F-measure (83%) and
ROC curve Area (95%). However, the rates TP (84%) and Recall (84%) of the SVM
were better than the NBM. On can see that there is a considerable number of TPs, but,
nonetheless, there is not a small number of FPs (16% - incorrectly identified), mainly
if we only consider the irony category.

Table 1. The main results.

TP FP F- ROC
Methods Precision Recall Category
Rate Rate Measure Area
80% 13% 82% 80% 81% 83% positive
84% 16% 80% 84% 82% 84% irony
SVM 55% 6% 58% 55% 57% 74% neutral
weighted
78% 13% 78% 78% 78% 82%
avg.
83% 15% 81% 83% 82% 93% positive
80% 10% 86% 80% 83% 95% irony
65% 7% 58% 65% 61% 90% neutral
NBM weighted
79% 12% 80% 79% 80% 94%
avg.
weighted
82% 12% 82% 82% 82% 87%
avg.

Table 2. Correct Classified Instances (Accuracy) and Kappa statistic.

SVM NBM
Kappa 64% 66%
Accuracy 78% 79%

6 Conclusions

Individuals post messages in the internet using e-mail, message boards and websites
like Facebook and Twitter. These forms of contact are highly integrated in everyday
life. With the proliferation of reviews, ratings, recommendations and other forms
of online expression, online opinion has turned into a kind of virtual currency for
808 L. Weitzel et al.

businesses willing to market their products, identify new opportunities and manage
their reputations. Despite the considerable amount of research, the classification of
polarity is still a challenging task; mostly since it involves a deep understanding of
explicit and implicit information conveyed by language structures. Henceforth, irony
or sarcasm has become an important topical issue in NLP, mostly because irony (or
sarcasm) flips the polarity of the message. Hence, this paper investigated how irony
affects tools of sentiment analysis. The classifications were conducted with Support
Vector Machine and Naïve Bayes Classifier. The results and conclusions of the expe-
riments raise remarks and new questions. A first remark to be made is that all experi-
ments are performed with English texts. Consequently, the result cannot directly be
generalized to other languages. We believe that the results in other languages are
different are likely, for instance, to have more structured languages such as Brazilian
Portuguese. Another interesting observation is whether very similar results are ob-
tained when the experiments are carried out on data from a different method. From
statistic point of view, there are no relevant differences between methodologies, for
example, the total accuracy ranges between 78 and 79% and kappa from 64 to 66%, in
spite of the inherently ambiguous nature of irony (or sarcasm) that makes hard to be
analyzed, not just automatically but often for humans. Our work indicate that the
NBM and SVM were reasonably able to detect irony in twitter messages. Bearing in
mind that our research deals with only one type of irony that is common in tweets.
The study provides us an initial understanding of the how irony affects the polarity
detection. To better understand the phenomenon, it is essential to apply different me-
thods, such as polarity given by SentiWordNet, based on Lexicon. In this sense, for
future work, we aim to explore the use Lexicon-based tools and thus measure and
compare the obtained results.

References
1. Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis. Found. Trends Inf. Retr. 2,
1–135 (2008)
2. Platt, J.C.: Fast training of support vector machines using sequential minimal optimization.
In: Advances in Kernel Methods, pp. 185−208. MIT Press (1999)
3. Weitzel, L., Aguiar, R.F., Rodriguez, W.F.G., Heringer, M.G.: How do medical authorities
express their sentiment in twitter messages? In: 2014 9th Iberian Conference on Information
Systems and Technologies (CISTI), pp. 1−6 (2014)
4. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38, 39–41 (1995)
5. Weitzel, L., Quaresma, P., Oliveira, J.P.M.D.: Measuring node importance on twitter
microblogging. In: Proceedings of the 2nd International Conference on Web Intelligence,
Mining and Semantics, pp. 1−7. ACM, Craiova (2012)
6. Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Trans.
Intell. Syst. Technol. 2, 1–27 (2011)
7. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA
data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)
Author Index

Abdelwahab, Omar 238 Cardoso, Amílcar 664

Abelha, António 116, 122 Carneiro, Davide 15
Abreu, Mónica 789 Carneiro, João 3
Adamatti, Diana Francisca 673 Cascalho, José 696
Aires, José 723 Casteleiro, João 747
Al-Rifaie, Mohammad Majid 201 Castelli, Mauro 213
Alkabani, Yousra 493 Cavadas, Bruno 513
Almeida, Ana 27 Cavique, Luís 584
Alvelos, Filipe 143 Chessa, Stefano 54
Alves, Ana 759 Christensen, Anders Lyhne 189
Amaral, Filipe 433 Cimler, Richard 313
Analide, Cesar 33 Cordeiro, João 796
Andres, Emmanuel 110 Correia, Luís 189, 250
Anselma, Luca 79 Correia, Marco 41, 376, 480
Antunes, Francisco 181 Correia, Sara 340
Antunes, Luís 687 Cortez, Paulo 535
Araujo, Rodrigo L. 262 Costa, Ernesto 226
Artikis, Alexander 128 Costa, Paulo 487
Costa, Pedro Maurício 169
Baccan, Davi 402
Costa, Vera 169
Bacciu, Davide 54
Couto, Francisco M. 789
Badr-El-Den, Mohamed 298
Couto, Marco 783
Baía, Luís 560
Cruz, Jorge 41, 376, 480
Bamidis, Panagiotis D. 128
Cruz, Lúcia P. 353
Barata, Carlos 789
Cruz-Correia, Ricardo 91
Barbosa, Helio J.C. 262
Cunha, Bernardo 457
Barbosa, Vítor 143
Bedour, Hassan 493
da Silva, Joaquim Ferreira 747
Belo, Orlando 597
da Silva Soares, Anderson 274
Benmimoune, Lamine 110
Damásio, Carlos Viegas 363
Bento, Carlos 181
de Amorim Junior, João 658
Bernardino, Heder S. 262
de Brito, Maiquel 624
Bhatia, Ajay 67
De Felice, Matteo 213
Billis, Antonis S. 128
de Lima, Vera Lucia Strube 735
Blackwell, Tim 201
de Paula, Lauro Cássio Martins 274
Bodnarova, Agata 313
Dias, Cláudia Camila 91
Boissier, Olivier 624
Dias, Ricardo 433
Bonisoli, Andrea 103
Dias, Teresa Galvão 169
Boughaci, Dalila 298
Dimuro, Graçaliz P. 673
Branco, Paula 513
Dolezal, Rafael 313
Bryson, Joanna J. 292
Bydžovská, Hana 425
Faria, Ana Raquel 27
Campos, Pedro 702 Faria, Brígida Mónica 445
Cano, Alberto 305 Fernandes, Diogo A.B. 796
810 Author Index

Fernandes, Kelwin 535 Knobbe, Arno 525

Fernandes, Renato 702 Knorr, Matthias 388, 611
Ferreira, Luís Miguel 445 Korabecny, Jan 313
Ferro, Erina 54 Kreiner, Karl 54
Ferrugento, Adriana 759 Krejcar, Ondrej 313
Figueiredo, Lino 27 Krenek, Jiri 313
Fonseca, Carlos M. 280 Kropf, Johannes 54
Fontes, Tânia 169 Kuca, Kamil 313
Fortunati, Luigi 54 Kumar, Suman 67
Franco, Andrea 41
Freire, Raul A. 803 Lau, Nuno 433, 445, 457
Fuad Muhammad 603 Leite, João 388, 611
Lemes, Cristiano Inácio 91
Lenca, Philippe 547
Gaio, A. Rita 702
Leon, Miguel 286
Gallicchio, Claudio 54
Lopes, Gabriel Pereira 747
Gama, João 501
Lopes, José Gabriel P. 723
Gamallo, Pablo 711
Lourenço, Nuno 226
Garcia, Marcos 711
Lowrance, Christopher J. 238
Gatta, Roberto 103
Gaudl, Swen E. 292 Mabunda, Pinto 696
Gerevini, Alfonso E. 103 Macedo, Luis 402
Géryk, Jan 578 Machado, José 116, 122
Ghodous, Parisa 110 Machado, Penousal 664
Gomes, Luís 723 Madeira, Sara C. 326
Gomes, Paulo 771 Magessi, Nuno Trindade 687
Gomes, Rui 181 Mago, Vijay Kumar 67
Gonçalves, Ivo 280 Manzoni, Luca 213
Gonçalves, Ramiro 27 Marques, Nuno Cavalheiro 584
Gonçalves, Ricardo 611 Marreiros, Goreti 3
Gonçalves, Teresa 803 Martinho, Diogo 3
Guerreiro, Tiago 789 Martins, Bruno 590
Martins, Constantino 27
Hajjam, Amir 110 Martins, Pedro 664
Hajjam, Mohamed 110 Marwan Muhammad 603
Hammad, Sherif 493 Mazzei, Alessandro 79
Hanke, Sten 54 Mazzini, Nicola 103
Henriques, Rui 326 McCluskey, Thomas Leo 134
Hübner, Jomi F. 624 Melo, Fernando 590
Husakova, Martina 313 Meshcheryakova, Olga 480
Micheli, Alessio 54
Mills, Rob 250
Ibarguren, Igor 572
Monteiro, Douglas 735
Inácio, Pedro R.M. 796
Moreira, António Paulo 445, 487
Ivanov, Vadim 388
Moura, João 363
Muguerza, Javier 572
Javaheri Javid, Mohammad Ali 201
Najman, Lukas 313
Katzouris, Nikos 128 Neves, António J.R. 433
Kavitha, K.M. 723 Neves, José 15
Kitchin, Diane 103 Novais, Paulo 3, 15
Author Index 811

Oliveira, Bruno 597 Silva, Álvaro 122

Oliveira, Hugo Gonçalo 759 Silva, Ana M. 501
Oliveira, Sérgio 122 Silva, Fábio 33
Osborn, Joseph Carter 292 Silva, Fernando 189
Silva, Filipe 469
Palumbo, Filippo 54 Silva, João 433
Parodi, Oberdan 54 Silva, Paulo 181
Pedrosa, Eurico 457 Silva, Sara 280
Pereira, Artur 457 Silveira, Ricardo Azambuja 658
Pereira, Francisco B. 226 Soares, Carlos 525
Pereira, Luís Moniz 414 Somaraki, Vassiliki 134
Pereira, Sérgio 513 Soulas, Julie 547
Pereira, Sónia 116 Sousa, Pedro A.C. 376, 480
Pérez, Jesús María 572
Petry, Marcelo 445 Talha, Samy 110
Pimenta, André 15 Tavares, José Pedro 157
Pinheiro, David 305 Teixeira, João F. 783
Pinto, Andry 487 Teixeira, Jorge 789
Portela, Filipe 116, 122 Tiple, Pedro 584
Prati, Ronaldo 803 Torgo, Luís 560
Torres, Pedro 789
Quaresma, Paulo 803
Valentini, Vincenzo 103
Racakova, Veronika 313 Vallati, Mauro 103, 134
Rebelo, Carla 525 Vanneschi, Leonardo 213
Rebelo de Sá, Cláudio 525 Ventura, Sebastián 305
Reis, Luís Paulo 445 Vieira, Susana M. 353
Respício, Ana 143 Vilarinho, Cristina 157
Rezk, Nesma M. 493 Vinagre, Pedro 535
Rezoug, Abdellah 298 Vinga, Susana 353
Ribeiro, Rita P. 501 Von Laer, Andressa 673
Rocha, Miguel 340 Vozzi, Federico 54
Rodrigues, Ana 664
Rodrigues, Filipe 759 Weitzel, Leila 803
Rodrigues, Pedro Pereira 91 Woźna-Szcześniak, Bożena 651
Rodrigues, Ricardo 771
Rosado, José 469 Xiong, Ning 286
Rossetti, Rosaldo J.F. 157
Rua, Fernando 122 Yampolskiy, Roman V. 238

Santos, Manuel F. 116, 122 Zanin, Massimiliano 376

Santos, Vítor 469 Zbrzezny, Agnieszka M. 638, 651
Saptawijaya, Ari 414 Zbrzezny, Andrzej 638, 651
Sbruzzi, Elton 402 Zimmer, Robert 201

Artificial Intelligence For Everyone
100% (6)
Artificial Intelligence For Everyone
224 pages
Sanet - ST - 978 3 031 44067 0
No ratings yet
Sanet - ST - 978 3 031 44067 0
676 pages
AI Industrial
No ratings yet
AI Industrial
493 pages
Medical Applications of Artificial Intelligence (PDFDrive)
0% (1)
Medical Applications of Artificial Intelligence (PDFDrive)
480 pages
Progress in Artificial Intelligence
No ratings yet
Progress in Artificial Intelligence
818 pages
Intelligent Systems and Applications - Proceedings of The 2023
No ratings yet
Intelligent Systems and Applications - Proceedings of The 2023
885 pages
Advances in Information Systems and Technologies PDF
No ratings yet
Advances in Information Systems and Technologies PDF
1,143 pages
Artificial Intelligence and Industrial Applications: Tawfik Masrour Ibtissam El Hassani Anass Cherrafi Editors
No ratings yet
Artificial Intelligence and Industrial Applications: Tawfik Masrour Ibtissam El Hassani Anass Cherrafi Editors
341 pages
Reverse logistics of waste electrical and electronic equipment and environmental sustainability
From Everand
Reverse logistics of waste electrical and electronic equipment and environmental sustainability
Uanderson Rebula de Oliveira
No ratings yet
Tutorial Micromine 2018 PDF
88% (16)
Tutorial Micromine 2018 PDF
318 pages
AAAI 2025 PresPanel Report Digital 3.7.25
No ratings yet
AAAI 2025 PresPanel Report Digital 3.7.25
88 pages
10.1007@978 3 030 45691 7 PDF
100% (1)
10.1007@978 3 030 45691 7 PDF
854 pages
Proceedings of The 9th International Conference On Advanced Intelligent Systems and Informatics 2023
No ratings yet
Proceedings of The 9th International Conference On Advanced Intelligent Systems and Informatics 2023
572 pages
ISR - Encyclopedia.of - Artificial.intelligence - Aug.2008.ebook ELOHiM
No ratings yet
ISR - Encyclopedia.of - Artificial.intelligence - Aug.2008.ebook ELOHiM
1,676 pages
PRICAI 2023 Trends in Artificial Inte... (Z-Library)
No ratings yet
PRICAI 2023 Trends in Artificial Inte... (Z-Library)
515 pages
Distributed Computing and Artifi Cial Intelligence, 11th International Conference
No ratings yet
Distributed Computing and Artifi Cial Intelligence, 11th International Conference
562 pages
Enabling Machine Learning App in Data Science 2021 Hassanien A.
No ratings yet
Enabling Machine Learning App in Data Science 2021 Hassanien A.
392 pages
Emerging Artificial Intelligence Applications in Computer Engineering_ Real Word AI Systems With Applications in EHealth, HCI, Information Retrieval and ... in Artificial Intelligence and Applications) ( PDFDrive )
No ratings yet
Emerging Artificial Intelligence Applications in Computer Engineering_ Real Word AI Systems With Applications in EHealth, HCI, Information Retrieval and ... in Artificial Intelligence and Applications) ( PDFDrive )
421 pages
Libro Hybrid IA
No ratings yet
Libro Hybrid IA
523 pages
Intelligent Systems: André Britto Karina Valdivia Delgado
No ratings yet
Intelligent Systems: André Britto Karina Valdivia Delgado
649 pages
Advances in Practical Applications of Agents, Multi-Agent Systems, and Digital Twins
No ratings yet
Advances in Practical Applications of Agents, Multi-Agent Systems, and Digital Twins
370 pages
Artificial Intelligence and Soft Computing
No ratings yet
Artificial Intelligence and Soft Computing
741 pages
Advances in Neural Computation, Machine Learning, and Cognitive Research VII
No ratings yet
Advances in Neural Computation, Machine Learning, and Cognitive Research VII
505 pages
Lourdes Martínez Villaseñor Oscar Herrera Alcántara
No ratings yet
Lourdes Martínez Villaseñor Oscar Herrera Alcántara
511 pages
Artificial Intelligence and Algorithms in Intelligent Systems (Radek Silhavy) (Z-Library)
No ratings yet
Artificial Intelligence and Algorithms in Intelligent Systems (Radek Silhavy) (Z-Library)
515 pages
ISR - Encyclopedia.of - Artificial.intelligence - Aug.2008.ebook ELOHiM (0001 0500)
No ratings yet
ISR - Encyclopedia.of - Artificial.intelligence - Aug.2008.ebook ELOHiM (0001 0500)
500 pages
Distributed Computing and Artificial Intelligence 12th Internati 2015
No ratings yet
Distributed Computing and Artificial Intelligence 12th Internati 2015
418 pages
2015 - Modeling Motivation in MicroPsi 2
No ratings yet
2015 - Modeling Motivation in MicroPsi 2
425 pages
ISR - Encyclopedia.of - Artificial.intelligence - Aug.2008.ebook ELOHiM (0001 0500) (001 400)
No ratings yet
ISR - Encyclopedia.of - Artificial.intelligence - Aug.2008.ebook ELOHiM (0001 0500) (001 400)
400 pages
2018 Book AdvancesInSoftComputing PDF
No ratings yet
2018 Book AdvancesInSoftComputing PDF
454 pages
2020 Book DistributedComputingAndArtific
No ratings yet
2020 Book DistributedComputingAndArtific
72 pages
(Lecture Notes in Computer Science 6703 Lecture Notes in Artificial Intelli PDF
No ratings yet
(Lecture Notes in Computer Science 6703 Lecture Notes in Artificial Intelli PDF
374 pages
CARES-UNet - Contour-Guided Attention-Based RES-UNet For Optic Disc
No ratings yet
CARES-UNet - Contour-Guided Attention-Based RES-UNet For Optic Disc
176 pages
01-Artificial Intelligence For Scientific Research VF032025
No ratings yet
01-Artificial Intelligence For Scientific Research VF032025
98 pages
Issue #13 Printing and Graphics Science Group Newsletter
From Everand
Issue #13 Printing and Graphics Science Group Newsletter
Roy Gray
No ratings yet
ISR - Encyclopedia.of - Artificial.intelligence - Aug.2008.ebook ELOHiM (0001 0160)
No ratings yet
ISR - Encyclopedia.of - Artificial.intelligence - Aug.2008.ebook ELOHiM (0001 0160)
160 pages
DSIE Proceedings
No ratings yet
DSIE Proceedings
112 pages
IJACSA SpecialIssueNo3
No ratings yet
IJACSA SpecialIssueNo3
155 pages
Mapis 2019 - First Map-I Seminar Proceedings: Edited By: Rui Rua, Vanessa Silva, Shamsuddeen Muhammad, Fernando Duarte
No ratings yet
Mapis 2019 - First Map-I Seminar Proceedings: Edited By: Rui Rua, Vanessa Silva, Shamsuddeen Muhammad, Fernando Duarte
85 pages
The Rise of Artificial Intelligence: A Concise Review
No ratings yet
The Rise of Artificial Intelligence: A Concise Review
10 pages
Front
No ratings yet
Front
29 pages
ALCÂNTARA, SILVA, SIQUEIRA-BATISTA (2024) .Notes On Artificial Intelligence - Concepts, Applications and Techniques
No ratings yet
ALCÂNTARA, SILVA, SIQUEIRA-BATISTA (2024) .Notes On Artificial Intelligence - Concepts, Applications and Techniques
39 pages
Artificial Intelligence in Information Systems
No ratings yet
Artificial Intelligence in Information Systems
6 pages
Computer Science and Its Applications
No ratings yet
Computer Science and Its Applications
23 pages
Sjeat 98 406-418
No ratings yet
Sjeat 98 406-418
13 pages
Lecture Notes in Artificial Intelligence 14392
No ratings yet
Lecture Notes in Artificial Intelligence 14392
18 pages
Transfer Pricing Interview Guidebook
No ratings yet
Transfer Pricing Interview Guidebook
16 pages
Presented To Presented by
No ratings yet
Presented To Presented by
12 pages
Tylique Antoine EDPM Assignment 3
No ratings yet
Tylique Antoine EDPM Assignment 3
12 pages
Communications in Computer and Information Science 2069: Editorial Board Members
No ratings yet
Communications in Computer and Information Science 2069: Editorial Board Members
15 pages
Forecasts On Future Evolution of Artificial Intelligence and Intelligent Systems
No ratings yet
Forecasts On Future Evolution of Artificial Intelligence and Intelligent Systems
9 pages
0900 0301
No ratings yet
0900 0301
51 pages
Assignment For Class 2 COC
100% (1)
Assignment For Class 2 COC
24 pages
DAC314 Principles of AI Handout
No ratings yet
DAC314 Principles of AI Handout
6 pages
JETIR1701631
No ratings yet
JETIR1701631
6 pages
RMS Inlet Pressure Calculation Theory
No ratings yet
RMS Inlet Pressure Calculation Theory
22 pages
CFP Miwai 2025
No ratings yet
CFP Miwai 2025
1 page
CFP Miwai 2025
No ratings yet
CFP Miwai 2025
1 page
Artificialintelligence 2025120251741098191908
No ratings yet
Artificialintelligence 2025120251741098191908
2 pages
IQ3 Range
No ratings yet
IQ3 Range
44 pages
Admin Manual
No ratings yet
Admin Manual
78 pages
Technical Specification 34
No ratings yet
Technical Specification 34
69 pages
Fornaess - Dynamics in Several Complex Variables
No ratings yet
Fornaess - Dynamics in Several Complex Variables
70 pages
ECE Lab Manual
No ratings yet
ECE Lab Manual
73 pages
Fifth International Conference On Artificial Intelligence, Soft Computing and Application (AIAA 2015)
No ratings yet
Fifth International Conference On Artificial Intelligence, Soft Computing and Application (AIAA 2015)
3 pages
2 - Compass Notes
No ratings yet
2 - Compass Notes
30 pages
Scope & Topics: Fourth International Conference On Soft Computing, Artificial Intelligence and Applications (SAI 2015)
No ratings yet
Scope & Topics: Fourth International Conference On Soft Computing, Artificial Intelligence and Applications (SAI 2015)
2 pages
Call For Papers: Artificial Intelligence
No ratings yet
Call For Papers: Artificial Intelligence
2 pages
Scope & Topics: November 6 7, 2015, Dubai, UAE
No ratings yet
Scope & Topics: November 6 7, 2015, Dubai, UAE
2 pages
Resumen de Todas Las Normas ISO
No ratings yet
Resumen de Todas Las Normas ISO
33 pages
Fundamentals of Fluid Flow: V S V S
No ratings yet
Fundamentals of Fluid Flow: V S V S
30 pages
Call For Papers: Artificial Intelligence Soft Computing and Applications
No ratings yet
Call For Papers: Artificial Intelligence Soft Computing and Applications
1 page
ChE228 Tutorial 5
No ratings yet
ChE228 Tutorial 5
1 page
Pyq Ipc (2000-2020) - 1
No ratings yet
Pyq Ipc (2000-2020) - 1
35 pages
JOYO Catalogue 2018
No ratings yet
JOYO Catalogue 2018
26 pages
14 - Lexical Resources For Sentiment Analysis
No ratings yet
14 - Lexical Resources For Sentiment Analysis
28 pages
Design of Two-Way Slabs (2-Edges Discontinous)
No ratings yet
Design of Two-Way Slabs (2-Edges Discontinous)
4 pages
Intersection Properties of Finite Disk Collections
No ratings yet
Intersection Properties of Finite Disk Collections
20 pages
Representations of Solvable Subgroups of PSL (3, C)
No ratings yet
Representations of Solvable Subgroups of PSL (3, C)
28 pages
Mathematics JSS2 Second Term Test 2 Question
No ratings yet
Mathematics JSS2 Second Term Test 2 Question
1 page
Gilman - Boundaries For Two-Parabolic Schottky Groups
No ratings yet
Gilman - Boundaries For Two-Parabolic Schottky Groups
17 pages
Mathgen 587515576
No ratings yet
Mathgen 587515576
14 pages
15 Paper
No ratings yet
15 Paper
20 pages
Review PDF
No ratings yet
Review PDF
12 pages
Soil Testing 1 - Shear Strength - Part I
No ratings yet
Soil Testing 1 - Shear Strength - Part I
20 pages
On The Geometry and Topology of Discrete Groups
No ratings yet
On The Geometry and Topology of Discrete Groups
10 pages
Concepts Valuation by Conjugate Möebius Inverse
No ratings yet
Concepts Valuation by Conjugate Möebius Inverse
10 pages
Review Moder Algebra and The Rise of Mathematical Structures Kleiner 2001
No ratings yet
Review Moder Algebra and The Rise of Mathematical Structures Kleiner 2001
9 pages
Cognitive Emotional Embedded Representations of Text To Predict Suicidal Ideation and Psychiatric Symptoms
No ratings yet
Cognitive Emotional Embedded Representations of Text To Predict Suicidal Ideation and Psychiatric Symptoms
27 pages
Arithhyphout 1
No ratings yet
Arithhyphout 1
12 pages
Orthogonal Isomorphic Representations of Free Groups
No ratings yet
Orthogonal Isomorphic Representations of Free Groups
7 pages
Solar Engineering - L2
No ratings yet
Solar Engineering - L2
10 pages
Girder Load Calculation
No ratings yet
Girder Load Calculation
7 pages
Geography Term 1 Test
No ratings yet
Geography Term 1 Test
5 pages
REAL TIME KERNELL-Introduction, Principles and Design Issues
No ratings yet
REAL TIME KERNELL-Introduction, Principles and Design Issues
11 pages
SAD 5 Feasiblity Analysis and System Proposal
No ratings yet
SAD 5 Feasiblity Analysis and System Proposal
10 pages
Text Mining Methods For The Characterisation of Suicidal Thoughts and Behaviour
No ratings yet
Text Mining Methods For The Characterisation of Suicidal Thoughts and Behaviour
7 pages
Comparison of Deproteinization Agents On Bonding
No ratings yet
Comparison of Deproteinization Agents On Bonding
8 pages
Allen: Ntse Study Material Syllabus For Other States (In)
No ratings yet
Allen: Ntse Study Material Syllabus For Other States (In)
1 page
Classics: The Principle of Least Action
No ratings yet
Classics: The Principle of Least Action
10 pages
Conic Sections Formulas
No ratings yet
Conic Sections Formulas
2 pages
Numerical Simulation of Variable Speed Refrigeration System PDF
No ratings yet
Numerical Simulation of Variable Speed Refrigeration System PDF
9 pages
Sonometer Report
No ratings yet
Sonometer Report
2 pages