0% found this document useful (0 votes)
695 views488 pages

ICIST 2014 Proceedings

d
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
695 views488 pages

ICIST 2014 Proceedings

d
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 488

ICIST 2014

4th International Conference on


Information Society and Technology

Proceedings

Publisher: Society for Information Systems and Computer Networks

Editors: Zdravković, M., Trajanović, M., Konjović, Z.

ISBN: 978-86-85525-14-8

Issued in Belgrade, Serbia, 2014.

Print run: 100 pieces

Printing by PC centar Magus, Zrenjanin


TABLE OF CONTENTS

4TH INTERNATIONAL CONFERENCE ON INFORMATION SOCIETY AND TECHNOLOGY


(ICIST 2014)
Faculty of Mechanical Engineering in Niš,
TOWARDS THE INTEROPERABLE, HYPER- Milan Zdravković
University of Niš, Niš, Serbia
CONNECTED WORLD: A FOREWORD TO THE
Faculty of Technical Sciences, University of
PROCEEDINGS OF THE 4TH INTERNATIONAL Miroslav Trajanović
Novi Sad, Novi Sad, Serbia 1
CONFERENCE ON INFORMATION SOCIETY AND
TECHNOLOGY Faculty of Technical Sciences, University of
Zora Konjović
Novi Sad, Novi Sad, Serbia

VOLUME 1: REGULAR PAPERS

BUSINESS PROCESSES MODELLING AND MANAGEMENT

BUSINESS REQUIREMENT NEGOTIATION BASED ON


1
GENERALIZED REQUIREMENT APPROACH
Aleksandar Bulajić LANB, Denmark 4

Dragan Mišić Faculty of Mechanical Engineering Niš, Srbija

Miloš Stojković Faculty of Mechanical Engineering Niš, Srbija

THE CONCEPT OF THE INFORMATION SYSTEM FOR Nikola Vitković Faculty of Mechanical Engineering Niš, Srbija
MANAGING BUSINESS PROCESSES OF DESIGNING
2
AND MANUFACTURING OF OSTEOFIXATION
Miroslav Trajanović Faculty of Mechanical Engineering Niš, Srbija 10
MATERIAL Miodrag Manić Faculty of Mechanical Engineering Niš, Srbija

Nikola Korunović Faculty of Mechanical Engineering Niš, Srbija

Jelena Milovanović Faculty of Mechanical Engineering Niš, Srbija

Miguel Ferro Beca UNINOVA, Portugal

Joao Sarraipa UNINOVA, Portugal

A FRAMEWORK TO ENHANCE SUPPLIER SEARCH IN Carlos Agostinho UNINOVA, Portugal


3
DYNAMIC MANUFACTURING NETWORKS 16
Fernando Gigante AIDIMA, Spain

Maria Jose Nunez AIDIMA, Spain

Ricardo Jardim Goncalves UNINOVA, Portugal

Milorad Filipović Fakultet tehničkih nauka, Novi Sad, Srbija

Gajo Petrović Fakultet tehničkih nauka, Novi Sad, Srbija


LIFERAY AND ALFRESCO: A CASE STUDY IN
4
INTEGRATED ENTERPRISE PORTALS
Aleksandar Nikolić Fakultet tehničkih nauka, Novi Sad, Srbija 22
Vidan Marković DDOR Novi Sad, Srbija

Branko Milosavljević Fakultet tehničkih nauka, Novi Sad, Srbija

Bogdan Pavković Institut "Mihajlo Pupin" Beograd, Srbija


A SHORT SURVEY OF EXISTING EMERGENCY
MANAGEMENT TOOLS FOR INFORMATION Uroš Milošević Institut "Mihajlo Pupin" Beograd, Srbija
5
COLLECTION, COMMUNICATION, AND DECISION 28
Vuk Mijović Institut "Mihajlo Pupin" Beograd, Srbija
SUPPORT
Sanja Vraneš Institut "Mihajlo Pupin" Beograd, Srbija

PREDICTIVE ANALYTICAL MODEL FOR SPARE PARTS


6
INVENTORY REPLENISHMENT
Nenad Stefanović Fakultet tehničkih nauka, Čačak, Srbija 34

Maroua Hachicha University of Lyon 2 - DISP Laboratory, France


ONTOLOGY BASED FRAMEWORK FOR
7
COLLABORATIVE BUSINESS PROCESS ASSESSMENT
Néjib Moalla University of Lyon 2 - DISP Laboratory, France 40
Yacine Ouzrout University of Lyon 2 - DISP Laboratory, France
ENERGY MANAGEMENT

SOLARENERGO - NEW WAY TO BRING RENEWABLE


1
ENERGY CLOSER
Matej Gomboši Municipality Beltinci, Slovenia 46

Vladimir Milivojević Institut "Jaroslav Černi" Beograd, Srbija

Nikola Milivojević Institut "Jaroslav Černi" Beograd, Srbija


DEVELOPMENT OF DISTRIBUTED HYDRO-
2 INFORMATION SYSTEM FOR THE DRINA RIVER Milan Stojković Institut "Jaroslav Černi" Beograd, Srbija 50
BASIN
Vukašin Ćirović Institut "Jaroslav Černi" Beograd, Srbija

Dejan Divac Institut "Jaroslav Černi" Beograd, Srbija

Nikola Milivojević Institut "Jaroslav Černi" Beograd, Srbija


Faculty of Engineering, University of
Nenad Grujović
Kragujevac, Srbija
INFORMATION SYSTEM FOR DAM SAFETY
3
MANAGEMENT Dejan Divac Institut "Jaroslav Černi" Beograd, Srbija 56
Vladimir Milivojević Institut "Jaroslav Černi" Beograd, Srbija

Rastko Martać Institut "Jaroslav Černi" Beograd, Srbija

Nikola Tomasević Institut "Mihajlo Pupin" Beograd, Srbija


GENETIC ALGORITHM BASED ENERGY DEMAND-SIDE
4
MANAGEMENT
Marko Batić Institut "Mihajlo Pupin" Beograd, Srbija 61
Sanja Vraneš Institut "Mihajlo Pupin" Beograd, Srbija

Marko Batić Institut "Mihajlo Pupin" Beograd, Srbija


INTEGRATED ENERGY DISPATCH APPROACH BASED
5
ON ENERGY HUB AND DSM
Nikola Tomasević Institut "Mihajlo Pupin" Beograd, Srbija 67
Sanja Vraneš Institut "Mihajlo Pupin" Beograd, Srbija

E-SOCIETY AND E-LEARNING 1


Miroslav Zarić Fakultet tehničkih nauka, Novi Sad, Srbija
SERVER SELECTION FOR SEARCH/RETRIEVAL IN
1
DISTRIBUTED LIBRARY SYSTEMS
Branko Milosavljević Fakultet tehničkih nauka, Novi Sad, Srbija 73
Prirodno-matematički fakultet, Univerzitet u
Dušan Surla
Novom Sadu, Srbija
José Marcelo Almeida Prado Pontifical Catholic University of Parana,
Cestari Industrial and Systems Engineering, Brazil
Mario Lezoche CRAN - Université de Lorraine - CNRS, France
A METHOD FOR EGOVERNMENT CONCEPTS Pontifical Catholic University of Parana,
2
INTEROPERABILITY ASSESSMENT
Eduardo Rocha Loures
Industrial and Systems Engineering, Brazil 79
Hervé Panetto CRAN - Université de Lorraine - CNRS, France
Pontifical Catholic University of Parana,
Eduardo Portela Santos
Industrial and Systems Engineering, Brazil
Faculty of Electronic Engineering, University
Milica Ćirić
of Niš, Srbija
ANALYSIS OF SENTIMENT CHANGE OVER TIME
Faculty of Electronic Engineering, University
3 USING USER STATUS UPDATES FROM SOCIAL Aleksandar Stanimirović
of Niš, Srbija 86
NETWORKS
Faculty of Electronic Engineering, University
Leonid Stoimenov
of Niš, Srbija
Uroš Milošević Institut "Mihajlo Pupin" Beograd, Srbija
TAKING DBPEDIA ACROSS BORDERS: BUILDING THE
4
SERBIAN CHAPTER
Vuk Mijović Institut "Mihajlo Pupin" Beograd, Srbija 91
Sanja Vraneš Institut "Mihajlo Pupin" Beograd, Srbija
E-SOCIETY AND E-LEARNING 2
Fakultet organizacionih nauka, Univerzitet u
Marina Dobrota
Beogradu, Srbija
Fakultet organizacionih nauka, Univerzitet u
STATISTICAL COMPOSITE INDICATOR FOR Jovana Stojilković
Beogradu, Srbija
1 ESTIMATING THE DEGREE OF INFORMATION
Fakultet organizacionih nauka, Univerzitet u
96
SOCIETY DEVELOPMENT Ana Poledica
Beogradu, Srbija
Fakultet organizacionih nauka, Univerzitet u
Veljko Jeremić
Beogradu, Srbija
Siniša Nikolić Fakultet tehničkih nauka, Novi Sad, Srbija
SYSTEM FOR MODELLING RULEBOOKS FOR THE
2 EVALUATION OF SCIENTIFIC-RESEARCH RESULTS. Valentin Penca Fakultet tehničkih nauka, Novi Sad, Srbija 102
CASE STUDY: SERBIAN RULEBOOK
Dragan Ivanović Fakultet tehničkih nauka, Novi Sad, Srbija

Valentin Penca Fakultet tehničkih nauka, Novi Sad, Srbija

3 SRU/W SERVICE FOR CRIS UNS SYSTEM Siniša Nikolić Fakultet tehničkih nauka, Novi Sad, Srbija 108
Dragan Ivanović Fakultet tehničkih nauka, Novi Sad, Srbija

Government of the AP of Vojvodina/Office for


Milan Paroški Joint Affairs of Provincial Bodies, Novi Sad,
DEVELOPMENT AND IMPLEMENTATION OF THE Republic of Serbia, Srbija
PUBLIC ELECTRONIC SERVICE FOR MANAGING OPEN
4
COMPETITIONS FOR GOVERNMENT GRANTS: CASE
Government of the AP of Vojvodina/Office for 114
Vesna Popović Joint Affairs of Provincial Bodies, Novi Sad,
STUDY AUTONOMOUS PROVINCE OF VOJVODINA
Republic of Serbia, Srbija
Zora Konjović Fakultet tehničkih nauka, Novi Sad, Srbija
Faculty of Electronic Engineering, University
Oliver Vojinović
of Niš, Srbija
EFFECTIVE TABLET DASHBOARD INTERFACE FOR
Faculty of Electronic Engineering, University
5 INNOVATIVE PIPELINED MULTI-TEACHER LAB Vladimir Simić
of Niš, Srbija 120
PRACTICING
Faculty of Electronic Engineering, University
Ivan Milentijević
of Niš, Srbija
Tehnički fakultet "Mihajlo Pupin" Zrenjanin,
Eleonora Brtka
Srbija
Tehnički fakultet "Mihajlo Pupin" Zrenjanin,
TOWARD MORE GENERAL CRITERIA OF Vladimir Brtka
Srbija
6 CONFORMITY BETWEEN LEARNER AND LEARNING
Tehnički fakultet "Mihajlo Pupin" Zrenjanin,
126
OBJECTS Vesna Makitan
Srbija
Tehnički fakultet "Mihajlo Pupin" Zrenjanin,
Ivana Berković
Srbija

INTERNET OF THINGS
Faculty of Electrical Engineering East
Mirjana Maksimović
Sarajevo, Bosnia and Herzegovina
Faculty of Electrical Engineering East
INCREASING THE LIFETIME OF HEXAGONAL Vladimir Vujović
1 Sarajevo, Bosnia and Herzegovina 131
DEPLOYED WIRELESS SENSORWEB NETWORK
Vladimir Milošević Fakultet tehničkih nauka, Novi Sad, Srbija

Branko Perišić Fakultet tehničkih nauka, Novi Sad, Srbija

2 SDN-BASED CONCEPT FOR NETWORK MONITORING Vassil Gourov E-Fellows, Bulgaria 137

VEHICLE CLASSIFICATION AND FALSE DETECTION Peter Sarcevic University of Szeged, Hungary
3 FILTERING USING A SINGLE MAGNETIC DETECTOR 144
BASED INTELLIGENT SENSOR Szilveszter Pletl University of Szeged, Hungary
University of Ljubljana, Faculty of electrical
Sara Stančin
MOTION ANALYSIS WITH WEARABLE 3D KINEMATIC engineering, Slovenia
4
SENSORS University of Ljubljana, Faculty of electrical
150
Sašo Tomažič
engineering, Slovenia
University of Ljubljana, Faculty of Electrical
QUALISYS WEB TRACKER – A WEB-BASED Andraž Krašček
Engineering, Slovenia
5 VISUALIZATION TOOL FOR REAL-TIME DATA OF AN
University of Ljubljana, Faculty of Electrical
155
OPTICAL TRACKING SYSTEM Jaka Sodnik
Engineering, Slovenia
University of Ljubljana, Faculty of Electrical
Anton Umek
USABILITY OF SMARTPHONE INERTIAL SENSORS Engineering, Slovenia
6
FOR CONFINED AREA MOTION TRACKING University of Ljubljana, Faculty of Electrical
160
Anton Kos
Engineering, Slovenia
KNOWLEDGE MODELLING, EXTRACTION AND INTERPRETATION
Branislav Popović Fakultet tehničkih nauka, Novi Sad, Srbija

Dragiša Mišković Fakultet tehničkih nauka, Novi Sad, Srbija


ENHANCED GAUSSIAN SELECTION IN MEDIUM AlfaNum – Speech Technologies, Novi Sad,
1
VOCABULARY CONTINUOUS SPEECH RECOGNITION
Darko Pekar
Srbija 164
Stevan Ostrogonac Fakultet tehničkih nauka, Novi Sad, Srbija

Vlado Delić Fakultet tehničkih nauka, Novi Sad, Srbija

Mila Dragomirova Sofia University, Bulgaria


BUILDING A VIRTUAL PROFESSIONAL COMMUNITY:
2 THE CASE OF BULGARIAN OPTOMETRY AND EYE Boyan Salutski Technical University - Sofia, Bulgaria 169
OPTICS
Elissaveta Gourova Sofia University, Bulgaria
Mathematical Institute of the Serbian
Zoran Marković
Academy of Sciences and Arts, Srbija

FUZZY INFLUENCE DIAGRAMS IN POWER SYSTEMS Aleksandar Janjić Faculty of Electronic Engineering, Srbija
3
DIAGNOSTICS 174
Miomir Stanković Faculty of Occupational Safety, Srbija
Mathematical Institute of the Serbian
Lazar Velimirović
Academy of Sciences and Arts, Srbija
Marko Jocić Fakultet tehničkih nauka, Novi Sad, Srbija

Dejan Dimitrijević Fakultet tehničkih nauka, Novi Sad, Srbija


Faculty of Sport and Physical Education,
4 LINEAR FUZZY SPACE BASED SCOLIOSIS SCREENING Milan Pantović 180
University of Novi Sad, Srbija
Faculty of Sport and Physical Education,
Dejan Madić
University of Novi Sad, Srbija
Zora Konjović Fakultet tehničkih nauka, Novi Sad, Srbija
Fakultet organizacionih nauka, Univerzitet u
CONTEXT MODELING BASED ON FEATURE MODELS Siniša Nešković
Beogradu, Srbija
5
EXPRESSED AS VIEWS ON ONTOLOGIES 186
Rade Matić Belgrade Business School, Srbija
Faculty of Mechanical Engineering, University
Milan Trifunović
of Niš, Srbija
Faculty of Mechanical Engineering, University
Miloš Stojković
of Niš, Srbija
APPROACH IN REALIZATION OF ANALOGY-BASED Faculty of Mechanical Engineering, University
6
REASONING IN SEMANTIC NETWORK
Miroslav Trajanović
of Niš, Srbija 192
Faculty of Mechanical Engineering, University
Dragan Mišić
of Niš, Srbija
Faculty of Mechanical Engineering, University
Miodrag Manić
of Niš, Srbija
Branko Arsić Faculty of Science, Kragujevac, Srbija

7 MAPPING EBXML STANDARDS TO ONTOLOGY Marija Đokić Faculty of Science, Kragujevac, Srbija 198
Nenad Stefanović Faculty of Science, Kragujevac, Srbija

Jelena Slivka Fakultet tehničkih nauka, Novi Sad, Srbija


ADDRESSING THE COLD-START NEW-USER
8 PROBLEM FOR RECOMMENDATION WITH CO- Aleksandar Kovačević Fakultet tehničkih nauka, Novi Sad, Srbija 204
TRAINING
Zora Konjović Fakultet tehničkih nauka, Novi Sad, Srbija
SOFTWARE DEVELOPMENT
Schneider Electric DMS NS Llc., Novi Sad,
AN APPROACH TO CONSOLIDATION OF DATABASE Nikola Obrenović
Srbija
1
CHECK CONSTRAINTS 210
Ivan Luković Fakultet tehničkih nauka, Novi Sad, Srbija

Dušan Okanović Fakultet tehničkih nauka, Novi Sad, Srbija

Milan Vidaković Fakultet tehničkih nauka, Novi Sad, Srbija


EDITOR FOR AGENT-ORIENTED PROGRAMMING
2
LANGUAGE ALAS
Željko Vuković Fakultet tehničkih nauka, Novi Sad, Srbija 216
Prirodno-matematički fakultet, Univerzitet u
Dejan Mitrović
Novom Sadu, Srbija
Prirodno-matematički fakultet, Univerzitet u
Mirjana Ivanović
Novom Sadu, Srbija
Gajo Petrović Fakultet tehničkih nauka, Novi Sad, Srbija

Aleksandar Nikolić Fakultet tehničkih nauka, Novi Sad, Srbija


GRADER: AN LTI APP FOR AUTOMATIC, SECURE,
3 PROGRAM VALIDATION USING THE DOCKER Milan Segedinac Fakultet tehničkih nauka, Novi Sad, Srbija 221
SANDBOX
Aleksandar Kovačević Fakultet tehničkih nauka, Novi Sad, Srbija

Zora Konjović Fakultet tehničkih nauka, Novi Sad, Srbija

Tara Petrić Fakultet tehničkih nauka, Novi Sad, Srbija

Predrag Rakić Fakultet tehničkih nauka, Novi Sad, Srbija


TULIPKO INTERACTIVE SOFTWARE FOR Prirodno-matematički fakultet, Univerzitet u
4 VISUALIZATION OF MONTE CARLO SIMULATION Petar Mali 225
Novom Sadu, Srbija
RESULTS
Lazar Stričević Fakultet tehničkih nauka, Novi Sad, Srbija
Prirodno-matematički fakultet, Univerzitet u
Slobodan Radošević
Novom Sadu, Srbija

PERFORMANCE EVALUATION OF THE ARPEGGIO Igor Dejanović Fakultet tehničkih nauka, Novi Sad, Srbija
5
PARSER 229
Gordana Milosavljević Fakultet tehničkih nauka, Novi Sad, Srbija
Tehnički fakultet "Mihajlo Pupin" Zrenjanin,
Vesna Makitan
Srbija
Tehnički fakultet "Mihajlo Pupin" Zrenjanin,
Vladimir Brtka
ROUGH SETS BASED MODEL AS PROJECT SUCCESS Srbija
6
SUPPORT Tehnički fakultet "Mihajlo Pupin" Zrenjanin,
235
Eleonora Brtka
Srbija
Tehnički fakultet "Mihajlo Pupin" Zrenjanin,
Miodrag Ivković
Srbija

SPECIAL TRACK ON INTEROPERABILITY OF UBIQUITOUS SYSTEMS: THE FUTURE OF THE


INTERNET-OF-EVERYTHING
Faculty of Mechanical Engineering, University
Milan Zdravković
of Niš, Srbija
ENABLING INTEROPERABILITY AS A PROPERTY OF
1 UBIQUITOUS SYSTEMS: TOWARDS THE THEORY OF Hervé Panetto CRAN, University of Lorraine, CNRS, France 240
INTEROPERABILITY-OF-EVERYTHING
Faculty of Mechanical Engineering, University
Miroslav Trajanović
of Niš, Srbija
Ovidiu Noran Griffith University, Australia
INTEROPERABILITY AS A PROPERTY: ENABLING
2
ADAPTIVE DISASTER MANAGEMENT Faculty of Mechanical Engineering, University 248
Milan Zdravković
of Niš, Srbija
Dubravka Sladić Fakultet tehničkih nauka, Novi Sad, Srbija

Aleksandra Radulović Fakultet tehničkih nauka, Novi Sad, Srbija

3 THE USE OF ONTOLOGIES IN CADASTRAL SYSTEMS Miro Govedarica Fakultet tehničkih nauka, Novi Sad, Srbija 256
Dušan Jovanović Fakultet tehničkih nauka, Novi Sad, Srbija

Dejan Rašić Fakultet tehničkih nauka, Novi Sad, Srbija


Faculty of Electronic Engineering, University
Miloš Bogdanović
of Niš, Srbija
AN APPROACH FOR THE DEVELOPMENT OF
Faculty of Electronic Engineering, University
4 CONTEXT-DRIVEN WEB MAP SOLUTIONS BASED ON Aleksandar Stanimirović
of Niš, Srbija 262
INTEROPERABLE GIS PLATFORM
Faculty of Electronic Engineering, University
Leonid Stoimenov
of Niš, Srbija
Nenad Gligorić DunavNET, Srbija

Srđan Krco DunavNET, Srbija

Ignacio Elicegui Univercity of Cantabria, Spain

Carmen López Univercity of Cantabria, Spain


SOCIOTAL: CREATING A CITIZEN-CENTRIC INTERNET Luis Sánchez Univercity of Cantabria, Spain
5
OF THINGS 270
Michele Nati Univercity of Surrey, United Kingdom

Rob van Kranenburg University of Liepaja, Latvia, Netherlands

M. Victoria Moreno University of Murcia, Spain


Information Society Research, CRS4, Parco
Davide Carboni
Tecnologico, Italy
Fakultet organizacionih nauka, Univerzitet u
Marija Janković
Beogradu, Srbija
Fakultet organizacionih nauka, Univerzitet u
ENHANCING BPMN 2.0 INFORMATIONAL Miroslav Ljubičić
Beogradu, Srbija
6 PERSPECTIVE TO SUPPORT INTEROPERABILITY FOR
Fakultet organizacionih nauka, Univerzitet u
278
CROSS-ORGANIZATIONAL BUSINESS PROCESSES Nenad Aničić
Beogradu, Srbija
Fakultet organizacionih nauka, Univerzitet u
Zoran Marjanović
Beogradu, Srbija
TOWARDS INTEROPERABILITY PROPERTIES FOR Alexis Aubry University of Lorraine, France
7 TOOLING A SOFTWARE BUS FOR ENERGY 285
EFFICIENCY Hervé Panetto University of Lorraine, France

VOLUME 2: POSTER PAPERS

COMPUTING

SUITABILITY OF DATA FLOW COMPUTING FOR


1
NUMBER SORTING
Anton Kos Faculty of Electrical Engineering, Slovenia 293

APPLICATION OF DIGITAL STOCHASTIC Boris Ličina Fakultet tehničkih nauka, Novi Sad, Srbija
2 MEASUREMENT OVER AN INTERVAL IN TIME AND 297
FREQUENCY DOMAIN Platon Sovilj Fakultet tehničkih nauka, Novi Sad, Srbija
Prirodno-matematički fakultet, Univerzitet u
Jelena Tekić
Novom Sadu, Srbija
PERFORMANCE COMPARISON OF LATTICE
Prirodno-matematički fakultet, Univerzitet u
3 BOLTZMANN FLUID FLOW SIMULATION USING Predrag Tekić
Novom Sadu, Srbija 303
OPENCL AND CUDA FRAMEWORKS
Prirodno-matematički fakultet, Univerzitet u
Miloš Racković
Novom Sadu, Srbija

E-SOCIETY, E-GOVERNMENT AND E-LEARNING


Dragiša Stanujkić Faculty of Management in Zaječar, Srbija
AN APPROACH TO RANKING HOTELS' WEBSITES BY
1
APPLYING MULTIMOORA METHOD
Anđelija Plavšić Faculty of Management in Zaječar, Srbija 307
Ana Stanujkić Independent researcher, Srbija

Dragiša Stanujkić Faculty of Management in Zaječar, Srbija


MULTI-CRITERIA MODEL FOR EVALUATING QUALITY
2 OF WEBSITES OF THE REGIONAL TOURISM Milica Paunković Faculty of Management in Zaječar, Srbija 311
ORGANIZATIONS
Goran Stanković Policijska uprava Bor, Srbija
Zoran Nešić Fakultet tehničkih nauka, Čačak, Srbija

INFORMATION FLOW IN PARKING AREAS JKP “Parking Service Kragujevac“, Kragujevac,


Leon Ljubić
Srbija
3 MANAGEMENT IN THE ENTERPRISE INFORMATION 317
SYSTEM Miroslav Radojičić Fakultet tehničkih nauka, Čačak, Srbija

Jasmina Vesić Vasović Fakultet tehničkih nauka, Čačak, Srbija

Lidija Petrović ALFA University, Belgrade, Srbija


ICT INFRASTRUCTURE AT SPORTS STADIUM: ISC School of Management, Paris, France &
4 Michel Desbordes 322
REQUIREMENTS AND INNOVATIVE SOLUTIONS Université Paris Sud XI, France
University of Belgrade, Faculty of Electrical
Dragorad Milovanović
Engineering, Srbija
Goran Savić Fakultet tehničkih nauka, Novi Sad, Srbija

Milan Segedinac Fakultet tehničkih nauka, Novi Sad, Srbija


5 SAKAI CLE IN SERBIAN HIGHER EDUCATION 328
Nikola Nikolić Fakultet tehničkih nauka, Novi Sad, Srbija

Zora Konjović Fakultet tehničkih nauka, Novi Sad, Srbija

Mirjana Kranjac Fakultet tehničkih nauka, Novi Sad, Srbija

PERSONALIZED DESIGN IN INTERACTIVE MAPPING Uroš Sikimić Politecnico di Milan, Italy


6
AS A PART OF INFORMATION SOCIETY 333
Đorđije Dupljanin Fakultet tehničkih nauka, Novi Sad, Srbija

Slaviša Dumnić Fakultet tehničkih nauka, Novi Sad, Srbija


Uprava za zajedničke poslove republičkih
DESIGN AND IMPLEMENTATION OF SOFTWARE Vjekoslav Bobar
organa, Srbija
7 ARCHITECTURE FOR PUBLIC E-PROCUREMENT
Fakultet organizacionih nauka, Univerzitet u
338
SYSTEM IN SERBIA Ksenija Mandić
Beogradu, Srbija
Aleksandar Nikolić Fakultet tehničkih nauka, Novi Sad, Srbija

Goran Sladić Fakultet tehničkih nauka, Novi Sad, Srbija


HADOOP AND PIG FOR INTERNET CENSUS DATA
8
ANALYSIS
Branko Milosavljević Fakultet tehničkih nauka, Novi Sad, Srbija 344
Stevan Gostojić Fakultet tehničkih nauka, Novi Sad, Srbija

Zora Konjović Fakultet tehničkih nauka, Novi Sad, Srbija


Faculty of Transport and Traffic Engineering,
Goran Murić
Srbija
INTERDEPENDENCIES OF COMMUNICATION AND
9
ELECTRICAL INFRASTRUCTURES
Dragan Bogojević PE EPS, Srbija 349
Faculty of Transport and Traffic Engineering,
Nataša Gospić
Srbija
Srđan Atanasijević Comtrade Solution Engineering, Srbija

ANALYSIS PLATFORM FOR THE PRESENTATION OF A Miloš Miladinović Comtrade Solution Engineering, Srbija
10 SET OF OPEN DATA IN EDUCATION AND PUBLIC 353
ADMINISTRATION Vladimir Nedić FILUM (Department of Filology & Arts), Srbija

Milan Matijević Department of Engineering Science, Srbija

Zorica Suvajdžin Rakić FTN, Novi Sad, Srbija

11 miniC PROJECT FOR TEACHING COMPILERS COURSE Predrag Rakić FTN, Novi Sad, Srbija 360
Tara Petrić FTN, Novi Sad, Srbija

Zorica Suvajdžin Rakić FTN, Novi Sad, Srbija


USING SYNTAX DIAGRAMS FOR TEACHING
12
PROGRAMMING LANGUAGE GRAMMAR
Srđan Popov FTN, Novi Sad, Srbija 363
Tara Petrić FTN, Novi Sad, Srbija

Nikola Nikolić Fakultet tehničkih nauka, Novi Sad, Srbija

Goran Savić Fakultet tehničkih nauka, Novi Sad, Srbija


13 MIGRATION FROM SAKAI TO CANVAS 366
Milan Segedinac Fakultet tehničkih nauka, Novi Sad, Srbija

Zora Konjović Fakultet tehničkih nauka, Novi Sad, Srbija


Martin Štufi Solutia, s.r.o., Czech Republic
IMPLEMENTING AN EFFECTIVE PUBLIC Faculty of Electronic Engineering, University
Nataša Veljković
ADMINISTRATION INFORMATION SYSTEM: STATE OF of Niš, Srbija
14
PAIS IN THE CZECH REPUBLIC AND ITS POTENTIAL Faculty of Electronic Engineering, University 371
Sanja Bogdanović - Dinić
APPLICATION IN THE REPUBLIC OF SERBIA of Niš, Srbija
Faculty of Electronic Engineering, University
Leonid Stoimenov
of Niš, Srbija
Vojkan Nikolić MUP RS, Srbija
E-GOVERNMENT INTEROPERABILITY IN THE
University of Belgrade, School of Electrical
15 CONTEXT OF EUROPEAN INTEROPERABILITY Jelica Protić
Engineering, Srbija 376
FRAMEWORK
Predrag Đikanović MUP RS, Srbija

HARDWARE AND TELECOMMUNICATIONS


University "Sv. Kiril i Metodij" in Skopje,
Danijela Efnusheva Faculty of Electrical Engineering and
INTEGRATING PROCESSING IN RAM MEMORY AND
Information Technologies, Macedonia
1 ITS APPLICATION TO HIGH SPEED FFT
University "Sv. Kiril i Metodij" in Skopje,
382
COMPUTATION
Aristotel Tentov Faculty of Electrical Engineering and
Information Technologies, Macedonia
Tomislav Dimčić Ericsson d.o.o., Srbija
CoAP COMMUNICATION WITH THE MOBILE PHONE
2
SENSORS OVER THE IPv6
Dejan Drajić Ericsson d.o.o., Srbija 388
Srđan Krco Ericsson d.o.o., Srbija

Radomir Janković Union University School of Computing, Srbija


COMMUNICATION NETWORKS 2-TERMINAL
3 RELIABILITY AND AVAILABILITY ESTIMATION BY Slavko Pokorni ITS Information Technology School, Srbija 393
SIMULATION
Momčilo Milinović Faculty of Mechanical Engineering, Srbija

Vesna Milutinović RATEL, Srbija

ANALYSIS OF MONITORING DIPOLE AND MONOPOLE Tatjana Cvetković RATEL, Srbija


4 ANTENNAS INFLUENCE ON SHIELDING Faculty of Electronic Engineering, University 399
EFFECTIVENESS OF ENCLOSURE WITH APERTURES Nebojša Dončov
of Niš, Srbija
Faculty of Electronic Engineering, University
Bratislav Milovanović
of Niš, Srbija
Tehnički fakultet "Mihajlo Pupin" Zrenjanin,
Borislav Odadžić
Srbija
Tehnički fakultet "Mihajlo Pupin" Zrenjanin,
Dalibor Dobrilović
THE CROSS LAYER MODEL FOR WIRELESS Srbija
5
NETWORKS ENERGY EFFICIENCY Tehnički fakultet "Mihajlo Pupin" Zrenjanin,
405
Željko Stojanov
Srbija
Tehnički fakultet "Mihajlo Pupin" Zrenjanin,
Dragan Odadžić
Srbija

INFORMATION SYSTEMS
OLAP ANALYTICAL SOLUTION FOR HUMAN Ružica Debeljački Ekonomski Fakultet Subotica, Srbija
RESOURCE MANAGEMENT PERFORMANCE
1
MEASUREMENT AND EVALUATION: FROM 411
Olivera Grljević Ekonomski Fakultet Subotica, Srbija
THEORETICAL CONCEPTS TO APPLICATION

APPROACH TO MULTIDIMENSIONAL DATA


2
MODELING IN BI TECHNOLOGY
Jelena Lukić JP Elektromreža Srbije, Srbija 416

Stevan Ostrogonac Fakultet tehničkih nauka, Novi Sad, Srbija

Nataša Vujnović-Sedlar Fakultet tehničkih nauka, Novi Sad, Srbija


AN EDUCATIONAL APPLICATION COMPRISING
3 SPEECH TECHNOLOGIES FOR SERBIAN ADAPTED TO Branislav Popović Fakultet tehničkih nauka, Novi Sad, Srbija 422
VISUALLY IMPAIRED CHILDREN - ANMASTERMIND
Milan Sečujski Fakultet tehničkih nauka, Novi Sad, Srbija

Darko Pekar AlfaNum – Speech Technologies, Srbija


Željko Jovanović Fakultet tehničkih nauka, Čačak, Srbija
MINING LOCATION IN GEOGRAPHIC INFORMATION
4
SYSTEMS USING NEURAL NETWORK
Marija Blagojević Fakultet tehničkih nauka, Čačak, Srbija 428
Vlade Urošević Fakultet tehničkih nauka, Čačak, Srbija

Milan Lukić Fakultet tehničkih nauka, Novi Sad, Srbija

Goran Sladić Fakultet tehničkih nauka, Novi Sad, Srbija


A SYSTEM FOR TRACKING AND RECORDING
5
LOCATIONS OF ANDROID DEVICES
Stevan Gostojić Fakultet tehničkih nauka, Novi Sad, Srbija 432
Branko Milosavljević Fakultet tehničkih nauka, Novi Sad, Srbija

Zora Konjović Fakultet tehničkih nauka, Novi Sad, Srbija


Faculty of Mechanical Engineering, University
Gordana Stefanović
of Niš, Srbija
SOFTWARE PROVIDED WASTE MANAGEMENT
6
SUSTAINABILITY ASSESSMENT
Michele Dassisti Politecnico di Bari, DMM, Italy 438
The school of higher technical professional
Biljana Milutinović
education, Srbija

INTEGRATION AND INTEROPERABILITY


Fakultet organizacionih nauka, Univerzitet u
Goran Grubić
Beogradu, Srbija
Fakultet organizacionih nauka, Univerzitet u
Miloš Milutinović
Beogradu, Srbija
A METHOD FOR WEB CONTENT SEMANTIC
1
ANALYSIS: THE CASE OF MANUFACTURING SYSTEMS
Vanjica Ratković Živanović Radio Televizija Srbije, Srbija 444
Fakultet organizacionih nauka, Univerzitet u
Zorica Bogdanović
Beogradu, Srbija
Fakultet organizacionih nauka, Univerzitet u
Marijana Despotović-Zrakić
Beogradu, Srbija
Vladimir Dimitrieski Fakultet tehničkih nauka, Novi Sad, Srbija

Ivan Luković Fakultet tehničkih nauka, Novi Sad, Srbija


AN OVERVIEW OF SELECTED VISUAL M2M
2
TRANSFORMATION LANGUAGES
Slavica Aleksić Fakultet tehničkih nauka, Novi Sad, Srbija 450
Milan Čeliković Fakultet tehničkih nauka, Novi Sad, Srbija

Gordana Milosavljević Fakultet tehničkih nauka, Novi Sad, Srbija


METHODOLOGY FOR INITIAL CONNECTION OF Ramona Markoska Faculty of Technical Sciences, Macedonia
ENTERPRISES IN DIGITAL BUSINESS ECOSYSTEMS
3
USING COST-BENEFIT ANALYSIS IN COLLABORATIVE Aleksandar Markoski UKLO, Faculty of Technical Sciences, 456
PROCESSES PLANNING Macedonia
Faculty of Electronic Engineering, University
Vladimir Ćirić
of Niš, Srbija
Faculty of Electronic Engineering, University
Vladimir Simić
of Niš, Srbija
PHYSICAL MEDICINE DEVICES WITH CENTRALIZED Faculty of Electronic Engineering, University
4
MANAGEMENT OVER COMPUTER NETWORK
Teufik Tokić
of Niš, Srbija 462
Faculty of Electronic Engineering, University
Ivan Milentijević
of Niš, Srbija
Faculty of Electronic Engineering, University
Oliver Vojinović
of Niš, Srbija
Faculty of Mechanical Engineering, University
Dragan Pavlović
of Niš, Srbija
Faculty of Mechanical Engineering, University
Marko Veselinović
of Niš, Srbija
CONCEPTUAL MODEL OF EXTERNAL FIXATORS FOR Faculty of Mechanical Engineering, University
5
FRACTURES OF THE LONG BONES
Milan Zdravković
of Niš, Srbija 468
Faculty of Mechanical Engineering, University
Miroslav Trajanović
of Niš, Srbija
Faculty of Mechanical Engineering, University
Milan Mitković
of Niš, Srbija

MAKING SENSE OF COMPLEXITY OF ENTERPRISE


6
INTEGRATION BY USING THE CYNEFIN FRAMEWORK
Mila Mitić Institut "Mihajlo Pupin" Beograd, Srbija 473
Towards the Interoperable, Hyper-Connected
World: A Foreword to the Proceedings of the 4th
International Conference on Information Society
and Technology
Milan Zdravković*, Miroslav Trajanović*, Zora Konjović**
Faculty of Mechanical Engineering in Niš, University of Niš, Niš, Serbia
*

Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia


**

[email protected], [email protected], [email protected]

context models. These challenges alone cannot be


I. INTRODUCTION addressed without the consideration of the computing
The 4th International Conference on Information issues (ICT Challenge 2 - Advanced Computing) and
Society and Technology (ICIST 2014) was organized in information management of big data in distributed sources
Kopaonik winter resort, Serbia, 9-11.3.2014. The (ICT Challenge 4 – Content technologies and information
conference provided a venue for ICT researchers and management).
practitioners from all over the world to present their Defined by ISO/IEC 2382 as “capability to
research results and development activities, to exchange communicate, execute programs, or transfer data among
experiences and new ideas, and to establish the various functional units in a manner that requires the user
relationships for a future collaboration. The International to have little or no knowledge of the unique characteristics
Programme Committee (IPS) gathered 48 ICT experts of those units”, the enterprise interoperability is a common
from industry and academy from 18 countries, giving the denominator of all challenges of so-called hyper-
truly international dimension to a review process. connected world. The collaboration of the devices cannot
While the numerous and diverse topics were considered be achieved without enabling them with a seamless
relevant for the conference, the prospective authors were capability to exchange, use and re-use information and
invited to discuss about the specific challenges of services, even without taking into account the nature
enterprise and systems interoperability. The topic was and/or purpose of this information and services. Thus, the
selected by the conference chairs as a reflection of the first devices need to have capability, not only to exchange this
published ICT work programmes of the recently launched information, but also to perceive and understand it.
European framework for funding research and innovation,
Horizon 2020 (H2020). II. ABOUT THE PROGRAMME
Today, ICT research community is facing the numerous The specific objective of ICIST 2014 was to gather the
challenges related to a need to strive towards a hyper- top experts in the different ICT areas, to discuss about the
connected world with hundreds of billions of devices specific aspects of enterprise and systems interoperability.
fuelled by ambient and pervasive services (H2020 ICT A total of 90 papers were submitted to the conference,
Challenge 3 – Future Internet). Driven by the paradigms, each peer-reviewed by 1-3 members of IPC or external
such as Wireless Sensor Networks (WSN), Internet of reviewers. 49 papers were accepted for oral presentation,
Things (IoT) and Cyber Physical Systems (CPS), this resulting with a regular papers acceptance rate of 54%.
world will change the way we perceive the computing These papers are published in Volume 1 of this book.
capability. It will feature strong collaboration and Based on the content, the oral presentations during the
coordination, smart and autonomous behavior and conference were organized in 8 sessions.
collective intelligence. It will use extensive data, open and The session on the Business process modeling and
linked, from distributed sources. It will be capable to management dealt with the interesting topics of business
acquire and manage knowledge based on this data. It will requirements negotiation, dynamic manufacturing
combine different services to swiftly implement and networks, emergency management tools and collaborative
deploy business processes. business process assessment. It also presented some
While the diversity of devices is also rapidly growing, it practical solutions, related to managing the processes of
becomes more and more difficult to make them healthcare products manufacturing and integration of the
collaborate, due to a complexity caused by the fragmented enterprise portals.
architectures and incoherent unifying concepts. One This year’s ICIST received a relatively large number of
approach to address this challenge is to foster application- papers in the area of energy management. Given the
independent development of a new generation of significance of the different energy efficiency and
connected components and systems (ICT Challenge 1 – A sustainability issues, especially in context of the emerging
new generation of components and systems), challenges of industrial symbiosis and importance of
complemented by externally located behavioral and enterprise interoperability for its resolution, one session

Page 1 of 478
was dedicated to this area. It presented 5 papers on the IT discussed about it in the case of disaster management
support to renewable energy exploitation and novel scenario. The discussion on the location-based services
methods for energy management, e.g. genetic algorithms. was facilitated by the papers which addressed the use of
The session on Internet of Things hosted 6 papers, ontologies in cadastral systems and context-driven web-
mostly dealing with deployment of sensor-based map solutions. Furthermore, interesting discussions on the
technologies. The authors were addressing the important interoperability issues in citizen-centric IoT, cross-
issues of WSN and sensor lifetime, usability, false organizational business processes and industrial symbiosis
detection filtering, network monitoring and data were presented.
visualization. An interesting case study of motion analysis
with wearable sensors was also presented. A. Poster sessions
Knowledge acquisition, management and interpretation To foster the scientific discussion and to pursue the
are recognized as some of the biggest challenges in collaboration opportunities, ICIST also organized poster
handling big data from disparate sources in hyper- sessions. The poster sessions included papers passing the
connected world, collected from multi-modal and multi- minimum threshold of scientific merit, presenting and
dimensional stimuli. This topic was addressed by 8 papers discussing about the interesting relevant ideas.
in a separate ICIST 2014 session. The authors dealt with 36 papers were presented during the two poster
context-modeling, mapping standards to ontologies, sessions, organized in different groups on the topics of
knowledge management in virtual communities, reasoning integration and interoperability, computing, hardware and
in semantic network, interpretation of knowledge from telecommunications, information systems and e-society, e-
speech and medical images. government and e-learning. All papers accepted for the
While the new circumstances are changing our poster sessions are published in Volume 2 of this book.
perception of the computing technologies, the software
development paradigms will also have to face new B. Invited Keynote
obstacles, mostly related to deployment platforms which With aim to provide truly inspiring settings for the
are now evolving from static computers to dynamic scientific discussion, this year’s ICIST 2014 invited Prof.
sensors and associated processing devices. The Software Ricardo Jardim-Gonçalves, from UNINOVA, Portugal to
Development topic was addressed by 6 papers, presented introduce the participants to the possible impact of the
in a dedicated session. The papers discussed about the pervasive computing to new organizational forms and
database architectures, tools for agent-oriented languages, business models.
visualization and performance evaluation. In his talk, he addressed the actual developments and
The great most of the authors dealt with the social and trends in the domain of Digital and Sensing Enterprise
technological challenges related to the paradigms of (with objects, equipment, and technological infrastructures
information society. Total of 10 papers were presented in exhibiting advanced networking and processing
two sessions on E-Society and E-Learning. The topics capabilities), and he gave insights concerning the
addressed were: assessment of information society realization of these concepts in the advent of the Internet
development and e-Government interoperability, linked of Everything.
open data, scientific & research results management and
evaluation, electronic public services, new technology- C. “Manufacturing the Future”
enabled teaching methods and sentiment data analysis. The scientific programme of ICIST 2014 was
This year’s ICIST hosted Technical Committee (TC) complemented with the training activities, through the
5.3 “Enterprise Integration and Networking” of the workshop “Manufacturing the Future: Automating and
International Federation for Automatic Control (IFAC), Connecting Dynamic Production Networks”.
who used this opportunity to organize a special session on The workshop was organized by IMAGINE FP7
Interoperability of Ubiquitous Systems: the future of the Project, funded by the European Commission under the
Internet-of-Everything. “Virtual Factories and Enterprises” theme of the 7th
A very high interest of researchers’ community to the Framework Programme (FoF-ICT-2011.7.3, Grant
new paradigm of ubiquitous computing gave rise to the Agreement No: 285132).
concepts synthesized in a term of Internet-of-Everything
(IoE). Today, the different devices with digital III. ACKNOWLEDGEMENT
information and services are embedded into our everyday The editors would like to thank all members of the
environment, interacting with us and other devices, organizing committee of YUINFO conference for
sometimes without any mutual awareness of this providing the full logistics and all other kinds of support
interaction. One of the concepts that are foreseen to have a to ICIST 2014.
very big impact on the future IoE initiatives is the
The editors wish also to express their sincere
interoperability of the ubiquitous systems. This specific
appreciation to the members of IPC and external
topic was addressed by 7 papers in the dedicated session.
reviewers who contributed to the quality of this year’s
The session proposed the theoretical concept of
programme by providing the detailed and timely reviews.
interoperability as a property of a ubiquitous system and

Page 2 of 478
ICIST 2014
Regular papers

Sections:
1. Business Process Modelling and Management

2. Energy Management

3. E-Society and E-Learning

4. Internet of Things

5. Knowledge Modelling, Extraction and Interpretation

6. Software Development

7. Interoperability of Ubiquitous Systems: The Future of the Internet-of-


Everything

Page 3 of 478
ICIST 2014 - Vol. 1 Regular papers

Business Requirement Negotiation based on


Generalized Requirement Approach (GRA)
Aleksandar Bulajic*
* LANB, Kongens Lyngby, Copenhagen, Denmark
[email protected]

Abstract—Business software development is based on the The IBM Project Management presentation use the
specific business requirements that are collected during Meta Group study to illustrate that 70% of large IT
requirement negotiation process. Gathering business projects failed or did not meet customer expectation. [2]
requirements, when final product requirements are dictated This research work proposes new method for business
by known client, can be a difficult process. An idea about requirement negotiation process called Generalized
new business product can be obscure, and described by Requirement Approach (GRA). The GRA requires
general terms that contributes very much common demonstration of business requirements during
misunderstandings. Business requirement verification requirement negotiation process. To be able to
accomplished by using text and graphics, and manual demonstrate requirement, the GRA requires the GRA
review processes, can be slow, error prone and expensive.
Framework. The GRA Framework is implementation of
Misunderstandings and omitted requirements affect future
the GRA method. The GRA method is described in the
software product. This research work proposes new
“Generalized Requirement Approach (GRA)” section. The
approach to requirement negotiation, the Generalized
GRA Framework is described in the “GRA Framework
Requirement Approach (GRA) and is focused on
demonstration of business requirement during requirement
(GRAF)” section.
negotiation process. The process of the business requirement The GRAF guides process of the business requirement
negotiation is guided by the set of predefined objects that negotiation by a set of predefined objects that store
store requirement description in the common repository, in requirement description in the structured text format in the
the structured text format. The objects attributes and common repository. The object attributes and properties
properties are guidelines for specifying sufficient level of are guidelines for specifying sufficient level of
requirement details for generating source code that is used requirement details for generating source code [3].
for requirement demonstration. The source code and Automated build is using source code to create
executables are generated without manual programming. executables and demonstrate requirement on fly. The
source code and executables are generated automatically
I. INTRODUCTION without manual coding by the Generic Programming
Business requirement specification is one of the most Units (GPU). The GPU is a class or module responsible
important documents for software development project. for generating source code. The GPU is based on the
The contract signing, budget, time scheduling and parameterized methods. The GPU is setting method
resource allocation depends hardly on the correct business parameters to the values stored in the structured text
requirement specification. Omitted and misunderstood format before generating source code. Besides changing
requirement can cause huge revision and code refactoring parameters and generating methods, the GPU is able to
in late software development phases and affect project generate user interface, classes, SQL statements and
budget and duration. configuration files. The GPU is described in the “GRA
Framework (GRAF)” section.
Gathering business requirements, when final product
requirements are dictated by known client, can be a The GRA addresses requirement management
difficult process. An idea about new business product can syndromes, specification of Insufficient Details Level [3],
be obscure, and described by general terms that the IKIWISI (I’ll know it when I see it) syndrome,
contributes very much common misunderstandings. the “Yes, but’ syndrome (‘that is not exactly what I
Business requirement verification accomplished by using mean’) and the ‘Undiscovered Ruin’ syndrome (‘Now that
text and graphics, and manual review processes, can be I see it, I have another requirement to add’).
slow, error prone and expensive.
II. RELATED WORK
Research studies show that issues related to
requirements that are discovered in later project phases Traditional requirement management approach is often
produce even greater costs and delays. Discovering or identified by the Waterfall [4][5] software development
modifying requirements in the Design Phase could be method, where comprehensive requirement analysis and
three to six times more expensive. In the Coding Phase it documenting is completed before a start of the next
is up to 10 times more expensive, and in the Development project phases. On the contrary, the Agile Requirement
Testing Phase it is 15 to 40 times more expensive. In the Management [6] does not wait that all requirements are
Acceptance Phase, it is 30 to 70 times more expensive, specified, neither is waiting that a whole requirement is
and in Operation Phase it could be 40 to 1000 times more specified. A development starts as soon as a part of the
expensive. [1] requirement is understood [7]. The project is developed by
using an iterative and incremental approach. The Agile
software development process is based on the short

Page 4 of 478
ICIST 2014 - Vol. 1 Regular papers

development iterations. Each of iteration implements a unambiguous, complete, consistent, traceable, verifiable
limited number of requirements. The next iteration is [15].
planned on the user feedback and experience collected The Unified Approach [14] added to this list next
during iteration testing process [7]. Short iteration characteristic “understandable”.
advantage is early discovering of the requirement Other authors, such as Wiegers, describe characteristics
misunderstanding. However, if requirement is
of excellent requirement by requirement statements
misunderstood, then time spend for code development can characteristics such as complete, correct, feasible,
be waste and affects project scheduling and budget. unambiguous, and verifiable [16].
Requirements that are implemented in the next iteration
can require code refactoring. Huge code refactoring can Wiegers makes differences between Requirement
affect project budget and scheduling. Description and Requirement Specification description
and a good Requirement Specification describes as
Mc Connell [8] pointed to importance of software complete, consistent, modifiable and traceable [16].
project proper preparation and prerequisites such us
planning, requirements, architecture and design. Requirements verification is a process of improving
requirement specification according to recommendation of
The Test Driven Development (TDD) is Extreme good requirement description practice.
programming method based on the test first approach. The
test is created before implementation code [9]. The TDD Wiegers [16] favor technique for requirement
improved test coverage and promotes testing culture [10]. verification is formal inspection of requirements
While low test coverage can mean that test was not document accomplished inside of the small teams where
properly executed, high test coverage guarantee are represented different views, such as analyst view,
customer view, developer and tester view. This technique
nothing [11]. is supported by testing requirements by developing
The Microsoft Solutions Framework (MSF) is functional test cases and specifying acceptance criteria
Microsoft best practice method for delivering software [16].
according to specification, on time and on budget. [12] Rational Unified Process [14] use traceability matrix for
“The MSF philosophy holds that there is no single requirement verification. A requirement or a need in RUP
structure or process that optimally applies to the terminology is linked to a feature. A Feature is linked to a
requirements and environments for all projects. It Software requirement and Use Case. Use Case is linked to
recognizes that, nonetheless, the need for guidance exists.” Test Cases. If some of the links is missing it is considered
[12] an indication that requirement is not properly verified.
Hewlett-Packard experimented by implementation of Requirement verification in this case is considered done if
the Evolutionary Development method (EVO) “to a link to a Use Case and a Test Case exists [14].
improve software development process, reduce number of Sommerville [17] for requirement verification process
late changes in the user interface and reduce number of specifies requirement reviews, test case generation and
defects found during system testing” [13]. The first and automated consistency analysis in case when requirements
second attempt that used two weeks delivery cycles and are specified “as a system model or formal notation”.
four to six delivery cycles over more than year and half
failed to delivery expected features and expected results. Prototyping technique is used for requirement
[13] The third attempt that used first month to prototype validation. Sommerville see prototyping as a requirement
after 4,6 months of implementation delivered world class verification technique [17]
product. [13]. These experiments on the full scale Requirement validation is a process of “evaluating of
industrial projects confirmed importance of prototyping as software component during or at the end of development
a tool for requirement clarification. process” [18]
The Unified Software Development Process, an Prototyping is an effective method for requirements
iterative and incremental component based software clarification, proof of concept and reducing a risk that
development method that is case driven, architecture final product is significantly different than expected [16].
centric and risk focused has been created in 1999. [14] Requirement verification accomplished by using text
Road map in the Unified Process method is described and graphics, and manual review processes, can be slow,
as The problem domain, Stakeholder needs, Moving error prone and expensive.
Toward the Solution Domain, Features of the System, Omitted and misunderstood requirements can cause
Software requirements.[14] A Problem Domain is huge revision and code refactoring in late software
identified by Needs, while Features and Software development phases and affect project budget and
Requirements belong to the Solution Domain.[14]. The duration.
most known implementation of the Unified Process (UP)
is IBM Rational Unified Process (RUP) component-based III. SOFTWARE DEVELOPMENT METHODOLOGY (SDM)
process. The Software Development methodology is software
However, the first step in the software development development process that can be described by following
process is requirement description and clarification. development phases and activities:
Collecting and describing requirements in the  Analysis – system requirements management,
Requirement Specification document can be a difficult
job. The natural language is subject of different  Architecture & Design –system design,
interpretation and cause ambiguities.  Development – internal design and coding ,
The IEEE 1998b standard describes characteristics of a  Test – test and validation,
good requirement specification such as correct,  Deployment – operation and maintenance.

Page 5 of 478
ICIST 2014 - Vol. 1 Regular papers

The SDM is a structured approach to software In case of more traditional approaches, this can be a
development. The SDM purpose is production of high- month or months long.
quality software in a cost-effective way [17]. The Traditional requirement management, Waterfall like
structuring process purpose is to enable process planning method, is most appropriate for a project where
and controlling. The SDM process structure is requirements are stable and do not change during software
implemented in the different software methodologies, development process. However, analysis shows that an
sequential and iterative, incremental and evolutionary, average of 25 % of requirements change in the typical
rapid application development and prototyping project, and change rate can go even higher to 35% to
History of the Software Development Methodology 50% for large projects [21].
(SDM) started in the 1956 when Herbert D. Benington If time difference between requirement specification
presented his paper “Production of Large Computers and requirement validation is longer, then is most likely
Programs” at "Symposium on advanced programming that requirement will be changed.
methods for digital computers: Washington, D.C., June
This process can be improved by introducing
28, 29, 1956" by [19].
requirement demonstration as early as possible to avoid
Dr. Winston W. Royce in 1970 presented his personal waste of time and resources on implementation and
view about managing large software developments in his modification of misunderstood requirements.
paper "Managing the Development of Large Software
Systems" at “Proceedings of IEEE WESCON 26 “ [20]. IV. GENERALIZED REQUIREMENT APPROACH (GRA)
While Herbert D. Benington called the first phase, where The Generalized Requirement Approach (GRA)
broad requirements are defined, the Operational Plan solution proposes requirement validation prior to creating
phase, Dr. Winston W. Royce called the first software Requirement Specification. Requirement validation
development phase the System Requirements phase. requires creating of the executables that are created from
The process of the requirement specification, the source code. Writing source code manually can be
verification and validation is described in the Figure 1 slow and error prone process.
“Traditional Requirement Management Approach”: The GRA method proposes automatic source code
generation from structured textual descriptions that are
expressed by customer native language. The process of
describing requirements, generating source code and
requirement demonstration is called Requirement
Normalization process. The Figure 2 “Generalized
Requirement Approach Overview” illustrates proposed
solution:

Figure 1 “Traditional Requirement Management


Approach”
The requirement verification is understood as a process
of the initial requirement evaluation, executed during
requirement gathering, elicitation and specification. [18]
The requirement validation is understood as a process of
the requirement evaluation after completing of the
development phase. [18]
The output from the Traditional Requirement
Management is Requirement Specification document. The
Requirement Specification document is used as reference Figure 2 “Generalized Requirement Approach (GRA)
document for further software development planning’s’ Overview”
and activities, Design Specification, Code Writing and
Testing & Validating, even it is well known that a written The Requirement Normalization process is responsible
texts as well as graphics are ambiguous and subject of for:
different interpretations.  Guiding user to specify sufficient level of
The choice of software development method affects details [3] by using customer native language,
time distance between requirement specification and  Storing requirement description in the
requirement validation. In case of the Agile development structured text format,
methods this time distance can be a week or weeks long.  Automatic source code and executables
generation.

Page 6 of 478
ICIST 2014 - Vol. 1 Regular papers

Besides requirement description, the Requirement Figure 3 “The GRA Framework Design”
Normalization primary goal is to clarify obscure
customer’s requirements. The Requirement Normalization The GRAF is organized around central repository. In
process is considered complete when requirement is the central repository are stored requirement descriptions
possible to describe by sufficient level of details from and used by Code Generator when necessary.
which is possible generate source code and build Designer is responsible to store structured text format
executables. The outputs from the Requirement descriptions in the Database and for guiding a user to
Normalization are generated Requirement Specification specify sufficient amount of details. Omitting sufficient
and source code. The source code can be used in the next number of details during requirement specification can
project phases. affect project duration and increase overall cost expenses
While traditional requirement management writes [3].
Requirement Specification document, the Requirement Code Generator is responsible for generating source
Specification in case of the GRA method is stored in the code by using structured text data stored in the Database.
central repository and can be generated on demand. It is The source code is generated in the standard programming
not recommended direct update of the Requirement language, for example C# or Java. The generated source
Specification. Updates should be accomplished though code is executed in the Runtime Environment. The
Requirement Normalization process. Runtime Environment depends of the generated source
Based on discussion in this section are identified code. For example, if the C# source code is generated by
following GRA features: Code Generator the Runtime Environment needs
Microsoft .NET and CLR installation. If the Java source
 Document and store requirements in the code is generated by Code Generator the Runtime
structured text format described by customer Environment needs JRE installation.
native language,
The Test & Validation process validate requirements by
 Generate source code without manual using code that is executed in the Runtime Environment.
programming, If requirement does not satisfy expectations the process
 Demonstrate working software during can be repeated and retuned back to Designer.
requirement negotiations process, The GRA method can be implemented by using
different technologies, such as Microsoft .NET, Java or
V. GENERALIZED REQUIREMENT APPROACH JavaScript. In this paper the GRA Framework is
FRAMEWORK (GRAF) OVERVIEW implemented by using the Microsoft .NET and C#
The Generalized Requirement Approach Framework language. Each implementation can be based on the
(GRAF) is implementation of the Generalized different object types.
Requirement Approach (GRA) method. The GRAF The GRA Framework used in this paper identifies
contains code, classes, objects and libraries that are following groups of objects that are used by Designer
guiding user during requirement negotiation process to during requirement negotiation process:
provide detailed requirements specification that is
 Objects responsible for requirement description
sufficient to generate source code and executables. The
GRAF is responsible for implementation of the GRA and documenting such as Requirement, User
features. Story, Use Case, and Test Case.,
The Figure 3 “The Generalized Requirement Approach  Objects responsible for storing data in structured
Framework Design” illustrates the GRA framework high text format that are used to generate source code
level design: such as Forms, Data Sources, Application
Objects and Interfaces.
Each of the GRA Framework object is mapped to one
or more corresponding database entities that are used for
storing data in the structured text format and for retrieving
data when the GRA Framework need it.
Objects responsible for requirement description and
documenting are designed according to best practice [22].
Objects responsible for storing data in structured text
format are business application building blocks and in this
particular GRAF implementation are used following
objects:
 Form object is describes entry fields and other
predefined User Interface (GUI) controls that
enable user and software application
interactions,
 Data Source object is responsible for creating
database tables and relations,
 Application Object is responsible for backend
and batch job processing,

Page 7 of 478
ICIST 2014 - Vol. 1 Regular papers

 Interface object is in the same time Application implementation. The generated SQL statements are used
Objects, but for this kind of objects is specific in the implementation of CRUD database operations. The
communication with sources of data external to GPU shall be able to generate other source code if there
application. are available sufficient details of information and if
implementation technology can support it.
Objects responsible for storing data in structured text The Runtime Environment is responsible for execution
format are used to generate source code. Code Generator of the generated source code and is using Database for
designed for this paper is illustrated in the Figure 4 “The storing and retrieving application data.
GRA Framework Source Code Generation”:
VI. EXPERIMENT
The GRA Framework implementation is tested on the
Retail Store application. The Retail Store is a fictive E-
Commerce application described by following Retail
Store User Story:
“As the Retail Store we want to sell our products on-
line through Internet in order to increase product
availability, get in touch with more customers and
increase sale and profit”. From the Retail Store User
Story is possible to identify:
 ProductComponent object,
 Sales Operation.
The Product Component requirements are described in
the Salesman User Story as “a need to add, update and
Figure 4 “The GRA Framework Source Code remove product from the product list”. The Sales
Operation requirements are further elaborate in the Buyer
Generation”
User Story as “a need to select product, add product to
The source code is generated from the structured text shopping cart, create order and enable online payment by
descriptions, the GRA Libraries and Templates. The credit card”.
structured descriptions are stored in the Database tables. The source code is generated according to the process
The GRA Libraries are containing parameterized methods described in the Figure 4 “The GRA Framework Source
and templates. These methods and templates are adapted Code Generation generated source code. The source code
to requirement specifics, and inserted in the generated has been generated from description of the Product,
source code. The Templates contains controls and controls Shopping Cart, Order and Payment forms. Each form is
attributes that are specific to implementation technology. described by forms name, field name, field type, data type,
For example if it is generated ASP.NET source code, the data length and number of decimal places, and forms
Templates are adapted to the ASP.NET controls such as control type, such as text field, drop down list, check box,
Textbox, Button or Dropdown list, as well as to the button, etc. The GPU from this data generates forms that
ASP.NET language specific syntax. The methods and can insert, modify and delete entries, and execute action
templates are used as building blocks to create source code that is assigned to the form fields. The Figure 5
code. The process of source code generation is initiated “Product Form” illustrates generated Product form:
externally from Actor by sending a name of the object that
needs to be generated.
The Generic Programming Unit (GPU) is a code, a
method, a class or a module that is able to generate source
code. One example of the GPU is a GPU that can generate
form. The form can be described by form’s name, field
name, field type, data type, data length and number of
decimal places, and form’s control type, such as text field,
drop down list, check box, button, etc. The GPU from this
data shall be able to create form that can insert, modify
and delete entries, and execute action code that is assigned
to the form fields.
The Generic Programming Unit (GPU) is glue that
connects database structured text descriptions, library
methods and templates, and creates source code. The GPU
is reading data stored in the Database for each particular
object and creates source code according to requirement
description by using GRA Libraries and Templates. The
Figure 5 “Product Form”
GRA Libraries contains templates and methods. The
outputs from the GPU are HTML Web Page, Code Behind Other forms are generated the same way and the GPU
Class, Data Object Class and SQL. The Data Object Class generates the fully functional application that is able to
is responsible for data mapping from relational database to demonstrate Product and Sales Operation components.
objects and is a part of the Data Access Object pattern The user of generated application is able to enter, store,

Page 8 of 478
ICIST 2014 - Vol. 1 Regular papers

update and browse data, add data to Shopping Cart, REFERENCES


change selected quantity and review Order before [1] DragonPoint, Inc (2008), Company Newsletter issue No. 3,
executing payment operations. To the Order form are “Requirements Capture: Keys 6 Through 10 to a Successful
assigned calculations for calculating items price, handling Software Development Project”, available at
fee and VAT amount, and for calculating Order total http://www.dragonpoint.com/CompanyNewsletters/Requirements
amount. The limited space in this paper does not allow full CaptureKeys610.aspx
presentation of the generated application. [2] IBM (2007), “IBM Project Management”, available at
http://facweb.cs.depaul.edu/yele/Course/IS372/Guest/Dawn%20G
VII. CONCLUSION oulbourn/IBM%20PM%20presentation%20for%20DePaul.ppt
[3] Bulajic, Aleksandar, Stojic, Radoslav, Sambasivam, Samuel
The result of the experiment, E-Commerce fictive (2013), “Gap Between Service Requestor and Service Provider”,
application, demonstrates feasibility of proposed solution, "Applied Internet and Information Technologies" ICAIIT2013 ,
and shows that the predefined set of the framework Zrenjanin, Serbia, October 26, 2013
objects and code that uses data defined during requirement [4] Benington, Herbert D. (1956), “Production of Large Computers
negotiation process is sufficient to generate source code Programs”, Symposium on Advanced Programming Methods for
Digital Computers sponsored by the navy Mathematical
and generate application without a need for writing code Computing Advisory Panel and the Office of Naval Research,
manually. June 1956,
The Generalized Requirement Approach (GRA) [5] Royce, Winston W. (1970), “Managing The Development of
proposed in this paper can improve software development Large Software Systems”, Proceedings of IEEE WESCON 26,
productivity, and improve the quality of the final product. August 1970,
[6] Beck Kent, Mike Beedle, Arie van Bennekum, Alistair Cockburn,
However, effective use of the GRA method requires Ward Cunningham, Martin Fowler, James Grenning, Jim
implementation of the GRA Framework (GRAF). The Highsmith, Andrew Hunt, Ron Jeffries, Jon Kern, Brian Marick,
GRAF objects attributes are guidelines for specifying Robert C. Martin, Steve Mellor, Ken Schwaber, Jeff Sutherland,
requirement and providing sufficient details level. While Dave Thomas (2001), “Manifesto for Agile Software
in the existing Software Development Methodology Development”, available at Internet http://agilemanifesto.org/
source code is written by programmers manually, the [7] Beck, Kent (2002), “ Introduktion til Extreme programming”,
GRAF generates source code from the requirement user IDG, 30-05-2002
descriptions stored in the central repository. [8] Mc Connell, Steve, “Code Complete 2: A Practical Handbook of
Software Construction“, Microsoft Press, 2004
The proposed solution can contribute to: [9] Beck, Kent (2002a), “Test Driven Development by Example”,
 Clarify requirements and improve requirement Addison-Wesley , November 18, 2002
understandings, [10] Bulajic, Aleksandar, Sambasivam, Samuel, Stojic, Radoslav
(2012), “Overview of the Test Driven Development Research
 Address IKIWISI, “Yes, but”, “Undiscovered Projects and Experiments”, Informing Science and Information
Ruin” and “Insufficient Details Level” Technology Education 2012 Conference (InSITE) in Montreal,
requirement syndromes, Canada, June 22-27, 2012
 Closing a gap between requirement [11] Cornett, S. (2011). “Minimum acceptable code coverage”,
Bullseye testing Technology, 2006-2011
specification and requirement validation,
[12] “Microsoft Solution Framework 3.0 Overview“ (2003),
 Producing of an environment where “Microsoft Solution Framework White Paper”, Microsoft, 2003
requirements can be executed, analyzed, [13] May, Elaine L., Zimmer, Barbara A. (1996), “The Evolutionary
observed, and validated, Development Model For Software”, Hewlett-Packard Journal,
August 1996
 Promote customer active participation.
[14] Leffingwell, Dean, Widrig, Don (2000), “Managing Software
The generated source code and executables are fully Requirements : A Unified Approach”, Addison-Wesley,2000
functional application that can be executed and tested. The
[15] “IEEE Recommended Practice for Software Requirements
Retail Store demo application can demonstrate workflow, Specification” (1998), Software Engineering Standards Committee
data, algorithms, and can be used for ad-hoc testing. of the IEEE Computer Society
According to the currently collected experience, the [16] Wiegers, Karl E. (2003), “Software Requirements”, Microsoft
critical part of this approach is providing sufficient Press, A Division of Microsoft Corporation, One Microsoft Way,
amount of the features that are in the GRAF represented Redmond, Washington, 2003
by Application Object. Application Object represents [17] Sommerville, Ian (2001), Software Engineering 6 th Edition”,
Pearson Education Limited
classes and generic methods that solve particular
[18] “IEEE Standard Glossary of Software Engineering Terminology”
programming issue. For example it can be testing of (1990), IEEE-Std 610.12, IEEE Standard Board, September 28,
unique Id, moving rows from one relation table to other or 1990
creating new entities that are combination of the existing [19] Benington, Herbert D. (1956), “Production of Large Computers
entities. In the Retail Store demo application example Programs”, Symposium on Advanced Programming Methods for
such example is the addRowToDataSource generic Digital Computers sponsored by the navy Mathematical
method. The addRowToDataSource method is able to add Computing Advisory Panel and the Office of Naval Research,
current data source row to any other data source. In this June 1956,
GRAF implementation, the target data source is specified [20] Royce, Winston W. (1970), “Managing The Development of
during requirement negotiation and is stored in the Large Software Systems”, Proceedings of IEEE WESCON 26,
August 1970
requirement description.
[21] Larman, Craig (2005), “Applying UML and Patterns”, Pearson
This framework version is developed for research and Education, 2005
experiment purposes. The further development can create [22] Cockburn, Alistair(2001), “Writing Effective Use Cases”,
a product that besides requirement negotiation can be used Addison-Wesley
for estimation, and generally speaking for project
management purposes

Page 9 of 478
ICIST 2014 - Vol. 1 Regular papers

The concept of the information system for


managing business processes of designing and
manufacturing of osteofixation material
Dragan Mišić*, Miloš Stojković*, Nikola Vitković*, Miroslav Trajanović*, Miodrag Manić*, Nikola
Korunović*, Jelena Milovanović*
Faculty of Mechanical Engineering, Nis, Serbia
*

[email protected], [email protected], [email protected], [email protected],


[email protected], [email protected], [email protected]

Abstract— One of the characteristics of modern production Information System (HIS) are dealing with processing
is adapting products to specific customer requirements. This data, information and knowledge in health care
principle is applied in the industry for some time, but this is environments [2].
not the case in medicine. The idea behind the information These systems deal with information flow in medical
system described in this paper is to support, improve and facilities, but they are rarely connected to information
accelerate manufacturing of medical supplies, which are systems of companies that make equipment used in
adapted to the patients (customers). This IS will be applied hospitals.
in process of design and manufacturing of osteofixation
material, in order to obtain high-quality products In [3], for example, is described integration in medicine
customized to the individual needs of patients. The MD manufacturing enterprises. They recommend use of SOA
system for business process management, developed at the and RFID integration technologies.
Faculty of Mechanical Engineering, will be used as a tool for Some authors are trying to apply Supply chain
the implementation and integration of the various activities management technologies in health care [4]. Those
in this processes. authors emphasize the fact that supply chain management
in a health care setting is characterized by some unique
features, which make it difficult to transfer knowledge
I. INTRODUCTION from the industrial sector to a health care sector in a direct
The project VIHOS (Virtual human osteoarticular way. The authors conclude that existing concepts, models
system and its application in preclinical and clinical and supply chain practices can be extended to supply
practice) [1] is focused on developing various tools that chain management in health services and existing research
ought to help doctors and engineers in specific segments underpins the assumption that the health sector can benefit
of their work. The project deals with the development of from the lessons learned in the industrial sector.
geometrical and simulation models of human bones, In this paper, we described integration of parts of
internal and external fixators and the scaffolds. Also, information system in hospital (orthopedical clinic) and
project addresses development of mathematical parametric enterprise(s) that manufacture osteofixation material.
models that, based on different types of radiology images Because of its flexibility, we chose Business Process
which come from doctors, generate the aforementioned Management System technology (BPMS) to be the tool
models. Developing software which can be used as for integration [5].
assistance with planning orthopedic operations is also one
of project's goals. The flexibility which BPMS offer can be noticed in
several aspects. These systems provide a very simple way
These tools are in fact independent software modules, of creating business process model, which is later used to
often developed by using different software packages or in execute particular instances. On that occasion there is no
different programming languages. need for any coding, so even the person without any
One part of the project mentioned above refers to programming experience is able to define and change the
creation of the production environment which will enable model.
mentioned services to be used and contribute to improving The activities which are executed within process
the quality of services which are offered in orthopedics. instance can be automatic and manual. Automatic
Due to variety of applied techniques and tools, the main activities are those that BPMS automatically executes by
problem with defining information system which would calling third-party software modules, while manual
try to automate this process is integration of different activities are executed by people (process participants).
software solutions. The processes referred to in this particular case use both
By analyzing researches related to integration of automatic and manual activities. So far we have
information systems used in medical facilities and automated parts of process for which developed software
enterprises which manufacture medical equipment, we exists (such as generating geometrical models of human
realized that there are very few papers dealing with this bones based on radiology images coming from hospital),
subject. while manual activities will be used for those parts of
When talking about use of information systems in process that are not yet automated.
medicine, it's mainly referred to Health IS. Health

Page 10 of 478
ICIST 2014 - Vol. 1 Regular papers

BPMS – MD is used for executing and monitoring the an editor enters the process model. If the structure of the
activities of this process. organization allows that, it is possible for the process
manager to enter the process definitions himself. The
II. BUSINESS PROCESS MANAGEMENT SYSTEM MD definition is entered by means of graphic process editor
Business Process Management System MD is [6].
developed at the Faculty of Mechanical Engineering in This system is not developed from scratch. The existing
Nis. Architecture of this system is shown in figure 1. system Enhydra Shark was used as a system core, and it
In MD WfMS, workflows are defined by a process was later extended with elements related to artificial
manager, i.e. the manager who is responsible for planning intelligence. That means that the system is connected to
and managing the process. Workflows are defined by a expert system in which is possible to define the rules
person who is in charge of them, usually a business which will be used for exception handling. We use expert
process manager i.e. the manager who is responsible for system created via JESS expert system shell [7]. This is
planning and managing the process. According to that the Java rule based system, created in Sandia National
definition, the system administrator with the assistance of Laboratories, from Livermore, California.
Database

Administrator
Process Process Business
definitions instances data

Rules
Expert
System core
Process editor system
XPDL
Meta
Exception rules

MD System users Knowledge


Process engineer
manager
Business environment

Knowledge
acquisiton

Fig. 1. MD BPMS architecture


In cases when it’s necessary to adapt, information
III. THE PROCESS OF DESIGNING AND MANUFACTURING system speeds up the whole process of designing and
OSTEOFIXATION MATERIAL developing tools, and it’s also responsible for achieving
The information system described here should support adequate quality. That can be done by arranging the whole
creation of osteofixation material and its application to the process, automating as many activities as possible and by
patient. Osteofixation material is a term that typically using knowledge management tools.
refers to an assembly which consists of fixator, scaffold Depending on the type of the fracture, doctors and
and/or the graft. When the bone suffers less mechanical engineers have to deal with many problems. One of the
load, the fixator isn’t needed. possibilities is that the patient is missing a part of a bone.
In this scenario, the owner of the automating process is If the missing part is large, it must be made for the patient
a fictional company, whose mission is to deliver the final by reserve engineering and embedded. If the missing part
product (osteofixation material) to the doctors. is smaller, a scaffold is used to enable the bone to
regenerate.
Depending on the type of the patient’s injury, there is a
possibility of using osteofixation material which is In case that it's a fracture at which there are no missing
standardized (doctor only takes the aids that are already in parts, then the goal of this process is to make a fixator
stock) or it can happen that the injury is somehow which will be adapted to patient's needs (this is the reason
specific, in which case is needed to adapt aids to the for starting the process).
patient.

Page 11 of 478
ICIST 2014 - Vol. 1 Regular papers

The processes which we plan to define and follow The process that we plan to define and monitor using
through the information system will consist of the information system consists of the numerous activities.
activities shown in figure 2. After reviewing the patient’s condition, consilium of
As already mentioned, the activities in process can be surgeons, based on radiology images, decide which
automatic or manual. Processes that only have automatic osteofixation material is most appropriate in that particular
activities are very rare and those are mainly cases when case. If it is the case for standard osteofixation material,
the BPMS is used for integration of computer the information system can be used for selection of size
applications, in order to create a new one. and type of aids. If it is a fracture that requires customized
Real processes consist of both manual and automatic approach the process of designing and developing the aids
activities, and the process described here is like that. Some will be lunched. Surgeons select company which can offer
of the listed activities are realized as manual. These are they need in this particular case. Based on that, a system
the activities for which appropriate software doesn't exist administrator of manufacturing company or surgeon,
yet. There are also automated activities, such as activity depending on organization of process, starts a new process
Creation of cloud of points, when the corresponding instance.
algorithm, that is able to define how the bone looks based
on radiology image, is called.

Figure 2: Process model


In the next activity radiology images that are required of designing and manufacturing aids. There are several
for work are collected. Those are X-ray images, CT treatments that the information system can monitor. The
images or MRI images, or a combination of the above. doctor can decide if it is necessary to make the missing
Images mentioned here are in digital form. As such, they part of the bone (which will be realized by using reverse
enter in the process and become a part of data flowing engineering and in that case the personal model is
through the process. That way they will be available in required) or the scaffold is needed or just the fixator
every activity of the process in which they are required. should be customized.
After collecting the images (which is done in a If the part of the bone is missing or the radiology
hospital), surgeons should inform the information system images are not good enough, it is necessary to create
about the chosen treatment. This decision defines which personal model of the bone. This model suits to the
branch of the process will be executed; defines the method particular patient. It is obtained on the basis of parametric

Page 12 of 478
ICIST 2014 - Vol. 1 Regular papers

bone model developed at the Faculty of Mechanical model can be treated as cloud of points model, and as
Engineering. If the radiology images are of good quality such, it can be used in any CAD application. The model is
and if the bone does not have any missing parts, then the based on anatomical points defined on B-spline curves
geometric model of the bone is created. In order to create created over input polygonal femur models. Use of B-
this model, we use reverse modeling. spline curve enables creation of geometrical femur models
If the personal model isn't necessary, the next activity is with high geometrical and topological/morphological
reverse modeling of a bone. Reverse modeling of a human precision. B-splines curves were defined in CAD software
bone’s geometry using CAD software means generating (CATIA) and they are absolutely applicable for the use in
digital 3D model of bone’s geometry from radiology the generic shape design of free form surfaces (human
image (CT, MRI). Importing the raw data into the CAD bones can be described as such), within this module. This
system results in generating of one or more clouds of application is defined in software package Mathlab [9].
points (discrete points of the tissue, which are scanned by After creating the cloud of points, which is the output
some of radiology methods). In the next phases of from the previous activity, procedure is the same as in
remodeling, the geometrical features of higher order reverse modeling.
(curves, surfaces and solids) are being designed. The After getting CAD model of a bone (whether it was
reverse modeling procedure consists of several steps, parametrical or realistic) comes designing and
which are presented by activities Creation of polygonal manufacturing osteofixation material.
model and Creating of CAD model in the model of If it was decided in the beginning that the scaffold
process. The reverse modeling procedure for the bone
would be used, the process of designing and
geometry is consisted of following steps [8]:
manufacturing the scaffold begins. This process has its
1. Importing and editing (filtering, aligning, etc.) of sub activities, that are not presented individually on a
clouds of points (activity Acquiring of clouds of diagram, but as two larger activities: designing and
points), manufacturing the scaffold.
2. Tessellation of polygonal model (mesh) by creating a When designing the scaffold its geometry is defined
huge number of small triangular planar surfaces first. This is done based on previously defined orthopedic
between the points in the cloud, as well as editing of treatment, in accordance with anatomy of the missing part
polygonal model (activity Creation of polygonal of a bone, load and other parameters. The next step is
model). defining material from the aspect of biocompatibility and
3. Identification of RGEs (points, directions, planes and biodegradability. When designing, manufacturing
views) (activity Creation of polygonal model), characteristics should be taken into account, because
4. Creating and editing the curves on polygonal model there's a possibility that the wanted design cannot be
of the bone (activity Creation of polygonal model) made. In the end, a method of implantation and fixation of
the scaffold also affect the design. The output from this
process is CAD model of the scaffold.
In the next activity solid model is created based on
After modeling the scaffold, the next activity is its
polygonal modal (activity Creation of solid model).
optimization, using different CAE methods.
If the surgeons decide that it's necessary to make the
Manufacturing the scaffold comes after its
missing part of a bone, a model of a specific bone is
created based on a parametric model. In that case, there is optimization. It can be done in the company, or by sub-
contractor, if the basic company does not have the
a software solution, which partially automates the process.
conditions for manufacturing (such a case is represented
Developed software system prototype enables creation of
on the model of the process).
polygonal human femur model based on input data from
one or more X-ray images of a certain patient. The system If it was concluded in the beginning that the large part
is based on application of the pre-created generic of the bone is missing, which must be replaced by an
parametrical point model, which is the most important implant, then after creating CAD model of a whole bone,
component of the software system. Exchanging the values the model of a missing part should be extracted.
of parameters, acquired from X-ray images, CT or MRI After processing parametrically obtained model comes
patient's scans, generic model is transformed into a subject whether manufacturing a mold, or manufacturing the
specific bone model. Parameters can be read from medical missing part of a bone, depending on defined orthopedic
images manually (by measuring from X-ray images), or treatment.
through adequate software (e.g. Mimics, Vitrea - Another possibility is to manufacture customized
DICOM). fixator, after getting a solid model of the bone. In that
The next activity in process is parameter measuring. case, designing the fixator comes next.
That can be done by a surgeon (which is recommended), For the treatment of bone fractures orthopedic surgeons
or an engineer. use methods of external and internal fixation. External
Whether the entry (measuring) of the parameters was fixation involves fixation of the bone by the use of
done by a surgeon or an engineer, the next activity is the elements that are positioned at some distance from the site
verification of those parameters, done by a surgeon. The of the injury, that is, outside of the human body. Internal
verification is done by comparing with already known and fixation implies surgical implementation of the implant in
recommended values. the human body for the purpose of healing the bone.
After the verification of the parameters comes the For both internal and external fixation standard fixation
creating of a cloud of points using a developed software. elements can be used. All mentioned internal (and
That's an automatic activity, which means that BPMS external) fixators are made in the specific dimension range
calls the corresponding application. Parametrical points (sizes), in order to enable the application of fixators to

Page 13 of 478
ICIST 2014 - Vol. 1 Regular papers

bones belonging to different patients. Application of Manufacturing of all the osteofixation material
predefined internal fixators to the specific patient may be elements is followed by their sterilization and usage
problematic because of the difference in the size and during an operation. That, of course, happens in the clinic.
shape of the particular bone and the fixator. One of the branches which is not mentioned before is
One of the solutions for this problem is the application planning the surgery. As part of the project VIHOS, we
of so-called customized fixators. The geometry and are developing an application for planning and simulating
topology of those fixators are adjusted to the anatomy and the surgery. In that application polygonal bone and fixator
morphology of the bone belonging to the specific patient. models are used. The application is based on the use of
Application of customized fixators has a positive effect on WebGL and HTML5 technologies and it supports X3D
patients, but on the other hand requires more time for ISO standard. X3D is a standard based on the VRML
preoperative planning and fixator manufacturing. (Virtual Reality Modeling Language) that allows creation
Therefore, these fixators are used in cases where the of 3D content which can be displayed in any compatible
application of predefined fixators can led to complications browser (IE, Mozzila, Chrome are supported). The
in the surgical interventions or on the recovery of patients. application allows the transformation of basic models
For the creation of the geometrical models of customized (rotation, translation, scaling), and pairing bone and
fixators new design method has been developed and fixator models in the appropriate assembly. Practitioners
presented in paper [10]. Geometrical models of internal have ability to choose adequate fixators models from the
fixator by Mitkovic for tibia bone created by this method models database and to pair it with the specific bone.
could be applied for the preparation of preliminary models Currently, models of fixators for the femur and tibia bones
for FEA, for the preoperative planning, for the production are implemented [9].
of customized internal fixators, etc.
Designing of the fixator is followed by its IV. CONCLUSION
manufacturing, which can be done in a company or sub- Information systems that support the activities which
contractor. In the example it's represented as done in a happen at medical facilities and companies that
company. manufacture the osteofixation material are very rare. This
After manufacturing of the fixator comes optimization paper presents the proposal of such information system,
of its implementation. On one hand, location of fixator on using fictional company as an example. BPMS technology
the bone is important because of mechanical factors (such and MD system developed at the Faculty of Mechanical
as fixator stability and durability) and on the other because Engineering are used as integration tools. The system
of biological factors (such as preservation of underlying integrates large number of activities that exist in the
bone material and blood supply in surrounding tissue). process of designing, manufacturing and implementing of
Experienced orthopedists use a set of empirical rules osteofixation material. Some of the activities in this
which help them set the fixator in a position that enables process are automated, some work with help from
the healing of the fracture without significant damage to computer (there are procedures developed) and some are
the tissue with which it comes into contact. At the same still manual, but they are monitored via this system.
time, the fixator has to be sufficiently strong and durable In the further development of this system we plan to
as well as correctly configured and positioned, in order to integrate activities related to the use of artificial
stay functional throughout the whole fracture healing intelligence methods. System MD can be programmed to
process. react on exceptions, too, and we also plan to use Active
Reverse engineering and CAE methods may be used to Semantic Model in cases when computer should make its
find an optimal position of the fixator on the bone. own conclusions.
Parametric CAD models of the bone (based on medical
imaging) and fixator are composed into assembly and ACKNOWLEDGMENT
their optimal position, with respect to fixator stability and The paper is part of the project III41017 - Virtual
durability as well as to stresses in the bone, is sought. Human Osteoarticular System and its Application in
During this process, the constraints that prevent fixator Preclinical and Clinical Practice, sponsored by Republic
components to damage the bone or block blood supply of Serbia for the period of 2011-2014.
must be obeyed.
Thus, the input parameters of the process are geometric REFERENCES
parameters that define the shape of the bone and the
[1] http://vihos.masfak.ni.ac.rs/site/
fixator, parameters used in material models representing
[2] Health Information Systems, Alfred Winter, Reinhold Haux, Elske
bone and fixator material and parameters that define the Ammenwerth, Birgit Brigl, Nils Hellrung, Franziska Jahn,
loads and supports on bone-fixator assembly that originate Springer, 2011
from muscles, joints or tendons. The parameters acting as [3] Wu Deng, Huimin Zhao, Li Zou, Yuanyuan Li, Zhengguang Li:
constraints are allowed distances from bone surfaces or Research on Application Information System Integration Platform
anatomical landmarks, which ensure the prevention of any in Medicine Manufacturing Enterprise, Springer 2011,
damage to the bone or eventual blocking of blood supply. [4] Jan de Vries, Robbert Huijsman, Supply chain management in
Output parameters are optimal dimensions of fixator health services: an overview, Supply Chain Management: An
components (if their allowed to be changed), number and International Journal 16/3 (2011) 159–165
location of fixator screws or clamps. At the moment, this [5] Marlon Dumas, Marcello La Rosa, Jan Mendling, Hajo A.
process cannot be performed automatically and it requires Reijers, Fundamentals of Business Process Management, book,
Springer 2012.
that a case study is performed for each individual case.
[6] Mišić D., Domazet D., Trajanović M, Manić M., Zdravković M.,
Concept of the exception handling system for manufacturing

Page 14 of 478
ICIST 2014 - Vol. 1 Regular papers

business processes, Computer Science and Information Systems [12] Mišić D., Stojkovic M., Domazet D., Trajanović M, Manić M.,
(ComSIS), 2010, 7(3):489-509 Trifunovic M., Exception detection in business process
[7] JESS, the Rule Engine for the JavaTM Platform, Sandia National management systems, JSIR-Journal of Scientific Industrial
Laboratories, http://herzberg.ca.sandia.gov/jess/. Research, Vol.69(03), March 2010, pp 1038-1042
[8] Stojkovic, M., Trajanovic, M., Vitkovic, N., Milovanovic, J., [13] Milos Stojkovic, Jelena Milovanovic, Nikola Vitkovic, Miroslav
Аrsic, S., Mitkovic, M. (2009). Referential Geometrical Entities Trajanovic, Stojanka Arsic, Milorad Mitkovic, Analysis of
for Reverse Modeling of Geometry of Femur, Computational femoral trochanters morphology based on geometrical model,
Vision and Medical Image Processing – VipIMАGE, Porto, JSIR-Journal of Scientific Industrial Research, Vol.71(03), March
Portugal, CRC Press/Balkema, Taylor & Francis Group. 189-195 2012, pp 210-216
[9] Vitković, N., Milovanović, J., Korunović, N., Trajanović, M., [14] Vitković,N., J. Milovanović, M.Trajanović, M.Stojković,
Stojković, M., Mišić, D., Arsić, S.: Software system for creation N.Korunović, M. Manić, Different Approaches for the Creation of
of human femur customized polygonal models. Computer Science Femur Anatomical Axis and Femur Shaft Geometrical Models
and Information Systems, Vol. 10, No. 3, 1473-1497, 2013 [15] Milan Zdravković, Miroslav Trajanović, Miloš Stojković, Dragan
[10] D. Stevanovic, N. Vitkovic, M. Veselinovic, M. Trajanovic, M. Mišić, Nikola Vitković, A case of using the Semantic
Manic, M. Mitkovic, Parametrization of Internal Fixator by Interoperability Framework for custom orthopedic implants
Mitkovic, The 7th International Working Conference ''Total manufacturing, Annual Reviews in Control, Volume: 36 Issue:
Quality Management - Advanced and Intelligent Approaches'', 2 Pages: 318-326
Belgrade, Serbia, Proceedings, pp 541-544, 3 - 7 June 2013 [16] Vidosav Majstorovic, Miroslav Trajanovic, Nikola Vitkovic,
[11] Trajanović, M., Korunović, N., Milovanović, J., Vitković, N., & Milos Stojkovic, Reverse engineering of human bones by using
Mitković, M. (2010). Primena računarskih modela method of anatomical features, CIRP Annals - Manufacturing
samodinamizirajućeg unutrašnjeg fiksatora po Mitkoviću u Technology 62 (2013) 167–170
saniranju trauma femura. Facta universitatis - series: Mechanical
Engineering, 8(1), 27-38.

Page 15 of 478
ICIST 2014 - Vol. 1 Regular papers

A Framework to Enhance Supplier Search in


Dynamic Manufacturing Networks
Miguel Ferro-Beca*, Joao Sarraipa*, Carlos Agostinho*, Fernando Gigante**, Maria Jose-Nunez**, Ricardo
Jardim-Goncalves*
* Centre of Technology and Systems, CTS, UNINOVA, 2829-516 Caparica, Portugal. {mfb, jfss, ca, rg}@uninova.pt
** AIDIMA- Institute of Technology for Furniture and Related Industry, Benjamín Franklin, 13. Parque Tecnológico -
46980 Paterna, Valencia, Spain. {mjnunez, fgigante}@aidima.es

Abstract — Supplier search can be daunting for supplier directories (i.e.: Alibaba.com, Thomasnet.com),
manufacturers, given the vast number of suppliers available contacting potential suppliers directly, either remotely or
and the work required to filter through all the information. face-to-face, attending trade show events, seeking for
However, even if this process allows the selection of a few information in sector-related publications, etc. All of the
potential suppliers of interest, dealing with potential above activities can be quite time-consuming, as there can
suppliers for the first time may bring uneasiness, as there is be hundreds of potential suppliers, and researching all the
no guarantee that they will comply with the requested available information can soon become unmanageable.
products or services. On the other hand, suppliers which Additionally, most supplier information available still
want to be seen and differentiate themselves in order to win does not provide full assurance with regards to the
more customers also face similar challenges. The vast supplier’s reliability and capability to deliver the promised
amount of competitors can make it extremely difficult for products or services on time. At most, suppliers can try to
any given supplier to stand out from amongst its reassure potential customers through quality certifications
competitors. Also, new entrants into the market with little which are insufficient to deal with the aforementioned
or no reputation may find it difficult to attract their first issues.
customers. The IMAGINE project is developing a
methodology and a software platform for the management On the other side of the supplier search process,
of the dynamic manufacturing networks which provides suppliers face the challenge of having to distinguish
innovative Supplier Search capabilities. The present paper themselves from their competitors, in order to attract
provides an overview of the developments behind the potential customers. The process of building and
IMAGINE platform and proposes a framework in order to maintaining business reputation takes time and well-
help the challenges that suppliers and manufacturers face. performing suppliers, if they are still unknown, do not
have many tools to assert themselves in the marketplace.
There is therefore a need for mechanisms which can
I. INTRODUCTION provide further details and insights into the capabilities
The process of seeking new suppliers can take place in and reliability of potential suppliers, thus providing further
a variety of scenarios, from individual companies seeking assurance to manufacturers seeking new suppliers and
a specific supplier for parts and/or services, to allowing new suppliers with the means to provide
manufacturers seeking to establish manufacturing assurance to potential customers on their good
networks, which can aggregate several different suppliers. performance. This paper aims to propose a framework
In either case, companies usually prefer to do business which can assist in solving the aforementioned issues. It
with suppliers with whom they may already have worked first provides an overview of Dynamic Manufacturing
together previously, as the cost of choosing the wrong Networks and the role of the IMAGINE project. Next, it
supplier can be very high. As an example, the cost of introduces concepts related to Web 2.0, Web 3.0, such as
dealing with supplier problems costs small-business in the Linked Data and the Internet of Things. Once this
United Kingdom around £10 billion every year [1]. In conceptual foundation has been laid, it then proceeds to
many instances “actual supplier relationships represent present the suggested framework, followed by its process
one of the most important assets the company can make flowchart and architecture. Lastly, it provides a
use of. [2]” description on how the proposed framework will be
validated within the context of the Furniture Living Lab of
However, there are instances when companies must the IMAGINE project.
take a chance during their supplier choice process. A new
part or a production process may only be available from II. DYNAMIC MANUFACTURING NETWORKS
new suppliers. In other instances, a trusted supplier may
go out of business or may face an unforeseen event (e.g. The globalization trends of the last decades have
natural catastrophe, plant fire, transportation delays, etc. changed the boundaries of modern manufacturing
[3], [4]) requiring a company/manufacturer to seek a new enterprises, as manufacturers moved many of their
one to replace it. According to a worldwide survey of operations across various suppliers, thus forming
more than 500 companies, 75% have experience supply manufacturing networks. Such networks, also known as
chain disruption problems, causing annual losses of €1 value networks, can be described “as networks of
million for 15% of respondents and 9% lost over €1 facilities, possibly owned by different organizations,
million in a single supply disruption event [5]. where time, place or shape utility is added to a good in
The search process for new suppliers is usually various stages such that the value for the ultimate
customer is increased. [6]” Manufacturers and
performed by searching for potential suppliers in online

Page 16 of 478
ICIST 2014 - Vol. 1 Regular papers

component allows the filtering of potential suppliers based


on several criteria, such as, production capacity,
production skills, number of employees, quality
certifications, production cost, estimated delivery time,
amongst other factors. It also assists in finding a set of
potential suppliers following a three-step process: (1) first,
it begins by creating a Long List of potential suppliers
based on criteria such as number of employees, turnover,
location, company category and production category; (2)
after this initial filtering process is performed, an
additional filtering process takes place, as potential
Figure 1. Proposed Framework for Semantic Company Analysis suppliers are sorted based on the dynamic criteria such as
capacity rate, duration, fixed cost and variable cost, thus
organizations which participate in manufacturing creating a Short List of potential suppliers for the
networks choose to focus on their core competencies and manufacturing network.; (3) finally, the platform provides
mission critical operations, while outsourcing other less simulation capabilities, through the “DMN Evaluation”
relevant operations to suppliers [7]. component, which simulates possible manufacturing
Lately, the manufacturing network concept has evolved network configurations based on the suppliers choice
into the Virtual Manufacturing Network (VMN), which is defined in the Short List.
a manufacturing network usually built with the use of ICT The envisioned capabilities provided by the “Partner
for bringing together different suppliers and alliance Search” component are a major leap forward in
partners creating in such a way a virtual network that is facilitating the supplier search process, as the component
able to operate as a solely owned supply network. A not only assists in filtering irrelevant suppliers, but it also
special case of a VMN is the Dynamic Manufacturing provides search and simulation capabilities which were
Network (DMN), which is “a coalition, either permanent previously unavailable which give added assurance in the
or temporal, comprising production systems of final selection of potential suppliers.
geographically dispersed Small and Medium Enterprises B. Web 2.0, Web 3.0 and the Internet of Things
(SMEs) and/or Original Equipment Manufacturers
The World Wide Web has been in constant evolution
(OEMs) that collaborate in a shared value-chain to
since its inception. The Web 1.0 of static web pages
conduct joint manufacturing. [7]” The DMN brings plenty evolved into the Web 2.0, where users provided and
of advantages, when compared with the more traditional shared their own content through social media, blogs,
concepts. Based on a view of information from various
podcasts, etc., “through an “architecture of participation”
manufacturing sources and systems, DMNs enable a
and going beyond the page metaphor of Web 1.0 to
service-enhanced product and production lifecycle, deliver rich user experiences. [10]” In addition to user
providing companies the capability to self adapt the participation, the Web 2.0 has evolved into richer internet
production to the most appropriate suppliers. In addition,
applications which can deliver desktop functionalities via
the concept envisages that if during the production phase
a web browser (i.e. Google docs, etc.), among other
there is a supplier or a manufacturer failure the network is
functionalities.
flexible enough to enable dynamic replacements of certain
partners and therefore maintain the production scheduling The Web 3.0 concept is the evolution of the Web 2.0.
[8]. Although visions of what the Web 3.0 may vary, one of
the existing visions revolves around the Semantic Web
A. IMAGINE Partner Search for Dynamic
and the generation of information by computers, in lieu of
Manufacturing Networks human-generated information [11]. The Semantic Web is
IMAGINE is a EU-funded R&D project which seeks to thus “is not a separate Web but an extension of the current
address the need of modern manufacturing enterprises for one, in which information is given well-defined meaning,
a novel end-to-end management of Dynamic better enabling computers and people to work in
Manufacturing Networks. With this objective in mind, the cooperation. [12]”
project consortium has been working on the development While the “Semantic Web is a vision of creating a Web
of “a multi-party collaboration platform for innovative, of Data, Linked Data is a concrete means to achieve (a
responsive manufacturing that encompasses globally
lightweight version of) that vision. [13]” To put it simply,
distributed partners, suppliers & production facilities
whereas the Semantic Web represents a conceptual vision,
(SMEs and/or OEMs) that jointly conduct multi-party
Linked Data aims to be its practical and working
manufacturing” , as well as, on a “a novel comprehensive implementation. The way in which Linked Data
methodology for the management of dynamic
implements the vision of the Semantic Web is based on
manufacturing networks that provides consolidated and
the following ‘Linked Data principles’ [14]:1. Use URIs
coordinated view of information from various
as names for things; 2. Use HTTP URIs so that people can
manufacturing sources and systems. [9]” look up those names: 3. When someone looks up a URI,
One of the essential features of the proposed platform is provide useful information, using the standards such as
to serve as a hub for suppliers, where they can register RDF* and SPARQL; 4. Include links to other URIs so
themselves and provide relevant information about their that they can discover more things.
skills and production capabilities. Based on this
By utilizing the above four principles, the vision of
information, the IMAGINE platform provides a “Partner building a Web of Data can begin its fulfillment, as thus
Search” component, implemented through a collaboration
mirroring for data, what website links do for the web. As
portal, which allows a manufacturer to seek potential
each website links to other websites, a Linked Data source
suppliers for a manufacturing network. The search

Page 17 of 478
ICIST 2014 - Vol. 1 Regular papers

Figure 2. Proposed Semantic Company Analysis Flowchart

is linked to another Linked Data source, thus allowing for Therefore, an improvement to the “Partner Search”
the discovery and aggregation of various data sources. component of the IMAGINE platform, as well as,
In addition to the principles of Linked Data to facilitate applicable supplier search tools and sites should be the
communication between machines, the Internet of Things inclusion of relevant production data, obtained from
(IoT) is another concept which also revolves around the production monitoring tools, which can help
issue of Machine-to-Machine communication (M2M), as manufacturers and other interested parties in their supplier
“it brings us into a new era in which everything from tires search process. The inclusion of such data would give an
to toothbrushes can be identified and connected and things additional guarantee regarding potential suppliers that
can exchange information and make decisions by indeed their production capabilities are in line with the
themselves. The communication forms will be human- production requirements. Additionally, the inclusion of
human, human-thing, thing-thing. Things will be the main historical data could add further insights into the reliability
traffic makers. [15]” of a potential supplier.
The aforementioned concepts of Semantic Web, Linked Another source of information could be in the form of
Data and Internet of Things can be relevant towards the user feedback, similar to what already takes place in many
improvement of Supplier Search mechanisms. The way consumer websites and auction websites. After a the
these concepts and their associated technologies may be successful deployment and execution of a manufacturing
helpful is discussed in further detail in section III. network, manufacturers involved in sub-contracting work
to suppliers could use the search platform to leave
feedback on their experience regarding the suppliers
C. Suggested Improvements to the IMAGINE Partner sourced in the manufacturing network. Although this sort
Search of mechanism is not without its drawbacks, if properly
The aforementioned capabilities provided by the supervised to ensure that only legitimate and honest
IMAGINE platform assist in solving some of the current feedback is provided, then it could help other potential
challenges in supplier search. The search capabilities of users in their future decision-making process regarding
the IMAGINE platform greatly reduce the time required their supplier choices.
to find a set of suitable suppliers according to the needs
and criteria of the manufacturer. However, some III. FRAMEWORK FOR SEMANTIC COMPANY ANALYSIS
challenges still remain namely those related to the
A. Framework Overview
trustworthiness and reliability of potential suppliers, as
most supplier search sites do not provide any sort of The aforementioned suggestions on how to further
information which can help assess whether any particular improve existing supplier search functions in the
supplier will deliver its services or products on time and as IMAGINE platform are based on utilizing information
specified. There is therefore the need for additional which is available within the manufacturing network,
mechanisms which can fulfill this gap in supplier assuming that potential suppliers are willing to share
evaluation, by providing additional information and partial production data and that manufacturing network
insights which can assist manufacturers in choosing the partners are willing to provide proper feedback on their
best possible suppliers. supplier choices.
The monitoring of production activities is sensitive However, there are other sources of potentially helpful
information which most suppliers are only willing to information which may also assist in the supplier search
provide after they are part of a manufacturing network. If process. The advent of Social Media and IoT brought with
production problems with any particular supplier arise, the it a new whole variety of data sources, which can possibly
monitoring components of the IMAGINE platform enable a more in-depth and accurate context-analysis for
quickly warn the manufacturer and suggest possible any potential supplier. Some of the possibilities include,
alternatives to the issue at hand. However, well- but are not limited to: 1.Analysis of Social Media to track
performing companies who produce and deliver their online reputation of a supplier and its competitors; 2.
products on time and according to specified requirements Analysis of Search analytics regarding a potential
may gain from publicly sharing some of their production supplier; 3. Access to wide variety of open data sources,
data. In particular, new suppliers in a particular market statistical data, etc.; 4. Ability to use web crawlers to
may wish to have greater transparency about their extract more specific information and data from other
production capabilities in order to provide assurance to websites; 5. Access to real-time data from sensor
potential customers. networks, IoT data, etc.

Page 18 of 478
ICIST 2014 - Vol. 1 Regular papers

is structured into a processable form, it is then ready to be


analyzed. In this phase, data mining algorithms may be
employed, however, whenever applicable, Big Data
methodologies may also be applied (i.e.: large data
streams from sensor networks, etc.). It is expected that
once the data has been analyzed, that company analysts
will be able to perform a proper company evaluation
which suits their needs.
Figure 3. Proposed Architecture for the Company Analysis Plug-in B. Semantic Company Analysis Flowchart
The process of performing a Semantic Company
Analysis, based on the proposed framework, is illustrated
While the new data sources provided by the new digital
in Figure 2.
age are relevant by themselves, if combined with other
types of data, they can provide further insights which It begins by searching the Blueprint Repository
otherwise would not be available, if only traditional data provided by the IMAGINE platform for existing
sources were to be utilized. Hence, there is a need for an information regarding a potential supplier [8], [20]. The
appropriate a framework and a methodology on how to IMAGINE platform and associate methodology provide
integrate these additional sources of data and information Partner Blueprints as a mechanism whereby companies
in the supplier search process. A possible framework provide information about themselves, such as contact
which can be utilized to perform analysis on companies is information, number of employees, annual turnover, skills
shown in Figure 1. and capabilities. Therefore, it makes sense to first retrieve
the information that is already available in the Partner
Firstly, the framework shall take into account a wide Blueprints before seeking for additional information in
variety of data sources, including but not limited to,
other sources. If already available, historic information
Linked Data, Sensor Data, XBRL data, as well as, the
regarding the partner’s performance in previous networks,
more common data formats which can be found on the
should also be made available.
web (text, RSS feeds, spreadsheets, documents, etc.). As
illustrated in Figure 1 above, Linked Data principles can Once the information from the Partner Blueprints is
be applied during the data acquisition phase in order to retrieved, it is then possible to confirm basic information
help solve possible Data Interoperability issues which may about the company under analysis (e.g. name, location,
arise [16]. Linked Data will also assist in the data etc.) and then proceed to search the WWW for additional
retrieval process, by allowing the discovery of data which information,
could be of interest towards the company evaluation The first search for additional information will be
process, such as, information about alternative suppliers, performed by approaching sources of data and information
possible competitors and data related to the context in which are based on Linked Data principles. The usage of
which the potential suppliers operate. Additionally, Linked Data will assist in uncovering additional facts
Linked Data can be explored in offering innovative data and/or sources of data.
visualizations, such as browsing through potential The results from the Linked Data search process will
suppliers in a more interactive way. With regards to return additional data and information which must then be
Sensor data, the framework would like to propose and confirmed by the user, to ensure that indeed the results are
encourage enterprises to publish relevant Iot-related data relevant for the analysis at hand. The results which are
related to their manufacturing and logistics supply chains. deemed relevant are then stored in a different repository,
The push towards greater transparency is beneficial for the “Extended Partner Blueprint Repository”, as an
both data consumers and providers: extension to the IMAGINE Blueprint Repository.
• IoT Data providers – Reliable and dependable Based on the information retrieved from the Partner
manufacturers and suppliers can gain an advantage of Blueprint in IMAGINE, as well as, the information
their competitors by publishing IoT-related data which can retrieved from Linked Data sources, the framework should
demonstrate their commitment and production efficiency, then suggest additional data sources to continue the data
as the data will provide evidence to potential customers acquisition step. Additional sources of data could include,
and thus assisting in generating future business for example, statistical data regarding the region where the
• IoT Data consumers – The availability of IoT-related company under analysis operates, social media data,
data regarding a potential supplier can provide data Google Analytics data, weather data, etc. The framework
consumers with greater insight and greater confidence should have a separate repository for data sources and
whenever choosing a potential supplier. their possible applications, as certain data sources may
Once data has been gathered, in the “Data Preparation” only have relevance according to certain criteria regarding
phase, negotiation mechanisms can assist in the the company being analyzed (e.g.: application of UK
development of reference ontologies and in the structuring statistical data for UK-based companies, industry-related
of the data collected. Negotiation mechanisms can assist sites according to the sector in which a company operates,
in performing semantic negotiation of the various etc.).
concepts described by the data acquired (e.g. [17]). The Once additional information is retrieved, there is the
variety of data sources can lead to semantic mismatches need for the user to confirm which information is relevant,
which must be solved in order to structure the data as previously done with Linked Data search results. As the
acquired, while ontologies can assist in the storage of relevant data may come in a wide variety of formats, it is
relevant relationships between data and data sources, as foreseen that additional databases may necessary to story
well as, in the permanent storage of relevant static data relevant search results according to their type.
[18], [19]. After the data acquired has been organized and

Page 19 of 478
ICIST 2014 - Vol. 1 Regular papers

Figure 4. Proposed Integration of the Semantic Company Analysis component in the Furniture Living Lab Adapter [21]

After the previous data acquisition data steps, it is then order to service one special order. This was developed to
possible to perform data reasoning using appropriate data demonstrate how some production problems can be
mining tools and algorithms, and then present the most identified and solved. It is also based on open standards in
relevant data and information, as well as, order to ensure a given level of interoperability, flexibility
recommendations regarding the company under analysis. and security. The order meets some conditions to be
C. Proposed Framework Architecture considered special and to be managed through a DMN in
In order to implement the proposed framework, a the IMAGINE platform: custom configuration, high
software architecture is suggested in Figure 3. volume and tight delivery terms. The objective is the use
The proposed Company Analysis Adapter is meant to of the platform in conjunction with the ERP legacy system
be modular in nature, so that it may be integrated and in order to deal with the special order.
suited to different applications, as mentioned in the As mentioned previously, the IMAGINE platform
previous section. provides capabilities to search for new suppliers and
The Adapter is constituted by the following major simulate the current production processes inside the
components: 1. Data Acquisition – This component is to DMN. This simulation allows a better selection of
be implemented through a Database Engine such as production partners in order to build a more efficient
Openlink Virtuoso or similar tool, which allows the production network. The platform also provides a system
aggregation of data from various data sources, including to communicate and exchange data inside the platform
Linked Data, web data, among others; 2. Data preparation with punctual human interaction. The objective is to be
– The Data Preparation component will be comprised of a able to connect the platform to any ERP system by using
negotiation rules engine, assisted by an ontology, which Adapters. This allows companies to react to production
will be responsible for evaluation of the data acquired and issues faster and more effectively.
whether it should be preserved for analysis or discarded; On the other hand, the proposed LL introduces a new
3. Data Reasoning – The Data Reasoning component of approach that it is not very extended in the furniture
the adapter is to be implemented by adapting an open- sector: the collaborative manufacturing approach.
source data mining tool such as R or RapidMiner. Although the proposed Furniture LL is defined initially
Appropriate tools for data mining of large sets of data, if for companies which are currently managing a small
necessary, will employ appropriate open-source “Big production network in some way, the availability of the
Data” tools; 4.Company Information Databases – The IMAGINE platform and its functionality fosters the
Adapter will require the use of additional data adoption of this production approach by companies which
repositories, including graph-based and relational are not following this at this moment. This new approach
databases, according to the data to be stored and analyzed; is also being in more demand and it facilitates new
5. Portlets – The User Interface of the component will be business opportunity for many SMEs.
implemented via the use of Portlets, which can then later
be integrated into a platform portal. B. Company Analysis Integration in the context of the
Furniture Living Lab
IV. FURNITURE LIVING LAB IN IMAGINE The Furniture Living Lab has developed an adapter
A. Framework Validation within the Context of the which facilitates the registration of Furniture
Furniture Living Lab Manufacturers in the IMAGINE platform. The adapter,
The Furniture Living Lab has been defined to through the use of Web Services, provides mechanisms
demonstrate the interactions between the IMAGINE for easy upload of Partner and Product information.
Platform and Legacy Systems through Web Services and Given the modular nature of the Company Analysis
Adapters developed ad-hoc. The objective is to pilot an adapter, the Company Analysis adapter and its supporting
end-to-end management of a DMN created on the basis of databases may be integrated within the existing Furniture
an arising business opportunity. This LL is focused on Living Lab adapter, as shown in Figure 4, thus providing
furniture manufacturing processes covering the cycle time additional functionalities to the adapter. The use of
from the detection of new business opportunities to the Portlets as an interface for the Company Analysis adapter
delivery of the produced goods. does not disrupt in any way the other functionalities of the
The definition of this LL is mainly based on the adapter, serving thus as a complement to them.
furniture scenario proposed for the IMAGINE project, In order to ease the communication between the
which proposes a production environment composed in adapter and the legacy systems, a web services application

Page 20 of 478
ICIST 2014 - Vol. 1 Regular papers

has been also developed. This application offers a set of managing-unpredictable-supply-chain-disruptions/ar/1 [Accessed:
Jan 10, 2014]
services according to the IMAGINE blueprints data model
[4] BBC News. “Toyota and Honda delay restart amid part supply
to retrieve specific information from the ERP of the issues” BBC News [Online] Available:
companies. Currently, the application implements a set of http://www.bbc.co.uk/news/business-12802495 [Accessed: Jan 10,
web services to retrieve values from GdP (a specific ERP 2014]
for the furniture industry supported by AIDIMA) and [5] G. Degun (2013, Nov 7). “Three quarters of businesses suffer
funStep (ISO 10303-236) [22]. To ease the integration, the supply chain disruption” Supply Management [Online] Available:
http://www.supplymanagement.com/news/2013/three-quarters-of-
architecture of the application offers an additional empty businesses-suffer-supply-chain-disruption [Accessed: Jan 11,
class which can be used to implement the specific queries 2014]
to retrieve values from any other ERP database. This way, [6] M. Rudberg, J. Olhager, “Manufacturing networks and supply
the ERP system and the updates in the implementation of chains: an operations strategy perspective”, Omega, Volume 31,
Issue 1, February 2003, Pages 29-39, ISSN 0305-0483,
the data retrieval are transparent to the Adapters and the http://dx.doi.org/10.1016/S0305-0483(02)00063-4.
IMAGINE platform. [7] IMAGINE Project Consortium – “Introduction to Virtual
In the particular context of the Furniture Living Lab’s Manufacturing Management & IMAGINE platform” Training
needs, the proposed framework could provide additional Course
features, such as the ability to search and aggregate [8] Ferreira, J., Ferro de Beça, M., Agostinho, C., Nunez, M. J., &
publicly available information on a given company and Jardim-Goncalves, R. (2013). Standard Blueprints for
the ability to view, upload and edit feedback information Interoperability in Factories of the Future (FoF). In Proceedings of
7th IFAC Conference on Manufacturing Modelling, Management,
on potential suppliers. Also, potential suppliers could be and Control, 2013 (pp. 1322–1327). St. Petersburg, Russia.
given the possibility of sharing certain production doi:10.3182/20130619-3-RU-3018.00427
monitoring data as proof of their reliability and production [9] IMAGINE Project Consortium – Document of Work
capabilities, allowing the framework to provide [10] T. O’Reilly, “What is Web 2.0: Design Patterns and Business
recommendations based on the information gathered Models for the Next Generation of Software”, Communications &
Strategies, No. 1, p. 17, First Quarter 2007,
V. CONCLUSIONS AND FUTURE WORK http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1008839
[11] C. Wolfram. “Communicating with apps in web 3.0”. IT PRO, 17
The proposed framework and the associated software Mar 2010
architecture aim to provide an innovative approach [12] T. Berners-Lee, J. Hendler and O. Lassila, “The Semantic Web”,
towards solving some of the major challenges that Scientific American, May 2001
manufacturers face when searching for potential suppliers. [13] V. Eisenberg (2011, Oct 29), “On the difference between Linked
The integration of relevant manufacturing data into Data and Semantic Web”, [Blog entry]. Vadim on Software and
supplier’s profiles, as well as, the provision of feedback Semantic Web. Available:
mechanisms and the integration of additional data sources http://vadimeisenberg.blogspot.pt/2011/10/on-difference-between-
can provide additional insights into a supplier’s linked-data-and.html [Accessed: Oct 10, 2013]
capabilities and reputation. [14] Bizer, Christian, Heath, Tom and Berners-Lee, Tim (2009) Linked
Data - the story so far. International Journal on Semantic Web and
Future work, as part of the first author’s PhD thesis Information Systems, 5, (3), 1-22.
[CA8], will include research into most appropriate (doi:10.4018/jswis.2009081901). [Accessed: Oct 12, 2013]
technologies for the implementation of the proposed [15] L. Tan and N. Wang, "Future internet: The Internet of Things,"
software architecture, as well as, its deployment. Future Advanced Computer Theory and Engineering (ICACTE), 2010
improvements to the framework may include the addition 3rd International Conference on , vol.5, no., pp.V5-376,V5-380,
20-22 Aug. 2010.
of Machine Learning algorithms to assist in the analysis
and recommendation process. [16] Lampathaki, F., Koussouris, S., Agostinho, C., Jardim-Goncalves,
R., Charalabidis, Y., & Psarras, J. (2012). Infusing scientific
foundations into Enterprise Interoperability. Computers in
ACKNOWLEDGMENT Industry, 63(8), 858–866. doi:10.1016/j.compind.2012.08.004
The authors wish to acknowledge the European [17] A. Cretan, C. Coutinho, B. Bratu, R. Jardim-Goncalves,
“NEGOSEIO: A framework for negotiations toward Sustainable
Commission for their support in funding this work Enterprise Interoperability”, Annual Reviews in Control, Volume
through Projects IMAGINE: Innovative End-to-end 36, Issue 2, December 2012, Pages 291-299, ISSN 1367-5788,
Management of Dynamic Manufacturing Networks (No. [18] Agostinho, C., Sarraipa, J., Goncalves, D., & Jardim-Goncalves,
285132), and FITMAN - Future Internet Technologies for R. (2011). Tuple-Based Semantic and Structural Mapping for a
MANufacturing industries (No. 604674). Sustainable Interoperability. In 2nd Doctoral Conference on
Computing, Electrical and Industrial Systems (DOCEIS’11).
REFERENCES Caparica, Portugal: Springer.
[19] Jardim-Goncalves, R., Sarraipa, J., Agostinho, C., & Panetto, H.
[1] B. Lobel (2013, Nov 27). “Problems with suppliers costing small (2011). Knowledge framework for intelligent manufacturing
businesses”. Smallbusiness.co.uk. [Online]. Available: systems. Journal of Intelligent Manufacturing, 22(5), 725–735.
http://www.smallbusiness.co.uk/news/outlook/2440947/problems-
with-suppliers-costing-small-businesses.thtml [Accessed: Jan 3, [20] IMAGINE, 2013. D3.3.1 – “Detailed Design of IMAGINE
2014] Platform”
[2] L.-E. Gadde and I. Snehota, “Making the Most of Supplier [21] José Ferreira, Fernando Gigante, Pablo Crespi, Joao Sarraipa,
Relationships”, Industrial Marketing Management, Volume 29, Maria José Nunez, Carlos Agostinho, “IMAGINE Dynamic
Issue 4, July 2000, Pages 305-316, ISSN 0019-8501 Manufacturing Networks: The Furniture Case”, I-ESA 2014
(under review)
[3] D. Simchi-Levi, W. Schmidt, and Y. Wei. “From Superstorms to
Factory Fires: Managing Unpredictable Supply-Chain [22] Ferreira, J., Agostinho, C., Sarraipa, J., & Jardim-Goncalves, R.
Disruptions”. Harvard Business Review [Online]. Available: (2013). IMAGINE Blueprints for the Furniture Industry:
http://hbr.org/2014/01/from-superstorms-to-factory-fires- Instantiation with the ISO 10303 - 236. In Proceedings of 19th
ICE Conference. The Hague, the Netherlands

Page 21 of 478
ICIST 2014 - Vol. 1 Regular papers

LIFERAY AND ALFRESCO: A CASE STUDY IN INTEGRATED


ENTERPRISE PORTALS

Milorad Filipović*, Gajo Petrović*, Aleksandar Nikolić*, Vidan Marković**, Branko Milosavljević*
*Faculty of Technical Sciences, Novi Sad, Serbia
**DDOR, Novi Sad, Serbia
{mfili, gajop, anikolic, gladic, mbranko}@uns.ac.rs, [email protected]

Abstract – In this paper we propose a solution for the forms and files which is preferred in some cases. The
development of enterprise portal-like web applications down side of this approach is that often knowledge of
that is based on the Liferay portal and the Alfresco the inner workings of the system is needed. Another
document repository. Combining a robust and feature- con is that customization options are not as generous
rich platform for building the application layer such as with our own developed solution, so the system of
as Liferay with the industry standard document choice needs to be able to support the project
management system such as Alfresco enables requirements.
developers to rapidly provide their customers with a In this paper we present an integration of the Liferay
custom portal web application that suit their needs. [3] portal development platform and the Alfresco [4, 5]
Emphasis of this paper is on extending the combined document management system [6] as flexible and easy
solution by implementing a custom activity list that to learn solutions for enterprise portal development.
shows integrated user activities from both Liferay and
Although both Liferay and Alfresco offer user
Alfresco, a feature missing in the original combined
activities tracking services that display basic user
platform.
actions, those logs are only available as separate
Keywords – Liferay, portal, Alfresco, document interfaces. Since our efforts are aimed at providing
management, CMS, system integration seamless integration of both platforms, an essential
requirement is that user activities could be uniformly
1. INTRODUCTION shown in one list from both Liferay and Alfresco. To
Building an enterprise information system is always a achieve this goal, we developed a custom Liferay
challenging task. Even though great effort has been extension that gathers user activity logs from both
invested in business application standardization and platforms and displays them in chronological order on
automation of the development process [1], each the desired portal page.
project and each customer bring their own custom The paper begins with a brief overview of both
requirements and views of the final product. One platforms with Liferay covered in section 2 alongside
special case of software products that enterprise its extension options discussed in subsection 2.1.
customers often need to support their day-to-day Alfresco and its extension opportunities are covered in
business are internal portals. By its formal definition, sections 3 and 3.1, respectively. The rest of the paper
an enterprise web portal is a web-based interface for is structured as follows: in section 4 we present a
enterprise users which provides access to a broad solution for basic Liferay and Alfresco integration. In
array of information such as corporate databases, section 5 we discuss a way of browsing the Alfresco
messages, document storage, internal applications and document repository trough out of the box Liferay
similar [2]. Portals are web applications that represent portlets and we show some customization options.
the company’s central point for information sharing in Section 6 discusses a problem of combining user
its internal network. activity logs from both platforms and showing it in the
When developing such applications, software custom Liferay portlet, while section 7 provides the
engineers usually choose two approaches: conclusion.
development of a custom portal from scratch or
2. LIFERAY PORTAL
combination of existing solutions such as content
management systems (CMS) tools and document Liferay is the open-source web development tool
management systems (DMS). Each choice has its own developed in the Java programming language.
advantages and disadvantages. In the case of custom Described as a content management system, Liferay is
portals, the developers are in total control of every distributed in two different editions:
fragment and functionality of the system. There is no 1. Liferay Portal Community Edition - A
need to spend time to learn documentation, source version with the latest features and supported
code and tutorials written by someone else. However, by an active community. Distributed under
this approach can be rather time consuming as every the GNU Lesser General Public License.
little feature needs to be implemented and tested. By
2. Liferay Portal Enterprise Edition - A
embracing out of the box solutions such as content
commercial version that includes additional
management systems, programming time is drastically
services, updates and comes with full support.
shortened, and instead the whole system need to Page
be22 of 478
configured to suit our needs via various configuration
ICIST 2014 - Vol. 1 Regular papers

Both editions can be downloaded bundled with a web developed in the JavaScript programming language
server application. Liferay core functionality is its with support for PHP and Ruby portlets. Besides that,
built-in content management system specialized for developers are free to use arbitrary libraries and tools
intranet and extranet portal web development, but in their projects. Finished projects can be
latest installments of the platforms offer additional automatically exported in an archive format using
features beyond the basic content management. These built-in IDE scripts. Deployment of portal extension
features include integrated document repository with includes exporting project as a WAR archive and
advanced document management functionalities, out uploading it to the Liferay's deploy directory. Upon
of the box message boards, wiki system, etc. Basic upload, extensions get hot deployed and are ready to
Liferay web applications can be assembled using be used within seconds.
standard HTML pages that consist of various portlets Besides developing custom extensions, the Liferay
and basic web content without the need for prior portal can be customized by configuring the portal and
programming knowledge. Portlets [8] represent web core portlets properties via the portal control panel or
applications that occupy portions of a web page and configuration files.
provide their own set of functionalities and data. Each
portal page consists mainly of such portlets. Liferay 3. ALFRESCO
portal comes with a number of pre-installed core Alfresco is a free/libre Enterprise management system
portlets and themes while additional ones can be (EMS) [10] written in Java, available in 3 editions:
downloaded via the Liferay marketplace [9].
1. Alfresco Community Edition - free and
2.1. LIFRERAY PORTAL EXTENSION open edition, with scalability and availability
limitations, distributed under LGPL license.
OPTIONS
2. Alfresco Enterprise Edition - commercially
For the advanced extension of the core functionalities, and proprietary licensed edition. Its design is
Liferay portal provides its own development tool geared towards users who require a high
called Liferay IDE. Liferay IDE is customized degree of modularity and scalable
installation of Eclipse Java IDE with pre-installed set performance.
of Liferay plugins which enable development of the
3. Alfresco Cloud Edition - SaaS version of
Liferay platform extensions [7]. Liferay IDE supports
Alfresco.
development of 5 types of Liferay extension projects:
Alfresco can be downloaded bundled with Apache
1. Portlet - Provides basic file structure and
Tomcat or JBoss application servers.
libraries for development of Liferay portlets
from scratch. Developed mainly as a document management system,
Alfresco has grown into a fully grown EMS providing
2. Hook - Hook is the Liferay extension used to
features such as [10]:
catch portal life-cycle events and to override
default actions performed for those events. • Document management - including a built-in
document repository and providing advanced
3. Ext - Ext plug-in present the programmatic
management options.
extensions of the Liferay core functionalities.
4. Layout - By creating a layout project in IDE, • Web content management
developers are provided with the graphical • Repository-level versioning
editor using which custom page layouts can • Repository access via CIFS/SMB/WebDav/
be defined. Layouts specify positions on NFS and CMIS
which portlets can be placed on pages.
• Lucene search
5. Theme - Enables development of custom
portal themes including custom images, color • Desktop integration with main office suits
schemes, CSS rules and JavaScript code. • Clustering support
Upon creation of a Liferay project, the developer is 3.1. ALFRESCO EXTENSION OPTIONS
provided with a basic structure of the chosen
extension project which includes standard As access to all repositories is done using the Share
configuration and resource files organized in a application, the recommended way of extending
corresponding directory structure along with core Alfresco features is by extending Share. A possible
libraries. way of customizing the core Alfresco functionalities is
by modifying its source code. Some examples of this
Even though the Liferay portal and its extensions are approach are shown in [10, 11, 12, 13]. Extending the
mainly written in the Java programming language, Alfresco functionalities this way is done by using the
some feature implementations span multiple Alfresco SDK which contains tools and libraries
programming languages and frameworks. Core needed to start developing custom Alfresco plugins
functionalities are implemented in the Java language, and extensions using Eclipse or NetBeans Java IDE.
while data is displayed using the Freemarker and JSP
template engines. Client-side functionalities Page are23 ofIn478
recent versions of Alfresco, the preferred way of
developing custom extensions is using the Alfresco
ICIST 2014 - Vol. 1 Regular papers

WEB Script API [14]. By using the WEB Script API, mechanism increases portal safety and user experience
developers can access repository features by invoking by incorporating single authentication point for all
web scripts. Web scripts are light-weight web services portal applications. With SSO, users don't have to
used to perform custom actions on the Alfresco enter their credentials on every application access
document repository. Alfresco comes with a which can be highly error prone or store or remember
predefined set of web scripts. Each web script is multiple username/password combinations for each
uniquely accessed by its URL which contains the restricted part of the portal which effects system
script name and any optional parameters passed from security.
a user. Once deployed as the Liferay web application,
Developing a custom webscript is relatively easy. Alfresco makes a selection of its portlets available in
Each script needs to have at least 3 required files: the Liferay portlets section. These portlets are:
1. Script description file - an XML file which 1. Repository browser - enables browsing
provides basic information. The minimal set through the whole Alfresco document
of XML elements in the description file repository.
includes: 2. Site document library - used for browsing
<shortname> - Script name the document repository associated to the
<description> - Script description specific Alfresco site. The Alfresco
document repository is divided into sites each
<url> - URL address by which the script is
with its own document and user spaces.
accessed with parameters specified as
name={value} 3. Advanced search - Provides the search
<authentication> - User authentication level functionality with wide range of search filters.
required for script actions, available values 4. My document libraries - Provides a list of
are: user, guest and admin. all document libraries (associated with sites)
2. Script implementation file - a JavaScript available to browse.
file that contains the script action The user interface and functionalities in great deal
implementation. Besides the standard resemble ones from the Alfresco Share application, so
JavaScript features, various Alfresco objects users have a large number of actions available through
and methods are available through the them. Only the advanced management and
Webscript API. The request parameter values administration functionalities are omitted.
can be accessed by their name provided in When a page with one of the browsing Share portlets
the description file. is loaded, the portlet points to the root of the selected
3. Script template file(s) - each script can document space. If we want to point our pages and
expose action results in the HTML, RSS or their corresponding portlets to link to a specific
JSON format, therefore each of the resulting directory, we need to add an additional path segment
formats needs to have its own template file to the page URL in Liferay. Format of this segment is:
with the corresponding extension. Objects are
passed from the implementation files as #filter=path/path/to/directory
attributes of the Model class instance and
accessed by the name in the template.
Where path/to/directory is path to directory in the
Deploying the web script is done by copying these
Alfresco repository starting from the root of its library.
files into one of the Alfresco repository directories
reserved for web scripts and refreshing the script
index page after which deployed script appears in the
available scrip list 5. COMBINING USER ACTIVITY LOG
4. BASIC INTEGRATION Providing a transparent insight into common user
activities is an important feature of enterprise portals.
By the basic Liferay-Alfresco integration we consider
By enabling users to see which pages have been
the implementation of two mechanisms:
created or modified, or which documents have been
1. Configuration of both Liferay and Alfresco uploaded by their coworkers increases information
instances to run on one application server flow and provides information of every company
instance. organization unit’s daily activities to everyone.
2. Enabling the single sign-on (SSO) [15] Liferay and Alfresco both have its own generic user
mechanisms between them. tracking services, but obtaining an automatically
Running both platforms on one server instance generated and integrated log has proven to be a
reduces server machine load, improves mobility and challenging task.
maintenance of the whole system. By this form of Liferay tracks every user action using its User Activity
integration we are basically creating one platform Page
with24 ofservice.
478 It logs user activities from all portal
a clean install that is easily reusable. The SSO
ICIST 2014 - Vol. 1 Regular papers

applications and provides a detailed report via the 5.1 CUSTOM PORTLET FOR INTEGRATED
corresponding portlet. The Activities portlet provides
ACTIVIY LOG
a list of tracked user actions which contains
information about users who performed an action, The custom portlet needs to provide a combined list of
with the activity description and timestamp. Although user activities from both the Liferay portal and the
this mechanism is intuitive and user-friendly, it can Alfresco document repository and represent them in
only be configured to show a selected number of chronological order and in a uniform format. First
activities, and there is no way to pick which type of problem that needs to be addressed is Liferay's lack of
activity will be shown to users. Besides that, a page trails for page activities. Since page creation and
creation and modification activities are not logged, modifications need to be transparent to all users, we
even though they represent one of the most important need to provide a way to automatically capture and
events in the portal. A slightly more flexible way of record these activities. Probably the easiest and most
displaying tracked activities is to use the Liferay's preferred way of achieving this is by implementing the
Asset publisher portlet. It can display activities related custom Liferay hook extension. As described before,
to each portal asset and filter them by a multiple hooks can capture certain portal lifecycle events, so by
criteria, including asset type. However, the event implementing a hook on a desired event we can write
Asset publisher doesn't record page related activities. an arbitrary programmatic reaction to it. For the
Alfresco also offers two ways of logging user actions purposes of capturing the page creation and
on the document repository. One is the activities modification events, we created a hook that
dashlet available on the user’s home page in the Share implements the Liferay ModelListener interface
application. It is very similar to the Liferay Activities which provides the methods that react to events in the
portlet and displays a list of recent Activities that can Layout class lifecycle. The Layout class is Liferay’s
be filtered by users, a time period and sites within the representation of the portal page. Outline of the
repository. A more advanced way of trailing LayoutHook class is shown in Listing 1.
repository activities is the Alfresco auditing As can be noticed, the interface captures all the
mechanism [16]. It represents the Alfresco service important events in the portal page lifecycle, but since
used to record and query user actions on the repository we are only interested in recording the page’s creation,
content which is very flexible and feature-rich. Since modification and destruction events, only the
auditing provides a wide range of customization onAfterCreate, onAfterRemove and onAfterUpdate
options and features, it is also more difficult to use and methods need to be implemented with the code that
it represents a full-grown framework with its own API persists information about the performed action in the
and libraries. database. The hook is then attached to the desired
Since our goal is to integrate both platforms in one event by creating a custom portal configuration file
web portal it would be very useful to have an with the following entry:
integrated user activity log which would display basic
user actions from both the Liferay portal and the value.object.listener.com.liferay.portal.model.Layout=
Alfresco document repository. Unfortunately, to this hooks.LayoutHook
day, no solution has emerged from either Alfresco or
the Liferay community. In the next subsection we will
Since Liferay relies on its service mechanism for
discuss the development of a custom Liferay portlet
storing data in the database, if we want to create tables
which collects and displays such an activity list.
for our custom entities and persist them, we need to
build a service layer which will provide the basic
CRUD operations. Building a custom service layer is
easily doable using the Liferay IDE's graphic service
builder tool. All we need to do is create an empty
configuration file called service.xml and the graphic

Figure 1. Custom entity shown in Liferay Service builder


Page 25 of 478
ICIST 2014 - Vol. 1 Regular papers

interface that generates its contents becomes available. With the service layer built to store activity data, and
For our solution, we created an entity called Activity the hook implemented to catch page actions, we are
which is used to store the page related user activity able to capture user activities associated with portal
information in the database. pages, a feature that is missing in the Liferay core
The service builder graphic editor with our custom activity portlet. We then build our own portlet, which
entity is shown in Figure 1. It provides graphic would read stored data along with the Alfresco
controls for defining custom entities that will be stored document activities. Since only events that are of
in the corresponding database tables and their importance in this case are document and folder
associations. The service.xml file is simultaneously creation, modification and destruction dates, Alfresco
populated with the corresponding entries for each audit proves to be too big and complicated, and the
entity. best thing to do is to find the way to access the Share
Activities dashlet and collect its data. This dashlet
public class DDORLayoutHook implements provides an option for the RSS subscriptions, so all we
ModelListener<Layout> {
need to do is access its feed from our portlet and parse
@Override its entries in the desired way.
public void onAfterAddAssociation() {} Model of our portlet is shown on Figure 2.
@Override Main portlet classes are:
public void onAfterCreate(Layout arg0) {}
• Activity - represents the portal user
@Override activity, it contains the activity description,
public void onBeforeRemove(Layout arg0) {} full name of the user that performed the
action and date and time when the activity
@Override happened.
public void onAfterUpdate(Layout arg0) {}
• LiferayActivityCollector - class is
@Override used to collect data stored by LayoutHook.
public void onAfterRemove() {}
It simply queries the activity table and
@Override converts each row to the corresponding
public void onAfterRemoveAssociation() {} activity instance. It is important to note that
to be able to access our custom service layer
@Override
classes, our portlet needs to reference service
public void onBeforeAddAssociation() {}
JAR file generated by the service builder.
@Override • AlfrescoActivityCollector - this
public void onBeforeCreate(Layout arg0) {}
collector class reads the Alfresco user
@Override activity RSS feed. Information about the
public void onBeforeRemoveAssociation() {} Alfresco server and site within the repository
needed to construct the RSS URL are stored
@Override and read from the portal configuration files.
public void onBeforeUpdate() {}
With parameters obtained from configuration
}
files, webscript URL is constructed as
Listing 1. LayoutHook class outline follows:
http://hostname/share/feedservice/
components/dashlets/activities/list?
format=atomfeed&mode=site&site=siteName
&dateFilter=28&userFilter=all
Where hostname is the name or IP address of
the Alfresco server followed by the port
number and sitename is a name of the site
within the Alfresco repository from which
we want to get activities. After obtaining the
feed list, each entry is parsed and the
Activity class instances are constructed
from the extracted data. Parsing the RSS
feeds can be done using any third-party
Figure 2. Activities portlet class diagram library.

Page 26 of 478
ICIST 2014 - Vol. 1 Regular papers

Figure 3. Custom activities portlet


• ActivityPortlet – the main portlet class, Science And Information Systems, Vol. 02, pp.
that extends the Liferay’s core MVCPortlet 57-82, 2004.
class. Besides implementing the standard [2] M P Ahmed Hasan - The liferay Cookbook
portlet functionalities, this class instantiates [3] Liferay Portal, www.liferay.com
collector classes and calls their corresponding [4] Alfresco, www.alfresco.com
methods that return a lists with collected [5] Shariff, M., Alfresco – Enterprise Content
activities. It then joins the obtained lists and Management Implementation, Packt Publishing,
sorts activities by date so the most recent 2006.
activity is shown first. [6] Gostojić, S., Sladić, G., Vidaković M.,
In addition to parsing Alfresco feed entries to extract "Arhiviranje dokumenata u Alfresco sistemu",
actions, users and document names, we also need to Zbornik radova YUInfo 2009, Kopaonik, Srbija,
convert links to documents and directories from feeds, 2009. (in Serbian)
so they are shown in Alfresco portlets on the portal [7] Liferay Portal 6.1 - Developers Guide,
instead of the Share application. https://www.liferay.com/documentation/liferay-
portal/6.1/development
The activity portlet class constructs and sorts the [8] Java Portlet Specification (JSR-168) ,
unified activity list and passes it to the corresponding http://docs.liferay.com/portal/4.2/official/liferay-
JSP template that is used to display data in the portlet portlet-development-guide-
section on the portal page. Since both collector classes 4.2/multipage/ch01s02.html
convert collected data to the Activity instances, data [9] Liferay marketplace, http://www.liferay.com/
from both sources are treated uniformly. The activity marketplace
portlet can further be extended by interface controls [10] Sladić, G., Gostojić, S., Milosavljević, B.,
for filtering and limiting displayed activities based on and Konjović, Z., "Handling Structured Data in
any user-defined criteria. The deployed portlet with the Alfresco System", Proceedings of the
example activities is shown on Figure 3. International Conference on Information Society
6. CONCLUSION Technology and Management (ICIST), pp. 78-82,
2011. ISBN: 9788685525070
By integrating Liferay's flagship portal and the [11] Gostojić, S., Sladić, G., and Milosavljević, B.,
industry-standard document management system we "Importing Document Hierarchy in the Alfresco
are getting best of both worlds: a proven Java-based System", Proceedings of the International
portal framework paired with the best open source Conference on Information Society Technology
ECM. In this paper we discussed some options in and Management (ICIST), pp. 88-91, 2011.
integrating the Liferay portal and Alfresco into one ISBN: 9788685525070
powerful enterprise portal platform by extracting the [12] Pavić, I., Sladić, G., Milosavljević, B.,G,
best features from both systems and enriching it by "Integracija upravljanja poslovnim procesima u
custom made extensions to bridge the gaps. We Alfresco sistemu", Zbornik radova YUInfo 2010,
presented an example of a tailored-made Liferay Kopaonik, Srbija, 2010. (in Serbian
portlet and provided information needed to develop [13] Savić, G., Sladić, G., Milosavljević, B., "On-
Alfresco web scripts which represent the backbone of line uređivanje dokumenata u sistemu Alfresco",
our approach. This forms a communication channel in Zbornik radova YUInfo 2008, Kopaonik, Srbija,
which Liferay portlets can call Alfresco scripts and 2008 (in Serbian)
display combined results seamlessly. We hope the [14] Alfresco Web script API,
paper presented enough information for developers http://wiki.alfresco.com/wiki/Web_Scripts
who decide to follow this approach to be able to [15] Sladić, G., Zarić, M., Konjović, Z.,
derive their own solutions that will suit specific Milosavljević, B., "Single Sign-On model za
requirements. web aplikacije", Zbornik radova YUInfo 2008,
Kopaonik, Srbija, 2008.
7. REFERENCES [16] Auditing, http://wiki.alfresco.com/wiki/
[1] G. Milosavljević, B. Perišić, “A Method and Auditing_%28from_V3.4%29
a tool for rapid prototyping of large-scale
Page 27 of 478
business information systems“, Computer
ICIST 2014 - Vol. 1 Regular papers

A short survey of existing emergency


management tools for information collection,
communication, and decision support
Bogdan Pavković*, Uroš Milošević*, Vuk Mijović*, Sanja Vraneš*
* University of Belgrade, Serbia

Institute Mihajlo Pupin, Belgrade, Serbia


[email protected]

Abstract † — Over the recent years, international public initial medical attention and triage to the most
experiences increased occurrence of natural and man-made endangered, coordination of the evacuation of the
disasters involving large human casualties and fatalities, affected people to emergency shelters, assessment of the
displacing affected population, and bringing immense needs for relief support goods, assembly of support goods
material loss. To cope with such devastating and highly
and dispatch to the concerned zone.
unpredictable events, disaster managers and first
responders participating in the crisis aftermath, need all the Disaster scene can involve several different groups of
existing technological help to be able to provide timely and actors ranging from: local and international FR groups
accurate response, better coordination, and overall risk (military, fireman, policeman, medics, search and rescue
prevention and mitigation. In this paper we provide an teams, etc.). Furthermore, FRs operate in harsh
overview of current state of the art commercial, open- environmental conditions (debris, fire, floods, toxic
source, and research emergency management tools aiming gases, etc.) along with ruptures in communication and
for recovery phase: disaster in-situ data collection, power infrastructure.
communication and DSS (decision support system) tools. An Having in mind everything previously stated, it is clear
overview of general requirements for such critical tools is
given as well and some integration challenges identified.
that emergency response actors require all the necessary
technological help i.e. a system that will provide a
I. INTRODUCTION holistic overview of the situation with potential risk
assessment, continuously updated with the most recent
We are all witnesses of the rise of the number of and relevant inputs from the field. Such system should
natural disasters, even in the regions that were previously also provide a coordination and tracking of involved
unfamiliar to it. A recent example comes from the Italian people and goods during the entire rescue and recovery
island Sardinia, that was struck by the unexpected phase. Finally, the system should provide an efficient
cyclone Cleopatra taking 18 lives and injuring much mode for voice and data communication, being able to
more [1]. Natural disasters (earthquakes, floods, provide independent operation regardless of remaining
tsunamis, landslides, avalanches, etc.) along with power and communication infrastructure on the
disasters produced by humans (oil and hazardous emergency site. A visionary glimpse of future version of
chemical spills, explosions, train derailments, etc.) such system can be appreciated in a video prepared by the
produce over the year a large number of human casualties USA Department of the Homeland Security [5].
and material loss that can be measured in billions of In this paper, we start by introducing the general
dollars [2]. Furthermore, after the disaster strikes the emergency management properties and end-user
population from endangered area becomes harshly requirements. We continue by providing an overview of
impacted and often displaced and lost in the surrounding state of the art commercial, open-source, and research
area with little or no basic life necessities. tools and prototypes for emergency management aiming
Immediate crisis aftermath (rescue and respond phase) for the recovery phase. We have focused on solutions
[3] presents first responders (FRs) with time-consuming providing answers along the entire information flow from
and extreme physical and mental challenges: efficient a disaster site to a disaster-safe segment: from the
assessment of the situation and the potential risks on the information collection at the disaster site, through
terrain before requiring more help, search and rescue communication domain, up to the decision support
(S&R) of trapped and injured victims, provision of the system, serving emergency managers and decision
makers. For each of these segments, we provide a short
† overview of the most prominent and relevant tools with
This work was partly financed by the European
their features and shortcomings.
Union (FP7 SPARTACUS project, Pr. No:
313002), and by the Ministry of Science and
Technological Development of Republic of
Serbia (Pr. No: TR-32043).

Page 28 of 478
ICIST 2014 - Vol. 1 Regular papers

Figure 1 The disaster risk management cycle overview: Figure 2 The disaster risk management cycle example
preparedness, response, recovery, and mitigation with specific actions listed

quantification of response goods and personnel


II. EMERGENCY MANAGEMENT PROPERTIES & should be provided.
GENERAL END-USER REQUIREMENTS
• Compatibility: all emerging response tools
Emergency management is composed of four phases: should ensure compatibility with previously
preparedness, response, recovery, and mitigation (cf. used EM tools whenever possible.
Figure 1). Preparedness includes a detailed plan on how • Near zero configuration and training: provided
to respond to a specific disaster, with definition of exact tools should come pre-configured and ready to
lines of command and control and distribution of be deployed straight from the box (plug & play
activities between responsible agencies. Response phase principle). Additionally, EM tools should be
executes all envisioned plans with potential easy to use with no or little prior training. FRs
improvisations, due to additional, unforeseen complexity time should be maximized to save lives.
of disaster at hand. The goal of recovery phase is to • Secured and independent two-way
provide the assistance in the clearing up from the disaster communication: in the time of emergency
and to help the affected people to regain their normal majority of public and private power sources and
ways of life. Mitigation insists on putting additional plans transmission lines as well as communication
in place to prevent re-occurrence of disaster if possible or infrastructure is susceptible to be destroyed or if
at least to minimize damage cased by the next remain standing to be overly congested [4].
occurrences. In this paper, we will concentrate on Thus, an independent means of communication
existing technological solutions aiming at the response should be provided, offering two-way audio and
phase (cf. Figure 2). favorably video calls.
Most natural and manmade disasters have specific nature:
they differ in type (earthquakes, floods, tsunamis, III. RELATED WORK
landslides, avalanches, oil and hazardous chemical spills, In the following section, we summarize the current
explosions, train derailments, etc.), affected area (from state of the art solutions for the response phase, regarding
city blocks, entire cities, regions, up to entire countries three main aspects: in-situ data collection and
[cite Pakistan]), number of affected people (from dozen management, disaster communication, and decision
people affected by Sardinia cyclone, to millions in Tahiti support and coordination systems.
earthquake), economic loss, involved FRs groups, etc.
Nevertheless, a list of general requirements for EM A. Field (in-situ) data collection and management tools
tools for response phase is available [6]: Having an eye on the exact situation in the field has a
• Holistic overview of the situation: decision precious value for emergency managers. Up to date
makers coordinating the response should be information provides a basis for timely and accurate
provided with a clean visual overview (maps and response, better coordination and overall risk prevention
rich data) with possibility to explore further and mitigation.
details. Abundance of information collected Field information can come in different forms: textual
from the field (situation awareness reports, FR (e.g. situation and risk assessment reports, list of people
risk reports, images, videos, or crowd-sourced and equipment in the field, lists of missing, injured, and
inputs) should be filtered, synthesized, and dead people, etc.), audio-visual (images and photos,
presented according to their quality, relevance, videos, earth observation satellite imagery, live and
and timeliness. recorded voice, etc.), and specific (environmental sensor
• Efficient organization and coordination: readings, weather data, location and timing information
decision makers should be provided with a on people and objects).
visual way to track advancement of involved Field information can be provided by trained
personnel and transported goods. Clustering and specialists or crowd-sourced by population in the affected

Page 29 of 478
ICIST 2014 - Vol. 1 Regular papers

zone. Abundant information from the field could be


harvested through various platforms: web-based,
commodity devices (PC, smartphones, tablets), or
specialized hardware (e.g. UAV, weather balloons, EO
satellites).
1) Web-based tools
Big technology player Google publicized the use of
web-tools in Crisis Response. Such solutions are freely
available to international public, often support multiple
languages by virtue of a large support community, and
are convenient for people with little or even no previous
technical knowledge (a simple internet browsing Figure 3 Emergency AUS: a simple and intuitive
competence could suffice). interface for emergency updates and reporting
We list some of the most prominent solutions: • Emergency AUS/ FireReady: issued by
• Google Crisis Response [7]: Technology giant Australian Government to provide timely and
has applied its widely accepted tools and created relevant emergency information to affected
specific tools for the crisis sector: Public Alerts: population. Users (professionals and ordinary
a renowned platform for collection of latest people) can provide relevant information about
critical information about weather alerts and ongoing emergency, through an intuitive step-
approaching natural disasters, before they cause by-step interface where geo-tagged images and
a damage. Platform allows creation of services text can be attached at the end (cf. Figure 3).
through combination of maps with overlays of Users can as well follow the status and
relevant announcements. Services are seamlessly additional information about various types of
integrated with Google search, maps, and emergencies (e.g. fire, storm, earthquake,
notifications. Person finder: a simple and landslide) on a map, or through a subscription
effective open platform, allowing individuals system. Application offers a community support
and organizations to provide information and mode where users can request or offer relief
conversely to look for missing persons. Crisis support goods and accommodation. Further
map: provides a mashup tool to collect, promotion and international support is needed
contribute, combine, and explore critical (for the moment, Australia only).
disaster-related geographic data, without any • UN-ASIGN: an application resulting from EU
specialized software. FP7 project Geo-pictures allows actors of an
• Virtual OSOCC: an international information emergency (both trained personnel and affected
exchange and coordination portal for early phase population) to provide geo-tagged input (photos
of major disasters. Created by GDACS and reports). During the Thailand floods, 2009,
cooperation framework between the United UN-ASIGN allowed EM decision makers to
Nations and the European Commission, it is fuse all the inputs from the field in order to
intended for exclusive use by disaster managers localize and visualize on a map the area covered
worldwide. Virtual OSOCC organizes by water and finally to gain better overall
information by event and handles several data situation awareness. Currently, UN-ASIGN is
types: field data (geo-tagged reports and missing structured pre-made reports, guiding the
photos/videos), in-situ sensor measurements, FRs through the process.
GIS, model output data, priority areas, baseline • HelpBridge: an application aiming to facilitate
data, and satellite image derived data (e.g. flood the collection and request of disaster relief,
extent, earthquake damage assessment). Virtual promoted by Microsoft and several relief
OSOCC incorporates maps produced worldwide organizations (American Red Cross, Care,
by UNITAR/UNOSAT network of collaborators CRS). Benefactors can directly provide financial
based on collected satellite and GIS data from and material donations, along with their time to
many organizations. Virtual OSOCC handles volunteer in case of a crisis. Application allows
and analyses mass and social media specifically affected population to connect with some of the
related to worldwide disaster events. world’s leading disaster relief organizations in
2) Smarthone-based tools order to request help and assistance. HelpBridge
High market adoption, performance boost, needs further promotion and international
versatility, and portability of smart mobile devices support.
made its way to people's everyday lives including the
emergency situations. We want to highlight some of
the most prominent emergency applications
optimized for mobile devices (prevalently Android
based):

Page 30 of 478
ICIST 2014 - Vol. 1 Regular papers

collaborate with Alpine rescue teams. Ground


robots will serve to probe the unstable terrain
before rescuers engage in potentially dangerous
zone. Aerial units will continuously scan the
monitored area after an avalanche looking for
survivors, while Alpine rescue provides help to
previously found victims. First working
prototypes are expected in 2015.

Figure 4 sensFly drone for earth mapping

3) UAV-based tools
Recent development and commercialization of the
small UAV (Unmanned Aerial Vehicles) asserts them as
a reliable tool for emergency response. UAVs equipped
with various imagery sensors offer a unique visual input
from the birds-eye perspective (cf. Figure 4). UAVs can
extend their operation even to the hazardous and
inaccessible zones (e.g. areas affected by radiation and Figure 5 CISCO NERV vehicle - a complete
telecommunication emergency recovery solution
toxic chemicals, steep mountain slopes, or swamps).
UAV (both quadcopter and gliding planes) demonstrate B. Disaster site Communication
extraordinary agility and stability even in presence of Efficient and reliable disaster site communication is
disturbing factors (strong winds or propeller / motor conditio sine qua non link in the chain of extracting the
failure) [13] field input data and delivering it to the outside world
We can point out some notable use-cases of UAV in (relevant emergency decision makers and journalist).
emergency scenarios: Recent study from American Homeland Security [4]
• Earth surface mapping and modeling: Swiss indicates that during disasters, communication networks
based company senseFly offers a complete become highly compromised. On one hand, disasters
solution for accurate mapping used for bring destruction to both power (primary and backup) and
environmental management, mining, communication infrastructure, limiting its use and reach.
construction industry, and emergency On the other hand, crisis situation produces an increased
management. Autonomous drone (cf. Figure 4) demand of communication leading to network congestion
(1m wingspan with 45 minutes flight) coupled and failure. As a result, communication outage can last
with photogrammetry software allows mapping even up to couple of weeks, depending on the severity of
of an area up to 10 km2 with up to 3cm a disaster, where the first 24-36h after the disaster are the
precision. senseFly successfully completed most critical for the response and relief.
several missions at Tahiti providing an Currently, there are several solutions that can help
estimation of number of new residents along bridge the communication gap between the imminent
with garbage concentration, as well an disaster site and outside world:
assessment of the water drainage in several • CISCO NERV: a specialized CISCO truck
neighborhoods of Port de Prince. vehicle (cf. Figure 5) provides an integral and
• Indoor mapping: members of GRASP robust telecommunication solution for
Laboratory at Penn Engineering create and emergency scenario. NERV is entirely
develop autonomous quadcopters enabled to independent from remaining power and
explore and map unknown and complex 3D communication infrastructure at the disaster
indoor environments [11]. The quadcopters site. NERV come equipped with a complete
equipped with a range of sensors (inertial, communication solution (UHF/VHF radio,
cameras, a laser range scanner, an altimeter and 2G/3G mobile base station, WiFi) with a
a GPS) allow precise mapping. Quadcopters high-bandwidth satellite backhaul. NERV
have a large potential in urban search and rescue comes with built in support for VoIP and
and first response missions, especially where video calls, and data exchange. A dedicated
hazardous materials are involved [12]. videoconference room is situated at the back
• Combined S&R in inaccessible terrain: A of the truck. CISCO Tactical Ops have
specialized FP7 SHERPA project aims at already provided a rapid deployment and
building smart ground and aerial robots that will

Page 31 of 478
ICIST 2014 - Vol. 1 Regular papers

assistance in case of 30 major emergencies • Mesh Dynamic: another proven provider of


around the globe. robust wireless mesh solution, specialized for
challenging use in emergencies. A robust
multi-frequency solution is suited for hastily
formed networks requiring easy installation,
low set up efforts and high data rates over
many wireless hops. Mesh Dynamic provides
connectivity and video streaming even in a
case of fast moving vehicles (up to 90km/h).
• Serval project: an open source initiative from
University of Auckland for anytime,
anywhere, secure communication outside
mobile tower coverage. Key idea is in using
Figure 6 Emergency.lu rapid deployable kit with available smartphones with in-built WiFi
an inflatable satellite dish capability. Smartphones with Serval app
• Emergency.lu: offers a complete emergency allow building impromptu networks in areas
communication service: a pre-configured with no or low mobile coverage (e.g. 75% of
rapid deployment kit (cf. Figure 6) featuring Australia lacks mobile coverage) for cost-
an inflatable satellite dish with all necessary effective solution (no mobile subscription
communication gear, a pre-booked satellite required). Serval supports VoIP calls, IM and
capacity and a free standby airborne file exchange with 256bit ECC encryption.
transportation, available after 2h following the Wireless mesh network coverage can be
demand. Once deployed, it offers a rich set of additionally increased up to 10 to 100 times
services (VoIP, IM, tracking & tracing of compared to standard WiFi coverage with
people and equipment, map assessment, asset additional equipment called MeshExtenders:
management, situational reports). a battery powered mesh-enabled devices
Emergency.lu is supported by UN and mounted on a 6m pole. Serval project
financed by the government of Luxemburg. develops and evolves with a support of open-
• FP6 WISECOM project: provided through source community.
live trials an integrated communication C. Decision Support System (DSS)
infrastructure - a standalone, independent,
Final step in EM tool chain laying atop of in field data
portable communication unit combining
input and emergency communication is the EM DSS.
advantages of different technologies ranging
Similar tools, capable to absorb a potentially large
from TETRA, WiFi to GSM and 3G with
amount of relevant, up-to-date and heterogeneous data, to
both Inmarsat BGAN and DVB-RCS satellite
filter them by relevance, further extract and recombine in
backhauling systems. Location based services
order to build a common overall holistic situational
were incorporated in the solution to offer
awareness picture with rich underlying details.
tracking and triage of patients in the field.
DSS platform should provide decision makers with a
support tool for more efficient reasoning based on up-to-
Larger network coverage, beyond the satellite date information. We can select some prominent
backhauled equipment described previously, can be examples:
obtained through wireless mesh solutions. Generally, a
deployed wireless mesh network becomes more robust by
• Ushahidi [9]: a non-profit tech company that
adding more nodes. We can point out some of the
develops free and open source software for
following prominent solutions:
information collection, visualization and
• Rajant: a reliable provider of wireless multi-
interactive mapping. Ushahidi platform was
frequency mesh solutions for challenging
initially developed to map reports of violence in
military, underground mining, railway, and
Kenya at the beginning of 2008. Eventually
emergency use. Battery powered, rugged
Ushahidi platform found widespread use in
BreadCrumb nodes extend coverage even in
crisis information mapping after the Haiti
harsh environment and inaccessible areas.
earthquake [10]. Ushahidi and its web-version
Several wireless interfaces provide higher Crowdmap offer seamless collection of
resilience to interference and jamming. Rajant
crowdsourced information (optionally geo-
solutions allow VoIP and video
localized) from multiple sources (SMS, email,
communication, along with data and remote
Twitter and the web) and further presentation on
monitoring information. Provided solution
interactive map. SwiftRiver platform
remains functional even when network
complements Ushahidi by providing open-
becomes fragmented - communication
source tools for filtering and mining of real-time
continues locally, awaiting connection to the
information.
backbone of the network.

Page 32 of 478
ICIST 2014 - Vol. 1 Regular papers

together data and process from multiple


emergency and non-emergency applications.
Users can customize their web dashboard to
extract only the most relevant and critical
information for decision making. EmerGeo
Fusionpoint Integrates with: alert notification
systems, Dispatch Systems, GIS Mapping
Systems, CCTV, Hazard Models and Simulation
tools, Live Data Feeds (weather, news, GeoRSS.
IV. CONCLUSIONS
Increased occurrence of natural and man-made
disasters has lead EM managers, decision makers, and
responders to reach for any available help allowing them
to work more efficiently, in a coordinated way, helping
Figure 7 GeoFES: a powerful and intuitive DSS tool for them to mitigate and relieve devastating effects of
emergency managers disasters.
In this short survey we have provided an overview of
• GeoFES: an ESRI ArcGIS-based software, state of the art the technology solutions for emergency
providing support for decision makers at fire management aiming for the recovery phase. We have
brigades and disaster management services. selected representative tools for information collection at
GeoFES is an efficient EM DSS tool in the event the disaster site, disaster communication equipment, and
of wide range of natural and man-made disasters decision support systems.
(storms, floods, fires, nuclear, biological and We believe that our survey paper can be beneficial to
chemical (NBC) incidents, epidemics) (cf. both EM personnel and technology developers and
Figure 7). It can also be used for preventive researchers seeking a concise overview of main EM
planning and training purposes. GeoFES focuses technology tools.
on the following main topics: a. Fast
REFERENCES
identification and preview of the emergency
location in the interactive map. b. Synthetic and [1] Sardinia hit by deadly Cyclone Cleopatra and floods, BBC
Europe, 19.11.2013, http://www.bbc.co.uk/news/world-europe-
holistic overview of all current risks - enable 24996292
better preparation and decision-making. c. [2] UN GLOBAL ENVIRONMENT OUTLOOK, Socio-economic
Simulation, modeling and estimation of hazard effects, http://www.unep.org/geo/geo3/english/448.htm
substance propagation in the air and water to [3] Coppola Damon P. (2007): Introduction to International Disaster
guide and support S&R and evacuation actions. Management. Oxford: Butterworth-Heinemann.
d. Evaluation of the endangered zones - [4] Gerard O’Reilly, Ahmad Jrad, Ramesh Nagarajan and Theresa
Brown, Stephen Conrad, "Critical Infrastructure Analysis of
population and buildings statistics. e. Telecom for Natural Disasters", Telecommunications Network
Operational management of emergency services Strategy and Planning Symposium, 2006
f. Adaptation of digital content for fire fighters [5] Precision Information - Vision Video, Department of the
without digital equipment. Homeland Security, USA, http://precisioninformation.org/?p=32
• Sahana [8]: an open source foundation [6] World Disasters Report, Focus on technology and the future of
humanitarian action, International Federation of Red Cross and
providing software and services that help solve Red Crescent Societies, 2013
concrete problems related to disaster response [7] Google Crisis Response Tools presentation,
coordination. Sahana provides tools for http://www.google.org/crisisresponse/
management of missing and found person, [8] Careem, M.; De Silva, C.; De Silva, R.; Raschid, L.;
tracking of organizations and programs Weerawarana, S., "Sahana: Overview of a Disaster Management
System," Information and Automation, 2006. ICIA 2006.
responding to the disaster, providing International Conference on , vol., no., pp.361,366, 15-17 Dec.
transparency in the response effort, project 2006
tracking by enabling relevant sharing of [9] Ushahidi Platform, http://www.ushahidi.com
information across independent organizations. [10] How Crisis Mapping Saved Lives in Haiti, National Geographic,
Additionally, Sahana provides a management http://newswatch.nationalgeographic.com/2012/07/02/crisis-
mapping-haiti/
tool for hospital triage.
[11] S. Shen, N. Michael, and V. Kumar, “Autonomous indoor 3D
• EmerGeo Fusionpoint: a powerful web-based exploration with a micro-aerial vehicle,” in Proc. IEEE Int. Conf.
Crisis Information Management System (CIMS) Robot. Automat., St. Paul, Minneapolis, 2012
with DSS integration allowing secure access [12] N. Michael, S. Shen, K. Mohta, Y. Mulgaonkar, V. Kumar, K.
from anywhere. Fusionpoint connects to Nagatani, Y. Okada, S. Kiribayashi, K. Otake, K. Yoshida, K.
Ohno, E. Takeuchi, S. Tadokoro, "Collaborative mapping of an
customer’s existing systems to merge and earthquake-damaged building via ground and aerial robots",
publish data. It combines logging and reporting, Journal of Field Robotics, Volume 29, 2012
real-time data fusion, OpenGIS and ESRI [13] Raffaello D'Andrea: The astounding athletic power of
mapping and web portal technology to bring quadcopters, TED conference

Page 33 of 478
ICIST 2014 - Vol. 1 Regular papers

Predictive analytical model for spare parts


inventory replenishment
Nenad Stefanovic*
* Faculty of Technical Sciences, Cacak, University of Kragujevac, Serbia
[email protected]

Abstract — In today’s volatile and turbulent business analytical models needs to be delivered to all parties involved
environment, supply chains face great challenges when in supply chain inventory management. As a result,
making supply and demand decisions. Making optimal collaboration through dedicated web-based workspaces
inventory replenishment decision became critical for became essential for more efficient and effective coordination
successful supply chain management. Existing traditional and decision making [1].
inventory management approaches showed as inadequate This paper presents a predictive inventory management
for these tasks. Current business environment requires new approach and describes corresponding data mining models
methods that incorporate more intelligent technologies and for making out-of-stock prediction of the automotive spare
tools capable to make accurate and reliable predictions. parts. The models are designed on top of the data warehouse
This paper deals with data mining applications for the which is loaded with sales data from the retail spare parts
supply chain inventory management. It describes the use of stores. Model accuracy is demonstrated through testing and
business intelligence (BI) tools, coupled with a data evaluation of the results. Finally, the specialized analytical
warehouse to employ data mining technology to provide web portal which provides collaborative, personalized and
accurate and up-to-date information for better inventory secure analytical services is presented.
management decisions and to deliver this information to
relevant decision makers in a user-friendly manner. II. BACKGROUND RESEARCH
Experiments carried out with the real data set showed very Inventory control is the activity which organizes the
good accuracy of the model which makes it suitable for availability of items to the customers. It coordinates the
more informed inventory decision making. purchasing, manufacturing and distribution functions to meet
marketing needs.
I. INTRODUCTION Inventory management is one of the most important
segment of the supply chain management. Companies face
The success of many organizations depends on their ability the common challenge of ensuring adequate product/item
to manage the flow of materials, information, and money stock levels across a number of inventory points throughout
into, within, and out of the organization. Such a flow is the supply chain. Additionally, uncertainty of demand, lead
referred to as a supply chain. Because supply chains may be time and production schedule, and also the demand
distributed and complex and may involve many different information distortion known as the bullwhip effect [2],
business partners, there are frequent problems in the makes it even more difficult to plan and manage inventories.
operation of the supply chains. These problems may result in The basis for decision making should be information about
delays, in customers’ dissatisfaction, in lost sales, and in high customer demand. Demand information directly influence
expenses of fixing the problems once they occur. inventory control, production scheduling, and distribution
The aim of the integrated supply chain planning and plans of individual companies in the supply chain [3].
operations management is to combine and evaluate from a Making decision based on local data leads to inaccurate
systemic perspective the decisions made and the actions forecasts, excessive inventory, and less capacity utilization.
undertaken within the various processes which compose the Generally, determining the adequate stock levels balances
supply chain. the following competing costs:
The need to optimize the supply chain, and therefore to
 Overstocking costs – these include costs for holding
have models and computerized tools for medium-term
the safety stocks, for occupying additional storage
inventory planning and replenishment, is particularly critical
space and transportation.
in the face of the high complexity of current supply chain
systems, which operate in a dynamic, uncertain and truly  Costs of lost sales – these are costs when customer
competitive environment. wants to buy a product that is not available in that
It is not enough to know only what happened and what is moment.
happening now, but also what will happen in the future and Commonly, managers have relied on a combination of
how/why did something happen. Due to the complex ERP, supply chain, and other specialized software packages,
interactions occurring between the different components of a as well as their intuition to forecast inventory. However, in
supply chain, traditional methods and tools intended to today’s high uncertain environment and large quantities of
support the inventory management activities seem today disparate data demands new approaches for forecasting
inadequate. Thus, predictive analytics and data mining inventory across the entire chain. Data mining tools can be
became indispensable and valuable tool for making more used to more accurately forecast particular product to the
intelligent decisions. right location.
On the other hand, predictive models themselves are not The best way to deal with these competing costs is to use
enough. Information and knowledge derived from these data mining techniques to ensure that each inventory point

Page 34 of 478
ICIST 2014 - Vol. 1 Regular papers

(internal warehouse, work-in-process, distribution center, especially true when it comes to automotive spare parts
retail store) has the optimal stock levels. supply management that is characterized by high uncertainty
Forecasting and planning for inventory management has of demand and thousands of different parts. Most of the
received considerable attention from the scientific existing research is focused on specific segments of the
community over the last 50 years because of its implications analytical solutions (i.e. only predictions). Data mining
for decision making, both at the strategic level of an models and analytical cubes are designed in such a way to
organization and at the operational level. Many influential accommodate specificity of the automotive spare parts
contributions have been made in this area, reflecting different management in terms of products, stores, inventory, and time
perspectives that have evolved in divergent strands of the dimensions. Since every prediction model is unique (in terms
literature, namely: system dynamics, control theory and of products, demand, etc.) and related to specific data set, it is
forecasting theory [4]. not possible to make concrete comparative analysis.
However, the proposed predictive models provide top-level
A number of research projects have demonstrated that the
accuracy (even up to 99% in certain cases). Also, in contrast
efficiency of inventory systems does not relate directly to
to existing research methods, the approach presented in this
demand forecasting performance, as measured by standard
paper introduces the complete business intelligence model
forecasting accuracy measures. When a forecasting method is
which combines specialized data warehouse, the two-phase
used as an input to an inventory system, it should therefore
data mining modeling approach, and analytical web portal for
always be evaluated with respect to its consequences for
information delivery.
stock control through accuracy implications metrics, in
addition to its performance on the standard accuracy
measures [5]. III. INVENTORY FORECASTING MODEL
Chandra and Gabris used simulation modeling to This section describes the business intelligence solution
investigate the impact of the forecasting method selection on for the real automotive supply chain, which utilizes data
the bullwhip effect and inventory performance for the most warehouse and data mining technology to provide timely
downstream supply chain unit [6]. The study showed that information for spare parts inventory management decisions.
application of autoregressive models compares favorably to The presented methodology is designed to provide out-of-
other forecasting methods considered according to both the stock predictions at the location/product level. For a
bullwhip effect and inventory performance criteria. particular product, data mining model is built that makes out-
Liang and Huang employed multi-agents to simulate a of-stock predictions for each store in the chain. This approach
supply chain [7]. Agents are coordinated to control inventory enables more effective balance between the competing costs
and minimize the total cost of a SC by sharing information related with stocking.
and forecasting knowledge. The demand is forecasted with a
genetic algorithm (GA). The results show that total cost A. Data Warehouse Design
decreases and the ordering variation curve becomes smooth. In order to gather data from many distributed sources, we
Spare parts are very common in many industries and needed to extract, clean, transform and load data into the data
forecasting their requirements is an important operational warehouse that summarize sales data from 36 retail stores
issue. In recent years, there have been advances in forecasting and for more than three thousands of different spare parts.
methods for spare parts, demand information sharing These data are distributed among multiple heterogeneous
strategies and the design of forecast support systems. Boylan data sources and in different formats (relational databases,
and Syntetos give thorough review on these developments spreadsheets, flat files and web services).
and provides avenues for further research are explored [8]. We have used the Unified Dimensional Model (UDM)
Accurate demand forecasting is of vital importance in technology to provide a bridge between the user/developer
inventory management of spare parts in process industries, and the data sources [13]. A UDM is constructed over many
while the intermittent nature makes demand forecasting for physical data sources, allowing us to issue queries against the
spare parts especially difficult. Hua et al. proposed an UDM using one of a variety of client tools and programming
approach that provides a mechanism to integrate the demand technologies. The main advantages are a simpler, more
autocorrelated process and the relationship between readily understood model of the data, isolation from
explanatory variables and the nonzero demand of spare parts heterogeneous backend data sources, and improved
during forecasting occurrences of nonzero demands over lead performance for summary type queries.
times [9]. The results show that this method produces more The following data sets are used for the out-of-stock
accurate forecasts of lead time demands than do exponential predictive modeling:
smoothing, Croston's method and Markov bootstrapping • Sales data that is aggregated at the store, product
method. (part), and day level. Daily sales are stored for each
Bala proposed an inventory forecasting model which use product that is sold, for each store in the retailer’s
of purchase driven information instead of customers’ chain.
demographic profile or other personal data for developing the • Inventory data that is aggregated at the store, product
decision tree for forecasting [10]. The methodology combines (part), and day level. This is the number of days that
neural networks, ARIMA and decision trees. the product has been in stock, for each product, for
Dhond et al [11] used neural-network based techniques for each day, and for each store.
the inventory optimization in a medical distribution network • Product (part) information such as product code,
which resulted in 50% lower stock levels. Symeonidis et al name, description, price, and product category.
[12] applied data mining technology in combination with the • Store information such as store description, store
autonomous agent to forecast the price of the winning bid in a classification, store division, store region, store
given order. district, city, zip code, space capacity, and other store
Even though, forecasting is seen as a crucial segment of information.
effective inventory management and supply, there are no
many reports from the industry which demonstrate successful
application of the prediction models and solutions. This is

Page 35 of 478
ICIST 2014 - Vol. 1 Regular papers

• Date information that maps fact-level date identifiers product hierarchy is used to identify the set of similar
to appropriate fiscal weeks, months, quarters, and products c(p) for a given product p. Alternatively, a
years. product clustering approach could be used to determine a
The data warehouse is the basis for all business data-driven grouping of spare parts similar to p by
intelligence applications and particularly for data mining clustering parts based upon their sales across the chain of
tasks. Data warehouse allows us to define data mining stores.
models based on the constructed data warehouse to discover 2. Prepare modeling dataset Dcluster for store clustering to
trends and predict outcomes. capture store-level properties and sales for category c(p).
3. Apply the Clustering algorithm to the dataset Dcluster to
B. Data Mining Methodology obtain k clusters (groups) of those stores that are similar
In order to increase the quality and accuracy of the across store-level properties and sales for category c(p).
forecasts, we have applied a two-phase modeling process. 4. For each cluster l = 1,…,k obtained in previous step:
Phase I of the modeling process consists of clustering stores a. Let S(l) be the set of stores that belong to cluster l.
in the supply chain based upon aggregate sales patterns. After These stores have similar category-level aggregate
store-cluster models have been constructed, in phase II, these
sales, for the category c(p).
clusters are used to more accurately make out-of-stock
predictions at the store/product level [14]. b. Create a dataset Dinventory(p,S(l)) consisting of historic
and current weekly sales aggregates, and changes in
The general data mining process is shown in Figure 1. The weekly sales aggregates, for each store s in S(l). In
process begins analyzing the data, choosing the right
addition, include Boolean flags indicating whether or
algorithm in order to build the model. The next step is model
not product p was in stock or out of stock one week
training over the sampled data. After that, model is tested,
into the future and two weeks into the future.
and if satisfactory, the prediction is performed.
c. Apply the predictive modeling algorithms (in this
case Decision Trees and Neural Networks) to the
dataset Dinventory(p,S(l)). Use the historic and current
weekly sales aggregates as input attributes and the
one- and two-week out-of-stock Boolean flags as
output or predict-only attributes. This instructs data
mining engine to generate a model that takes as its
input the historic and current weekly sales, along
with changes in weekly sales, and then make a
prediction of the Boolean flags that indicate whether
or not spare part p will be out of stock one and two
weeks into the future.

D. Phase I: Store clustering


The goal of store clustering is to obtain groups of stores
that have similar sales patterns, focused on sales over the
Figure 1. Data mining process spare parts in the category to which part p belongs c(p).
Phase I begins with constructing the dataset that will be used
C. Inventory Predictive Modeling Process for store clustering.
Phase I consists of grouping together those stores that have The dataset used for store clustering consisted of store-
similar aggregate sales patterns across the chain. Store level aggregate sales over the time period of four years.
clustering is accomplished by using the data mining Typically, the dataset consists of a single table with the
Clustering algorithm. Dataset holds aggregate sales patterns unique key (StoreID) that identifies each item (store in the
and Clustering algorithm groups together stores into clusters. chain). The creation of this table can be automated by
The modeling dataset is based on aggregate sales data that is designing the appropriate ETL package. However, we
derived from the data warehouse. The measure that is used to decided to take advantage of the UDM and defined the data
group together stores is computed over this aggregate sales source view against it. This way, denormalized data source
data. view is created over normalized set of fact and dimension
In phase II, cluster models were used to build more data and without worrying about underlying data sources.
accurate out-of-stock forecasting models. This allows The store clustering task is to group together stores based
predictive algorithms such as Decision Trees and neural upon similarity of aggregate sales patterns. Firstly, we had to
Networks to use the results of the clustering process to identify a set of aggregate sales attributes relevant for this
improve forecasting quality. In essence, to make the project. Attributes were aggregated over the fact data in the
predictions for a given spare part p in a given store s, the data warehouse. These attributes are category-specific
forecasting algorithms use the fact that the sales for the same (total_sale_quantity, total_sale_amount, quantity_on_order,
spare part p in a similar store s may produce better results discount_amount, etc.) and store-specific (total_sales,
when determining whether or not a particular part will be out total_weekly_on_hand, total_weekly_on_order, etc.).
of stock in a particular store. After initial business understanding phase, data cleaning
Modeling process consists of the following high-level and transformation, data warehouse construction and loading,
steps: the next step is clustering mining model construction.
1. Use the spare part hierarchy in the product information Cases (i.e. stores) within the same group have more or less
(dimension) portion of the data warehouse to determine similar attribute values. The mining structure defines the
the spare part category c(p) for part p. We assume that column structure that will be used to construct the store-
spare parts within the same category have similar clustering model. All attributes are selected as input attributes
aggregate sales patterns across the chain of stores, and the except the Category_Fraction_Sales (fraction of total non-

Page 36 of 478
ICIST 2014 - Vol. 1 Regular papers

discount sales coming from parts in category c(p) in the given


store) and Category_Total_Sales_Quantity (total quantity of
spare parts in category c(p) that were sold during the non-
discount period) attributes that are selected as predict.
Two clustering algorithm parameters were tuned in order
to get better outcome. Cluster_Count parameter specifies the
maximum number of clusters to search for in the source data.
In order to produce distinct clusters that sufficiently capture
the correlations in store properties and aggregate
sales/inventory values, the Cluster_Count parameter was
altered and tested with different values to obtain desired
results. The other parameter Minimum_Support instruct
clustering algorithm to identify only those clusters that have
given value or more cases (stores in our case) in them. After
setting the parameters for the Clustering algorithm, the
mining structure is processed, thereby creating and
populating the mining model. Figure 2 shows store clustering
Figure 3. Store clusters
mining structure and algorithm parameters.
Typically, there are very few out-of-stock events that
occur for a single store and single product. To obtain accurate
predictive models, the training data needs to include a
sufficient number of out-of-stock events and in-stock events
to identify trends differentiating the two. The following data
preparation strategy was aimed at achieving a sufficient
number of out-of-stock events and in-stock events by
considering a given product p over the entire chain of stores.
We included the store cluster label (derived from the store-
cluster model) to allow the predictive modeling algorithms to
identify trends in out-of-stock behavior that might be
different between different store clusters.
For each store s in the retail chain a unique key
(store/week identifier) is generated. Some of the attributes
which describe the entity are: current_week_on_hand,
one_weeks_back_on_hand, one_week_back_sales,
current_week_sales, cluster_label (from the store-clustering
model), four_weeks_back_sales, five_weeks_back_on_hand,
two_weeks_back_sales, first_week_sales_change,
one_week_oos_boolean, two_week_oos_boolean, etc.
Figure 2. Store-clustering mining structure The data mining algorithms will attempt to identify the
pertinent correlations for making accurate predictions. Since
In the model-building stage, we build a set of models using
the pertinent correlations are not known, we have included all
different algorithms and parameter settings. After the store-
possible attributes in the training dataset. Attributes
clustering models have been constructed, they are evaluated
first/second/third week sales change help to approximate the
by using the Cluster Browser to determine if the clusters are
change in sales week over week. Typically, these types of
distinguished by category sales patterns.
attributes can be very useful in improving a model’s
The store clusters tend to be discriminated primarily by the predictive accuracy.
total_sales, category_sales_quantity, category_weekly_sales,
To more objectively evaluate the predictive accuracy of
category_weekly_on-hand, and on-order values. Figure 3
the models, it is common practice to hold out a subset of data
shows the derived store clusters shaded with the different
and call this the testing set. The remainder of the dataset is
density consistent with the population values, and also the
called the training dataset. The data mining models are
link density relationships.
constructed using the training dataset. Predictions from the
model are then compared with the actual values over the
E. Phase II: Inventory predictive modeling
testing set.
The dataset used for the inventory predictive model task First a data source is created that specify the database
takes into account weekly sales data for a given spare part server instance that stores the training and test tables for the
across all stores in the supply chain. We used a sliding spare parts under consideration.
window strategy to create the dataset used for predictive
modeling. The sliding window strategy typically is a good After the data source view is added, a new mining
data preparation strategy when the data has a temporal nature structure is created for the inventory predictive modeling
(for example, when predictions are made into the future) and process.
the type of the predictable quantity is discrete (such as Decision Trees and Neural Network models are built to
Boolean out-of-stock indicators). If there is sufficient determine which algorithm produces the most accurate
temporal data and the predictable quantity is inherently models (as measured by comparing predictions with actual
numeric, time-series modeling may be a preferred strategy. values over the testing set). After an initial mining structure
and mining model is built (specifying the input and
predictable attributes), other mining models can be added.
In Figure 4 the part of the mining structure and mining
algorithms are shown. Input indicates that the attribute value

Page 37 of 478
ICIST 2014 - Vol. 1 Regular papers

will be used as an input into the predictive model. TABLE I.


PredictOnly indicates that these values should be predicted
by the data mining model. Key indicates the column that OUT-OF-STOCK PREDICTIVE ACCURACIES FOR
uniquely identifies the case of interest FOUR SPARE PARTS
Out-of-Stock
PRODUCT Week 1 Week 2
Product 1 98.26% 93.31%
Product 2 99.10% 94.12%
Product 3 97.65% 89.48%
Product 4 99.70% 92.93%
AVG ACCURACY 98.68% 92.46%

Sales opportunity
By using the developed data mining predictive models we
can analyze sales opportunities. The method for calculating
the lost sales opportunity for each spare part was computed
by multiplying the number of out-of-stock total store weeks
by the two-week Boolean predicted value. Multiplying the
out-of-stock predicted values by the percentage of actual
sales for the year by the respective retail sale price generates
Figure 4. Out-of-stock mining structure with mining models the total sales opportunity. Sales opportunity formula:
Yearly increase in sales =
F. Predictive Modeling Results (# of total OOS weeks for all stores) x (2-week Boolean
The predictive accuracy of mining models were evaluated predicted accuracy)
by examining them over the testing set. There are a few X (% of actual sales across all stores) x (retail price)
popular tools to evaluate the quality of a model. The most Additionally, it is possible to generate profit charts which
well-known one is the lift chart. It uses a trained model to use input parameters such as: population, fixed cost,
predict the values of the testing dataset. Based on the individual cost and revenue per individual.
predicted value and probability, it graphically displays the
model in a chart. The lift chart compares the predictive IV. BUSINESS INTELLIGENCE WEB PORTAL
performance of the mining model with an ideal model and a Capability to deliver analytical information to the end-user
random model. Figure 5 shows the lift chart for Boolean two- via standard Web technologies, as well as enabling decision-
week out-of-stock predictions for the front bulb spare part. makers to access these information in a unified way, become
The task is to predict a true/false value as to whether the part a critical factor for the success of data warehousing and data
will be in stock or out of stock two weeks into the future at mining initiatives. Enterprise information portal serve as a
any store in the chain. The overall predictive accuracy of this virtual desktop providing transparent access to the
model is close to the ideal model. information objects (reports, cubes, spreadsheets, etc.) as
described in [15].
In order to provide better user experience we have
designed the business intelligence (BI) Web portal as an
integrated, Web-based online analytical processing (OLAP)
solution that enables employees throughout the entire supply
chain to create and share reports, charts and pivot tables,
based on online OLAP services, cube files, relational
database and web services.
BI applications often require specialized propriety client
tools and the process of maintenance and modification is
time-consuming and difficult. The designed BI Web portal
offers the standard user interface to easily create centralized
place for business analytics. The portal is modular (made of
many web parts) and enables up to four data views in
different formats. The main modules are the BI Tree web part
Figure 5. Lift chart for two-week out-of-stock predictions which organizes content using a tree structure, the BI Viewer
web part for creating views on data, and the BI Data Analysis
Table 1 summarizes the predictive accuracies for the web part to further analyze or manage the data displayed in a
five products that were considered in this task. On average, view. Figure 6 shows BI portal with two data views that
the data mining models can predict whether or not a product present two reports for presenting data mining results. The
will be out of stock one week into the future with 98.52% reports are stored in a separate report server and integrated in
accuracy. Predictions on whether or not the product will be the portal using standard XML web service technologies.
out of stock two weeks into the future are, on average,
86.45% accurate.

Page 38 of 478
ICIST 2014 - Vol. 1 Regular papers

ACKNOWLEDGMENT
Research presented in this paper was supported by
Ministry of Science and Technological Development of
Republic of Serbia, Grant III-44010, Title: Intelligent
Systems for Software Product Development and Business
Support based on Models.

REFERENCES
[1] N. Stefanovic, D. Stefanovic, “Supply Chain Business Intelligence
– Technologies, Issues and Trends”, IFIP State of the Art Series;
Lecture Notes in Computer Science; Artificial Intelligence: An
International Perspective, Max Bramer (Ed.), Springer-Verlag,
2009, pp. 217-245.
[2] H. L. Lee, V. Padmanabhan, S.Whank, “The Bullwhip effect in
supply chains”, Sloan Management Review, 1997, 38:93-102.
[3] Sethi P. S, Yan H, Zhang H. Inventory and supply chain
management with forecast updates, Springer Science, 2005.
Figure 6. Business Intelligence Web Portal [4] A. A. Syntetos, J. E. Boylan, S. M. Disney, “Forecasting for
inventory planning: a 50-year review”, Journal of the Operational
The portal can be saved as a template and implemented Research Society, Vol. 60, 2009, pp. 149-160.
with out-of-the-box functionality in many locations. The [5] A.A. Syntetos, K. Nikolopoulos, J. E. Boylan, “Judging the judges
security is enforced through SSL encryption together with the through accuracy-implication metrics: The case of inventory
authentication and authorization. User roles (reader, forecasting”, International Journal of Forecasting, Vol. 26, No. 1,
contributor, and administrator) are also supported. These 2010, pp. 134-143.
security mechanisms are especially important in the context [6] C. Chandra, J. Grabis, “Application of multi-steps forecasting for
of the supply chain where many different companies restraining the bullwhip effect and improving inventory
performance under autoregressive demand”, European Journal of
cooperate. Operational Research, Vol. 166, No. 2, 2005, pp. 337-350.
By implementing the presented BI portal it is possible to [7] W-Y. Liang, C-C. Huang, “Agent-based demand forecast in multi-
deliver data mining results to the right person, any time, via echelon supply chain”, Decision Support Systems, Vol. 42, No. 1,
any browser and in a secure manner. Personalization and 2006, pp. 390-407.
filtering capabilities enable end users to access information [8] J. E. Boylan, A. A. Syntetos, “Spare parts management: a review
relevant to them. All this features allow supply chain partners of forecasting research and extensions”, IMA J Management Math,
to bring more informed decision collaboratively. Vol. 21, No. 3, 2010, pp. 227-237.
[9] Z. S. Hua, B. Zhang, J. Yang, D. S. Tan, “A new approach of
V. CONCLUSION forecasting intermittent demand for spare parts inventories in the
process industries”, Journal of the Operational Research Society,
The goals of modern SCM are to reduce uncertainty and Vol 58, 2007, pp. 52-61.
risks in the supply chain, thereby positively affecting [10] P. K. Bala, “Purchase-driven Classification for Improved
inventory control, planning, replenishment and customer Forecasting in Spare Parts Inventory Replenishment”,
service. All these benefits contribute to increased profitability International Journal of Computer Applications, Vol 10. No. 9,
and competitiveness. 2010, pp. 40-45.
In practice, organizations face many challenges regarding [11] A. Dhond, A. Gupta, V. Vadhavkar, „Data mining techniques for
the inventory management that include uncertainty, data optimizing inventories for electronic commerce“, Sixth ACM
SIGKDD international conference on Knowledge discovery and
isolation, problems with data sharing and local decision data mining, 2000, pp. 480-486.
making.
[12] L. A. Symeonidis, V. Nikolaidou, A. P. Mitkas, „Exploiting Data
In this paper, we propose a unified supply chain Mining Techniques for Improving the Efficiency of a Supply
intelligence model to integrate and consolidate all inventory Chain Management Agent“, IEEE/WIC/ACM international
relevant data and to use business intelligence (BI) tools like conference on Web Intelligence and Intelligent Agent Technology,
data warehousing and data mining, to perform accurate 2006, pp. 23-26.
forecasts and finally to deliver derived knowledge to the [13] B. Larson, “Delivering Business Intelligence with Microsoft SQL
business users via web portal. Server 2012, 3rd Ed”, McGraw Hill, 2012.
An approach with data warehouse enables data extraction [14] N. Stefanovic, D. Stefanovic, B. Radenkovic, “Application of
from different sources and design of integrated data storage Data Mining for Supply Chain Inventory Forecasting”, in
Applications and Innovations in Intelligent Systems XV, Eds.
optimized for analytical tasks such as data mining. Richard Ellis, Tony Allen and Miltos Petridis, Springer London,
The presented out-of-stock prediction model is tested with pp. 175-188, 2008.
real automotive data set and demonstrated excellent accuracy [15] N. Stefanovic, D. Stefanovic, “Supply Chain Performance
for one week and two week forecasting. This information can Measurement System Based on Scorecards and Web Portals”,
be very useful when making inventory planning and Computer Science and Information Systems, Vol. 8, No. 1, 2010,
replenishment decisions which can ultimately result in more pp. 167-192.
sales, decreased costs and improved customer service level.

Page 39 of 478
ICIST 2014 - Vol. 1 Regular papers

Ontology based framework for collaborative


business process assessment
Maroua Hachicha*, Néjib Moalla*, Yacine Ouzrout*
*University of Lyon 2, DISP laboratory. 160, Boulevard de l’université, 69676, Bron, France
{Maroua.Hachicha, Nejib.Moalla, Yacine.Ouzrout}@univ-lyon2.fr

Abstract— To remain competitive and agile, modern A. Business Process


organizations invest several business and IT enablers in
order to supervise their change management initiatives. In In this section, we present the abstraction levels and the
this perspective, research in enhancing business process importance of the analysis of the business process.
management (BPM) capabilities (maturity, risk assessment, 1) Business Process abstraction levels
etc.) presents relevant guidelines in order to adapt IT Johansson [1] defined a business process as a set of
solutions when business requirements evolve. related activities that transform an input to create an
In this context, this research proposes an ontology based output with added values.
framework for business processes assessment. The first
objective of this research covers the definition of a
In order to provide different views on process models,
functional category of ontological concepts related to the
business process model abstraction appeared.
qualification of business process dimensions. Hence, we Scheer [2] developed the method ARchitecture of
identify concepts related to processes lifecycle, process Integrated Information Systems (ARIS) which identifies
maturity, task related risk, etc. The second category covers three levels of abstraction of business process:
the non-functional aspects related to the QoS, event • Requirements Definition: describe only the concepts
management, etc. and business logic, without using any technical
The second objective targets the definition of assessment realization.
model based on the instantiation of the proposed ontology.
• Design Specification: gives more details of the
The application of this approach aims to identify the stark
concepts, and an initial translation to a solution to the third
nodes of business processes and to validate the alignment
level.
with evolutionary business processes requirements.
• Implementation Description: implements the proposed
I. INTRODUCTION solution at the Design Specification level.
In recent years, companies aim to setup flexible and Weske [3] classified business process, based on the
adaptable information systems to support their strategies strategic level of the involved business processes. The first
and business processes. They need to align their business level represents the business strategy, which does not
strategy with IT to facilitate the adaptation of the relate to business processes, but rather means strategies to
organization to the requirements and the changes of the which business processes should be modeled. At the
environment. Thus, Alignment Business/IT is becoming second level, the business strategy transforms into goals.
more and more important, especially when setting inter- The third level depicts organizational business processes
enterprise collaborations. Therefore, many companies start and presents interact with suppliers or customers. The
internally to improve their business process continuously fourth level refines the third level with more details
in different aspects to respond the changes in their specifying activities and their relationships. Finally, the
environment. In this context, we propose an ontology fifth level concerns the implementation of business
based framework for business process assessment. The processes.
aim of this approach is to supervise the evolution of Silver [4] is interested in the Business Process Model
collaborative business processes and improve their quality and Notation (BPMN) and he suggested another
and productivity. classification of business process:
This paper is organized as follows: Section 2 presents a •Descriptive BPMN: shows the business people and
review of business process alignment approaches and structuring the business process.
business process assessment. Section 3 is divided into two •Analytical BPMN: details the descriptive business
parts. In the first part, we define our process analytic process model.
model. The second concerns an ontology based process •Executable BPMN: adds all details in order to execute
assessment model and its application rules. Section four the business process (data, services, messages, human task
concerns a conclusion and a presentation of future works. assignment, etc).
II. RELATED WORKS Dhamen [5] considered that a business process is
generally modeled into 4 views:
Before introducing our proposed assessment model, we
highlight various issues related to the domain of Business -The business level: defines a set of business existing in
Process. Many researchers have attempted to address the company.
these issues by incorporating various models. -The functional level: represents the formalization of
the interactions between different functional participants
in the process.

Page 40 of 478
ICIST 2014 - Vol. 1 Regular papers

- The application level: defines the link between allows companies to increase their performance and to
activities/participants modeled in the functional level, and ensure their sustainability [11].
the applications/services. Processes are assessed at this The problem of alignment appeared in the first time in
level. the late 1970s and since then numerous researches and
- The technical level: is the IT infrastructure that methods, several techniques and tools were proposed to
supports the application. emphasize the alignment concerns. Indeed, the business
2) Business Process Analysis (BPA) and IT performance are tightly coupled and they are
Business Process Management (BPM) consists of considered an inseparable partner whose influence is
managing end to end the business processes of the mutual, and enterprises cannot be competitive if their
company to get a better view [6]. Indeed, it allows business and IT strategies are not aligned [12]. Henderson
companies to analyze, model, execute, control, automate, and Venkatraman [13] proposed the Strategic Alignment
measure and optimize their business activities. Model (SAM). This model is divided into two areas:
Business and Information Technology (IT). These two
Certainly, the objective of BPM is to support and to areas are subdivided also into two domains: external
maintain the lifecycle of business process in the domain (strategy) and internal domain (structure). The
organizations. But, the most important in BPM lifecycle is first domain interests to the strategies reflecting the
the analysis of deployed processes. So, BPA can provide environment of the company. The second domain
organizations with knowledge to understand how their concerns the infrastructure IT and business process. The
processes are currently being performed in order to detect strategic fit describes the relation between the external
gaps between guidelines and actual practices [7]. In domain and internal domain. The functional integration
addition, BPA aims to assess processes and to determinate represents the horizontal relation between two elements of
a track of improvement. In fact, the business processes the same domain [11].
analysis consists in defining, computing and analyzing of
metrics concerning business activities to evaluate their Many contributions emphasize the importance of
performance and thereafter the flexibility of the whole strategic alignment Business-IT, such as [14], [15], [10]
company [8]. and [16].
Many works emphasize the benefits and potential uses Several researches suggested an extension of the model
of ontologies in the field of analysis and assessment of SAM. For example, the model of Javier [10] aims to
business process. Theories of ontology are relevant to support the specificities of the alignment with the strategy
measure the performance of processes, because ontology and with the environment. Other studies proposed new
provides a set of concepts for modeling process and approaches. For instance, the method Alignment
reason about their characteristics. Correction and Evolution Method (ACEM) proposed by
Etien [17] and the method INtentional STrategic
Pedrinaci and Domingue [8] proposed the Metrics ALignment INSTAL proposed by Thevenet et al [18].
Ontology model. It’s about a domain independent
ontology that supports the seamless definition of business Lemrabet et al [19] proposed a model named
metrics, and the corresponding engine which can interpret (Economic, Technological and Methodological) EMT to
and automatically compute these metrics over domain demonstrate that SOA and BPM are complementary
specific data. Pedrinaci et al [9] proposed Core Ontology disciplines that offer both a competitive advantage to
for Business pRocess Analysis (COBRA) to analyze their organizations by allowing them to improve their agility.
business processes by offering a core terminology to These two approaches play an important role in the
business practitioners for mapping the domain specific continuous optimization of business processes, by
knowledge. They considered that the BPA is typically allowing complete implementation of the alignment
structured around three views: the process view, the between business and IT. On the one hand, the BPM
resource view, and the object view. Then, COBRA has discipline provides the ability to implement business
been structured around these views in an attempt to processes centered on customer and aligned with business
enhance BPA. requirements. On the other hand, SOA ensures efficient
and capable infrastructure to respond quickly to changing
To remain competitive and to be more reactive to business processes.
changes imposed by the current market, modern
organizations need also to align their business with the IT. Walsh et al [20] proposed the Translated strategic
In the next sub-section, we will explain the Alignment Alignment Model (TSAM). It is a conceptual and non-
Business/IT. functionalist model, which integrates several streams of
literature. This model may drive toward a critical level of
B. Alignement Business/IT approches alignment that appears as necessary to clear the path
The alignment can be defined as an internal coherence toward competitive advantage.
in the organization between the components of IT field In recent years, there have been several researches
(the computer system architecture) and the components of regarding business models based on ontologies for
business field (competitive strategy and processes of the monitoring the alignment between business and IT.
organization) [10]. In fact, IT is an operational enables the Ghedini and Gostinski [21] proposed a methodological
actors of the company to accomplish their activities that framework using ontologies to understand the effects
are often grouped in the process in order to achieve generated between business and IT purposes and
strategic objectives. Therefore, the need to align thereafter to ensure the alignment between Business and
Business/IT became the top priority of the companies. IT. Similarly, Brocke et al [22] created an Ontology for
They must adapt in internal to the external constraints to Linking Processes and IT infrastructure (OLPIT) to model
remain competitive, reactive and flexible. This alignment the relationship between IT resources and business
processes for the purpose of measuring the impacts of IT

Page 41 of 478
ICIST 2014 - Vol. 1 Regular papers

infrastructure changes on business processes, and vice Santana Tapia introduced in [28] developed the
versa. ICoNOs MM model, which is based on CMMI, for
In the next sub- sect, we are going to discuss the aspects assessing and improving the maturity of business/IT
of Business Process assessment. alignment in collaborative networked organizations.
Guédria et al [29] created the MMEI model which
C. Business Process assessment allows an enterprise to have an idea of the probability it
For evaluate a business process, many aspects can be has to support efficient interoperations and to detect
identified, such as the maturity, the risk… precisely the weaknesses which can be sources of
1) Maturity interoperability problems. The model MMEI defines five
Maturity Models describe the evolution of a specific levels of interoperability maturity (Unprepared, defined,
entity over time [23]. They have been confirmed as an aligned, organized, adapted). Cuencaa et al [30]
important tool for allowing a better understanding of the considered that MMEI is centered in the Interoperability
organization situation and help them to find the best way barriers (conceptual, technological and organizational) and
to change [23]. The goal of a maturity model is to improve the enterprise concerns (business, process, service and
the efficiency of the company by identifying, analyzing data).
and making the process more efficient. The maturity 2) Risk assessment
models provide an assessment tool which compares the According to the AS/NZS ISO 31000 standard, a
process performance to that of best practices established business process risk is the chance of something
[24]. happening that will have a negative impact on the process
The Capability Maturity Model ‘CMM’ was developed objectives, and is measured in terms of likelihood and
at the Software Engineering Institute (SEI) at 1989. CMM consequence [31]. Understanding of the operations of the
introduced the concept of five maturity levels defined by an organization operations is required to completely
cumulative requirements [25]. The aim of CMM is to understand business risk that can reach the material
present sets of recommended practices in a number of key misstatement, and business process level internal controls
process areas that have been given to enhance software intended to address business risks and the risk of material
development and maintenance capability. But, the misstatement [32].
Software CMM has been retired in favor of the CMMI ® Taylor et al [33] proposed a simulation environment
(Capability Maturity Model ® Integration). CMMI is an based on the jBPM Process Definition Language (JPDL)
integrated, extensible framework for improving process workflow language. In this environment, a process model
capability and quality across an organization. It has characterized by some risk information (key risk indicator
become a cornerstone in the implementation of continuous KRI, key performance indicator KPI, and risk event) can
improvement for both industry and governments around be simulated to evaluate the effects of risk events on some
the world [26]. One distinctive feature of CMMI is its use pre-defined KPIs and KRIs.
of levels to measure: the capability of an organization in Kaegi et al [34] simulated a process model modeled by
individual process areas (from capability level 0 to the model BPMN through agent-based modelling
capability level 5, with capability level 5 being best), and technique to analyze business process-related risks.
an overall organizational process maturity rating (from
maturity level 1 to maturity level 5, with maturity level 5 Jallow et al [35] proposed an approach in order to
being best). The CMMI model is organized in a hierarchic analyze risks in business processes. So, this approach
structure: process areas, Generic and specific objectives, identifies a set risk events and their occurrence
and generic and specific practices. probabilities, in order to assess and quantify the
impact/consequences of those risk events on each process
To improve practices of the organization and to achieve and on the overall process.
the desired maturity, CMMI model offers two
representations (staged and continuous). Staged III. CONTRIBUTION
representation defines successive levels of maturity, from
1 to 5, that are associated with process areas. The purpose In the perspective of defining an assessment model for
of this representation is to provide to the organization a business / IT alignment, we aim to propose an analytic
homogeneous roadmap for the implementation of process model as well as a metric model. The application
improvements. The continuous representation is designed of these models is expected to supervise the evolution of
to give maximum flexibility to choose the process areas collaborative processes and decide about their quality.
for improvement which will achieve the business The evaluation doesn’t cover the ROI objective.
objectives of the organization.
A. Process analytic model
Many international standards have been created to treat
the issues of the quality of management: The analysis of any process lifecycle allows to identify
the following steps:
- ISO/IEC 15288 and ISO/IEC 12207:
international standard on system life cycle processes.  The specification: the stage where we answer to the
- ISO/IEC 15504: an international standard that strategic and business objectives
defines the requirements for performing process  The adaptation: the stage where we define what is
assessments. possible to implement
Another model, which has been developed by Luftman  The use: the stage where process is used
[27], can provide the organization with a roadmap that  The optimization: the stage where the process is re-
identifies opportunities for improving the harmonious engineered
relationship between business and IT.

Page 42 of 478
ICIST 2014 - Vol. 1 Regular papers

 The dissemination: the stage where the process between process lifecycle stages and process abstraction
doesn’t answer to any business / or strategic level is resumed in the following table (Table 1). Only the
objective and we should freeze it for revision. most relevant projections between stages and level are
The following figure (Figure 1) illustrates the considered.
connection between these stages and the possible
TABLE I.
evolution schemas. PROCESS STAGES AND VIEWPOINTS

Specifi Adapta Optimi Dissemina-


Use
cation tion zation tion
Business X X X X
Functional X X X
Specification Applicative X X

In the next sub-section, we propose an evaluation


model at the applicative level (implementation) with
aggregation mechanisms until business level assessment.
B. Process assessment model
Adaptation At the applicative side, it is useful to measure and
evaluate the quality of deployed processes. At this level,
we characterize service tasks through two sets of
concepts:
Functional concepts: related to the running
environment of each service task. It’s about quantitative
indicators characterizing data input/output, assigned
Optimization Use organization role, and the implementation type of the task
(i.e. service task, user task, etc.).
Non-functional concepts: related to the appreciation of
contextual and qualitative concepts like maturity,
availability, risk level and use frequency.
Both, functional and non-functional indicators are
defined from the instance tracking data provided from the
Dissemination server level. Several execution environments (i.e. Oracle
BPM/SOA server suite, etc.) provide structured data in
databases (not just log files) in order to facilitate the
exploitation of process tracking data and related payloads.
In order to facilitate the exploitation of collected
information, we propose a first level of aggregation
mechanism allowing to define a common value for the
assessment of each applicative task. We are building a
notation model for each measurement concept. In fact, the
Figure 1: Process Lifecycle definition of implementation type quotation level is more
important when task is implemented as service task. From
From the other hand, the concepts of enterprise the non-functional side, the risk gravity is related to failure
architecture provide several decompositions of rate and task-use frequency.
viewpoints. We identify in our work: For the aggregation at the applicative process level, we
• The business viewpoints: formalizes the validated apply a serialization mechanism for process tasks in order
vision of the collaboration objectives at the business to solve parallel branches by maximum value rule. After
level. that, we assume the following composition rules:
• The functional viewpoints: defines the necessary • Each applicative task is a sub-class of a
details in order to ensure the feasibility of the business functional task
process. • Each functional task is a sub-class of a business
• The applicative viewpoints: concerns the task
implementation of the functional level. We illustrate, by the following ontological model
• The technical viewpoints: concerns the definition of (figure 2), the connection between all proposed concepts.
infrastructure facilities to support the applicative In order to provide a relevant interpretation model at
processes. This abstraction level is not covered at our the business process level, we are developing a
research level. We consider only the execution classification model through a supervised learning
environment from the geographic location (on process.
promises / distributed or shared / in the cloud).
In the development of our processes, we are choosing
BPMN as modeling language. The correspondence

Page 43 of 478
ICIST 2014 - Vol. 1 Regular papers

Figure 2: Ontological model for process assessment

Table 2: example of performance quotation

Average of
New
Periodicity: 1 month Concept details previous Performance
values
values
Maturity CMMI 3 3 100%
Number of successful calls / total
Availability 0,93 0,83 93%
numbers of calls
Nonfunctional Use frequency Number of calls (Instances) 30 36 83%
Gravity: 1 to 16=> service failure event,
connected error events, execution
Risk Level 8 12 50%
location (premises, distributes, cloud)
Use frequency
User or Manuel Interface: weight = 1
Implementation type 3 3 100%
Service: weight = 3
T-input Number of parameters 3 4 100%
Functional
T-output Number of parameters 1 1 100%
Internal: weight = 1
Role 1 1 100%
External: weight = 3
ATask 91%

The exploitation of proposed classification model aims At the business level, the results generated from the
to define the pathway of possible process reengineering proposed assessment model are compared with results
actions. The Figure 1 provides the potential evolution from the evaluation of key business performance
steps starting form the process USE stage. indicators. The sensibility of our approach allows to point
We propose in the table 2 an example of performance directly the bottlenecks of collaborative business
calculation results at an application task level. The most processes at the task level.
impacting concept in our assessment model is related to
the risk aspect. The notation aspect is subject to revision IV. CONSLUSIONS AND PERSPECTIVES
until the maturity of the hall assessment process. We proposed in this paper an ontology based for
When the decision classification is based on learning process assessment. The process analytic model identifies
process, the same metrics doesn’t provide always the same process lifecycle stages as well as common viewpoints.
decision. The appropriation of the results at the industrial The metric model measure performance level at each
side needs additional development efforts. applicative task. The business performance metric is
aggregated at the business level in order to estimate the

Page 44 of 478
ICIST 2014 - Vol. 1 Regular papers

quality of collaborative business processes. The validation des Systèmes d'Information (ISI), Revue Ingénierie des Systèmes
of the proposed models is based on 6 collaborative d’Information Special Issue on Information System Evolution., Hermès,
pp17-37, 6:2009.
business processes for customers and suppliers
relationship management. [19] Youness Lemrabet, Nordine Benkeltoum, Michel Bigand, David
Clin, Jean-Pierre Bourey, 9e Congrès International de Génie
Our future work concerns the classification of metrics Industriel, Québec, CANADA, 2011.
and their associations to processes’ events. By more [20] Isabelle Walsh, Alexandre Renaud, Michel Kalika, 2013, The
tracking data, we aim to refine further the learning and Translated Strategic Alignment Model: A Practice-Based Perspective,
classification processes. As results, we expect to propose Systèmes d'information & management 2013/2 (Volume 18), Pages :
event impact models and define more evaluation nodes 172.
(functional and non-functional) and new concepts related [21] C. Ghedini, R. Gostinski, A methodological framework for
business-IT alignment, in: Proceedings of the Third IEEE/IFIP
to the temporal aspect in order to monitor the evolution of International Workshop on Business-Driven IT Management (BDIM),
the business process. The improvement of the existing 2008, pp. 1–10.
ones could enhance the stability of our decision system. [22] Jan vom Brocke, Alessio Maria Braccini, Christian Sonnenberg,
Paolo Spagnoletti, Living IT infrastructures — An ontology-based
ACKNOWLEDGMENT approach to aligning IT infrastructure capacity and business needs,
This work was funded by the European projects International Journal of Accounting Information Systems (2013),
ACCINF-00318; No of Pages 29.
FITMAN (FP7-604674) and EASY-IMP (FP7-609078).
[23] L. Cuenca, A. Boza, A. Ortiz An enterprise engineering approach
for the alignment of business and information technology strategy
REFERENCES International Journal of Computer Integrated Manufacturing, 24 (1)
[1] Johansson H (1993) Business process reengineering: breakpoint (2011), pp. 974–992
strategies for market dominance. Wiley, New York. <http://books. [24] Nydia Gonzalez Ramirez, Contribution à l’amélioration des
google.fr/books?id=wA7tAAAAMAAJ>. processus à travers la mesure de la maturité de projet : application à
[2] A.-W. Scheer, ARIS—Business Process Modeling (3rd ed.) Springer l’automobile, école centrale des arts et manufactures, 2009
(2000). [25] M.C. Paulk, “A History of the Capability Maturity Model for
[3] M. Weske, Business Process Management: Concepts, Languages, Software,” ASQ Software Quality Professional, Vol. 12, No. 1,
Architectures, Springer-Verlag pp 17-18(2007) December 2009, pp. 5-19.
[4] B. Silver, BPMN: Method and Style, Cody-Cassidy Press, Aptos, [26] Dennis M. Ahern; Aaron Clouse; Richard Turner, CMMI®
CA, USA (2009). Distilled: A Practical Introduction to Integrated Process Improvement,
[5] Karim DAHMAN, Gouvernance et étude de l’impact du changement Third Edition, Addison Wesley Professional (2008).
des processus métiers sur les architectures orientées services : une [27] Jerry Luftman, Assessing business-it alignment maturity, School of
approche dirigée par les modèles, Université de Lorraine, 2012. Management, Stevens Institute of Technology (2000).
[6]Alexandrer SAMARIN, Business Process Management-concepts de [28] Santana Tapia, R.G. (2006), IT Process Architectures for
Base, 2008. Enterprises Development: A Survey from a Maturity Model Perspective
[7] Weske, M., van der Aalst, W.M.P., Verbeek, H.M.V. (Eds.): [29] Wided Guédria, David Chen, and Yannick Naudet, A Maturity
Advances in Business Process Management. Special Issue Data and Model for Enterprise Interoperability, 2011.
Knowledge Engineering, Volume 50, Issue 1. Elsevier: 2004. [30] Llanos Cuencaa, Andrés Bozaa, M.M.E. Alemanya, Jos J.M.
[8] Pedrinaci, Carlos and Domingue, John (2009). Ontology-based Trienekens, Structural elements of coordination mechanisms in
metrics computation for business process analysis. In: 4th International collaborative planning processes and their assessment through maturity
Workshop on Semantic Business Process Management (SBPM 2009), models: Application to a ceramic tile company, Computers in Industry
Workshop at ESWC 2009, 1 June 2009, Crete, Greece. Volume 64, Issue 8, October 2013, Pages 898–911.
[9] Pedrinaci, C., Domingue, J. and Medeiros, A. (2008) A Core [31] Standards Australia and Standards New Zealand (2009) Risk
Ontology for Business Process Analysis, 5th European Semantic Web Management: Principles and Guidelines, third edition (AS/NZS ISO
Conference 2008, Tenerife, Spain, eds. Sean Bechhofer, Manfred 31000:2009), Sydney, Australia, Wellington, New Zealand.
Hauswirth, Joerg Hoffmann, Manolis Koubarakis. [32] Carla Carnaghan, Business process modeling approaches in the
[10] Oscar Javier AVILA CIFUENTES, Contribution à l’Alignement context of process level audit risk assessment: An analysis and
Complet des Systèmes d’Information Techniques, Université de comparison, International Journal of Accounting Information Systems7
Strasbourg, 2009. (2006) 170/204.
[11] Jean-Stéphane ULMER, Approche générique pour la modélisation [33] P. Taylor, J.J. Godino, B. Majeed, Use of fuzzy reasoning in the
et l'implémentation des processus, Université de Toulouse, 2011. simulation of risk events in business processes, Intelligent Systems
[12] Lerina Aversano, Carmine Grasso, Maria Tortorella, (2013) A Research Centre, 2008.
Literature Review of Business/IT Alignment Strategies, Enterprise [34] Kaegi, M., R. Mock, R. Ziegler, and R. Nibali (2006) “Information
Information Systems, Volume 141, 2013, pp 471-488. Systems’ Risk Analysis by Agent-based Modelling of Business
[13] Henderson, J. C. et Venkatraman, N. (1993) Strategic alignment: Processes” in Soares, C., and E. Zio (eds.), Proceedings of the
leveraging information technology for transforming organizations. IBM Seventeenth European Safety and Reliability Conference (ESREL’06),
Systems Journal, 32(1): 4-17. London, UK: Taylor & Francis Group, pp. 2277–2284.
[14] Broadbent, M. & Weill, P. (1993) Improving business and [35] Jallow, A., B. Majeed, K. Vergidis, A. Tiwari, and R. Roy (2007)
information strategy alignment: Learning from the banking industry, “Operational Risk Analysis in Business Processes”, BT Technology
IBM Systems Journal, Vol 32 No 1, pp 162-179. Journal, (25)1, pp. 168–177.
[15] Luftman, J., Papp, R. Brier, T. (1999) “Enablers and Inhibitors of
Business-IT Alignment,” Communications of the Association for
Information Systems, (1) 11.
[16] Cuenca Ll., Ortiz A., Boza A., 2010. Business and IS/IT strategic
alignment framework. Emerging Trends in Technological Innovation.
Doctoral Conference on Computing, Electrical and Industrial Systems.
IFIP, AICT 314 2010. pp. 24–31.Springer.
[17] Etien A., 2006, La méthode ACEM pour l’alignement d’un
système d’information aux processus d’entreprise, L’université Paris I.
[18] L. H. Thevenet, C. Rolland, C. Salinesi, Alignement de la stratégie
et de l’organisation : Présentation de la méthode INSTAL, Ingénierie

Page 45 of 478
ICIST 2014 - Vol. 1 Regular papers

SolarEnergo – New way to bring renewable


energy closer
Dr. Matej Gomboši
Municipality of Beltinci, Slovenia
[email protected]

Abstract— In recent years people are increasingly opting for automated calculation of the solar potential (daily solar
energy self-sufficiency of their residential and commercial radiation) over the surface of the selected roof. These
buildings. The application of solar energy is one of the most solutions give the users a detailed overview of the
attractive options, since the installation of photo-voltaic received solar potential on the roof surface throughout the
systems is fairly easily. The question is often on which parts year as well as an overview of the cost of installation of
of the roof is best to put such systems to achieve the most PV systems from different manufacturers.
optimal use. New public e-services SolarEnergo is Through e-services a municipality may give to its
constructed and visually shows how much the roof is citizens, tourists and investors a variety of information and
suitable for the installation of photo-voltaic systems and services. In this way, municipalities can adapt to the trend
gives detailed calculation on the amount of solar energy
of increasing communication via modern information
throughout the year. The calculation takes into account also
technologies. We can find many good practices in the use
shading from other buildings, terrain within a radius of 15
of e-services for citizens in some EU countries. Even the
km and high vegetation. The project has successfully
Slovenian e-government portal “E-Uprava” is a fine
acquired funds from the Ministry of Education, Science and
Education in Slovenia and European Regional Development
example of a national level e-service for citizens.
Fund for innovative IT solutions. For the development In the current e-service calculations of the solar
Municipality Beltinci is cooperating with the Faculty of potential, which are based on an analysis of several years
Electrical Engineering and Computer Science, where the of meteorological data of solar radiation and shading
experts have the necessary knowledge and experience to simulation of classified LiDAR data (Light Detection and
develop advanced solutions in the field of geographic Ranging) are used. They also take into account the roof's
information systems (GIS). The accuracy of topology topographical features (orientation and slope). The results
information and information about high vegetation is of the calculations are stored on the server part of the
achieved using data from airborne laser scan technology geographic information system (GIS), and are accessed by
LiDAR (Light Detection and Ranging), which scans terrain the application on the client side.
and buildings at a resolution of 20 points/m2. For the LiDAR technology consists of a laser transmitter that
development of the web and mobile e-service open source emits laser pulses to determine the position of objects on
platforms in the field of GIS are used. Thus, we have shown the surface [1]. The result of LiDAR air scanning is a set
that even with less expensive technology, reliable and of unstructured 3D points called point cloud.
advanced applications can be developed. The result is a new
web and mobile e-service SolarEnergo that allows citizens to Geographic information system enables us to manage
easily select the object also with the help of smart Android the geo-referenced data which are given in the form of
phones and GPS positioning. With this, Municipality geographical coordinates in a specific geographic
Beltinci offers a new e-service, which can provide, directly coordinate system (e.g. GPS). GIS presented in this article
in the field or from the office, all the necessary information consists of a database, server applications and applications
to decide on the use of solar energy. In the next step on the client side. The database is of crucial importance in
SolarEnergo can be easily upgraded and extended to other the given system. Shape format (SHP) is used in
municipalities or also to whole countries. geographical data. Open source platforms in the field of
GIS are used for development of this e-service. Thus, we
have shown that even with less expensive technology, a
I. INTRODUCTION reliable and advanced e-service on the Internet and smart
Solar energy is today an indispensable source of phones can be developed.
renewable energy, converted into electricity by photo-
voltaic systems (PV). Therefore, the use of such systems II. THE SUN POTENTIAL AND GIS APPLICATIONS
for different purposes is increasing. In most cases, the The sun potential is calculated using the method
systems are installed on the roofs of buildings. In doing presented in [2], where multi-resolution shading, long-
so, the problem is to provide an optimal installation of PV term daily radiation measurements and vegetation
systems concerning the received solar radiation on the shading is taken into account. LiDAR data is first put into
surface of the roof, since it is necessary to take into a grid of 1m2 cells. Space and time dependent shading also
account the various factors which affect the strength of the depends on the accurate position of the sun, which is
received solar radiation. Such measurements are carried calculated using SolPos algorithm [3].
out by experts in this field, but they are very expensive Geoserver [4] is an open-source server platform written
and much less accurate. Thus, in the last decade solutions in Java, which has excellent support for WMS and WFS
were developed, which allow the user more precise and standards according to OGC specification [5]. PostGIS [6]

Page 46 of 478
ICIST 2014 - Vol. 1 Regular papers

is responsible for the data part and enables support for


geographical objects in the database, which is managed by
PostgreSQL [7]. These technologies are used on the server
side of the presented e-service. Land cadaster data of
Municipality Beltinci together with the sun potential are
imported into database. Gradient color scale, at which
cells are painted, consists of 6 colors (black, blue, cyan,
green, yellow and red). It is constructed in a way where
we first find the lowest and the highest value of the
calculated potential of solar cells from LiDAR data. Then,
we determine the initial and final color. Other values that
are 20% of the difference to the highest and lowest values
are interpolated (see Fig. 1).
Figure 3. Example of a graph showing sun potential of a building.

and Rgraph technologies. A PDF report is also generated


for each building. In addition to the basic graph about sun
potential, it also includes graphs of the potential of
amorphous and polycrystalline silicon PV module type.
Figure 1. Example of color interpolation from 0,28 to 0,93 with Graphs are displayed as a stacked bar chart, where the
step 0,13 (20%). lower value is the sum of the cells above the average solar
potential value, while the upper part is the difference
In the next step we create a uniformly distributed between the total and above-average solar potential.
network of 1m2 cells over the outlines of the buildings. Today's mobile devices are increasingly powerful in
The process is shown in Fig. 2. hardware as well as software level. That is why we also
developed a mobile SolarEnergo application supporting
Android operating system. The application has the same
functionality as a web application, with a streamlined
search ability using the GPS technology (Global
Positioning System Briefings).
The mobile application works on Android OS version
2.3.3 and newer. Because of smaller displays it is
Figure 2. Example of color grid creation (left: building outline; pointless to show the solar potential map and information
middle: unevenly distributed cells; right: uniformly distributed about the selected building at the same time. This is why
cells). we divided the screen into three tabs using the "TabView"
For each cell, we write the value of the solar potential widget. To see the map in the first tab, the widget
and FID (FeatureID) of the building outline in the file, "WebView" is used, through which the page is loaded.
where each record represents a single cell. New attributes This page has been simplified for use on mobile devices.
that represent the sun potential and the potential of solar Using the Application Program Interface "JSInterface",
PV are added to the building outlines. Created files are which is offered by the Android platform, we connect the
imported into the database on the server side, which is mobile and the web applications. In this way, we can call
accessed by web application through servlets. These from JavaScript, methods implemented in our mobile
servlets serve as an intermediary (proxy) to connect to apps, which display data in the "Data" tab.
other services on the server. Servlets are used, for The mobile application enables us to find the nearest
example, for obtaining information about the selected building using the current GPS coordinates. Of course, we
building. Example can be seen in Fig. 3. support the search of the building through the address
The SolarEnergo e-service is composed of a number of entry box, which has the same functionality as the web
applications that are needed for the accurate operation. application.
The web application is implemented using OpenLayers Fig. 4 shows different layouts. Searching by address is
library that makes it easy to work with standards like WFS shown on the left. The middle one shows a map that is the
and WMS [8]. To display images and orthophoto same as in the web application. The right side shows the
geometry WMS objects a request is sent, which returns “Data” tab with all the information regarding sun and PV
raster data. The application allows us to select and view potential for the whole year.
the details of each object by clicking on it, to which a
server sends WFS request, which returns the result in the
III. RESULTS
form of a GML document (Geometry Markup Language).
This is converted into the OpenLayers list of geometric
objects that contain all the information to describe the In cooperation with the Faculty of Electrical
object (in the case of building outlines we obtain the Engineering and Computer Science at the University of
address and street number). The application also enables Maribor in Slovenia a new SolarEnergo e-service has
search by address. As you type, valid addresses appear been developed to allow citizens a graphical overview of
that match the text you entered. Selecting the result on the
map shows details on the solar potential for the requested the solar potential of buildings in local environment.
building. The web application uses JQuery, OpenLayers Using LiDAR data of the municipality, calculation of

Page 47 of 478
ICIST 2014 - Vol. 1 Regular papers

Figure 5. Example of sun potential for the same location as a


WMS map (left) and as model of LiDAR data (right).

lower part shows the scale and the gradient scale of the
solar potential. In the upper right corner there is the text
Figure 4. Mobile application on Android. Search (left), map
(middle), building data (right).
box into which the user enters the desired address for
which to display information about the solar radiation and
solar radiation and the PV potential of each building is the PV potential. Basic mapping functions, such as zoom
done. The calculation method takes into account also data and pan are available. The Fig. 6 also shows that the roofs
about direct and diffuse half-hour solar radiation over the facing south, have higher expected solar potential. Fig. 7
last 10 years. Data from the nearest weather stations in shows the solar potential of known buildings in Beltinci.
the town of Rakičan was used. Fig. 5 shows a model of The end result is therefore a new SolarEnergo e-
sun potential using LiDAR data, which is then service for citizens, which makes it easy to select the
transformed into SHP format. building on a map with the help of Android smart phones
When you click on the building, characteristics such and GPS. At the same time it is also possible to calculate
as: address, street, house number, solar potential the sun potential for the municipality as a whole, which
throughout the year, the potential of PV in a year for two shows the amount of energy that can be obtained on the
different materials (amorphous and polycrystalline whole territory and how it can be converted into
silicon) are displayed. The results, shown in the graph, electricity. Municipality Beltinci thus offers a new e-
are the solar potential for each month for the entire roof service, which enables everybody to directly, on the spot
surface. To see the graph, a web browser with support for or in the office, get all the necessary information to
HTML5 technology is needed. The user has the decide on the usage of solar energy. SolarEnergo e-
possibility to view the PDF report, using the link located service can easily be upgraded and extended to other
in the lower left corner of the window. When using the municipalities or also to the whole country.
search by address, the view moves to the center of the
selected building.
Fig. 6 shows the interface of the web application. The

Figure 6. Final look of the web application.

Page 48 of 478
ICIST 2014 - Vol. 1 Regular papers

of necessary knowledge and experience to develop


similar solutions in the field of geographic information
systems (GIS). The system consists of free open source
tools. The web application is upgraded with the mobile
version, which uses GPS coordinates when selecting the
nearest building to our position in the field. In this way
we offer our citizens the possibility of increasingly
widespread use of e-services on smart phones and their
technical abilities.
Presented SolarEnergo e-service is a new step in
services which a municipality can offer to its citizens. It
can be easily transferred to other municipalities or to the
whole country. It is an e-service for all the citizens with a
great added value in usability, which is what every
municipality tries to give to their citizens.

ACKNOWLEDGMENT
Figure 7. Sun potential of town hall (top-left), castle (top-right),
primary school (bottom-left) and kindergarten (botton-right). We thank the Ministry of Education, Science and Sport
in Slovenia and European Regional Development Fund
for financial support through the tender, which enabled the
IV. CONCLUSIONS
development of this innovative SolarEnergo e-service. We
are also grateful to the Surveying and Mapping Authority
In this paper we described the development and in Slovenia for all the GIS data and to the Slovenian
deployment of SolarEnergo e-service that allows users to Environment Agency for solar radiation measurements.
visually view the solar potential of buildings in local
environment. The solution provides an innovative tool for REFERENCES
municipalities, which allows each citizen or investor to [1] Airborne and spaceborne laser profilers and scanners. Petrie G.,
get all the necessary information concerning the Toth CK. In: Shan J, Toth CK, editors. Topographic laser ranging
and scanning: principles and processing. Boca Raton: CRC Press;
investment of photovoltaic on the roofs of their buildings, 2008. str.
at no cost. SolarEnergo e-service visually and [2] Rating of roof's surfaces regarding their solar potential and
numerically shows how much a roof is suitable for the suitability for PV systems, based on LiDAR data. Lukač N., Žlaus
installation of photovoltaic systems and gives a detailed D., Seme S., Žalik B., Štumberger G., Applied Energy 2013.
calculation of the amount of solar energy throughout the [3] Solar position algorithm for solar radiation applications. Reda, I.,
Afshin, A., 2004. Solar energy 76(5).
year. The calculation also takes into account shading
[4] GeoServer. http://geoserver.org/display/GEOS/Welcome.
from neighboring buildings, terrain within a radius of 15
[5] Open Geospatial Consortium.
km and high vegetation. The project has successfully http://www.opengeospatial.org/standards.
acquired a financial grant from the Ministry of Education, [6] PostGIS - Spatial and Geographic objects for PostgreSQL.
Science and Education in Slovenia for the development http://postgis.net/.
of innovative IT solutions. For the development of [7] http://www.postgresql.org/, PostgreSQL Database, obiskano
SolarEnergo, Municipality Beltinci is cooperating with 12.04.2013
the Faculty of Electrical Engineering and Computer [8] OpenLayers. http://openlayers.org/.. Eason, B. Noble, and I. N.
Sneddon, “On certain integrals of Lipschitz-Hankel type involving
Science in Maribor, where the Laboratory of Geometric products of Bessel functions,” Phil. Trans. Roy. Soc. London, vol.
Modeling and Multimedia Algorithms has a long history A247, pp. 529–551, April 1955.

Page 49 of 478
ICIST 2014 - Vol. 1 Regular papers

Development of distributed hydro-information


system for the Drina river basin
Vladimir Milivojević*, Nikola Milivojević*, Milan Stojković*, Vukašin Ćirović*, Dejan Divac*
* Jaroslav Ĉerni Institute, Belgrade, Serbia

[email protected], [email protected], [email protected],


[email protected], [email protected]

Abstract — Hydro information systems in Serbia are in etc. It is a common practice to realize these measurement
constant development from the early 80’s. Considerable systems simultaneously, but in separate information
advancements have been made in application of modern systems, what makes the use of the obtained measurement
methodologies and technologies, with practical use in hydro data as inputs into simulation models much more
power and water resources management. In order to complicated.
provide support to management of complex hydropower Like it was already demonstrated in the paper Divac et
systems and their digital simulations it is necessary to al. [2], collected data are often used as input data for
establish communication between measurement systems and simulation models; each of the models has its own data
computational models, although they were not primarily definition methods and requires data from diverse
designed on the concepts of mutual integration and possible
measurement systems (including forecasted values).
reusability. This paper presents a solution for distributed
Simulation models are used for planning, operational
hydro-meteorological and hydropower data acquisition,
management and optimization and it is of highest
simulation and design support that is based on service-
oriented architecture (SOA) principles. The complex and
importance that the data is available and accurate.
heterogeneous information flow used by these systems is The concept of a hydro-information system has been
portioned into separate data and computational functions recently developed and it connects measurement systems
that use common open protocols for mutual communication. and simulation models for the purpose of water resources
Based on these techniques, the paper presents the design of planning and management. Although hydro-information
SOA hydro-information system and its application in case of systems are not necessarily designed on the service-
Drina river basin management. oriented principles, every day practice demonstrates that it
is highly desirable for the software solution of such a
system to be defined as an open service-oriented
I. INTRODUCTION architecture, due to the diversity of measurement systems
The contemporary approach to water resources and simulation models. Such an approach would simplify
management requires the formation of high-speed the complexity of a solution and improve the re-usability
computational systems that could provide support to short- of the software, along with the possibility of a dynamic
and long-term planning of water resources exploitation; integration of different components and simplification of
they would be based on measured data reflecting the state their maintenance and development.
of a system (meteorological and hydrological values, This paper presents a solution to the distributed
water consumption etc.) and the existing mathematical acquisition of hydro-meteorological and hydropower data,
models (flow in river channels, transformation of rainfall based on the principles of the service-oriented architecture
into runoff, etc.). The comprehensive discussion on this (SOA), which is at the same time open to different
subject is presented in the paper Divac et al.[1]. simulation models and future components. This solution is
The support to management and simulation of complex used in various hydro-information systems, and here will
hydropower systems demands connectivity and interaction be presented on case of Drina hydro-information system
of measurement systems and computational hydrological as most complex river basin, belonging to Serbia,
models, both of which are not designed primarily with the Montenegro and Bosnia and Herzegovina.
goal of integration and reusability in mind.
For the time being, automatic measurement systems and II. HYDRO-INFORMATION SYSTEMS
remote objects management within hydropower systems It can be stated that the strategic goal of implementation
are mostly carried out in the form of the SCADA and application of distributed hydro-information systems
(Supervisory Control And Data Acquisition) systems that (HIS for short) is the creation of conditions that can help
were previously built on a monolithic concept, i.e., each of the optimum management of water resources, as well as
the systems was independent from the other ones and the solving of the existing and potential conflicts within
there was no communication between them. Along with the particular catchment or in the particular region in
the development of SCADA systems, the distributed relation to the conflict of interests or development projects
systems have been created that operate on multiple existing in different countries, local communities, firms
workstations that are grouped together into a local and other legal or physical bodies.
network – the LAN network. Along with automatic Because of the complexity of water resources
measurements covered by the SCADA systems, and those exploitation, both of natural and artificial origins, hydro-
are mostly related to hydropower and hydraulic values information systems have been designed as platforms with
that change rapidly, there are also the ancillary broad extension possibilities, whose structure is frequently
measurement systems applied for the slow-changing modified as a result of changes in the mode of
values, like certain meteorological or hydrological values, exploitation. Regardless of a HIS system or the goals of its

Page 50 of 478
ICIST 2014 - Vol. 1 Regular papers

development, the most important part of such a system is service, which is platform- or program language-
its hydrologic component, because the natural water flow independent. The service-oriented architecture is also
is the most important factor in water resources independent from any development platform (such as
management. For this reason, the two terms mentioned .NET or Java). This enables software components to
above are usually used as synonyms, but it has to be stated become extremely autonomous, because the service-
that a hydrologic information system is only a component providing logic is completely detached from its
that is common to the most of HIS-systems and generally development environment.
has a monolithic structure; while, on the other hand, The SOA architecture usually indicates the use of web
hydro-information systems represent potentially broader services, within the Internet, as well as within an Intranet
systems that include electricity generation, irrigation, environment. The standards that form the base of web
water supply and other artificial activities within a system, services are XML, HTTP (or HTTPS), SOAP, WSDL and
as well. The use of complex hydrological models is UDDI. It is necessary to stress it again that SOA can be
common in hydro-information systems, as is in the case of based on any other accepted standard.
HIS for Drina river basin.
Finally, SOA is by no means visible to its consumers,
The role of the hydrological model in hydro- and, furthermore, it does not even exist if there are no
information systems is to provide a reliable assessment of consumers who make use of services. The greatest
the planning and development of water management and advantages of the SOA architecture are its flexibility and
energy and agricultural sectors. Based on these objectives, extensibility, as well as the possibility of the operation of
hydrological tools have been developed that transform the different programmers’ profiles on the development of
meteorological variables into the runoff from the actual solutions; in addition, there is also the usability of
catchment. Forecasted runoff is then used to manage the the existing heterogeneous resources instead of the
catchment, through water facilities, such as accumulation, continuous investment into the redesigning of the system.
gates, etc. By using various optimization algorithms with SOA can be rather labelled as an evolution in the
hydrological model, it is also possible to provide decision architecture than as a revolution, because it represents the
support to management. implementation of the best experiences from different
Usually, hydrological models are designed as empirical, technologies that have been used so far.
but with advances in numerical procedures and computer The basic structural components of a service-oriented
hardware, there are more and more efficient deterministic HIS are shown in Fig. 1. The solutions that are based on
models in use. The advantage in the use of deterministic SOA are in fact scalable, secure, easy to monitor,
hydrological models is that, due to the number of input interoperable and independent from any particular
parameters, these models are more accurate than empirical operating system. Such an approach to development
model, provided that the parameters are determined based conceals from the programmer and the user the
on the actual characteristics of the basin. In addition, the complexity of the process of overcoming the
advantage is reflected in higher temporal and spatial incompatibility between the components used; at the same
discretizations, that are limited only by hardware. Today, time, it detaches a certain processing logic from a certain
the hydro-information systems use distributed client application. In other words, different users in the
hydrological models such as WaterWare [3], MIKE SHE form of browsers, desktop applications, like Matlab,
[4], and Ribasim [5]. Hydrological model WEAP21 [6] Excel, or ArcGis, or other user components, can access
combines the requirements of users in the basin and the the single data source in the same way.
hydrological modelling process, where there is a According to their purpose, the presented services can
possibility of adding a system of reservoirs and canals.
be grouped in the following way:
There is also an example of complex water management
information systems that use hydrological model [7].  Basic services – controlling and monitoring of a server
In HIS for Drina river basin, a deterministic model with that represents an interface to the already existing
disaggregated parameters is used, where the parameters systems, databases, applications etc. that are being
are estimated on the basis of soil and hydro geological connected to a HIS. To this group belong also the
composition, as well as vegetative cover. updating services, replication services etc.
 Services for access to external resources – they make
III. SERVICE-ORIENTED ARCHITECTURE possible the access to different external sources that
dispose over hydrologic data. They are used for
In software-related terminology, service-oriented browsing and data acquiring, as well as for the periodic
architecture (SOA for short), denotes a method of regeneration of metadata storages.
software development that can fulfil user requests by
supplying services in form of standardized services.  Services for access to measurement devices – represent
Within the SOA architecture, each party offers its a way of communication between separate devices, i.e.,
resources by means of services that can be accessed in a sensors that can be directly integrated into a HIS.
standardized way[8]. Most often this architecture relies  Data filtration services – represent the necessary
upon the implementation of Web services (using SOAP or component of the processing of raw measurement data
REST), although the implementation is also possible by from the sensors, because such data is usually afflicted
means of any other standard[9]. by errors, inaccuracies and periodic communication and
Unlike the traditional architectures, SOA consists of working process breakdowns[10].
weakly integrated components that, however, display a  Services for access to resources – perform collecting,
high degree of interoperability in supplying services. This browsing, updating and acquiring of data that is already
interoperability is based on the formal definition of a on HIS servers.

Page 51 of 478
ICIST 2014 - Vol. 1 Regular papers

Figure 1 Structural components of a service-oriented HIS

 Application services – represent a large group of information server. The HIS services catalogue is also to
services that support the manipulation of objects of a be found on the central hydro-information server. The
higher rank, like a virtual catchment. These services central server has more functional sections: data layer,
include: compilation of lower-rank objects into the online services and offline tools that are necessary for the
virtual catchment object, reliability analysis and filling- proper functioning of the system.
in of missing data in time series, data transformation and The data layer of the central server represents a
the synchronization of services’ operation. complex functional entity that has a role to control the
archiving of data and to coordinate users’ requests related
 Services for authentication and authorization – that are
to the acquiring of accessible data from any server within
connected to the process of user application and
a HIS. In order to fulfil these tasks the data layer of the
assessment of data access privileges.
central server unites the following elements: the central
Within the described architecture, the services are database, the metadata database, the service catalogue, the
grouped together into the following sections according to data transfer service, and the data replication service.
the logical hardware implementation:
 Central server, B. Acquisition server
 Acquisition server, The role of the acquisition server within a hydro-
information system is to collect and process data on the
 Application server and
measurements within the physical system. An acquisition
 Specialized HIS applications. server consists of the following parts: data acquisition
layer, data processing and validating service, data transfer
A. Central server service and a separate service for monitoring processes
The central server has a role to coordinate, distribute that take place on the server. The data that the server
and synchronize data storage, as well as to control access acquires can originate from automated measurement
to HIS data and services. The main function of the central systems, other information systems or they can be
hydro-information server is to put together all the relevant manually entered through specific services. The processed
data and store them into the central database, as needed. and verified data is sent by the acquisition server to the
Descriptions and references of all relevant data are to be central server, and a limited set of the data can be acquired
found in the metadata database, on the central hydro-

Page 52 of 478
ICIST 2014 - Vol. 1 Regular papers

by the acquisition server from the central server in order to architectures are the following: client/server architecture,
perform the procedure of validation and data processing. three-layer and multi-layer architecture, clusters, grid,
In order to secure the full functionality, the acquisition peer-to-peer (P2P), mobile code architecture and the
server has to include all the previously mentioned data service-oriented architecture.
sources, as well as the procedures for validation and As the service-oriented architecture has already been
communication with the central database. The acquisition described, now only an outline of the client-server
server consists of the following elements: architecture that was also applied for the creation of the
 data acquisition layer (that includes the manual data user-related components shall be presented.
entry service, the service for data acquisition from other In the case of a HIS, clients are the tools that users
information systems, and the service for the direct employ to access the system’s data and parameters, and
communication with measurement systems), the content that they work on is supplied by a server in the
 data processing and validation service, system or is being transferred to the database by the
means of a service on the server.
 data transfer service,
 data monitoring service and B. Object-oriented development of a simulation
 acquisition repository. environment
The object-oriented development of a simulation
C. Application server environment involves the use of the object-oriented design
The application server is designed to control the access and object-oriented program tools. The result of such a
to data by different users, as well as to find and acquire development is evidently an environment that functions on
the available data from the distributed HIS, which allows the basis of an object-oriented simulation [12]. The
the particular user to have the universal access to all the features of an object-oriented language involve
available data of a certain HIS. The structure of the fundamentally different modelling process as compared
application server is very similar to the structure of the with the process of the conventional modelling and
central hydro-information server. The main differences in simulation. The application of object-oriented concepts for
comparison to the central server are in a lesser number of simulations is nowadays considered to be the key factor
off-line tools that are adapted to the processing of local that provides the efficiency of modelling and software
data, and omitting of on-line tools. development in implementation of simulation models that
should be modular and flexible.
IV. IMPLEMENTATION OF HYDRO-INFORMATION The computational methods are based on the
SYSTEMS IN A DISTRIBUTED ENVIRONMENT mathematical system theory, object-oriented theory and a
large number of mathematical principles. It is interesting,
In the process of implementation of the presented
that there is a natural connection between the development
software platform of the hydro-information system, the
environment defined in this manner and the system theory
latest technologies were used for the design of the system
[13]. The entities of the environment are defined in
as well as for the application creation process and the
accordance with the system theory and their relations are
database implementation.
formulated in order to achieve the morphism over system
The system architecture design and the choice of descriptions. On the other hand, the abstraction that
adequate software technologies were performed with a characterizes the system theory requires an actual
goal of creating an open and scalable platform that can be implementation in order to be applicable to real systems.
equally productive both in a single-processor
environment, as well as in a distributed environment. V. HYDRO-INFORMATION SYSTEM “DRINA”
Since this is a complex system, with a tendency towards
further extending and an increase in complexity, the The basin of the River Drina represents the most
scalability of the application is of great importance, with important unused hydropower potential in the Balkans.
the intention right from the start to allow a large number The area or the River Drina basin is circa 19570 km2
of users to exploit the system simultaneously. This is the (30.5% of this area belongs to Serbia, 31.5% to
reason why an object-related approach was chosen for the Montenegro and 37% to Bosnia and Herzegovina). So far
development of the software and simulation models. The 9 hydropower plants within the River Drina basin have
interoperability of a model and its methodologies and been built that have the total installed power of 1932 MW
research results become the priority of the international and the average annual generation of 6350 GWh. Within
co-operation, and this is one of the main reasons why it is Drina river basin it is possible to build substantial
necessary to apply an open architecture [11]. hydropower capacities, which would allow for the
additional annual electricity generation of more than
A. Architecture of distributed systems 7000 GWh.
In the process of the implementation of distributed The “Drina” hydro-information system (HIS Drina for
systems, it is possible to apply numerous hardware and short) is a distributed hydro-information system designed
software solutions. On the basic implementation level, it is according to previously presented service-oriented
necessary to join a certain number of separate processors, architecture, created for the decision-making support for
through the connection on the same motherboard and the management of the waters in the river Drina basin. It
creation of multi-processor devices, or by creation of LAN consists of various data management services, databases
networks of different complexity levels. On a higher and numerical modules.
implementation level, it is necessary to provide protocols The system is built on commercial technologies, with
for the communication between processes that are running use of SQL Server database and .NET Framework.
on the separate microprocessors. Some of the standardized Services in the system are published as Web Services.

Page 53 of 478
ICIST 2014 - Vol. 1 Regular papers

station with reliable discharge measuring exists. It is also


very important to model the changes of flow conditions as
a function of time, due to management decisions.
The realization of an integral algorithm (which includes
natural and artificial watercourses, as well as users’
requests, supply priorities etc.) has provoked the use of
system models in which discrete changes of state within
the system or its environment occur discontinuously in
time. For this reason, a library of models has been
developed, as well as the corresponding simulation
platform based on the application of the Discrete Event
System Specification – DEVS, as it is presented in the
paper [19]. On the basis of the developed library, it is
possible to define a large number of scenarios and to
perform manipulation upon a dynamic and flexible model
that allows for all sorts of modifications, in relation to its
parameters and in relation to structure of the model itself.
Figure 2 HIS Drina web portal
C. User applications
A. Data management
Previously presented system components need user
The full use of HIS Drina relies on availability of data interface to communicate properly with users of HIS
gathered from monitoring networks. The data required as Drina. Several user applications were designed to enable
simulation input are mostly meteorological measurements, full use of HIS data and simulation models: software for
while data necessary for state updating procedures [14] data access and visualisation, software for data series
also include hydrological measurements and power plants’ analysis, optimization and simulation software.
productions and other information important for Thanks to user-friendly interface and coupling of
interpretation of current state of river basin. rainfall/runoff model, DEVS simulation, and optimization
Acquisition of this data is mostly automated, which is algorithms, it is made possible for operators of hydro
done by importing from other informations systems and power plants and reservoirs to apply information from
SCADA systems through specially designed services. HIS Drina in reservoir operation in near real-time. At the
Some data are manually measured and acquired through same time, HIS Drina provides local and state authorities
manual entry application by staff. Considerable amount of with additional information on state of Drina river basin,
data is transferred from old archives and databases, which like: estimated snow coverage with water equivalents,
readily provides data for long-term analysis. what-if analysis (e.g. flood risks) and other valuable data
In total, over 200 measurement stations are included in for basin management and planning.
acquisition and represented with real-time and historic
data. This data is readily available for use in user
applications and numerical modules. Selected data can be
monitored through HIS Drina web portal, which provides
data for all interested parties within river basin.
B. Simulation model
The simulation model is the principal part of the
complex software and it represents the core of the
distributed system for the support to the integral
management of waters in the river Drina basin. The model
is related to water flow and exploitation in a broad and
complex area that covers the whole river Drina basin.
Water enters the system in the form of precipitation and
Figure 3 User interface of DEVS simulation model
there is a system of user demands (demands regarding
electricity generation as a function of time or demands
related to the capturing of certain quantities of water as a
function of time). The model includes the formation of the
runoff from the rainfall[15], taking into account the
influence of snow[16], relief and soil, as well as all other
linear flow forms[17], i.e., flows through natural water
courses in accordance with morphological performances,
flows through artificial objects (dam spillways and outlets,
hydropower plants, tunnels, channels, pipelines etc.). A
large number of parameters of the model of the formation
of runoff on complex catchment areas requires the
implementation of the state-of-art estimation methods
[18], in order to achieve the best possible matching
between calculated and measured values of discharge on a Figure 4 Estimated snow coverage with water equivalents
certain hydro-profile, where a representative hydrologic

Page 54 of 478
ICIST 2014 - Vol. 1 Regular papers

VI. CONCLUSIONS [4] K. Christiaens, J. Feyen, Constraining soil hydraulic parameter


and output uncertainty of the distributed hydrologicalMIKE SHE
The presented service-oriented architecture of a HIS model using the GLUE framework, Hydrological Processes,
itself represents a platform for a future development of Volume: 16 Issue: 2 Pages: 373-391. 2002.
components and functionality for each particular case. [5] Y.B. Sun, H.D. Ji, Heihe River Water Allocation Model Based on
This does not indicate that the platform functioning is RIBASIM, Proceedings Of The 1st International Yellow River
influenced by its actual application, but that the openness Forum On River Basin Management, Vol V, Pages: 130-141,
2004.
of the presented solution allows the specific components
[6] D. Yates, J. Sieber, D. Purkey, Huber-Lee, WEAP21 - A demand-
to be included effortlessly into a system, in a much easier , priority-, and preference-driven water planning model Part 1:
way than before. This fact also allows the broader Model characteristics, Water International, Volume: 30 Issue: 4
application of a HIS in support for the management of Pages: 487-500. 2005.
water resources. [7] E.A. Zagona, T.J. Fulp, R. Shane, Y. Magee, H.M. Goranflo,
A HIS should provide for the necessary components, Riverware: A generalized tool for complex reservoir system
protocols and objects that would, by the means of the modeling, Journal Of The American Water Resources Association,
Volume: 37 Issue: 4 Pages: 913-929. 2001.
hierarchical integration of data and entities within water
[8] Reference Model for Service Oriented Architecture 1.0, OASIS
management systems, allow an integral system analysis Standard, October 2006.
and support water resources management. The application [9] MP Papazoglou, WJ van den Heuvel, Service-Oriented
of interdisciplinary procedures, algorithms and techniques Computing: State-of-the-Art and Open Research Issues, IEEE
on the observed data can allow the expansion of a HIS Computer. v40 i11
beyond its use in exploitation of water management and [10] N. Branisavljević, D. Prodanović, M. Arsić, Z. Simić, J. Borota,
hydropower facilities towards the use in the field of Hydro-Meteorological Data Quality Assurance and Improvement.
ecology, economy and social issues. Journal of the Serbian Society for Computational Mechanics, Vol.
3, No. 1, 2009.
One of the main goals of the implementation and use of
[11] M.W. Blind, B. Adrichem, P. Groenendijk, Generic Framework
a HIS is the creation of a virtual hydro-meteorological and Water: An open modeling system for efficient model linking in
hydropower observatory [20, 21]. This term implies an integrated water management - current status, EuroSim 2001,
overall survey of the information that describes the natural Delft.
environment of a catchment area, hydro-meteorological [12] B.P. Zeigler, D. Fulton, P. Hammonds, J. Nutaro, Framework for
and hydropower measurements, simulation models of M&S–Based System Development and Testing In a Net-Centric
processes and phenomena and a conceptual frame for Environment. Arizona Center for Integrative Modeling and
formulation of new hydrologic perceptions. The virtual Simultaion, ITEA Journal, 2005.
hydro-meteorological and hydropower observatory can be [13] H.S. Sarjoughian, R.K. Singh, Building Simulation Modeling
Environments Using Systems Theory and Software Architecture
achieved by the implementation of a service-oriented HIS Principles, ASTC, Washington DC, 2004.
within the limits defined by the catchment area. [14] B. Stojanović, D. Divac, N. Grujović, N. Milivojević, Z.
The presented architecture applied on Drina river basin Stojanović, State Variables Updating Algorithm for Open-Channel
in form of HIS Drina offers a sound basis for further and Reservoir Flow Simulation Model. Journal of the Serbian
development of relevant algorithms and models that could Society for Computational Mechanics, Vol. 3, No. 1, 2009.
allow for the creation of a virtual hydro-meteorological [15] Z. Simić, N. Milivojević, D. Prodanović, V. Milivojević, N.
and hydropower observatory that is an important part of Perović, SWAT-Based Runoff Modeling in Complex Catchment
Areas – Theoretical Background and Numerical Procedures.
the contemporary system for the integrated management Journal of the Serbian Society for Computational Mechanics, Vol.
of water resources. This is one of the most important goals 3, No. 1, 2009.
set for the further development of HIS Drina and other [16] M. Stojković, N. Milivojević, 2013, Hydrological modeling with
HIS systems in Serbia and region. special reference to snow cover processes, Facta universitatis
series architecture and civil engineering, Vol.11, No 2, 2013 pp.
ACKNOWLEDGMENT 147 – 168.
[17] M. Stojković, N. Milivojević, Z. Stojanović, Use of information
The development of the software was supported by the technology in hydrological analysis, „E-SOCIETY Research and
Ministry of Education, Science and Technological applications”, December 2012.
development of the Republic of Serbia as part of the [18] N. Milivojević, Z. Simić, A. Orlic, V. Milivojević, B. Stojanović
project TR 37013 "System development to support (2009a), Parameter Estimation and Validation of the Proposed
optimal sustainability of the high dams in Serbia". SWAT Based Rainfall-Runoff Model – Methods and Outcomes.
Journal of the Serbian Society for Computational Mechanics, Vol.
3, No. 1
REFERENCES
[19] N. Milivojević, N. Grujović, B. Stojanović, D. Divac, V.
[1] D. Divac, N. Grujović, N. Milivojević, Z. Stojanović, Z. Simić, Milivojević, Discrete Events Simulation Model Applied to Large-
Hydro-Information Systems and Management of Hydropower Scale Hydro Systems. Journal of the Serbian Society for
Resources in Serbia, Journal of the Serbian Society for Computational Mechanics, Vol. 3, No. 1, 2009.
Computational Mechanics, Vol. 3, No. 1, 2009. [20] P. Fox, Virtual Observatories in Geosciences, Earth Science
[2] D. Divac, N. Milivojević, N. Grujović, B. Stojanović, Z. Simić, A Informatics, Vol. 1, No. 1, Springer, Berlin, 2008.
Procedure for State Updating of SWAT-Based Distributed [21] J.J. McDonnell, M. Sivapalan, K. Vache, S. Dunn, G. Grant, R.
Hydrological Model for Operational Runoff Forecasting. Journal Haggerty, C. Hinz, R. Hooper, J. Kirchner, M.L. Roderick, J.
of the Serbian Society for Computational Mechanics, Vol. 3, No. Selker, M. Weiler, Moving beyond heterogeneity and process
1, 2009. complexity: A new vision for watershed hydrology, American
[3] D.G. Jamieson, K. Fedra, The 'WaterWare' decision-support Geophysical Union, Water Resources Research, vol. 43, W07301,
system for river-basin planning 1. Conceptual design, Journal Of pp 1-6, 2007.
Hydrology, Volume: 177 Issue: 3-4 Pages: 163-175. 1996.

Page 55 of 478
ICIST 2014 - Vol. 1 Regular papers

Information System for Dam Safety Management


Nikola Milivojević*, Nenad Grujović**, Dejan Divac*, Vladimir Milivojević*, Rastko Martać*
Jaroslav Černi Institute, Belgrade, Serbia
*

Faculty of Engineering, University of Kragujevac , Serbia


**

[email protected], [email protected], [email protected], [email protected],


[email protected]

Abstract—Dams must be permanently maintained in a analysis[5,6,7]. Usually this means tight workflows and
proper way because of extremely grave consequences that diverse and detailed data on safety management for
may occur in case of a failure. Maintenance refers to specific use[8]. There are also examples of advanced
monitoring the condition of a dam and its belonging monitoring systems [9] that provide valuable information
facilities, that is to identification and undertaking of all on embankments’ stability, landslide risks, etc.
necessary measures for ensuring safety and functionality of An efficient system for dam safety management should
the facilities in a timely manner. Maintenance involves manage not only information on safety-related activities,
various factors concerning the facilities as structures: monitoring and analysis, but also their correlation and
natural factors, environmental factors and human activities. provide feedback information on all interactions when
With such complex issues, the question of optimality and use
required.
of available resources arises. This paper presents an
information system for the support of implementation of
regulatory frameworks for the safety of dams, which serves
II. DAM SAFETY IN SERBIA
to improve the maintenance system, safety and functionality There are 69 large dams in use in Serbia with total
of the existing dams and provide the protection of local volume of 6 billion cubic meters. According to their
communities. An integrated approach to all aspects of dam purpose, they can be generally classified in two categories
maintenance is essential for solving this issue. The – electrical energy production (23 dams within Electric
development of the system involves comprehension of Power Industry of Serbia) and water resources
measurable indicators that are relevant for the decision- management (water supply, irrigation, and protection from
making process, modernization and extension of the system floods – 46 dams). Dams, as well as other construction
for monitoring relevant values and development of facilities, have limited working life. Over time, there is
physically-based mathematical models with the aim of growing need to monitor their condition, and respond if
analysing and predicting of dam behaviour. The system is necessary in order to reduce the probability of damage,
developed and implemented on Prvonek dam (a 90 m rock- failure and uncontrolled discharge of water. The average
fill, water supply, latest facility of this kind that is age of the large dams in Serbia is 36 years, and more than
constructed in Serbia). Some details on implementation and half of these dams (37 precisely) is over 30 years.
initial results are also presented.
There is a consensus in Serbia over importance of dam
maintenance, but the regulations in this area are not
I. INTRODUCTION completely defined. Fortunately, no severe hazards
happened to this day, mostly due to good design and
Regarding dam maintenance, in all countries that have quality construction works.
large dams on their territory, the executive state
authorities are obliged to monitor behaviour of all However, despite the fact that no serious incidents
facilities in use and coordinate all activities that need to be occurred, which would have jeopardized the safety and
taken in order to maintain the security of these facilities at stability of dams, or lessen their usability, one must have
a satisfactory level[1]. Recognizing the importance of in mind that, especially with ageing of dam constructions,
dams and reservoirs which belong to the category of emergence of various kinds of problems can be expected
facilities of special social interest, as well as importance in future.
and impact of their maintenance on the service life and Dam maintenance and safety management in Serbia
functionality, and safety of the facilities, and having in have recently come into general practice, based on
mind the level of actualization of legislation which deals existing legal framework for dam monitoring[10]. In order
with this field, it is extremely important to create technical to provide basis for future regulations and tools for
conditions for implementation of strategy for maintenance application of these regulations, a research is being carried
of the existing dams, that is to define the proper way of out on information systems for dam safety management
maintaining large dams, so that their security and for dams in Serbia. Such system integrates monitoring on
functionality is improved in a feasible and rational way[2]. dam with numerical procedures and safety parameters.
Many tools for dam safety management provide owners The architecture of this system will be presented in this
and experts with procedures and methods for assessment paper along with an example of application on Prvonek
of certain dam parameters[3,4], or periodic, more complex dam.

Page 56 of 478
ICIST 2014 - Vol. 1 Regular papers

III. DAM SAFETY MANAGEMENT INFORMATION Using this interface, numerical modules and user
SYSTEM applications are able to request data at real-time. A web
service implements this interface, so that various clients
Information system for dam safety management is may request and receive the data. Missing data is common
complex service-oriented system which allows the use of in monitoring of complex objects. Therefore, the interface
near real-time data in dam safety analysis. The data and to technical monitoring information system uses
services are standardized, making it possible to use this techniques for estimation of missing data[11].
architecture with various monitoring systems and
numerical techniques. The system is scalable and robust B. Numerical module for statistical analysis
by design, and capable for future expansion and
application of new tools and numerical procedures. When sufficient amount of data from dam monitoring is
Standard technologies such as SQL server database, .NET available, it is possible to form statistical model of dam
framework and Web services are used. Most of the behaviour, using regression and numerical procedures[12].
applications are desktop applications built with rich and As part of presented information system, numerical
intuitive UI. module for statistical analysis provides statistical
modelling and analysis features, which can be used for
Dam safety management software consists of following estimation of dam behaviour, as well as adaption of
components: interface to technical monitoring system, previously designed statistical models. The module is
numerical module for statistical analysis, numerical designed with scalability and robustness in mind, so that
module for groundwater seepage simulation, numerical future measurements can be easily included in modelling.
module for stress/strain simulation, numerical module for
data assimilation, and applications. The architecture of C. Numerical module for groundwater seepage
this system is illustrated on Fig. 1. simulation
A. Interface to technical monitoring information system This numerical module consists of pre-processor and
FEM solver for groundwater seepage simulation. The pre-
To use data stored in technical monitoring information
processor prepares input data for FEM solver. It also
system it was necessary to develop adequate interface.

INTERFACE TO MONITORING INFORMATION SYSTEM

Numerical module for statistical Numerical module for data


analysis assimilation

Numerical module
Numerical module
Statistical models for groundwater
for stress/strain
service seepage
simulation
simulation

Dam safety FEM model


reports calibration
Real time
Web portal for
monitoring Statistical modelling tool FEM model user interface
dam assessment
dashboard

Staff Experts
Figure 1. The structure of information system for dam safety management.

Page 57 of 478
ICIST 2014 - Vol. 1 Regular papers

serves as communication layer between FEM solver and Applications within information system for dam safety
numerical model for data assimilation. Load cases and management are designed to provide users with
stress tests in user interface of FEM model are applied to visualisation of real-time measurement data and
FEM model through pre-processor. Model updated states estimations of dam status. According to level of expertise
obtained from data assimilation module are applied to required from users applications are grouped as: expert-
FEM model through pre-processor, and FEM solver is oriented, as shown in Fig. 2 (for periodic assessments) and
then used to calculate resulting states of FEM model for staff-oriented, as shown in Fig. 3, for day-to-day use.
specific load case (potentials, velocities, gradients, Most of the applications are desktop applications, built
seepage forces, etc.) The results are presented to the user with rich and intuitive UI. These applications enable the
through graphical interface and printed reports. users of the system to interact with statistical and FEM
models and processes within system. Web applications are
D. Numerical module for stress-strain simulation also used for better communication with other interested
Similar to numerical module for groundwater seepage parties.
simulation, numerical module for stress/strain simulation Interaction with FEM model within dam safety
consists of pre-processor and FEM solver for simulation management system requires expert knowledge and the
of stress/strain processes. Pre-processor is designed to application for model calibration and standard tests is the
provide FEM solver with up-to-date FEM model (seepage most complex in terms of UI. The expert is provided with
forces, elastic modulus, etc.) and input data (load cases, tools for parameter control of FEM model, seamless data
stress tests). FEM solver performs calculation with assimilation with measured data, and advanced
provided data and outputs following results: stress fields, visualisation of results. A report on dam safety parameters
elastic and plastic deformation, safety factors, fracture is also generated from this application. Dam safety criteria
modes, etc.) The results are presented to the user through is defined according to shear strength reduction method,
interactive graphical interface and printed reports. as given in [14].
E. Numerical module for data assimilation
IV. PRVONEK INFORMATION SYSTEM FOR DAM SAFETY
In order to perform consistent dam safety analysis MANAGEMENT
calculations should be performed upon up-to-date state of
model. State updating algorithms are implemented in Prvonek dam is located in Southern Serbia, near the city
numerical module for data assimilation[13]. Using data of Vranje, on the river of Banjska, 9 km upstream from
from monitoring information system this module performs confluence with South Morava. Construction of this
data assimilation upon FEM model. Data assimilation is facility was completed in 2006. The dam (shown in Fig. 4)
achieved using optimization algorithms and stochastic is rock-fill, 93 m in height, and is the latest multi-purpose
methods. This module functions automatically by facility of that kind constructed in Serbia. Main purposes
communicating with other numerical modules. of this facility are water supply of city of Vranje, flood
protection downstream of the dam, as well as electrical
F. Applications energy production.
Dam safety implies such state of dam that required
functionalities can be fulfilled without adverse effects to
the environment, population, or other facilities. Existing
Prvonek dam safety has been achieved by criteria defined
in the design phase as well as by application of standards
during the construction phase. Nevertheless, facility
operational conditions are quite different in regard to
design conditions, but also there are changes in dam
material properties during the time. Because of that, there
is necessity of permanent assessment of the most
important indicators of dam safety. Information system for
dam safety management has been implemented at Prvonek

Figure 2. Expert-oriented application for FEM analysis

Figure 3. Real-time dashboard Figure 4. Prvonek dam

Page 58 of 478
ICIST 2014 - Vol. 1 Regular papers

dam with goal of constant dam safety assessment in mind. analysed by experts in order to make conclusions about
Use of dam safety management system consists of two dam safety and overall dam state. Finally, dam safety
main tasks: report can be generated. Because of complexity of results,
the generation of report is semi-automated process, which
 Everyday use is based on report template. The expert is given choices in
 Periodic assessment generating report, so that each generated plot is verified
Everyday use is performed by dam staff through web and adjusted if required. Upon confirmation of report
portal for dam assessment and real time monitoring elements, the application generates the report in Microsoft
dashboard. The main task is to verify if there exist some Word document format. This document is intended for use
unexpected values in data, failure in measurement units or in obtaining dam permits and, if necessary, justifying
other problems. Significant help in performing this task to investments in rehabilitation and maintenance.
the staff is statistical model service. Statistical model
service compares real time value to estimated value V. CONCLUSION
obtained from statistical model, and a warning is raised if Presented information system for dam safety
there is deviation larger than one defined within model. If management enables owners of dams to easily monitor
the warning occurs, there is well-defined procedure that safety parameters of a dam using intuitive software tools.
must be completed in order to verify anomaly. The use case presented in this paper demonstrates an
Unlike everyday use, periodic assessment can only be application of such system on rock-fill dam. However, the
done by experts from accredited institution. The expert same architecture is applicable on various sorts of dams
performs two main tasks: (gravity dam, arch dam, etc.) with specifically designed
 Generating dam safety report; numerical modules for FEM simulation.
 Updating statistical models. The main benefit of presented information system is
Dam safety report is the output of the information seamless integration of monitoring data with numerical
modules and applications. The use of web services and
system for dam safety management and is created semi-
automatically through guided steps. Firstly, FEM model of standard data formats provides sound base for further
dam must be updated according to available development.
measurements using FEM model user interface and With the development of legal framework for dam
numerical module for data assimilation. After that, a set of maintenance and safety management, the presented
predefined tests (groundwater seepage and stress/strain) of information system will provide efficient tools for owners
the updated model must be performed. of dams for application of regulations. On the other hand,
Set of groundwater seepage tests consists of two load by applying such systems to all high dams in one country,
cases: a real opportunity could be created for executive state
authorities to standardize procedures and create a network
 Steady state seepage of dam safety management systems.
 Emergency reservoir evacuation
Each test generates large set of data that can be divided
into several categories in order to simplify further ACKNOWLEDGMENT
analysis. Some of these categories are: The development of the software was supported by the
 Surface plot of potential in whole dam Ministry of Education, Science and Technological
 Surface plot of pore pressure in whole dam development of the Republic of Serbia as part of the
project TR 37013 "System development to support
 Surface plot of groundwater seepage velocity optimal sustainability of the high dams in Serbia".
Set of stress/strain tests consists of following cases:
 Steady state seepage REFERENCES
 Emergency reservoir evacuation [1] Bradlow Daniel D, Palmieri Alessandro, Salman MA. Regulatory
frameworks for dam safety: a comparative study (law, justice, and
 Typical earthquake (according to seismic data) development series). USA: World Bank Publications; 2004.
Results of this group of tests are fields of total [2] Schmidt T, Grounds M. Dam safety program management tools
displacement, plastic deformation and global safety (DSPMT). United States society on dams; future of dams and their
coefficient. reservoirs; 2002. p. 747–63.
Results obtained from all of the tests have to be [3] Farinha F, Portela E, Domingues C, Sousa L. Knowledge-based
systems in civil engineering: three case studies. Adv. Eng.
Software 2005;36(11/12):729–39.
[4] Muller BC, Mayer D. The evolution of reclamation’s risk based
dam safety program management and decision making. United
States society on dams; technologies to enhance dam safety and
the environment; 2005. p. 693–702.
[5] H. Su, Z. Wen, Z. Wu, Study on an Intelligent Inference Engine in
Early-Warning System of Dam Health, Water Resources
Management vol.25 Issue 6,, 2011. p. 1545-1563.
[6] Rohaninejad M., Zarghami M., Combining Monte Carlo and finite
difference methods for effective simulation of dam behaviour,
Advances in Engineering Software vol.45 Issue 1., 2012, p. 197-
202.
Figure 5. Dam safety report generation [7] Swiatek D., Kemblowki M., Jankowski W., Application of the
Bayesian Belief Nets in dam safety management, Annals of

Page 59 of 478
ICIST 2014 - Vol. 1 Regular papers

Warsaw University of Life Sciences, Land Reclamation, vol.44, [12] Divac D., Milivojevic N., Stojanovic B., Milivojevic M,. Ivanovic
issue 1, 2012. p. 25-33 M., Adaptive System for dam behavior modeling based on linear
[8] J. Jeon, J. Lee, D. Shin, H. Park, Development of dam safety regression and genetic algorithms, Advances in Engineering
management system, Advances in Engineering Software, vol. 40, Software, Vol. 65, Elsevier, 2013
Issue 8, 2009, p..554-563 [13] Schulze-Riegert, R. W., Krosche, M., Pajonk, O., & Mustafa, H.
[9] Caldwell L., Scannell J., DamWatch – A Web-Based software Data Assimilation Coupled to Evolutionary Algorithms—A Case
system to monitor 12000 Watershed Dams, NACD Conference, Example in History Matching. Society of Petroleum Engineers.
2014. SPE/EAGE Reservoir Characterization and Simulation
Conference, 19-21 October, Abu Dhabi, UAE, 2009
[10] Group of authors, Uputstva za tehničko osmatranje visokih brana,
Jaroslav Černi Institute, Belgrade, 1982. [14] Divac D., Milivojevic N., Rakic D., Zivkovic M., Vulovic S.,
[11] Divac D., Milivojevic N., Novakvic A., Rankovic V., Grujovic N., Slavkovic R., Embankment dam stability analysis using FEM,
Missing data estimation in dam structures using multiple Proceedings of 3rd South-East European Conference on
imputation method, Proceeding of 7th International Quality Computational Mechanics - SEECCM, Institute of Structural
Conference, Faculty of Engineering University of Kragujevac, Analysis and Anti Seismic Research National Technical
Serbia, 2013 University of Athens, Kos Island, Greece, 2013

Page 60 of 478
ICIST 2014 - Vol. 1 Regular papers

Genetic Algorithm Based


Energy Demand-Side Management
Nikola Tomašević, Marko Batić, Sanja Vraneš
University of Belgrade, Institute Mihajlo Pupin
[email protected]
[email protected]
[email protected]

Abstract—Application of demand-side management (DSM) the offline extraction of the training database using
plays today an important role in energy management both different load demands. In [5] GA was used to minimize
in industrial and residential domain. This paper proposes a the energy consumption for operating the refrigerated
generic approach to DSM which is based on the genetic warehouse supported by wind energy. On the other hand,
algorithm (GA), as one of the powerful search heuristics [6] proposes a modified GA approach for scheduling of
inspired by the process of natural evolution. The proposed generator units in order to meet the forecasted load
approach was defined flexibly enough to be capable of demand at minimum cost. Development of hybrid
discovering an optimal load distribution (e.g. from the techniques was carried out in [7] for short-term generation
financial perspective) in practically any multiple energy scheduling based on the GA approach in order to adjust
supply/multiple load facility infrastructure. Optimization of the pre-scheduling results obtained by ANNs. Moreover,
the demand side was carried out by taking into account the matrix real-coded GA optimization module was described
forecasted energy demand and applied tariff schemes as in [8] to perform the load management within a smart
well. Furthermore, a performance of the proposed approach energy management system aimed for optimization of the
was verified at a multiple supply/multiple load use case micro-grid. Approach proposed in [9] was composed of a
scenario. Based on the optimization results, it was concluded master and slave GAs to optimize the scheduling of direct
that the proposed GA based solution could be successfully load control strategies. Finally, in [10] a GA based
utilized to facilitate decision making of energy managers decision support information system was analyzed which
regarding the appropriate DSM measure selection. would, apart from the equipment scheduling, facilitate
dealing with the various load management scenarios.
I. INTRODUCTION In this paper, a generic approach to DSM was proposed
which was based on the GA paradigm. The proposed GA
For many years demand-side management (DSM) has
based approach was defined in such a way to discover an
been utilized in order to answer the needs of mostly large
optimal load distribution profile (for instance from the
and predictable loads in the industry domain [1]. Today,
financial perspective) in practically any multiple energy
DSM concept plays an important role in management of
supply/multiple load facility infrastructure. This is
the energy consumption of smaller commercial and
contrary to the already existing methods which are in
residential end-users. The reason for this is strong energy
some cases closely tied to a specific facility infrastructure
supply constraints which are forcing the utilities to take
and given resources. The main idea of the proposed
into account the DSM potential. In fact, it is more
approach was to suggest the appropriate DSM measures to
profitable and environmentally beneficial to perform the
the energy manager, by taking into account the forecasted
energy demand management by investing into the facility
energy demand and applied tariff schemes. Having in
infrastructure and by applying corresponding load
mind the complexity of this task, the proposed solution
management measures, rather than increasing the energy
was leveraged on the GA paradigm as one of the powerful
generation and transmission capacity. Moreover, the DSM
search heuristics which was utilized to search for the
measures are aimed to lower the energy supply and
optimal energy demand. Both single, as well as multiple
distribution requirements. In the same time, by applying
loads optimization is supported depending on the given
the DSM measures, it is possible to answer the needs of
use case scenario and the facility infrastructure.
much higher number of end-users with the same energy
Furthermore, one time step, but also the entire time span
supply capacity.
of the given time interval is supported for performing the
Various optimization paradigms were utilized so far to DSM optimization. Through the proposed approach,
solve the problem of discovering an optimal end-use load different DSM measures (such as curtailing and/or shifting
distribution and applying adequate DSM measures. One of the load) could be taken into the consideration solely or in
the most popular and often cited approaches to apply combination to avoid for instance peak hours or high tariff
optimal resource and load management is based on the use periods. By defining the desired optimization constraints
of genetic algorithm (GA) [2]. Until today, many papers of the proposed solution, it is possible to vary the degree
on this topic were published in the literature [3]-[10]. For of influence on the end-user operation, to preserve the
instance, online management of fuel cells was proposed in total energy consumption per load etc. In order to evaluate
[3], [4] for onsite provision of the energy supply to the performance of the proposed approach, GA
residential loads. Proposed approach was based on the optimization procedures were carried out for a simple use
artificial neural networks (ANN), while GA was used for case scenario having two energy carriers at the supply

Page 61 of 478
ICIST 2014 - Vol. 1 Regular papers

side, renewable energy sources (RES) and two loads at the consumed power. Looking from the broader perspective,
demand side. the DSM measures can also consider providing the
The remainder of this paper is organized as follows. additional power supply such as by implementation of the
Section 2 describes the concept of the DSM measures and RES elements based on which the load demand could be
their potential application. The following Section 3 met along with the reduced cost of the energy
analyzes the proposed GA based approach to DSM. The consumption.
same section tackles the problem of GA optimization From the perspective of the DSM application, it is
objectives, constraints and parameters definition suitable important first to identify and appropriately categorise the
to satisfy the demand side having in mind the applied type of the end-use load. The end-use load (electricity,
tariff schemes. Optimization results and performance of heating and cooling load) could be driven by the various
the proposed approach are analyzed and discussed for a building systems, such as air conditioning system, lighting
simple use case scenario within Section 4. Section 5 system or any other piece of equipment installed at the
presents the final conclusions of this paper. site. With respect to the mentioned, the load can be
categorised according to the following [11]:
II. CONCEPT OF DEMAND-SIDE MANAGEMENT  critical load – should not be influenced (typically
The main objective of DSM is to change the energy power supply of fundamental operation),
end-use, i.e. to influence the energy consumption profile  curtailable load – could be reduced (the
in order to reduce the overall cost of the consumed energy temperature set-point of the air conditioning
[11]. In other words, DSM represents the corresponding system could be lowered during periods of high
modification of the consumer’s energy demand by electricity price or if contracted peak
applying various mechanisms mainly driven by the consumption is being approached),
financial incentives. DSM related measures are often  reschedulable load – could be shifted (forwards
undertaken by the end consumer, but also can be initiated or backwards) in time (pre-cooling of a building
by the distribution utility itself. It usually includes actions can be performed early in the morning before
such as increasing or decreasing load demand, shifting it there is an actual cooling demand).
from high to low tariff periods if variable tariff scheme is
applied (e.g. moving the energy use to off-peak periods Having in mind the above listed categories,
such as during the nights, weekends) etc. DSM can be identification of the curtailable and reschedulable load is a
applied through: prerequisite in order to select and apply suitable DSM
measure.
1) energy efficiency improvement and
2) load management. III. PROPOSED GA BASED APPROACH
In the first place, improvement of the energy efficiency The objective of this paper is to propose the
implies performing the same type of operations for less corresponding DSM related measures to the
energy. These actions consider reduction of the energy use facility/energy manager by taking into account the
through implementation of energy efficient equipment forecasted energy load profile of the building and applied
(such as energy saving lighting devices, more efficient air tariff scheme per energy carrier (such as electrical energy,
conditioning units, circulation pumps etc) and they are natural gas, fossil fuel etc). The proposed solution is based
focused to reduce the energy consumption and indirectly on the GA [2] which is used in order to discover the
to reduce the peak demand. On the other hand, DSM can optimal energy load profile over a given time window.
be achieved through the load management which this GA optimization was chosen as one of the powerful
paper is focused on. search heuristics inspired by the process of natural
Load management [1] includes all the measures evolution (such as inheritance, mutation, selection, and
intentionally undertaken with the aim to influence the crossover) and which is often used for search problems
energy consumption/load profile in such a way to alter and optimization tasks. As such, the GA was taken as the
(usually to reduce) the peak demand or total energy main paradigm of the proposed solution, facilitating
consumption over a certain period of time. In other words, search for the optimal load distribution which should be
it includes redistribution of energy demand in order to followed through the corresponding DSM related
spread the energy consumption evenly throughout the measures.
given period (on a daily or seasonal basis). It is directly The proposed solution supports single, as well as
focused on reduction of the peak demand and may or may multiple loads optimization which considers redistribution
not result in decrease of total energy consumption. of demanded energy per load (such as per electricity load,
Therefore, it could be stated that load management heating/cooling load). Moreover, one time step, as well as
considers any reactive or preventive intentional multiple time steps optimization is supported, depending
modification of the energy consumption pattern with the on the time span of the given window interval, i.e. period
aim to influence the timing, level of instantaneous demand of time within which DSM optimization should be
or total energy consumption [11]. It can be achieved by performed. As it was mentioned, the main idea of the load
applying various actions of controlling, curtailing or management performed by GA optimization procedure is
shifting the load. In other words, desired load shape to redistribute the energy demand in such a way that the
influence can be achieved depending on the applied load lowest possible cost of demanded energy is achieved
management mechanism. under certain constraints. This task can be tackled through
Apart from the load management related actions, DSM various methods of curtailing or shifting the load, among
includes all the measures undertaken by the end consumer which the load shifting was incorporated by the GA
and/or utility in order to consume the energy more optimization in order to avoid the peak hours as well as
efficiently and subsequently to reduce the cost of the

Page 62 of 478
ICIST 2014 - Vol. 1 Regular papers

reallocation of the load by taking into account the applied defined prices per corresponding energy carrier (as part of
tariff scheme (to avoid high tariff intervals). the given tariff scheme). In other words, the overall cost of
the entire energy load profile was taken as the fitness
A. Optimization Objectives and Constraints function value, which was used to rate the “individuals”
The GA optimization process was performed within a within the population. The fitness function value of the
multidimensional space defined by the given constraints, corresponding “individual” was calculated by determining
such as the allowed energy consumption deviation from the cost of the energy carriers required at the supply side
the forecasted one, so called energy consumption margins, to satisfy the demand side. By redistribution and
which define maximal and minimal energy consumption reallocation of the demanded energy among different
bounds. This constraint was taken into account to prevent loads, i.e. by gene modification of the individual, the task
significant disturbance of the regular end-user operations, of the GA was to discover the optimal load profile
which are not usually very flexible. On the other hand, the (“individual”) in terms of the cost of the consumed energy
greater the deviation from the forecasted/regular energy indicated by the fitness function.
consumption profile is defined (by introducing the more Due to the nature of the problem, all the individuals
flexibility in end-user operation), the larger the space for within the population were represented as a set of positive
the optimization algorithm will be, which consequently real numbers indicating the load demand. Initial
could yield better overall result. Furthermore, one of the population was randomly chosen and dispersed over the
objectives of the GA optimization process was to preserve search space defined by given energy load profile and
the total energy consumption per load. The assumption margins. The number of individuals with the best fitness
was made that the consumer operations, looking from the function values that were taken directly into the next
perspective of the total demanded energy, should not be generation, so called elite individuals, was set to 2.
significantly altered (reduced or decreased in a given time Number of elite individuals was set intentionally to the
window), but only redistributed. In other words, these low value in order to avoid that the fittest individuals
actions only suggest reallocation of the end-user dominate the population, which could make the search
operation, but not their cancellation. less effective. Apart from the mentioned elite individuals,
Additionally, one of the frequently applied load the rest of the population was considered for crossover
management measures, the load shifting, was taken into and mutation. Crossover fraction took share of 60% of the
account, which is aimed to reduce the peak consumption remaining individuals, while the 40% was taken for the
and reallocate it to the off-peak periods in the given time mutation by introducing random changes to the genes of
window. The load shifting was applied through the the corresponding individuals, but by taking into account
definition of the maximal allowed energy consumption, the above-mentioned bounds and constraints. In such a
which can be implemented per corresponding load for the way, mutation provided a genetic diversity and enabled
desired time step or throughout the time span of the given the GA to search a broader space.
window interval. In order to discover and propose the Selection of the pairs of the individuals, so called
optimal energy demand profile (from the financial parents, that were combined and used for production of
perspective for instance) to the end user for the given use- new individuals of the next generation, i.e. for crossover,
case scenario described in Section IV, all of the previously was performed based on their fitness function value. More
mentioned constraints and DSM measures were applied in precisely, the selection was performed by simulating a
combination. roulette wheel method, which assumed that selection area
of the corresponding individual is proportional to the
B. GA Optimization Parameters individual’s expectation, i.e. the fitness function value.
The aim of the GA optimization process was to find the The GA optimization was run for 500 generations. In
optimal “individual” within a search space defined by the order to reduce the execution time of the GA optimization
given (i.e. forecasted) energy load profile and process, additional stopping criteria were implemented,
corresponding constraints (such as energy consumption which terminated the optimization process if algorithm
margins, maximal peak energy consumption, total entered the stationary area. The stationary area was
demanded energy preservation). GA optimization for the considered to be around the optimal point when the
purpose of this paper was performed over the population weighted average change in the fitness function value, i.e.
of 100 “individuals”. The population size was chosen as the improvement of individuals was below a given
such having in mind that large population enables more threshold for defined number of generations, so called
thorough search for the optimal solution (possible stall generation limit, or when the fitness function value of
discovering a global minimum and avoiding the local the best individual was less than defined limit.
minimum), but causes the algorithm to run more slowly.
The “individual” in this context considers a set of values IV. OPTIMIZATION PERFORMANCE AND RESULTS
(per one or multiple loads) indicating consumed energy In order to evaluate the performance of the proposed
per corresponding load for one time step or over a time GA based approach to DSM, optimization process was
span of the given window interval. carried out upon the use case scenario having two energy
The result of the GA optimization should be the optimal carriers at the supply side (for instance electricity and
load profile (i.e. “individual” in GA terminology), i.e. the natural gas), RES generation elements and two loads at
optimal point in multidimensional search space the demand side (such as electricity and heating load). For
representing the proposed energy load distribution which such use case scenario, a simple facility infrastructure
yields the lowest possible cost under the applied tariff modelled with energy conversions, as shown in Figure 1,
scheme. Each “individual” of the population was was taken into account. Conversion from electrical energy
evaluated based on the predefined GA fitness function to electricity load was modelled with efficiency of 0.99,
which calculated its cost, while taking into account the indicating that only small part of electrical energy is lost

Page 63 of 478
ICIST 2014 - Vol. 1 Regular papers

Electricity
RES
(solar & wind)

Electricity Electricity
(power grid) load
0.99

Heating
Natural gas load
0.85 Figure 4. Renewable energy sources

Figure 1. Use case facility infrastructure

due to the distribution. On the other hand, conversion


from natural gas to supply heating demand was given with
the efficiency of 0.85. Additionally, RES generation
elements (solar and wind energy based) for electricity
production were included as well. Corresponding load
distribution of the defined facility infrastructure indicating
the forecasted energy consumption for the time interval
06-17h (with one hour resolution) was given as shown in
Figure 2. In addition, dynamic tariff scheme was taken
into the consideration applied per electrical energy, while
in the case of natural gas the fixed price scheme was
applied as it can be seen in Figure 3. Distribution of RES
generated energy (from photovoltaics and wind turbines)
is presented in Figure 4.
Previously described setup of the GA optimization
process was performed upon the defined use case
scenario. The task was to apply the DSM measures upon
Figure 5. Load distribution before and after applied DSM
the forecasted load profile for a given time interval 06-
17h, for two loads (in this case for the electricity and were intentionally chosen to be relatively small in order to
heating load) as shown in Figure 5. Based on the load avoid significant disturbance of the regular end user
profile, the search space for the GA was defined by energy operations.
consumption margins, which, in this scenario, were set to
Additionally, in order to tackle the peak energy
3p.u. (per unit of energy) for both loads (as indicated by
consumption instances, maximal allowed energy
the blue bars in Figure 5). Energy consumption margins
consumption was set to 10p.u., but only for electrical
energy. Based on the given load distribution it can be
noticed that consumption exceeded in certain time
intervals the allowed threshold (as presented in Figure 5).
As it was previously mentioned, additional constraint was
applied, related to the preservation of the total demanded
energy per load as compared to the forecasted load
distribution. For the purpose of this scenario, the fitness
function was calculated by taking into account the tariff
scheme per energy carrier as illustrated in Figure 3. It is
important to emphasize that the applied tariff scheme
indicated the price distribution per corresponding energy
Figure 2. Load distribution carrier at the supply side (in this case electricity and
natural gas) over a given time period.
The progress of the GA optimization process, i.e. the
evolution of the best and mean fitness function value
(calculated upon the entire population) over the 500
generations is presented in Figure 6. Based on the
presented results, it can be concluded that GA
optimization reached its optimal solution around 350th
generation, while further evolution of the population did
not show any significant improvements.
By performing the GA optimization process upon the
given load profile, under the mentioned constraints and
Figure 3. Applied tariff schemes
DSM measures, the resulting load distribution is presented

Page 64 of 478
ICIST 2014 - Vol. 1 Regular papers

TABLE I
COMPARISON RESULTS

Approach Cost [m.u.] Savings [%]


Without DSM
419.61 0.00
(baseline)
With GA based
369.72 11.89
DSM applied

load, implemented solely or in combination, which


depends on a way the optimization constraints are defined.
In that manner, it is possible to reallocate the load from
the peak hours or from the high tariff periods. The degree
of influence of applied DSM measures on the end-user
operation could be varied depending on the type of the
load (critical or non-critical load). Preservation of total
Figure 6. GA optimization process (best and mean fitness value energy consumption per load was also taken into account
evolution) by the proposed approach.
in Figure 5. As it can be noticed, the GA successfully For evaluation of the proposed solution and its
managed to alter the given load distribution in order to performance, a simple use case scenario of two energy
comply with the defined constrains revealing the optimal carriers, RES generation elements and two loads was
load profile with the lowest cost under the applied tariff considered. More precisely, the GA optimization
scheme. For instance, the peak electricity consumption at procedure was carried out upon the given forecasted load
08h and energy consumption above maximal allowed profile and applied tariff scheme. Additional constraints
limit from 14h were successfully curtailed. Electricity load were defined representing the energy consumption
from the evening hours was shifted to the period 12-14h margins and maximal allowed consumption. By analyzing
due to the increased RES generated energy and to the the optimization results it was concluded that the proposed
morning hours having in mind the lower electricity cost. solution gave substantial improvements in terms of cost
On the other hand, heating demand remained almost the savings as compared to the forecasted energy load profile
same since the fixed price scheme was applied for natural taken as a baseline.
gas with no additional constraints. The cost of the
proposed load distribution (end-use load profile ACKNOWLEDGMENT
discovered by the GA) was 369.72m.u. (in abstract The research presented in this paper is partly financed
monetary units), while the cost of the initial forecasted by the European Union (FP7 EPIC-HUB project, Pr. No:
load distribution was 419.61m.u. as shown in Table I. In 600067), and partly by the Ministry of Science and
other words, the proposed approach made cost savings of Technological Development of Republic of Serbia
11.89% comparing to the baseline (without DSM applied). (SOFIA project, Pr. No: TR-32010).
It is important also to emphasize that savings which could
be achieved by the proposed approach are specific to the REFERENCES
analyzed use case scenario (i.e. facility infrastructure, [1] Maharjan I.K., Demand side management: Load management,
applied tariff scheme, constraints etc). load profiling, load shifting, residential and industrial consumer,
energy audit, reliability, urban, semi-urban and rural setting, LAP
V. CONCLUSIONS LAMBERT Academic Publishing, 2010.
[2] Mitchell M., An Introduction to Genetic Algorithms (Complex
Both industrial and residential domains are forced today Adaptive Systems), MIT Press, 1998.
by stringent energy supply constraints to take into account [3] Azmy A.M., Mohamed M.R., Erlich I., “Decision tree-based
the potential benefits of introducing DSM measures. One approach for online management of fuel cells supplying
of the goals of performing the DSM measures is to lower residential loads”, 2005 IEEE Russia Power Tech, St. Petersburg,
the energy supply and distribution requirements, but at the pp.1-7, June 2005.
same time to answer the needs of much higher number of [4] Azmy A.M., Erlich I., “Online optimal management of PEMFuel
end-users with the same energy supply/distribution cells using neural networks”, IEEE Transactions on Power
capacity. Various optimization paradigms were utilized so Delivery, Vol. 20, Issue 2, pp. 1051-1058, April 2005.
far to apply resource and load management. One of the [5] Yi Zong, Cronin T., Gehrke O., Bindner H., Hansen J.C., Latour
M.I., Arcauz O.U., “Application genetic algorithms for load
most popular approaches is based on the GA paradigm. In management in refrigerated warehouses with wind power
this paper, a generic GA based approach to DSM was penetration”, 2009 IEEE Bucharest PowerTech, Bucharest, pp. 1-
proposed and analyzed. GA paradigm was chosen as 6, June 2009.
suitable to solve the problem of discovering an optimal [6] Wong Y.K., Chung T.S., Tuen K.W., “GA approach to scheduling
end-use load distribution and applying adequate DSM of generator units”, 2000 International Conference on Advances in
measures. Depending on the use case scenario and facility Power System Control, Operation and Management APSCOM-
infrastructure of interest, both single as well as multiple 00., Vol.1, pp. 129-133, October 2000.
loads optimization is supported. In addition, proposed [7] El Desouky A.A., Aggarwal R., Elkateb M.M., Li, F., “Advanced
approach supports the optimization of given loads within hybrid genetic algorithm for short-term generation scheduling”,
IEE Proceedings - Generation, Transmission and Distribution,
one time step, but also over the predefined time interval. Vol. 148, Issue: 6, pp. 511-517, November 2001.
The proposed solution includes the application of [8] Chen C., Duan S., Cai T., Liu B., Hu G., “Smart energy
various DSM measures such as curtailing or shifting the management system for optimal microgrid economic operation”,

Page 65 of 478
ICIST 2014 - Vol. 1 Regular papers

IET Renewable Power Generation, Vol. 5, Issue: 3, pp. 1752- via a genetic algorithm embedded neural network”, Proceedings
1416, May 2011. of the 17th IEEE Instrumentation and Measurement Technology
[9] Leehter Yao, Wen-Chi Chang, Rong-Liang Yen, “An Iterative Conference, IMTC 2000. Vol. 2, Baltimore, pp. 1091-5281, May
Deepening Genetic Algorithm for Scheduling of Direct Load 2000.
Control”, IEEE Transactions on Power Systems, Vol. 20, Issue: 3, [11] Nörstebö V.S., Demiray T. H. et al., EPIC-HUB Deliverable D1.3
pp. 1414-1421, August 2005. - Performance indicators, 2013.
[10] Chih-Hsien Kung, Devaney M.J., Chung-Ming Huang, Chih-Ming
Kung, “Power source scheduling and adaptive load management

Page 66 of 478
ICIST 2014 - Vol. 1 Regular papers

Integrated Energy Dispatch Approach Based on


Energy Hub and DSM
Marko Batić, Nikola Tomašević, Sanja Vraneš
University of Belgrade, Institute Mihajlo Pupin, Belgrade, Serbia
[email protected], [email protected], [email protected]

Abstract— Permanent increase of energy prices united with energy by taking into consideration different conversion
greater energy demand make the reduction of energy and/or storage options while meeting a desired
infrastructure operation costs a challenging task. Therefore, optimization criterion. So far, many aspects of the EH
systematic application of energy management (EM) have been thoroughly elaborated, thus emphasizing
solutions became a necessity and the most viable approach
optimization potential of the concept owing to its flexible
for cutting down the energy costs. This paper proposes an
EM solution that brings advancement of the state of the art modelling framework, diverse technologies and wide
related to the multi-carrier energy dispatch by optimizing range of energy carriers [2][3]. The latest research efforts
the energy flows within a generic energy infrastructure. The even considered generalization of this concept by
proposed advancement is achieved through the enrichment introducing renewable energy sources, which was first
of existing Energy Hub concept, leveraging on supply side mentioned in [4]. However, considering that EH concept
optimization, with complementary optimization of demand basically performs optimization of supply side, without
side, known also as demand side management (DSM). affecting the desired energy demand, this paper proposes
Compelling simulation results, justifying the merging of strengthening the EH concept with the introduction of
these two concepts, were reported. The results show the
potential for saving up to 25% of energy costs, depending on
additional, complementary, optimization of the demand
the use case scenario, comparing to a baseline scenario side, which may create space for further energy cost
where no EM solution is applied. Considering that the savings in spite of all mentioned advantages of EH
proposed solution employs existing energy infrastructure, concept. This implies application of the well-known
thus does not require expensive equipment retrofit, it makes concept of demand side management (DSM), which
it immediately applicable in both residential and consists of various techniques for modifying the energy
commercial domain. end use profile, i.e. the demand side. Therefore, it should
be emphasized that any further savings, compared to EH
I. INTRODUCTION approach, require certain level of compromise from the
user (changing the time schedules of equipment, reducing
Current trends of increasing energy demand, present at the demand etc.). Nevertheless, this is perfectly aligned
both residential and industrial/commercial level, as well with current trends in energy supply as more and more
as constant rise of energy prices led to high energy energy providers offer significant economic benefits if, in
related operation costs. This represents a great motive, return, the end user complies with some energy end use
apart from better saving of the environment, for constraints (reducing loads in peak hours, improving
introduction of energy conservation measures and cost power factor etc.).
reduction actions. Typically, this objective can be The remainder of the paper starts with the Section II
achieved either through introduction of energy efficient describing the existing EH concept and its modelling
equipment (e.g. efficient boilers, pumps etc.), which framework. The Section III introduces basic features of
represents a costly solution, or employment of an energy the DSM approach, depicting its benefits as well as
management (EM) solution aiming at optimization of application limits varying case-by-case. Merging of the
energy flows upon existing equipment, requiring only two concepts is elaborated in Section IV, where a
additional ICT support. The objective of this paper is to complete evolution process is described in four sub-
propose such EM solution which might be integrated over sections. Starting from the sub-section A, a baseline
existing Supervisory Control and Data Acquisition scenario, which represents the case in which no
optimization is performed, is introduced. In this scenario
(SCADA) systems and considers energy dispatch
the end use energy demand is satisfied directly from
optimization of complex multi-carrier energy energy carrier which offers the highest conversion
infrastructures. The existing Energy Hub (EH) concept efficiency. The following is the sub-section B depicting
offers the modelling of energy flows from different the scenario where only DSM approach is applied. The
energy carriers while satisfying the requested user corresponding optimization process is, therefore,
demand [1]. The concept leverages on the conversion performed only at the demand side through systematic
potential of a specific, constrained, domain referred as modifications of requested loads with respect to given
Hub which serves as a point of coupling between existing constraints corresponding to each energy carrier. Next is
energy supply infrastructures and energy end use. The the sub-section C which elaborates the scenario where EH
Hub basically represents a set of energy converters and/or concept alone is involved. Contrary to the previous
storages which is responsible for delivering required scenario, this one includes optimization of the supply side

Page 67 of 478
ICIST 2014 - Vol. 1 Regular papers

exploiting the conversion potential of particular entity. using the conversion elements (C), allowing for
Finally, the sub-section D reveals the fourth scenario conversion from electrical towards thermal energy and
which merges the previous two and offers optimization of vice versa, and/or energy storages (Ė), such as batteries,
both supply and demand side. All four scenarios are ultra capacitors, fuel cells for electricity or boilers and
simulated, using an example Hub configuration, and phase changing materials for thermal energy, while taking
valuable simulation results are presented within Section into account the storage efficiencies depicted with
V. Finally, the paper is concluded and the results are coupling matrix (S). Passing through the Hub, depicted by
summarized in Section VI. the conversion and/or storage matrix, power from the
supply side is fed to the demand, loads (L), typically
II. ENERGY HUB CONCEPT represented with electricity and heating/cooling loads.
As introduced, the existing EH concept models energy However, with the introduction of renewable energy
flows from different energy carriers aiming to satisfy the sources and the neighbourhood concept, additional energy
requested user demand by taking the advantage of the flows (vectors) should be defined as well. Apart from the
conversion potential of specific Hub. The overall concept power input (P), a vector comprising of all local energy
is presented in Figure 1, depicting both basic Hub production (R), such as photovoltaic, wind turbines for
elements and its renewable energy extension, extensively electricity and/or solar thermal and geothermal for thermal
elaborated in the following. The EH concept has originally energy, is added at the input. The output of the Hub,
previously depicting the loads is now extended with
foreseen only downstream energy flows going from inputs
(left), i.e. energy supply infrastructures, towards the neighbourhood loads (N), preserving the same distribution
output (right), the energy end use, passing through the between electricity and heating/cooling loads, which allow
matrix of conversion and/or storage elements which the Hub to feed (export) the surplus of energy towards the
enabled fulfilment of the loads from wide range of energy neighbourhood, which is considered to be another similar
carriers. However, considering the addition of renewable entity or a piece of power infrastructure. Finally, the
energy sources, meaning that there is uncontrollable complete Energy Hub model equation, defined in [2], is
energy generation, it is also important to enable energy given in the following:
export, i.e. the upstream of energy, which can have strong i P + R
( L + N ) = C ( P + R) − S E = [C − S ]  i 
economical and/or environmental benefits attached to it.
 E 
This is done by means of “neighbourhood loads”, which
may represent a similar Hub like structure in its vicinity or Considering the flexibility and generality of such
another complex energy infrastructure comprised of wide modelling approach, a Hub concept can be applied to an
range of energy carriers such as electricity (power grid), entity ranging from single residence up to an entire city or
gas (gas network) and etc. country.
A Hub, from a mathematical perspective, is represented
as a matrix which includes, in the most generic case, III. DEMAND SIDE MANAGEMENT
elements which enable conversion of all supply energy Considering the increasing trend of cutting down
carriers into any of the load carriers. Moreover, in the case energy purchase costs a concept of demand side
where storages are taken into account, each carrier is management (DSM), which aims at changing the energy
associated with its storage unit which acts as energy buffer end use, was introduced. Moreover, the DSM approach
at the cost of storage efficiency, and the corresponding offers different mechanisms to alter energy consumption
storage matrix is considered as well. Considering the profile and/or improve end use equipment efficiency, in
illustration of Hub, the power input, comprising of order to reduce operation costs of the consumed energy
conventional (P) energy sources, such as electricity power [5]. The DSM measures are most often undertaken by the
grid, natural gas, district heating, fossil fuels etc., is end user, but can also be initiated by an energy provider
supplied to the Hub. The input power is then transformed itself. It usually includes increasing or decreasing the
requested loads, shifting them from high to low tariff
periods in case a variable tariff scheme is applied (e.g.
moving loads towards off-peak periods such as during the
nights, weekends) etc. Finally, the DSM concept
encompasses a set of actions which may be divided in two
main categories:
i. energy efficiency improvement and
ii. load management.
Starting with improvement of the energy efficiency, it
implies delivering the same quality of service (satisfying
the requested loads) for less energy. Therefore, these
actions consider reduction of the energy consumption
through utilization of energy efficient equipment (such as
energy saving lighting devices, more efficient air
conditioning units, circulation pumps etc). Hence, they are
reducing the overall energy consumption and indirectly
the demand peak which is one of the main targets of
DSM. On the other hand, DSM is also applied through the
load management upon which the methodology presented
Figure 1. Energy Hub with renewable energy in this paper will be leveraged on.

Page 68 of 478
ICIST 2014 - Vol. 1 Regular papers

The load management represents any intentional corresponding energy conversion and status of storage
modification of the load profile aiming to reach a given systems, it was believed that additional optimization of
objective. Usually, this objective considers reduction of energy end use may achieve even greater savings.
the overall operation costs, taking the advantage of The hybrid concept can, therefore, be represented as a
dynamic energy pricing schemes (peak and off-peak single Hub with additional DSM optimization engine
hours), but lately more and more ecological and which takes into account various load management
environmental criteria, such as reduction of GHG techniques such as peak shaving, load shifting, valley
footprint, are influencing the load management objectives. filling etc. However, having in mind that the EH concept
In order to reach a predefined objective, load management already takes into account the dynamic pricing for each
considers various actions ranging from shifting the load energy carrier, it basically “moves” their consumption
profile in time, changing its instantaneous levels or towards to the time intervals with lower energy prices.
altering its cumulative sum. Considering the nature of Therefore, initially there was a reasonable doubt if the
energy production and supply processes, any occurrence introduction of DSM in the overall optimization will lead
of peaks (or sudden drops) in the load profile usually to the improvement of performance at all. Furthermore,
incurs additional costs (non-proportional to delivered another issue has been considered, i.e. if this hybrid
energy) from the perspective of an energy provider. solution will bring enough improvement that would justify
Having in mind that these costs are then forwarded the introduction of another optimization engine, which
towards the end user, a successful tackling of peaks will certainly increase the computational efforts and
encompassing control, curtailment and/or shifting of the extend the simulation process. However, this issue is
load, may assume great savings in the operational costs. highly dependent on the actual case and some figures for
However, considering that it is rarely allowed to apply comparison will be presented in the following section.
these actions over the entire load, which considers In order to properly evaluate impact of both EH and
electricity, heating and cooling yielded by different critical
DSM optimization, a baseline energy dispatch scenario
building systems, such as air conditioning system, lighting
was set-up, where no energy dispatch optimization was
system etc., it is first necessary to identify and performed. This baseline scenario then gradually evolved
appropriately categorise different types of loads.
passing through three characteristic steps, encompassing
With respect to the mentioned and considering the different energy dispatch optimization strategies, as
perspective of the DSM application, loads can be following:
categorised according to the following breakdown [5]:
a. no EH, no DSM – baseline scenario in which
• critical load – should not be influenced (typically there is no energy dispatch optimization;
power supply of fundamental operation), b. No EH, DSM – scenario in which only DSM
• curtailable load – could be reduced (the optimization was performed, thus representing
temperature set-point of the air conditioning the “demand side optimization”;
system could be lowered during periods of high c. EH, no DSM – scenario in which only Energy
electricity price or if contracted peak Hub optimization was performed, thus
consumption is being approached), representing the “supply side optimization”;
• reschedulable load – could be shifted (forwards d. EH, DSM – scenario in which both Energy Hub
or backwards) in time (pre-cooling of a building and DSM optimization were performed, thus
can be performed early in the morning before achieving both supply and demand side
there is an actual cooling demand). optimization;
Having in mind the above listed categories, Validating the proposed concept, at least theoretically,
identification of the curtailable and reschedulable loads requires simulation of different optimization strategies
for each particular case is a prerequisite in order to select which is presented in the following section.
and apply suitable DSM measure.
When it comes to the implementation of DSM, various V. SIMULATION AND VERIFICATION OF CONCEPT
paradigms were used so far ranging from the manual
analysis of load profiles and unit commitment allocation The proposed hybrid concept was tested on the use case
patterns towards the automatic load profile search represented with a simple Hub with two supply energy
leveraging on artificial intelligence (pattern search, GA, carriers (P), two renewable energy sources (R) and the
PSO etc.). For the purposes of this paper a GA based two loads (L) together with the corresponding conversion
DSM implementation was hired to discover the optimal elements (C) and with no storage elements, as depicted in
energy load profile over a given time window, described Figure 2. All four optimization strategies were simulated
in detail in [6]. The selected approach takes into account under the same conditions, i.e. the same energy pricing
building load forecast, applicable tariff scheme for each
considered energy carrier (e.g. electrical energy, natural
gas, fossil fuel etc) as well as set of constrains, depicting
the modification limits coming from the nature of a load.

IV. EXTENDING ENERGY HUB WITH DSM


The aim of this section is to try to evaluate, and
eventually justify, the merging of the EH concept with
another, well known, concept of DSM. Although the EH
concept alone can reduce the operation costs significantly,
owing to the optimization of entering energy flows, Figure 2. Use case scenario

Page 69 of 478
ICIST 2014 - Vol. 1 Regular papers

scheme, see Figure 3, renewable energy contribution, see


Figure 4, and finally the same requested energy demand,
see Figure 5– a.2, during a twelve hours period during the
day (06 – 17 h). Although some procedures, considered as
part of DSM concept, suggest decrease or even increase of
overall energy demand, in the scenarios where DSM
optimization was performed (b. and d.) it was considered
that the total energy demand per carrier remained the same
as in other scenarios. In this way, different scenarios could Figure 3.Variable energy pricing
be meaningfully compared and benchmarked. The
simulation results are jointly depicted in Figure 5
representing the typical optimization output, which
includes the time distribution of energy supply and
demand. Each row represents the corresponding
optimization strategy, whereas left column reveals the
supply and right the demand profile.
A. No EH, No DSM
Comparing the supply and demand, an unusual
mismatch between electricity demand and electricity
supply can be noticed at the first sight. However, this Figure 4. Renewable energy contribution
difference comes from the fact that the renewable energy
sources have significant contribution in the overall
Finally, a Genetic Algorithm (GA) based approach was
electricity supply. The same effect can be seen at Figure 6
used for the “systematic search” for optimal energy
where the time distribution of total energy dispatch costs
is depicted. Namely, considering the peak energy demand profile, within given constraints. This generic
production, coming mainly from the PV plant, the costs optimization framework delivers very good results and
offers rather high flexibility when it comes to the
around the mid day are minimal. Furthermore, it should be
constraints for a given optimization problem. The GA was
emphasized that in this particular setting of the Hub,
implemented by setting the total loads as a single
possibility of energy export (mainly renewable) was not
taken into consideration. In spite of stimulating prices for individual of the population. This includes all carriers
throughout the desired time frame. A set of these
energy export, described within a country’s feed-in tariff
individuals represent a population which evolves, through
scheme, the main objective of these simulations was to
test different optimization procedures that leverage generations (namely the iterations), to an optimal solution.
themselves mainly on the Hub’s conversion capabilities as For the purpose of this example simulation the
well as efficient demand management. following GA parameters were adopted:
- 2 elite individuals
B. No EH, DSM - 60% population crossover
This scenario considers sole application of DSM - 40 % population mutation
procedures for the optimization strategy, and represents
the first step towards the finally proposed solution. These - 100 individuals
procedures are usually implemented through heuristics - (each individual 24 values)
derived from an actual energy bill analysis, performed by - 500 generations
an energy manager at particular site, or some kind of The size of population, as well as number of
systematic search for appropriate energy demand profile generations, can be arbitrarily defined for each GA. They
that will yield the minimum costs. It should be usually depend on the convergence nature of the actual
emphasized though that the level of freedom associated optimization problem, i.e. the fitness function. Also, the
with this search is limited due to constraints related to two parameters suggesting the percentage of population
initial demand profile. These constraints actually reflect for crossover and population can steer the optimization
the nature of particular demand profile, i.e. its breakdown process towards either local or global optimal solution. On
to critical, curtailable and reschedulable loads. Therefore, the other hand, the greatest impact on the duration of
before applying the proposed optimization a careful optimization process lies in the number of individuals as
demand profile characterization should be performed and well as number of generations. Naturally, the higher these
the corresponding constraints should be defined. numbers are, the better will be the final solution.
For the purpose of this example the following were Nevertheless, a high number of individuals and/or
adopted: generations cannot always be justifiable with the achieved
- Total load per energy carrier remained the same. results. Therefore it is a good practice to run the
- Electricity load was allowed to vary within optimization for a large number of individuals for many
relative margins of ±3 p.u. with additional generations and to observe the evolution of the solution.
constraint of maximum absolute value of 10 p.u. Usually, the solution tends to reach rather acceptable level
after just a couple of generation and save the precious time
- Natural gas load was allowed to vary within and computational effort. Following the labelling
relative margins of ±3 p.u. with no additional considerations given in the baseline scenario, the
constraints for maximum absolute value. simulation results are depicted in the second row of Figure

Page 70 of 478
ICIST 2014 - Vol. 1 Regular papers

Figure 5. Energy supply (left-1) and demand (right-2) distribution:


a. No EH, No DSM; b. No EH, DSM; c. EH, No DSM; d. EH, DSM;

5 and Figure 6. It is obvious that the application of DSM actually used for satisfying the thermal loads instead of
resulted with the flattening of peaks in the demand profile natural gas. This is the consequence of the difference
as well as distribution of costs. between of purchase prices, in the corresponding time
period, which makes the use of electricity more viable
C. EH, no DSM solution, even considering the lower conversion
The third scenario leverages on the utilization of EH efficiency.
concept, meaning that optimization process is performed
only at the supply side, while taking also into account D. EH, DSM
different conversion possibilities and applying dynamic Finally, the fourth scenario combines the previous two
energy prices. Simulation results are presented in the third approaches by performing the optimization both at the
row of Figure 5 and Figure 6. Although not presented in supply and demand side, meaning that both profiles are
the figures, part of simulation outputs also represents the found as the result of the optimization. For the
set of dispatch factors, determined at each time step of the optimization of supply side EH concept was utilized
simulation. These dispatch factors are represented by a whereas for the demand side a previously mentioned GA
matrix, for each time step, suggesting the breakdown of algorithm, with the same definition of constraints, was
each energy carrier for satisfaction of each load. hired. Among others, the most important constraint,
The results show radically increased usage of electricity saying that the total energy delivered per load remained
in the first half of the day comparing to the baseline constant before and after the optimization, was hired in
scenario. Considering that both electrical and thermal order to make a reasonable comparison with previous
loads remained unchanged, this excess of electricity is cases. The results, depicted in the fourth row of Figure 5

Page 71 of 478
ICIST 2014 - Vol. 1 Regular papers

second (b.) and the third (c.) scenario depends heavily on


TABLE 1. these parameters.
SIMULATION RESULTS COMPARISON
Optimization Costs (m.u.) Savings (%) VI. CONCLUSION
a. No EH, No DSM 419.61 baseline Need for an integrated optimization of multi-carrier
energy systems is greater than ever, considering high
b. No EH, DSM 369.72 11.89 operation costs and availability of energy supply in peak
c. EH, No DSM 357.82 14.73 time periods. Concept named Energy Hub was considered
as one of the solutions offering multi-carrier optimization
d. EH, DSM 317.92 24.23 as well as flexible modelling framework of the energy
infrastructure. However, this concept is aiming to
completely satisfy the end use energy demand, and thus
leverages its optimization potential on better management
of energy supply and conversion capacities of the system.
This paper extends the proposed concept with
complementary optimization of the demand side while
offering higher saving potential. The presented simulation
results show that integrated solution may produce savings
of up to 25%, depending on the layout of the system,
applicable energy pricing and the demand profile,
comparing to the baseline case where no optimization is
used. These savings, however, do not come without a
Figure 6. Comparison of total dispatch costs distribution compromise from the user side, which is required to
comply with certain changes in the demand profile.
and Figure 6, show even without a detailed analysis how
the demand profile is balanced contrary to the supply ACKNOWLEDGMENT
profile, which basically follows the curve of energy prices The research presented in this paper is partly financed
for each energy carrier, thus achieving the lowest energy by the European Union (FP7 EPIC-HUB project, Pr. No:
dispatch costs. 600067), and partly by the Ministry of Science and
Finally, the simulation results from all four scenarios Technological Development of Republic of Serbia
are offered in Figure 6, where comparison of the time (SOFIA project, Pr. No: TR-32010).
distribution of yielded operation costs is given for
strategies in parallel, and Table 1, where the total energy REFERENCES
dispatch costs are presented together with potential [1] P. Favre-Perrod, M. Geidl, B. Klöckl and G. Koeppel, “A Vision
savings. It is immediately obvious that the joint of Future Energy Networks”, presented at the IEEE PES Inaugural
optimization of both supply and demand side, proposed in Conference and Exposition in Africa, Durban, South Africa, 2005.
this paper, is well justified. Moreover, the table suggests a [2] M. Geidl, “Integrated Modeling and Optimization of Multi-Carrier
decrease of dispatch costs up to 25%, for considered Energy Systems”, Ph.D. dissertation, ETH Diss. 17141, 2007.
scenario. Although these values may greatly vary [3] B. Klöckl, P. Stricker, and G. Koeppel, “On the properties of
depending on many parameters such as energy price stochastic power sources in combination with local energy
profile per carrier, the desired energy demand (load storage”, CIGRÉ Symposium on Power Systems with Dispersed
Generation, Athens, Greece, 13-16 April, 2005.
profile) and the Hub architecture, the benefits coming
[4] M. Schulze, L. Friedrich, and M. Gautschi, “Modeling and
from this joint optimization approach are undoubted. Optimization of Renewables: Applying the Energy Hub
The above table also suggests the order of performance Approach”, IEEE conference, July 2008.
for different optimization strategies, which should be [5] Nörstebö V.S., Demiray T. H. et al., EPIC-HUB Deliverable D1.3
considered only conditionally. The first (a.) and last (b.) - Performance indicators, 2013.
strategy yield the highest and lowest costs, respectively, [6] Nikola Tomašević, Marko Batić, Sanja Vraneš, “Genetic
regardless of the previously mentioned dynamic Algorithm based Energy Demand-Side Management“, in
evaluation for ICIST 2014, Kopaonik 9.-12.3.2014, Belgrade,
parameters. However, the relative order between the
Serbia.

Page 72 of 478
ICIST 2014 - Vol. 1 Regular papers

Server Selection for Search/Retrieval in


Distributed Library Systems
Miroslav Zarić*, Branko Milosavljević*, Dušan Surla**
* University of Novi Sad/Faculty of Technical Sciences, Novi Sad, Serbia
** University of Novi Sad/Faculty of Science, Novi Sad, Serbia

{miroslavzaric, mbranko, surla}@uns.ac.rs

Abstract—This paper presents one approach to solve server But existence of large number of document servers
selection problem during search in distributed library presents new challenges. To successfully perform search
systems. over a network of available document servers, user should
know, at least, access URL to every relevant document
Information retrieval systems are aimed at providing collection, and furthermore the details of its operations
infrastructure and means of finding specific documents in a (such as query language implemented). Distributed
collection, such that they satisfy specific information need. information systems provides a new component, usually
Distributed information retrieval systems aims at providing called search broker, that is meant to work as an interface
the same capabilities, but in an environment of distributed point between users and remote document servers. This
and heterogeneous document repositories. In such an system works as a specialized client module, receiving a
information retrieval system, an important step is server user query in one notation, transforming it into appropriate
selection i.e. selection of servers that will be included in a language for each document server, and collecting results
search operation, and to whom the query will be sent. from them, and representing it to the user.
Other problems that are specific to distributed information
retrieval systems are query formatting for different servers, It is evident that any distributed information retrieval
and results merging and ranking. These are special class of system needs to solve at least following problems:
problems, and are not the subject of this paper. One of the  Document server representation and server
first institutions to massively adopt information retrieval selection during queries
systems were libraries. Currently, almost every library has  Result retrieval, and duplicate removal
an online search capability. Using these search capabilities,
a client application can perform search across a network of  Consolidated ranking
library servers. This paper is focused on a method for server Each of these tasks presents a distinct research area.
selection in such search scenario. More on distributed information retrievals methods can
be found in [2].
I. INTRODUCTION This paper concentrates on the first problem,
Information retrieval systems are nowadays regularly specifically in an environment of library servers. Libraries
used for almost every search we perform on the Internet, have traditionally been one of the first adopters of
and in various business information systems. An information retrieval systems. Although the task of search
information retrieval system needs to provide efficient and retrieval is similar, library information systems have
processing of large collections of documents, an efficient some specifics:
search algorithm, and result ranking. Each information Instead of holding full text documents library
retrieval system implements some specific document information systems usually contain library records, most
representation model and search model [1]. recently in MARC 21 [3]. These records contain all
Distributed information systems, essentially represents relevant data about some library item (such as
a group of computer hardware components working bibliographic data, location data). Searching in library
together to fulfill a desired goal. In distributed information information systems is performed over a collection of
systems, typically, no hardware resources are shared, and these records. These records have well defined structure
each computer is managing its own resources, while that enables some more guided search. In modern days,
cooperation is achieved on a logical level by entangling and with the advent of digital libraries, these distinctions
software components running on different computers are blurred, since they contain library records as well as a
(nodes). Distributed information systems are meant to be full-text, electronic versions of documents.
transparent for users, i.e. user should not be aware whether Although there are different implementations, most
his request is handled by local computer, or by distributed library information systems use standard Z39.50 protocol
system. [4] for search and retrieval. But its use does not guarantee
Distributed information retrieval systems are focused general compatibility, the problem that will be discussed
on providing information retrieval capabilities over a vast later. In a recent period an approach to adapt this
network of document servers – servers that holds commonly use protocol to new, internet environment have
document collections, as well as some tools for access, given rise to SWR/SRU protocol, also implemented by
search and retrieval of documents in that collection – increasing number of libraries.
practically an implementation of a standalone information But even in such an environment, of library servers,
retrieval system. where good and broadly adopted standards exists, there is

Page 73 of 478
ICIST 2014 - Vol. 1 Regular papers

a need to perform the same, previously defined, steps as in b) non-cooperative environment, when only
any other distributed information retrieval system, information available at run time is list of results
although some problems will be less intensive (usage of obtained from a specific server.
common query language and communication protocol There are also some server selection methods that are
simplifies communication from query broker to specific based on query clustering or relevant document
server). distribution model. These methods are trying not to rank
the servers itself, but to predict the number of relevant
documents on each server.
II. SERVER SELECTION IN DISTRIBUTED INFORMATION Server selection problem is not only inherent to
RETRIEVAL SYSTEMS information retrieval system, but exists in other commonly
Most server selection methods operate as methods for used distributed systems, such as P2P networks, IPTV
ranking information gathered about the servers. Usually networks etc.
distributed systems adopt one of the methodologies for
document ranking, and adopt it to ranking servers. There
is, obviously, a need to define how information about the III. PERFORMING SEARCH IN
servers is represented, and to adopt chosen document DISTRIBUTED LIBRARY SYSTEMS
ranking model to that representation. Search is a feature most commonly used in library
In an analogy to document ranking, here we perform information systems. Most of the library information
server ranking for every performed query. After an initial systems provide an online access point through which the
ranking is established, further searches, based on the query search of its catalogue can be performed. Library systems
criteria are routed only to those servers that are expected traditionally contain only library records used to describe
to return valuable result for the query. Usually, the search and locate holdings in the library. In a classic library, user
query is passed to n best ranked servers in the list. This will perform search and get information whether some
approach can greatly improve performance of overall item exists or not in a chosen library. Some library
system, lower the network traffic, as well as other systems are enhanced to support item reservations through
resource requirements, while producing the same or some online tools. Digital libraries allow users not only to
equivalent amount of relevant results. gather information about specific item, but in some cases
Best known algorithms for server selections are CORI to download electronic version of document.
[5], CVV [6], bGlOSS, vGlOSS [7], and weak sampling Generally there are two distinct types of users using
techniques [8]. library information system search capabilities. Ordinary
CORI system was one of the first systems to introduce users, using the search to locate if an item exist and
server selection. The server selection is based on whether it is available. Library staffs use the search
adaptation of well known tf-idf norm for ranking capabilities to perform various catalogue-maintenance
documents. In this case a document frequency df and related tasks. One of these tasks is very important for
inverse collection frequency icf are used to rank the overall performance of library information system –
collections. cataloguing task. It is important since it affects search
bGlOSS, and vGlOSS were introduced as models for capabilities for other users.
ranking servers implementing Boolean and vector based As the main intentions of these users are different, so
retrieval model, respectively. Document frequency and are the query they perform. While an ordinary user,
collection size are used as a basis for calculating server searching for a book, may be well served by searching the
ranking. Since the exact collection size is usually local catalogue (after all, the user will primarily be
unavailable, some estimation methods are used to evaluate interested to find an item in a local library), librarian,
the size of collection. performing cataloguing duty, will not be served well at all
CVV (Cue Validity Variance) evaluates query terms, in if the search is confined to local catalogue only. During
such a manner that terms that are better discriminatory the cataloguing process, librarian already has a copy of an
values for servers gain a higher weight. Therefore, using item (a book for example) and knows that it is a new entry
weight of such term the importance of the server is to the catalogue – that needs to be properly described by
pondered for query containing the term. an associated MARC record. In order to reduce amount of
With weak sampling method, for any complex query, a time needed to populate MARC records, it is highly
beneficial if the librarian can get a hold of an existing
short two-term sample query is sent to the server. Results
of this sample queries are used to calculate server rank. MARC record describing the same item, presumably from
another library system. In this case local search will yield
This method assumes that server will provide additional
data, alongside result, to allow for runtime rank no relevant result, and search over a network of library
calculation. catalogues should be performed. Such an operation – of
using existing library record to amend it and incorporate
Server ranking can be enacted in: in local catalogue is called copy-cataloguing. Apart from
a) cooperative environment (when each queried time saving, this approach has additional benefit of
server response contains not only relevant increasing the overall completeness of records. The
documents, but also additional information, such as quality of the MARC records is a debatable issue,
document ranking, total number of hits – discussed in many papers, and also part of the research
information that can be readily use for server conducted in [9].
ranking) Since there are many library servers available,
librarians will tend to use those server that are providing
the most complete records, and most librarians will in

Page 74 of 478
ICIST 2014 - Vol. 1 Regular papers

time develop their preferred list of target servers from indiscriminately to all servers. If the server
which to retrieve records. However, initially, and later in responded without an error – that attribute is
some occasions when rare or new book is catalogued, supported on the server.
there is a need to perform search over larger number of - Using Explain service of Z39.50 as long as servers
library servers in order to obtain some records. are implementing it.
Z39.50 protocol is used as a standard for - Use an existing Z39.50 target directory, to get
communication between library information systems. It information about server capabilities.
provides facilities to initialize communication link, First option is simple enough, but requires large number
perform search, and for presenting results to the client. of queries to multiple servers before we can produce
The protocol allows for use of different query languages, knowledge base for further selection.
with query language Type 1 as mandatory. As an
alternative, newer systems also support use of SRU Second option allows for automatic reconfiguration of
protocol [10] which uses CQL as a query language. This client application to fit with capabilities of target server.
protocol is also standardized under the guidance of That would enable that some attributes are automatically
Library of Congress. disabled if target server is incapable of processing it.
Although Explain service of Z39.50 is exclusively
The system presented in this paper is part of continuous intended to transmit information about server capabilities
development of BISIS library system. As an integral part to the client side, the downside is that it is not required.
of the system, a client application for search and retrieval So, a number of servers simply does not implement
has been developed [11]. It allowed users to perform Explain service. Table 1 gives an example of support for
search to one remote library system at a time using Explain feature among 2052 servers listed in The Z39.50
Z39.50. In later upgrading, described in this client Target Directory (http://irspy.indexdata.com/)
application was enhanced to allow simultaneous use of
both Z39.50 and SRU protocols, and to allow parallel Third option – existing directory of available Z39.50
querying of multiple servers. This change required for targets, is good starting point since they represent an
query adaptation from Type1 query for Z39.50 compatible aggregated list of server descriptions, with different level
servers to CQL for SRU servers. This module is described of details. There are several public directories of library
in details in [12]. servers available, such as:
• The Z39.50 Target Directory
http://irspy.indexdata.com/
IV. SERVER REPRESENTATION AND RANKING • LOC – Gateway to Library Catalogs
Introduction of support for simultaneous queries to http://www.loc.gov/z3950/
different servers has introduced several problems that, • MUN Libraries Library Z39.50 destinations - Queen
although foreseen, need to be addressed. One of the most Elizabeth II Library Memorial University of
important is different level of support from different Newfoundland
servers regarding different attributes (for example, some http://staff.library.mun.ca/staff/toolbox/z3950hosts.htm
servers may support query by ISBN or ISSN, others do • Directory of Z39.50 Targets in Australia
not). We will concentrate on Z39.50 servers since they http://www.nla.gov.au/apps/libraries?action=ListTargets
represent a vast majority of servers available for querying. • Z39.50 Hosts Available for Testing
If we do not know which use attribute is available on http://www.loc.gov/z3950/agency/resources/testport.html
which server, sending the same, complex query containing
different attributes, to multiple servers usually produce
large quantity of erroneous connections, i.e. the response For the development of the system presented in this
from server was an error message stating that server is not paper, the first directory has been used, since it provided
capable of processing request on some attributes. In order an XML format for server descriptions (the same format
to obtain information which server is supporting which that would be provided by Explain facility), and has more
attribute we have several choices: than 2000 servers listed. The list is regularly updated.
Additionally, alongside basic server description data, this
- Incremental gathering of server description data.
directory also has a host connection reliability measure.
Initially we send all simple (one attribute) queries
This measure is calculated as a percentage of successful
connections to target servers in last three months.
Explain Category #Targets supporting Previous search sessions can be a source of valuable
1. TargetInfo 173 (8.43%) data about each server performance. Even erroneous
2. DatabaseInfo 173 (8.43%) connections provide valuable input that can be used to
calculate usability of the server for future searches. Paper
3. CategoryList 172 (8.38%)
[13] takes into account network parameters for accessing
4. AttributeDetails 169 (8.23%) each server.
5. AttributeSetInfo 134 (6.53%) Analysis of previous search sessions can provide
6. RecordSyntaxInfo 120 (5.84%) following data:
7. SchemaInfo 34 (1.65%) • Total number of queries in which server
8. TagSetInfo 34 (1.65%) connection was invoked
9. Processing 26 (1.26%) • Total count of successful connections
• Total number of errors
10. TermListInfo 2 (0.09%)
• Mean query response time of the server
Table 1. Support for Explain facility

Page 75 of 478
ICIST 2014 - Vol. 1 Regular papers

• Total number of results returned V. IMPLEMENTATION AND TESTING


• Total number of results from server that has been The proposed system is implemented in a client part of
selected for further usage by the user BISIS application. An XML configuration file is used to
We can assume that N queries have been submitted in form the list of available servers. This XML file contains
total, and that M servers are available. all relevant server data, such as server name, URL, port,
For each server i one can record the total number of supported access points (attributes that can be used for
invoked communication sessions ni. Initially, without query terms matching). This XML file also contains data
prior knowledge, we can assume that all queries will be representing all information gathered about the server
sent to all servers. As number of search queries grows, from all previous search/retrieve sessions. Table 2
librarian will tend to restrict server list to those server that displays servers support for use of different attributes in
provided valuable information in prior sessions. Hence, search queries. Since server list provides information
the total number of requested communications can be used about supported attributes, the client module has been
as an indirect measure of importance one user gives to altered so that servers are now automatically excluded
specified server, regarding his common queries. Even if from available server list if a not supported attribute has
server is automatically excluded from some search been selected for the given query. This feature represents a
sessions, due to unsupported attributes, this notion of server list filtering based on a query formulation.
importance still holds its place, since selected query To test the effect of server filtering, series of 50 queries
attributes also represent user’s habits or preferences when on different attributes (and combinations) has been run,
forming the query. We can create a measure of importance without server filtering and with server filtering. The list
given to this server by specific user impi=ni / N. of 120 servers has been compiled from a full list of 2000+
Using the number of requested communications ni and available servers. The same servers were used in both run.
number of erroneous responses ei from any server i the The results are given in Table 3. Although server
measure of overall relative reliability can be calculated as capabilities are taken into account, it did not completely
reli=1-ei /ni. As a starting reliability measure for servers, removed errors. The actual cause of errors may be
the one obtained from server directory list is used. different, and not restricted to supported attributes. It may
be that server is temporarily unavailable, or that given
Since, without prior knowledge, we can not estimate the
address is no longer accessible. However since the main
size of each collection, we can create an indirect relative
goal of the server filtering (and ranking based on proposed
measure based on total count of results returned from an
criteria) is to compile a list of best performing servers, for
ith server (ri), and cumulative total count of results from all
future use, the actual cause of remaining errors were not
servers R. Calculated value rset,i=ri / R represents relative
further investigated, but existence of these errors were
contribution of given server to total result set.
taken into account when calculating each server
If we want to take into account a response time of the performance vector.
server, we can measure total time tuk,i it took the server i to
Additionally, a low level change in communication
complete ni search queries. Mean response time of the
module has been introduced. Original version of the
server in that case can be calculated as tsr,i=tuk,i/ni. To put
communication module fires simultaneous connections to
this measure into relation with other servers, we can
multiple servers. In altered version, new configurable
compare it to cumulative mean response time calculated
𝑀 property maxActiveConnections has been introduced.
𝑖=1 𝑡 𝑢𝑘 ,𝑖
as 𝑇𝑠𝑟 = 𝑀 𝑛 .
𝑖=1 𝑖 # Attribute Name # Targets
Relative response time of a server (server speed), 1 4 Title 1523 (73.18%)
compared to group of servers, can now be computed as 2 21 Subject heading 1494 (71.79%)
trel,i=tsr,i / Tsr. We can further normalize this value to bring 3 1003 Author 1487 (71.45%)
it to the range [0,1]. For this purpose, a sigmoid function 4 7 ISBN 1316 (63.23%)
can be used. 5 8 ISSN 1261 (60.59%)
Finally we can take into account the information about 6 5 Title series 1245 (59.82%)
number of records that has been copied into local system 7 1016 Any 1227 (58.96%)
(retrieved records for copy cataloguing). If the total 8 1 Personal name 1196 (57.47%)
number of retrieved records is Rr, and number of records 9 12 Local number 1175 (56.46%)
retrieved from server i is rr,i than relative contribution of 10 13 Dewey 1088 (52.28%)
server i to total set of retrieved records is rret,i=rr,i/Rr. This classification
measure is also an indirect measure of “quality” of records 11 2 Corporate name 1074 (51.6%)
from that server, from a user point of view (user will tend 12 31 Date of publication 1067 (51.27%)
to pick those records that are similar to its cataloguing 13 3 Conference name 1048 (50.36%)
needs). 14 54 Code--language 994 (47.76%)
Based on these measures each server i can be 15 6 Title uniform 952 (45.74%)
represented with a performance vector 16 1007 Identifier--standard 926 (44.49%)
si={impi, reli, rset,i,trel,i,rset,i}. 17 33 Title key 922 (44.3%)
We can create a vector of “maximum performing” 18 16 LC call number 896 (43.05%)
server, and compare all other servers to it. Standard 19 1004 Author-name 892 (42.86%)
measure of cosine similarity can be used, or any other personal
method used in vector based models. As more and more 20 9 LC card number 853 (40.98%)
search queries is performed, best performing servers will Table 2. Server support for different access points
be ranked better. (search by specific attribute)

Page 76 of 478
ICIST 2014 - Vol. 1 Regular papers

Errors Errors effort to bring it in concordance with local cataloguing


use attribute (server (server practice. Therefore, not only overall completeness does
filtering off) filtering on) matter, but also existence of certain fields and even the
ISBN 46% 35% style used to enter some data. This problem has been
Personal name 40% 24% further addressed in [9].
Title 33% 24%
Author 34% 21%
Subject heading 30% 20% CONCLUSION
Author+Title 45% 30% This paper presents one approach to solving server
Table 3. Effect of a server filtering on number of selection problem, a common step in performing search in
erroneous server responses any distributed information retrieval system. In this case it
is implemented on client application for Z39.50 and SRU
protocols, commonly used in library information systems.
In case when all 120 servers have been included in the Tracking of different performance measures, and ranking
search, the original version would consume following based on these measures are proposed. Data gathered
resources: about the server capabilities and its performance during
- Total number of threads: 131 the previous search sessions are used to estimate its
- Peak memory utilization: 66132 KB relevance for future searches, based on the attribute set
used in the query. This approach indirectly gives an
- Mean memory utilization: 58544 KB opportunity to tailor the server selection according to
The altered version, with maxActiveConnections set to individual users preferences, since some of the measures
10 consumes following resources: are directly affected by the choices user has made on
- Total number of threads: 26 returned results. This enables the client application to be
- Peak memory utilization: 51896 KB personalized to reflect the user’s preferences.
- Mean memory utilization: 51004 KB Taking into account server capabilities, taken from
server directory list, a filtering of the available servers can
be performed, thus reducing the number of
The total time required to perform search and present a communication links that will certainly result in errors.
result to the user has not been noticeably changed. With Additionally, ability to set the number of active
this optimization in place a minor reduction in number of connections can reduce the resource usage. Server ranking
errors has been registered. On average there were 3 errors can be used to limit the number of servers that needs to be
less. This suggests that minor number of errors were queried, but still to be able to get result most relevant to
produced by too many communication threads running the user. Furthermore, delayed start of communication
simultaneously. threads gives an opportunity to stop the search if some
After initial sets of 50 queries on given attributes have predefined numbers of records are already retrieved from
been run, the server statistic is already formed so ranking best ranking servers. This server ranking system is further
of servers could be taken into account. To test if ranking is strengthened when used in combination with record
presenting relevant servers at top of the list additional ranking algorithm.
queries were run.
Different queries have been run. Most of the queries
used ISBN attribute (since it is the most common query REFERENCES
attribute used for searching the records during the copy- [1] Manning, Christopher D. and Raghavan, Prabhakar and Schtze,
cataloguing). On average, if the query is run only on Hinrich, Introduction to Information Retrieval (New York, NY,
servers that have ranked as 100% relevant, we could get USA: Cambridge University Press, 2008).
about 55% of all results returned by a full, non-filtered set [2] Nicholas Eric Craswell, "Methods for Distributed Information
Retrieval" (2000).
of servers.
[3] Gordana Rudic and Dusan Surla, "Conversion of bibliographic
However if servers ranking as 80% relevant or higher records to MARC 21 format", The Electronic Library 27, 6
are selected (in our case it was a total number of 39 out of (2009), pp. 950-967.
120 servers) we got, on average 92% of all results [4] ANSI/NISO, "Information Retrieval (Z39.50): Application
returned by non-restricted search). With servers ranked as Service Definition and Protocol Specification", Library of
70% relevant or higher were used the same result set is Congress.
returned as with non restricted search. [5] Callan, James P. and Lu, Zhihong and Croft, W. Bruce,
"Searching distributed collections with inference networks", in
These results are promising but further analysis, on data Proceedings of the 18th annual international ACM SIGIR
gathered in real life usage scenarios, should be performed. conference on Research and development in information retrieval
However these results show that number of (New York, NY, USA: ACM, 1995), pp. 21--28.
communications may be significantly reduced if only best [6] Gravano, Luis and García-Molina, Héctor and Tomasic, Anthony,
qualifying servers are used to submit the search, while "GlOSS: text-source discovery over the Internet", ACM Trans.
Database Syst. 24 (1999), pp. 229--264.
result set will remain relevant to the query.
[7] Budi Yuwono and Dik Lun Lee, "Server Ranking for Distributed
This notion was further strengthen by introducing the Text Retrieval Systems on the Internet", in Rodney W. Topor and
“quality” measurement of record. There is no prescription Katsumi Tanaka, ed., DASFAA vol. 6, (World Scientific, 1997),
how to judge the record quality. Surely the completeness pp. 41-50.
of the record must be taken into account, as well as its [8] David Hawking and Paul B. Thistlewaite, "Methods for
syntax correctness, but from a copy-cataloguing viewpoint Information Server Selection", ACM Trans. Inf. Syst. 17, 1 (1999),
the best record would be the one that requires minimal pp. 40-76.

Page 77 of 478
ICIST 2014 - Vol. 1 Regular papers

[9] Zarić, M. “Model za distribuirano i rangirano pretraživanje u [12] Miroslav Zaric, Danijela Boberic Krsticev, Dušan Surla,
bibliotečkim informacionim sistemima”, doctoral thesis, "Multitarget/multiprotocol client application for search and
University of Novi Sad, Faculty of Technical Sciences, 2013 retrieval of bibliographic records", Electronic Library, The Vol.
(serbian). 30 Iss: 3 (2012), pp. 351-366.
[10] Search Retrieval over URL, Standard, Library of Congress, [13] Carter, Robert L. and Crovella, Mark E., "Server Selection Using
available at: http://www.loc.gov/standards/sru/index.html Dynamic Path Characterization in Wide-Area Networks", in
[11] Boberić, D., "System for retrieval of bibliographic records" Proceedings of the INFOCOM '97. Sixteenth Annual Joint
(2010). Conference of the IEEE Computer and Communications Societies.
Driving the Information Revolution (Washington, DC, USA: IEEE
Computer Society, 1997), pp. 1014.

Page 78 of 478
ICIST 2014 - Vol. 1 Regular papers

A Method for eGovernment concepts Interoperability Assessment


José Marcelo A. P. Cestari*, Mario Lezoche**, Eduardo R. Loures*, Hervé Panetto**, Eduardo P. Santos*
* Pontifical Catholic University of Parana, Industrial and Systems Engineering, Curitiba, Brazil
** Université de Lorraine, CRAN, UMR 7039, Vandoeuvre-lès-Nancy, France; CNRS, CRAN, UMR 7039, France
[email protected], [email protected], [email protected], [email protected],
[email protected]

Abstract — Since the late 1990s, with the raising of eGovernment, rose in the late 1990s [2]. The term
eGovernment concepts and the increase use of ICT by eGovernment, e-gov, eGov and similar are an
public administration entities, the need for abbreviation of “electronic government” and refers to the
collaboration among these organizations is a reality use of information and communication technologies to
with which systems, managers and other stakeholders support the government business, providing or enhancing
must deal. In order to increase performance, to supply public services or managing internal government
the online services, and the search for cost reductions operations [3]. Considering the concepts in an integrated
the governments paradigms focus, now more than way, the eGovernment interoperability domain arises as
ever, on how to better manage information. As the the ability of constituencies or public agencies to work
need of these ‘inter operations’ is real, together attempting to meet interoperability requirements,
interoperability is a key factor for organizations which will be the focus area of this research.
facing with collaborative-cooperative environments.
The modern architecture of information systems (ISs) An important aspect of interoperability is the assessment
is based on distributed networks with a grand of the adherence regarding some specific model or
challenge of representing and sharing knowledge maturity degree. That is, the evaluation of how adherent
managed by ISs and consequently, to remove (or how mature) is an organization in comparison with a
semantics interoperability barriers. This paper baseline model and/or in comparison with other
presents a literature review and a research method organizations. Enterprise Interoperability Assessment
that defines the mechanisms for the creation of (EIA) provides an organization the opportunity to know
guidelines, attributes and an assessment methodology their strengths, weaknesses and prioritize actions to
in public administration domain. The presented improve its performance and maturity. Characterization
research strategy identifies the basic phases and and measurement of different degrees of interoperability
activities, purposing a structure of how to collect and allows an organization to know its “as is” stage and plan
compose the guidelines, and how to define an the ways to achieve higher degrees (“to be”). The
assessment method through the help of semantic complexity presented in the eGovernment context
technologies. requires additional effort regarding influence factors as
legal, political and policy, and sociocultural issues. This
I. INTRODUCTION scenario is particularly prominent in some emergent
At least in the last twenty years, organizations are facing countries, providing a broad field for research in the
a competitive marketplace and they must, among other eGovernment interoperability domain, once eGovernment
things, develop partnerships and work in an integrated interoperability frameworks focus almost entirely (90%)
way with others competitors and stakeholders. in technical domain [4]. Bring all of these concepts to a
Interoperability takes into account dimensions such as public administration domain is not an easy task, once the
concerns, barriers, degrees of maturity and types of complexity, barriers and variables of a government
assessment. Interoperability can be defined as the ability organization are different from those found in the private
of two or more systems to share, to understand and to companies.
consume information [1]. When put together and
analyzed, this set of views and perspectives can help to This paper has two main goals: (i) present a literature
increase the level and quality of collaboration, interaction review and analysis positioning the theme and exposing
and transactions between organizations (public or private) information regarding the countries engaged, authors,
and between areas (agencies) inside the same evolution through the years and others aspects and (ii)
organization. This is not an exclusive concern of private present a research strategy in order to identify attributes
administrations, once the increasing need for information and guidelines to assess interoperability in public
exchange among government agencies, the supply of administration entities.
online services to citizens, and the cost reduction of
public operations and transactions demand that the II. BACKGROUND
government organizations must be ready to provide an
adequate interface to their users. A. Interoperability, interoperability assessment and
interoperability models
With the increasing use and importance of ICT in Terms such as integration, collaboration, cooperation and
government institutions, a concept, known as compatibility are frequently used to compose or explain

Page 79 of 478
ICIST 2014 - Vol. 1 Regular papers

some aspects of interoperability, although they are not the common processes.
same thing. Integration, for example, has a strong link Business Related to the creation of harmonized way of
with concepts of coordination and consistence in which working at the levels of an organization in
the parts are tightly coupled, whereas interoperability has despite of different modes, methods,
the meaning of coexistence and environment, legislations and culture.
Approaches
characterizing two loosely coupled parts. Collaboration
Integrated Exists a common format. It is not necessarily
concerns sharing the work or the engagement of a standard, but must be agreed by all parties.
participants in a coordinated effort, whereas cooperation Unified Exists a common format but only at a meta-
concerns the division of labor among participants, where level. The related metamodel provides a mean
each person is responsible for a portion of the solution. for semantic equivalence in order to allow
Compatibility is also related to interoperability, once that mapping between models and systems.
in order to interoperate, systems must be compatible, i.e. Federated No common format. In order to interoperate
capable of existing together and/or possible to work with parties must accommodate ‘on the fly’. There
another part. Finally, according to [5], interoperability is is no imposition of models, languages and
“the ability of two or more systems or components to methods of work by one of the parties.
exchange information and to use the information that has
been exchanged”. In terms of a typology, interoperability Interoperability involves two (or more) organizations
has four major categories: semantic, organizational, (or units) and, usually, these organizations have different
technical [1] and governance elements [6]. systems, models or structure. Enterprise Interoperability
Assessment (EIA) provides to an organization the
 Technical interoperability: concerned with opportunity to know its strengths, weaknesses and
technical issues regarding computer systems, prioritize actions to improve its performance and maturity
definition of interfaces, data formats and level assessment. Assessing interoperability implies the
protocols. establishment of measures to evaluate the degree of
 Semantic interoperability: related to ensure that interoperability between organizations and one of the
the precise meaning of exchanged information is measures that can be used and defined is the maturity
understandable by any other application not level that is (intend to be) achieved. Table II exemplifies
initially developed for this purpose [1, 7]. three interoperability maturity models (IMMs) presented
 Organizational interoperability: concerned with in the literature.
modelling business processes, aligning
TABLE II. EXAMPLES OF IMMS AND ITS LEVELS
information architectures with organizational
goals and helping business processes to co- Model Level 1 Level 2 Level 3
LISI Isolated. Connected. Functional.
operate [1, 7].
Manual Homogeneous Heterogeneous
 Governance interoperability: refers to gateway (CD, product product
agreements between governments and others DVDs). exchange. exchange.
stakeholders involved in the interoperability OIMM Independent. Ad hoc. Collaborative.
Personal General General
issues, including the ways of achieving and communication. guidelines. frameworks
creating those agreements. Share basic and some
Interoperability has three main dimensions: barriers, data. sharing.
concerns and approaches (adopted to solve the barriers MMEI Unprepared. Defined. Aligned. Able
and attack the concerns) [8, 9]. Table I shows more detail No capability Limited. to adopt
for Simple data common
of these aspects: interoperation. exchange. formats.

TABLE I. INTEROPERABILITY ASPECTS AND DIMENSIONS

Barriers Model Level 4 Level 5


Conceptual Related to the syntactic and semantic LISI Domain. Enterprise.
differences of information to be exchanged. Shared Distributed
Technological Relates to the possible incompatibility of ICT databases. information.
and the use of software systems. Collaboration.
OIMM Combined. Unified.
Organizational Definition of responsibility, authority and
Some shared Interoperating
others factors associated with human and culture on a daily
organizational behaviors that can be obstacles oriented by basis.
to interoperability. headquarter.
Concerns MMEI Organized. Adapted.
Data Put together different data models, different Heterogeneous Shared
languages and heterogeneous bases. partners. domain
ontologies.
Services Put together multiples services/applications by
solving possible syntactic and semantic B. eGovernment, eGovernment interoperability models
differences as well as finding the connections and frameworks
to the various heterogeneous databases.
Process Put together multiples processes, connecting According to [2], eGovernment is relatively a recent
internal with external processes and creating concept, formalized in 1999, when Al Gore, then Vice
President of U.S. opened the 1st Global Forum on

Page 80 of 478
ICIST 2014 - Vol. 1 Regular papers

Reinventing Government, in Washington, attended by III. LITERATURE REVIEW


representatives of 43 countries. eGovernment works with
models considering interactions such as government to A. Systematic review
citizens (G2C), government to business (G2B), In order to define the goals, concepts and help to
government to employees (G2E), government-to- identify and map the actual context, a literature review and
government (G2G), government-to-organizations (G2Org) content analysis was made. Table IV shows the attributes
and government-to-other-governments (G2OG) [10]. created to run the search, with the definitions of strings,
eGovernment is defined in [3] as the use of information databases and filters. The results are exposed in Figure 1.
and communication technologies to support the
government business, such as providing or enhancing TABLE IV. ATTRIBUTES USED IN THE SEARCH FOR FILES WITHIN
public services or managing internal government THE GOOGLE SCHOLARS DATABASE

operations. Integrating the concepts of eGovernment and Attribute Description


interoperability helps the “creation of systems that Database Google Scholars, SciELO [16], Capes Database [17],
facilitate better decision making, better coordination of Google, Bing and other search engines.
government agency programs and services in order to Goal Verify the publications regarding eGovernment in
provide enhanced services to citizens and businesses, the the context of interoperability (and vice versa). That
foundation of a citizen centric society, and the one-stop is, researches considering the interoperability aspects
within government organizations. Provide this
delivery of services through a variety of channels” [10]. overview across the years. Create a research
Although the models already presented in section 2.1 repository.
can be used (abstractly) in various types of organizations, Start/Finish Considered since 1986 until may/2013.
there are few models regarding specifically government dates
issues. According to [11], a Government Interoperability Criteria Criterion 1: string “interoperability” and
("government" or "public administration") in the title
Framework (GIF) is a set of standards and guidelines that and abstract of the publication. Criterion 2: string
a government uses to specify the preferred way that its “interoperabilidade” and ("governo" or
agencies, citizens and partners interact with each other, "administração pública" or "egov" or "e-gov") in the
being one way to achieve eGovernment interoperability. A title of the publication. The words “governo”,
GIF includes, context, technical content, process “administração pública” e “interoperabilidade” are
documentation and, among other things, the basic portuguese and stands for “govern”, “public
technical specifications that all agencies relevant to the administration” and “interoperability”.
Filters Filter 1: documents before 2000 were removed.
eGovernment strategy implementation should adopt. In Filter 2: duplicated documents were removed.
order to illustrate, three examples of government
interoperability models and/or frameworks are exposed in
Table III. The preliminary results identifies some main sources of
the publications and a vision of the distribution across the
TABLE III. EXAMPLES OF GOVERNMENT INTEROPERABILITY years. The sum of all publications found is 432, without
MODELS/FRAMEWORK
specific filters except those listed in Table IV. It is
Source Brief description possible to notice that the period with more publications
[12] Government Interoperability Maturity Matrix (GIMM). is around 2009 and 2010, followed by a decrease of
Provides a way for administrations to evaluate their status on publications found in all databases.
eGovernment issues. There are maturity levels defining the
characteristics of the formalism degree and the way of 100
exchanging data and information. 0
[13] The e-PING is a Brazilian Government framework effort that
defines a minimum set of premises, policies and technical
specifications to regulate the use of ICT in the interoperability
Scholars SciELO
of services regarding the eGovernment. Establishes conditions
and interactions with other government agencies and the Capes Others
society, covering aspects such as interconnection, security,
access and data interchange. Figure 1: Results from all database (without specific filters)
[14] Federal Enterprise Architecture Framework consists of a set of
interrelated “reference models” that describe six sub-
architecture domains in the framework: strategy, business, As a next step, a content analysis of the documents was
data, applications, infrastructure and security. performed. The objective was to review all the 432
publications found and apply filters in order to select only
The implementation of interoperability issues in the those with connection to the research field and remove
eGovernment domain allows data compiled by different those not related to the theme but whose may use some of
agencies to be used together to make better decisions and the same strings adopted (e.g. warfare papers also uses
increases transparency and accountability, enabling one- government and interoperability words, as radio
stop, comprehensive online services for citizens and
transmission themes and others). The method adopted to
businesses by linking the diverse services that are offered
by different agencies [10]. Among the difficulties, it is perform this task was the reading of the title and abstract
important to mention items such as variety of legacy (and when necessary the introduction) of all documents
systems, difference of standards, cultural differences and apply a relevance and pertinence analysis. After this
between departments, legal and political issues, step, an amount of 150 documents left, as shown in
managerial and jurisdictional [15]. Figure 2. The 150 publications remained are distributed
according to Table V, considering the types of
documents.

Page 81 of 478
ICIST 2014 - Vol. 1 Regular papers

50 Netherlands (0.9125), the United Kingdom (0.8960) and


0 0 0 1 5 3 10 6 15 22 23 21 22 19 3 Denmark (0.8889), with the United States, Canada,
France, Norway, Singapore and Sweden close behind.
Figure 2: Distribution of publications considering all documents found Europe has the highest eGovernment development
in all search mechanisms ranking, followed by the Americas (0.5403). Within
America, United States is in the first position, followed
TABLE V. DISTRIBUTION PER TYPE OF PUBLICATION by Canada and, considering South America.
Type of Although the United Nations survey deals with general
Type of publication Quantity Quantity
publication eGovernment maturity around the world, not focusing on
Papers in conference 47 Book chapters 5 specific themes such as interoperability, frameworks, and
Dissertation models; it gives a good idea of the world adoption
Papers in journals 60 (masters and 6
doctors)
regarding the theme. In order to evaluate the country (and
Others region) distribution of the researches regarding
White papers 11 (presentations, 2 eGovernment interoperability, an analysis was made
docs) considering the 150 documents retrieved from the
Technical reports 19 literature review. The distribution considers where the
authors are working (which university, laboratory,
A brief author analysis was also made. The objectives country), even though the research may be related to
were (i) verify the dissemination regarding the number of other country or organization (the author’s birthplace are
authors related to the documents; (ii) verify the not considered). The information was collect in each of
production of these authors (# of published documents) the 150 documents, considering that each country are
and (iii) try to detect if there is a main group of authors considered only once (per document) and they were
that is responsible for the major number of publishing. grouped in regions (e.g. Asia, Europe, North America).
The author’s analysis identified 240 different authors The review detected 62 different countries in the 150
associated to the 150 documents. Most of the authors documents analyzed. Each of the 62 countries are
contributed with one document and there is two authors associated at least once to a researcher but some of them
with the maximum number of contributions detected are cited more times (considering the different
(six). For those documents generated by a committee documents) and, because of this, there are 192 references
and/or government agency, it is considered that the author of countries within the 150 documents (e.g. Greece
is the “committee”. Therefore, from the 240 authors appears 12 times considering only one appearance per
identified, there is one (the committee) that is responsible document).
for 19 documents. Table VI does not consider the
“committee” for its distribution. IV. RELATED WORKS
From the literature review important issues emerged,
TABLE VI. PERCENTAGE OF AUTHORS ACCORDING TO THE
NUMBER OF CONTRIBUTIONS
such as the related works, the approach those works give
to the subject and the specific concerns and difficulties of
# Contributions 1 2 3 eGovernment interoperability. Among the detected
195 -- 27 -- 5 -- difficulties it is possible to mention the great variety of
# -- % Authors
81.59% 11.3% 2.09%
legacy systems, collaboration between agencies,
difference of data standards, cultural differences, issues
# Contributions 4 5 6 of trust, timing, legal and political issues. It is important
6 -- 4 -- to remember that, usually, private sector suffers only a
# -- % Authors 2 -- 0.84%
2.51% 1.67% little of politicization while eGovernment (for its nature)
is more government centric. At the end of the literature
As Table VI shows, more than 81% of the detected review process it was possible to identify at least seven
authors contributed with only one document, followed by major points: (i) distribution of the research domain
approximately 11% who have two contributions. At least across the years; (ii) existent models, frameworks,
two things come to attention: (i) the “small” number of concepts, barriers and concerns; (iii) engaged countries in
the maximum contribution and (ii) the significant number the research area; (iv) authors and their publications; (v)
of researchers involved: 239 (plus the “committee”). the majority of approaches are related to technical
aspects; (vi) there is a gap regarding influence factors
A United Nations survey [18] presents eGovernment such as behavior, human aspects and political issues and
development rankings for 2012, analyzing how (vii) there is a gap regarding the research methodology
governments of the world are employing eGovernment adopted to identify guidelines, attributes and perform
policies and programs to support efficiency, assessments. As it is not a good idea to transfer concepts
effectiveness, and inclusiveness as the parameters of (without the proper adaptations) from private sector into
sustainable development efforts worldwide. The index the public one, the following section exposes a specific
can go from zero (no eGovernment) to one (high degree research strategy for public administration.
of e-government). According to the survey, the Republic
of Korea is the world leader (0.9283) followed by the

Page 82 of 478
ICIST 2014 - Vol. 1 Regular papers

V. PROPOSED METHODOLOGY The presented method identifies the basic phases and
activities, purposing a structure of how to collect and
Derived from the literature analysis, related works and
compose the guidelines, and how to define an assessment
existent gaps detected, this section presents a research method in order to fulfill the requirements and goals s
strategy (Figure 3) to define the mechanisms for the stated in Table VII.
identification of guidelines, attributes and an assessment
methodology in public administration domain.

Points of
Systematic and Keywords Keywords Proposals
Interest
background review Methodology definition definition identification Survey AHP Delphi
definition
Research Research
question database
Review literature Review literature Identify preliminary Expert Validate (prioritise)
(quantitative) (content analysis) set of guidelines evaluation guidelines
Quantitative
Goals A0 A1 A2 A3 A4
analysis
MaxQDA Excel MaxQDA Academy
Databases Excel Research MaxQDA Practitioners Academy Practitioners
database
Comments
Ranked results
Consolidate the SCAMPI
Government Availability SCAMPI Method Review artifacts
results Method CobiT
structure analysis
Initial
Guidelines Suggestions Updated guidelines
Update set of guidelines Propose an Identify Guidelines Execute Update
guidelines Initial assessment method Attributes agencies Attributes assessments artifacts Updated attributes
Method Assessment
attibutes Method Updated methods
A5 A6 A7 A8 results A9

Excel MaxQDA Tool Excel Word BPMN Visio Interviews Excel Word
Interviews Questionnaires

Figure 3: Illustration of the methodological activities

TABLE VII. NEEDS AND GOALS


After the definition of the guidelines and attributes, it is
Research question necessary to define a process to letting us to fragment
How to know the interoperability degree (or adherence) of a activities knowledge through the transformation of
public administration entity regarding its business and attributes into entities and relationships, and thus to
organizational aspects? emphasize some fine-grained knowledge atoms. In the
Main goal (MG)
proposed knowledge working process (Figure 4) of our
Propose a model to assess the interoperability between public
administration entities regarding its business and process x
general methodology, the starting point can be various: an
conceptual and organizational aspects. application, a data model, a logical view, a model, etc.
Specific goals (SGs) There are several reverse engineering methods through
SG1: gather main concepts and SG5: propose an which a model from can be derived. Then, the resulted
position the domain. assessment method. initial model is enriched and corrected through the
SG2: formalize the research SG6: execute an combined action of the domain expert and knowledge
domain. assessment. extraction and matching application. Finally the model is
SG3: propose a preliminary set of SG7: update the guidelines examined with the help of a domain expert or an end-
guidelines. according to the results. user, who are the most qualified persons to describe the
SG4: validate the guidelines with context of the peculiar domain and to put in evidence the
specialists. contextual knowledge. According to the administration
Brief description
best practices and its data, they would clean and better
SG1: Provide a background, literature and theoretical review.
organize the knowledge represented in the derived model.
SG2: Define the research domain using concept mapping.
SG3: This preliminary guidelines is generated after (and
However, the obtained initial conceptual model, in the
based in) the theoretical review. form of a UML class diagram, has yet a major limit. In
SG4: Execute a survey in order to review the preliminary set fact, its semantics is in a tacit form because all the
of guidelines. Update the guidelines according to the attributes are buried inside single classes and it is then
information gathered from the survey. difficult to make their semantics explicit.
SG5: Definition of a set of rules and procedures to assess the Thus, the next step of our approach is a Fact-Oriented
government interoperability entity, proposing methods Transformation [19] through the application of a set of
for assess, rank and evaluate the adherence to the patterns rules for transforming the enriched conceptual
guidelines. model to a fact-oriented model (FOM) with its semantics
SG6: Assess a government entity, collecting information completely displayed. The resulting fact-oriented model,
about the adherence and suggestions for update the
displaying the finest-grained semantic atoms, is then used
model (guidelines and assessment method) according to
the specific needs. as an input for the last step of the process, a number of
SG7: Update the model (guidelines and assessment method), structural optimization through formal concept analysis
generating a final version. methods [20] (not presented in this paper).

Page 83 of 478
ICIST 2014 - Vol. 1 Regular papers

Figure 4: Knowledge extraction process

After the definition of the explicit knowledge process, it to the model. The method relies on an aggregation of
is necessary to define the assessment method, which will information collected via defined types (interviews,
describe how the assessment will occur, how to rank the questionnaires) and worked through the knowledge
items evaluated, what are the steps and other issues. The extraction activities process. The method is based on
evaluation method will be based in the Standard CMMI three stages (plan and prepare for assessment, conduct
Appraisal Method for Process Improvement [21], which assessment and report results) as illustrate in Figure 5.
is designed to provide benchmark-quality ratings relative

Figure 5: Detailed activities of the assessment

In summary, the purpose is to define measurement


attributes (guidelines and/or models) linked to domains VI. CONCLUSIONS
not related entirely to technical aspects. Although there are basically three primary goals
Considering a scenario proposed in Figure 6, the idea is associated with achieving interoperability in any system
to apply the research guidelines (assessment and other (data exchange, meaning exchange and process and
results achieved) in one (or more) aspects of the business agreement), when it comes to government, the
relationship involved in the eGovernment (e.g. G2G, context can be even more complex because of the
G2B), depending on the needs. necessity of dealing with some influencing factors such
as legal, political and sociocultural issues. In government
related interoperability, the context is very important,
once some major differences must also be addressed (e.g.
poor infrastructures, dictatorial countries). In spite of that,
the majority of government related models deals with
issues concerning eGovernment, whose objectives are
generally to improve efficiency and effectiveness,
offering (if pertinent) online services and information that
can increase democratic participation, accountability,
transparency, and the quality and speed of services areas
[3]. The approach of such eGovernment models is similar
to that of the “non eGovernment” models, that is, the
Figure 6: Scenario regarding the involvement of organizations
focus is basically the exchange of information,
considering the availability of public services, integration
of agencies and others.

Page 84 of 478
ICIST 2014 - Vol. 1 Regular papers

This paper presents results regarding the execution of a [7] Commission of the European Communities., 2003, “Linking up
Europe, the importance of interoperability for egovernment
literature review based on the eGovernment services,” Ministerial conference on eGovernment, Como, Italy, 7-
interoperability topic. The results show some distribution 8 July.
of the research around the world, characterizing the [8] Guédria, W., Naudet, Y., and Chen, D., 2009, “A Maturity Model
evolution of some countries in comparison to others. for Enterprise Interoperability,” OTM 2009 Workshops, LNCS
Besides that, it was possible to identify 150 documents 5872, pp. 216–225.
from an initial database universe, with the identification [9] Panetto, H., 2007, “Towards a Classification Framework for
Interoperability of Enterprise Applications,” International Journal
of the authors, quantity (and type) of publications. A of CIM, Taylor & Francis. Retrieved from
series of important definitions were gathered from the http://www.tandf.co.uk/journals
database (including the confirmation that the growth of [10] Pardo, T. A., and Burke, G. B., 2008b, “Government worth
the subject began in 2000), helping to establish the having: A briefing on interoperability for government leaders,”
Albany: Center for Technology in Government, Research
problem and create a theoretical reference, exposing Foundation of State University of New York. Retrieved from
methodologies, frameworks and models. This paper also http://goo.gl/QzWA68
presented a methodology to structure a research for [11] UNDP (United Nations Development Program)., 2007, “e-
assesses public administration interoperability, Government interoperability: Guide,” Bangkok, Thailand.
Retrieved from http://www.unapcict.org/ecohub/resources/e-
considering since the initial activities until the assessment government-intero
itself. It was detected gaps regarding the research field in [12] Sarantis, D., Charalabidis, Y., and Psarras, J., 2008, “Towards
some aspects of eGovernment interoperability, especially standardising interoperability levels for information systems of
when dealing with non-technical aspects. This paper public administrations,” Electronic Journal for e-Commerce Tools
considers that the development of models (guidelines or & Applications (eJETA).
frameworks) that contribute to the processes of assessing [13] Ministério do Planejamento, Orçamento e Gestão., 2012, “e-
PING: Padrões de Interoperabilidade de Governo Eletrônico -
semantic interoperability levels in the government (or Documento de Referência,” Retrieved from
eGovernment) context is relevant, both for the http://www.governoeletronico.gov.br/interoperabilidade
development of the research field and also for the [14] US Government., 2013, “Federal Enterprise Architecture
government organizations and public administration Framework - Version 2,” Retrieved from http://goo.gl/4mOZR3
managers. [15] Saekow, A., Boonmee, C., 2009b, “A pragmatic approach to
interoperability practical implementation support (IPIS) for e-
government interoperability,” Electronic Journal of e-Government,
REFERENCES vol. 7, issue 4, pp. 403 – 414.
[1] IEEE: Standard Computer Dictionary, 1990. A Compilation of [16] SciELO (Scientific Electronic Library Online)., 2013, Homepage.
IEEE Standard Computer Glossaries. In NY. 610-1990. ISBN: Retrieved from http://www.scielo.org/php/index.php
1559370793. [17] CAPES Periódicos., 2013, Homepage. Retrieved from
[2] Camargo, A. R., 2009, “Cidade e informática, contatos e http://www.periodicos.capes.gov.br
interações: explorando manifestações de urbanização virtual,” [18] United Nations (Department of Economic and Social Affairs).,
Pesquisa CNPq Proc 301383 /2005-7. Escola de Engenharia de 2012, “United Nations e-government survey 2012: e-government
São Carlos, USP, São Carlos, SP. Retrieved from for the people,” Retrieved from http://unpan3.un.org/egovkb
http://www.iau.usp.br/pesquisa/grupos/e-
urb/Relatorios/RelTecCont2009.pdf [19] Halpin, T.A. 1991, ‘A fact-oriented approach to schema
transformation’, Proc. MFDBS-91, Springer Lecture Notes in
[3] Novakouski, M., and Lewis, G., 2012, “Interoperability in the e- Computer Science, no. 495, Rostock.
Government Context. TECHNICAL NOTE,” CMU/SEI-2011-
TN-014, Carnegie Mellon University, Pittsburgh, PA. Retrieved [20] Wille R., Ganter B., Formal Concept Analysis: Mathematical
from http://www.sei.cmu.edu/reports/11tn014.pdf Foundations 1st, 294pp, Springer-Verlag New York, Inc.
Secaucus, NJ, USA 1997, ISBN 3540627715.
[4] CSTRANSFORM., 2010, e-Government Interoperability. “A
Comparative Analysis of 30 countries,” Retrieved from [21] SCAMPI Upgrade Team., 2011, “Standard CMMI Appraisal
http://www.cstransform.com/index.htm Method for Process Improvement (SCAMPI) A, Version 1.3:
Method Definition Document (CMU/SEI-2011-HB-001),”
[5] IEEE., 1990, “A compilation of IEEE standard computer Retrieved from the Software Engineering Institute, Carnegie
glossaries,” standard computer dictionary. Mellon University website: http://goo.gl/Vb518E
[6] CEPA (Comisión Econômica para America Latina y el Caribe).
División de Desarrollo Productivo y Empresarial (DDPE)., 2007,
“White Book of e-Government Interoperability for Latin America
and the Caribbean,” Version 3.0, 37 pp.

Page 85 of 478
ICIST 2014 - Vol. 1 Regular papers

Analysis of sentiment change over time using


user status updates from social networks
Milica Ćirić*, Aleksandar Stanimirović*, Leonid Stoimenov*
*
Faculty of Electronic Enginneering, University of Niš, Niš, Serbia
[email protected], [email protected], [email protected]

Abstract— Social networks' users usually post status release of Lady Gaga’s single Applause and classified
updates of their opinions on and reactions to events almost them by their sentiment. For classifying we used Naïve
instantly. Because of this, pre-announced events, especially Bayes classifier which is one of the mathematically
ones that divide the public, lead to generation of a large simplest classifiers, but gives results in the rank with
quantity of status updates in the days just before and after much more sophisticated classifiers. Rather that
the event. Status updates published before the event usually developing our own implementation of Naïve Bayes
contain speculations about what will happen, and those classifier, we used Mallet open source library and
published after represent the reaction to what actually extended it with some new features. According to the
happened. In this paper we analyze the change of sentiment findings from a previous research [10] we used a
in tweets collected in following few days after a specific pre- combination of predefined and developed features that
announced event. We also discuss can this analysis be used produced the best results for Twitter sentiment
for prediction, as an indicator of to which extent will this classification.
event fulfill its designated long term goal.
II. RELATED WORK
I. INTRODUCTION The most popular domain for sentiment classification is
Text sentiment classification deals with identifying the certainly classification of movie reviews [4, 11, 12] and
sentiment expressed in a text of certain length. Usually, product reviews in general [13, 14]. The length of reviews
the goal is to assign one of two labels: positive or can range from a couple of sentences to several
negative, although there are researches than focus on paragraphs, but the text is usually well formed. Another
subjective and objective classes [1, 2], or have multiple research direction concerning product reviews sentiment
degrees of positivity/negativity [3, 4, 5]. As such, it can be classification is domain adaptation [11, 14], i.e. managing
applied in many fields, some of which include: classifying the difference in domains of training and testing data.
product customer feedback [3], preprocessing input data Twitter sentiment classification is rather different than
for recommender systems [6] or determining the citizens' movie and product reviews, mostly due to the language
position on government policies and decisions [7]. style and the shortness of the tweets, but is an increasingly
Besides just determining sentiment, it is also interesting popular topic [1, 5, 15, 16, 17]. The upside of the informal
to analyze sentiment change over time. The effect of a and slang-like language is the presence of emoticons or
certain event to public opinion can be explored by smileys that can be used as an indicator of the sentiment
tracking sentiment change on social networks. the author wants to express [5, 11, 15]. Automatically
assigning sentiment based on emoticons greatly facilitates
Social networks are especially convenient for tracking the creation of training and testing corpuses. In [18]
sentiment change due to the fact that their users post status instead of classifying the sentiment of a tweet, the authors
updates very often, even up to several times an hour. Also, determine the general sentiment about a specified hashtag.
the option for sharing other’s statuses enables fast
spreading of its content. One of the most popular social Even texts of small length can express sentiments about
networking platforms is Twitter [8]. It allows its users to multiple entities and classifying the overall sentiment can
post messages of maximal length of 140 characters, also give a false result. Therefore, some researchers focus on
known as “tweets”. Since the launching in July 2006, its identifying the sentiment pointed to a specific entity, so
popularity has rapidly grown and it now has over 600 called target-dependent sentiment classification [1].
million active users [9]. There have been multiple researches that analyze the
Tweets can contain all kinds of information, from daily connection between sentiment in tweets addressing
events in the life of the user to live news from political election candidates during the campaign and the outcome
events. A common form of a tweet is a reference to a of the elections [19, 20, 21]. Some of them have
person, event, product and a personal opinion about the discovered that the prediction using sentiment data would
referenced entity. The # symbol, called a hashtag, is used have given good results [20], while in other cases the
to mark keywords or topics in a tweet. It was created prediction wouldn't have been accurate [21]. In [22] the
organically by Twitter users as a way to categorize authors track the behavior streams from groups of users
messages. By analyzing tweets one could observe which with similar biases and present time-dependent collective
are the popular entities and also the “public opinion” measures of their opinions. They tested the effectiveness
about them. of their system by tracking groups’ Twitter responses to a
common stimulus set: the 2012 U.S. presidential election
Page 86 of 478
During the research presented in this paper we collected debates.
around 100000 tweets from a 4 day period following the
ICIST 2014 - Vol. 1 Regular papers

Authors of [23] explore how public mood patterns, as unrelated any other feature of the same instance, i.e.
evidenced from a sentiment analysis of Twitter posts probability that an instance belongs to a certain category is
published during a 5 month period in 2008, relate to influenced by each feature independently.
fluctuations in macroscopic social and economic The probability that instance i belongs to category c can
indicators in the same time period. They attempt to be expressed with (1).
identify a quantifiable relationship between overall public
mood and social, economic and other major events in the P (c | i ) µ P ( c ) ∏ P ( f k | c ) (1)
media and popular culture. Research presented in [24] 1≤ k ≤ni
utilized a stream of millions of tweets on Twitter to In (1) P(c) represents the prior probability that an
explore how people feel about Thanksgiving and instance belongs to the category c, P(fk|t) represents the
Christmas through real-time sentiment analysis. With help probability of feature fk occurring in an instance that
of Twitter Streaming API, the author discovered the belongs to the class c and ni is the number of features in
patterns of sentiment changes by hour before and after the instance i.
two holidays.
The assumption of feature independence is not very
realistic for the real world and it is the reason this
III. DATA CORPUS AND CLASSIFIER classifier is called naïve. However, as simple and
inefficient as Naïve Bayes classifier may seem in theory,
Since Twitter is currently one of the most popular social in practice it usually gives good results, often as good as
networking platforms and we already had experience with much more complex classifiers. For example, in [10] in
Twitter sentiment classification [10, 25] we decided to use comparison of Naïve Bayes, MaxEnt and Balanced
tweets as our data corpus. Tweets can be obtained from Winnow, NB has slightly worse results than MaxEnt, but
Twitter with the use of Search API [26] or Streaming much better than Balanced Winnow.
API[27]. Search API performs a search according to the In our research we used Mallet [30], an open source
specified query and returns a collections of relevant Java library for statistical natural language processing,
Tweets. Search API is not meant to be an exhaustive document classification, clustering, topic modeling,
source of Tweets and not all Tweets are indexed or made information extraction, and other machine learning
available via the search interface. Streaming API, on the applications to text. This gave us the opportunity to use
other hand, gives low latency access to Twitter's global built-in functionalities of Mallet’s Naïve Bayes
stream of Tweet data. We decided to use the Search API implementation and also extend the library to fulfill our
and perform queries in regular intervals in order to collect specific needs or try a new approach.
tweets. The classifiers used were trained during the research
In order to monitor change of sentiment over time, we published in a previous paper, dealing with context and
needed an event that was announced and expected with sentiment classification [10]. For sentiment classification
anticipation. Choosing Lady Gaga and the release of her we collected tweets with positive, negative and objective
single Applause, which was heavily announced and sentiment. However, due to the small number of objective
promoted on Twitter, enabled us to get a lot of tweets tweets, compared to the number of positive and negative
during a 4 day period. Additionally, the fact that opinions ones, we decided to proceed with only these two
about Lady Gaga and her actions are often very divided, categories. Collected tweets were selected according to the
implied that we will probably get data with strong emoticons they contained. In order to decrease the
sentiments, making them easier to classify. The idea was influence of emoticons in classifying and increase the
to collect tweets in the days before and after the release of influence of the words used, we subsequently removed
her single Applause. However, due to a leak incident, the emoticons from about 90% of the collected tweets.
single was released a week earlier than announced [28], For training the classifiers we started with adding a
and we were only able to start the collection of tweets feature for each word in the tweet and then added various
after the release. This is also one of the reasons we data processing pipes for filtering features. These filtering
decided to use Search API, since it enables querying past steps can help in extracting features that carry more
data. information and removing others that can be considered as
Search API offers a wide range of parameters that can noise. They are widely used in various researches [1, 5,
be defined for a query. For our purpose, we searched for 15, 17]. Some of the data processing pipes we used were
tweets containing key words “lady gaga”, those that were built in Mallet library, some we were able to get by
directed to her (to:ladygaga) and those that were customizing existing Mallet data processing pipes and
mentioning her (@ladygaga). Overall, we acquired some we had to implement ourselves. Analyzing results
100944 tweets during a 4 day period following the release gotten with different combinations showed that the best
of the Applause single. These tweets were then classified results for sentiment classification were produced when
by their sentiment. using the following pipes:
Classification was performed with the use of Naive • Lowercase normalization
Bayes classifying algorithms. Naive Bayes classifier (NB)
[29] is one of the simplest classifier. It is a probabilistic • Adding some application specific features (e.g.
classifier and it is based on the Bayes theorem. The sport results, music ranking, …)
premise is that each feature of an instance is completely

Page 87 of 478
ICIST 2014 - Vol. 1 Regular papers

95
72K
120K
90

85

80

75

70

65

60

55

50
0 1 2 3 4

Figure 1. Change of percent of positive tweets during the four day period

• Adding negation features


IV. RESULTS
• Removing punctuation
Twitter Search API returns tweet objects in JSON
• Some primitive stemming format and, besides the actual text of the tweet, they also
• Adding bigram and n-gram features contain various other properties. Some of them include:
the author of the tweet, if the tweet is a retweet, time of
One other conclusion based on classification results posting, location if it is specified, etc. We preprocessed
was that for sentiment classification the mere presence of Search API query results to extract the text and posting
the feature is a better indicator of the class, than the time for each tweet. During classifying we calculated the
number of appearances of the said feature. In other words, number of positive and negative tweets posted during each
the multivariant Bernoulli model is more suited for hour of the monitored period. Fig. 1 shows the change of
sentiment classification than the multinomial model. the percent of positive tweets during a 4 day period
As a result of this research we trained two classifiers following the release of the Applause single.
with two training sets of different sizes (72 thousand and It can be seen that positive tweets make up the majority
120 thousand training samples), using feature processing of collected tweets during the whole period. It is
pipes mentioned above, which we will refer to as 72K and reasonable to assume that Lady Gaga’s fans are the most
120K classifier from now on. invested in tweeting about her new single, which is why
these results are not surprising. It would probably be
4500

4000

3500

3000

2500

2000

1500

1000

500

0
0 4 8 12 16 20 24
Page 88 of 478
Figure 2. Change of the number of total tweets over a 24h period of day
ICIST 2014 - Vol. 1 Regular papers

interesting to examine the exact content of tweets that REFERENCES


were posted in during periods that correspond to graph [1]
L. Jiang, M. Yu, M. Zhou, X. Liu, T. Zhao, ”Target-dependent
extreme values. The presented extracted data most likely Twitter Sentiment Classification”, Proceedings of the ACL HLT,
could have been used as a one of the parameters for future 2011, pp. 151-160
sales estimation. Another possible application for such [2] B. Pang, L. Lee, "A Sentimental Education Sentiment Analysis
data would be estimation of the chart position that the Using Subjectivity Summarization Based on Minimum Cuts",
single will peak at and the duration of time it will hold a Proceedings of the ACL, 2004, pp. 271-278
position on the chart. [3] M. Gamon, "Sentiment classification on customer feedback data:
noisy data, large feature vectors, and the role of linguistic
Data shown on Fig. 2 represents the total number of analysis", Proceedings of the 20th international conference on
tweets collected for each hour in the day. Since Twitter Computational Linguistics, 2004, pp. 841—847
API returns the time of posting in UTC time, the data [4] A. Kennedy, D. Inkpen, "Sentiment Classification Of Movie
from the graph is presented according to that. There are Reviews Using Contextual Valence Shifters", Computational
three peaks that stand out. The global maximum Intelligence, vol. 22, no. 2, 2006, pp. 110-125
corresponds to the time period between 6h and 7h UTC, [5] A. Pak, P. Paroubek, “Twitter as a corpus for sentiment analysis
i.e. around midnight in North America time zones. There and opinion mining”, Proceedings of the 7th LREC, 2010, pp.
are also two smaller peaks around15h and 21h UTC. 1320–1326.
These data could be useful for determining the time [6] A. Stanimirović, M. Ćirić, B. Džonić, L. Stoimenov, N. Petrović,
patterns of Twitter use. Comparing these data to other "Sistem za davanje preporuka baziran na tehnologijama
determined time patterns of use could show the semantičkog web-a", Zbornik radova YU INFO 2012, 2012, pp.
147-152
specificities of this particular case, for further analysis.
[7] G Stylios, D Christodoulakis. J Besharat, MA Vonitsanou, I
Off course, the classifiers are not perfect and there are Kotrotsos, A Koumpouri, S Stamou, "Deciphering the Public's
misclassified tweets. Some examples that were Stance towards Governmental Decisions", Proceedings of the 10th
misclassified as negative include: European Conference on E-Government, 2010, pp. 376-381
• Of course we saw it, and we died beacuse it's to [8] Twitter, https://twitter.com
much perfect. [9] Twitter statistics, http://www.statisticbrain.com/twitter-statistics/
[10] M. Ćirić, A. Stanimirović, N. Petrović, L. Stoimenov, "Naive
• sorry it looks like I cut your hand and neck hhh.. Bayes Twitter classification", Proceedings of SAUM 2012, 2012,
url pp. 248-251
• Bow down.. Queen is back bitches haha! [11] J. Read, "Using Emoticons to reduce Dependency in Machine
It can be noticed that these examples include words Learning Techniques for Sentiment Classification", Proceedings of
the ACL Student Research Workshop, 2005, pp. 43-48
with originally negative meanings. However, they are
[12] B. Pang, L. Lee, S. Vaithyuanathan, "Thumbs up? Sentiment
used in positive context which the classifier doesn't detect. Classification using Machine Learning Techniques", Proceedings
Coordinating the perceived meaning of the word/phrase of the ACL-02 conference on Empirical methods in natural
with the context of use is a possible research topic to be language processing, vol. 10, 2002, pp. 79-86
considered in the future. [13] H. Cui, V. Mittal, M. Datar, "Comparative Experiments on
Sentiment Classification for Online Product Reviews",
V. CONCLUSION Proceedings of the 21st national conference on Artificial
intelligence, vol. 2, 2006, pp. 1265-1270
Social networks represent a good source for data that [14] J. Blitzer, M. Dredze, F. Pereira, "Biographies, Bollywood, Boom-
can be used to model public opinion. In that context, one boxes and Blenders: Domain Adaptation for Sentiment
of the most analyzed social networks is Twitter. Twitter Classification", In ACL, 2007, pp. 187-205
status messages - tweets, have their own characteristics [15] A. Go, R. Bhayani, L. Huang, “Twitter Sentiment Classification
that distinguish them from other domain data, which is using Distant Supervision”, CS224N Project Report, Stanford,
why the classifier should also be trained with Twitter data. 2009.
In Twitter sentiment classification, preprocessing the [16] A. Agarwal, B. Xie, I. Vovsha, O. Rambow, R. Passonneau,
features in order to improve the accuracy usually gives "Sentiment Analysis of Twitter Data", Proceedings of the ACL
good results. 2011 Workshop on Languages in Social Media, 2011, pp. 30-38
[17] A. Davies, Z. Ghahramani, "Language-independent Bayesian
Tracking sentiment change over time can give a good sentiment mining of Twitter", Proceedings of the 5th SNAKDD
insight into public change. Also, it can probably be used Workshop on Social Network Mining and Analysis, 2011, pp. 99-
as a parameter for prediction of long term goals, such as 106
record sales or election results, and that is a research [18] X. Wang, F. Wei, X. Liu, M. Zhou, M. Zhang, "Topic Sentiment
direction worth exploring. Additionally, it is possible to Analysis in Twitter: A Graph-based Hashtag Sentiment
track sentiment over a period of time for multiple singles Classification Approach", Proceedings of the CIKM '11, 2011, pp.
of the same artist, and compare their placements on charts. 1031-1040
Comparing these data for multiple artists is also a [19] H. Wang, D. Can, A. Kazemzadeh, F. Bar, S. Narayanan, "A
possibility. System for Real-time Twitter Sentiment Analysis of 2012 U.S.
Presidential Election Cycle", ACL '12 Proceedings of the ACL
2012 System Demonstrations, 2012, pp. 115-120
ACKNOWLEDGMENT
[20] A. Tumasjan, T. O. Sprenger, P. G. Sandner, I. M. Welpe,
Research presented in this paper was funded by the "Predicting Elections with Twitter: What 140 Characters Reveal
Ministry of Education, Science and Technological about Political Sentiment", Proceedings of the Fourth
International AAAI Conference on Weblogs and Social
Development of the Republic of Serbia, within the project Media, 2010, pp. 178-185
"Technology Enhanced Learning", No. III 47003. [21] M. Choy, L. F. M. Cheong, N. L. Ma, P. S. Koo, A sentiment
analysis of Singapore Presidential Election 2011 using Twitter
Page 89 of 478data with census correction, Proceedings of CoRR. 2011, 2011
ICIST 2014 - Vol. 1 Regular papers

[22] Y. R. Lin, D. Margolin, B. Keegan, D. Lazer, Voices of Victory: [26] The Huffington Post,
A Computational Focus Group Framework for Tracking Opinion http://www.huffingtonpost.com/2013/08/12/lady-gagas-applause-
Shift in Real Time, 22nd International World Web Conference, early-release-single-leaks_n_3744177.html
2013 [27] Twitter Search API,
[23] J. Bollen, H. Mao, A. Pepe, Modeling Public Mood and Emotion: https://dev.twitter.com/docs/api/1.1/get/search/tweets
Twitter Sentiment and Socio-Economic Phenomena, Proceedings [28] Twitter Streaming API, https://dev.twitter.com/docs/api/streaming
of the Fifth International AAAI Conference on Weblogs and Social
[29] C. D. Manning, P. Raghavan, H. Schütze, Introduction to
Media, 2011, pp. 450–453 Information Retrieval, Cambridge University Press, New York,
[24] W. Hu, Real-Time Twitter Sentiment toward Thanksgiving and 2008.
Christmas Holidays, Social Networking, 2013 [30] K. McCallum, "MALLET: A Machine Learning for Language
[25] M. Ćirić, A. Stanimirović, N. Petrović, L. Stoimenov, Toolkit.", http://mallet.cs.umass.edu, 2002.
"Comparison of different algorithms for sentiment classification ",
Proceedings of TELSIKS 2013, 2013, pp. 567-570

Page 90 of 478
ICIST 2014 - Vol. 1 Regular papers

Taking DBpedia Across Borders: Building the


Serbian Chapter
Uroš Milošević, Vuk Mijović, Sanja Vraneš*
* Mihajlo Pupin Institute, Belgrade, Serbia

{uros.milosevic, vuk.mijovic, sanja.vranes}@pupin.rs

Abstract—With the emergence of Linked Data, DBpedia has ongoing efforts being put into the Serbian edition of this
steadily grown to become one of the largest and most dataset.
important structured knowledge sources we know of today. In Section 2, we take a look at the current state of the
Adopting Wikipedia’s practice of entrusting the community DBpedia project, with a focus on its main components. In
with most of the work, the DBpedia internationalization Section 3, we try to understand the issues related to the
committee has made a major step towards the move from move from the Serbian Wikipedia to the very first Serbian
unstructured to structured knowledge on the Web. Still, DBpedia. Section 4 outlines the achieved results, and
with new languages come new challenges. In this paper, we Section 5 concludes our findings and proposes future
inspect some common obstacles that need to be tackled in work.
order to add one more language to this popular data hub,
but also some that haven’t been encountered before in this II. DBPEDIA
domain. More specifically, we explore the digraphic nature
of the Serbian language, analyze the state of the DBpedia Creating and maintaining a multilingual knowledge
Extraction Framework with respect to its support for base can require an enormous (and, thus, expensive)
languages that use multiple scripts, and propose solutions amount of work. Crowdsourcing unstructured knowledge,
towards overcoming this problem. Finally, we deploy the on the other hand is free, and has given birth to Wikipedia
first digraphic DBpedia edition, taking the leading position – one of the most important knowledge sources mankind
amongst all DBpedia versions in the percentage of all knows of today. Being maintained by thousands of
covered Wikipedia templates, and all template occurrences contributors from all over the world, it is bound to grow
in Wikipedia that are mapped, while adding a valuable new even larger.
chapter to the DBpedia project and enriching the Linked Realizing the potential of this global effort, the Linked
Open Data Cloud even further. Data community has gathered around a joint project of
their own to breathe life into the Linked Data Cloud by
I. INTRODUCTION enriching it with crowdsourced knowledge straight from
The rise of the Linked Data Web has brought a Wikipedia, in RDF form.
paradigm shift to the world of information retrieval. We’re English DBpedia alone describes 4.0 million things1:
no longer interested in the short answers to who, what, 832,000 persons, 639,000 places, 372,000 creative works
where or when, but would also like to know about the (music albums, movies, video games etc.), 209,000
relations between and the background behind those organizations, 226,000 species, 5,600 diseases etc. All
answers. However, these connections imply structure, and versions of DBpedia together contain descriptions of 24.9
structured knowledge is often hard to come by. million things, (out of which 16.8 million are interlinked
DBpedia leverages the existing efforts by the ever- with the English DBpedia concepts), 12.6 million labels
growing Wikipedia communities worldwide by extracting and abstracts, 24.6 million image links, 27.6 million links
the crowdsourced data and transforming them to RDF. to external web pages. Moreover, they contain 45.0
Not only has this approach helped create a bridge between million links to other, external, LOD datasets, and 67.0
unstructured and structured data on the Web; it has pushed million Wikipedia, and 41.2 million YAGO category
DBpedia towards becoming the most important data hub links. Together, they make up a dataset of 2.46 billion
in the Linked Open Data Cloud (Figure 1). It is also one of RDF triples, out of which 1.98 billion were extracted from
the largest multilingual datasets on the Web of Data and, non-English language editions.
as such, expected to be able to cope with all the A. DBpedia Ontology
difficulties that go along with multilingualism.
The DBpedia Ontology is a manually created directed-
In this paper, we give a detailed survey of the acyclic graph based on the most commonly used
internationalization challenges standing in the way of infoboxes within Wikipedia. It is a shallow, cross-domain
enriching DBpedia with more languages, an analysis of ontology that covers 529 classes which form a
the state of the DBpedia framework itself with respect to subsumption hierarchy and are described by 2,333
such challenges, and a report on the completed and different properties2.

The research presented in this paper is partly financed by the European


Union (FP7 LOD2 project, Pr. No: 257943), and in part by the Ministry
of Science and Technological Development of Republic of Serbia
(SOFIA project, Pr. No: TR-32010). 1
http://wiki.dbpedia.org/About
2
http://wiki.dbpedia.org/Ontology

Page 91 of 478
ICIST 2014 - Vol. 1 Regular papers

TABLE I.
MAPPING THE INFOKUTIJA BIOGRAFIJA INFOBOX

Template mapping
Map to class Person
Property mapping
Template property datum_rođenja
Ontology property birthDate

The template parameter values are parsed according to


the data types specified in the DBpedia Ontology and the
Figure 1. DBpedia at the heart of the LOD Cloud
Wikipedia resources are classified based on the used
A downside to any mass community effort is often lack infobox template and additional classification rules
of standardization and quality control. As such, the specified in the mapping configurations. TABLE I shows
DBpedia Ontology, as detailed as it is, appears to be a mapping example with a specified class and a single
suffering from many issues caused by the ad hoc nature of property representing the date of birth of a person.
the solutions contributed by its creators for their specific
C. Mappings Wiki
purposes. Some of these problems are often reflected in
the missing classes, properties, as well as the inadequate Although it may appear that the fact that the mappings
names, ranges and domains, defined for some of those need to be created manually takes the entire idea of
properties. For example, the constellation property is exploiting the crowdsourcing efforts of the Wikipedia
limited to the Galaxy domain. However, other community one step back, the actual amount (and nature)
CelestialBody instances often also have references to of work is trivial in comparison with that of maintaining
constellations, yet no other property exists for this Wikipedia. Thanks to the Mappings Wiki, infobox
purpose. templates can be collaboratively mapped to the
An even detailed look using an ontology evaluation corresponding ontology classes and properties, across
methodology, such as OntoClean [1], would reveal other different Wikipedia language editions [4]. Moreover, the
suboptimal engineering choices. For instance, the Wiki can validate a single mapping without starting up the
subsumption hierarchy has many typical role-concepts [2] entire Extraction Framework, and also let you preview the
(e.g. Celebrity, Engineer, Criminal, Athlete etc.) mapping result by triplifying the infoboxes of a small
subsumed by a basic-concept - Person, which might not number of test articles on the fly.
be appropriate for an ontology with such wide coverage There are currently 26 language specific mappings in
(or any reusable ontology, for that matter). This, however, DBpedia’s Mapping Wiki1.
could be well outside the scope of this paper and will not
be covered here. III. BUILDING A SERBIAN DBPEDIA
The case of the Greek DBpedia brought attention to the
B. Information Extraction Framework many issues related to internationalization and paved the
The Wikipedia information is extracted and triplified way for other languages that use non-US-ASCII encoded
using a flexible and extensible framework (DBpedia scripts.
Information Extraction Framework – DIEF), written in The Serbian version of Wikipedia is by no means the
Scala and structured into different modules, where the largest, or the richest Wikipedia amongst the 287 versions
core module contains the essential components of the currently available2. It is 26th in article count, with
framework, and the dump extraction module is aimed at approximately 242,000 articles. However, it is one of very
Wikipedia data extraction. The framework also relies on a few editions supporting multiple writing systems, due to
growing set of extractors – mappings from page nodes to Serbian being the only European language with active
graphs of statements describing those nodes. The DIEF synchronic digraphia (using two scripts for the same
provides a total of 16 extractors, 7 of which are language language).
(mostly English) specific. Two of them are aimed at the
richest sources of structured knowledge on Wikipedia: A. Wikipedia in Serbian
infobox templates. We have mentioned earlier the problems that often go
The Generic Infobox Extractor is tuned to create triples along with many mass community projects.
using the article URI as the subject, the infobox property Unfortunately, Wikipedia is no different, and the lack of a
name (in camel case form, appended to the coordinated approach is reflected in data errors, noise and
http://dbpedia.org/property namespace) as the redundancy.
predicate, and the attribute value as the object. Take, for example, one of the most frequently occurring
With the introduction of the internationalization (i18n) infobox properties defining the Web page of a resource
filters, the Mapping-based Infobox Extractor has become (foaf:homepage in RDF). A total of 36 variants of this
the single most important extractor for any property exist in the Serbian Wikipedia, the most common
internationalization effort. It relies on manually created of which, веб-страна, alone is used in 108 infobox
Wikipedia infobox property to DBpedia Ontology
property mappings (using a relatively simple syntax) to
extract triples, binding any mapped properties to the 1
http://mappings.dbpedia.org/
http://dbpedia.org/ontology namespace. 2
http://meta.wikimedia.org/wiki/List_of_Wikipedias

Page 92 of 478
ICIST 2014 - Vol. 1 Regular papers

templates. веб is found on 28 occasions, вебсајт on 16,


страница on 10, website on 7 etc.
The digraphic nature of the language only doubles the Wikipedia
Transliteration
potential for error. The same often happens with entire dump
infobox templates. For instance, there are three different
infoboxes used to describe an actor: Глумац, Кутијица за
глумце, and Glumac-lat.
It is clear that Wikipedia itself is in need of a collective
effort towards standardization and a common vocabulary.
DIEF Fusion
B. Coping with Digraphia
The issue of digraphic Wikipedia is best illustrated in
the case of information retrieval. Most of the Serbian
online communities rely on the Latin alphabet for
communication/interaction on the Web. That means a DBpedia
large portion of the information available online is (and, dump RDF Store
often, expected to be) encoded in ISO 8859-2 (i.e. Latin-
2). And, yet, most of the information in the Serbian
Wikipedia dumps is encoded in Cyrillic. So, unless the
information retrieval software performs transliteration
(romanization or cyrillization) on-the-fly (at retrieval time, Figure 2. DBpedia Extraction process
as in the case of Wikipedia), many attempts at information
extraction will be doomed to fail. This directly affects constructs that can be used to keep the original encoding
common tasks such as keyword search, label-based of a string. However, there are no mechanisms available
SPARQL querying, named entity recognition, etc. for accomplishing the same based on a generated DBpedia
As it may be unrealistic to impose this requirement on RDF dump. Moreover, in cases where transliteration is
the software developers, the only reasonable, yet, perhaps, possible, doing so is not always as easy as with
not so elegant workaround is to have the knowledge base romanization.
keep the information encoded in both character sets. TABLE II shows the nature of Serbian digrams (pairs
Although such approach would double the space of characters, each used to identify a single phoneme).
requirements needed for storing any Cyrillic or Latin The rule says that any њ character is always transliterated
string literal, there is also the matter of perspective - one to nj. Ideally, the same should hold for the other way
could argue that although the information being stored is around, but there are exceptions. For instance, konjunkcija
essentially the same, the very fact that different character (en: conjunction) is not transliterated to коњункција, but
sequences are needed to describe the same piece of конјункција, as that is the original form of the word (н
knowledge makes this problem fall into the domain of and ј are treated as separate characters).
multilingualism. Therefore, we perform only romanization of Cyrillic
In such a case, a single IRI would still be used, but two string literals in our post-processing module.
separate triples would be stored for any string literal in C. Serializing the DBpedia Dumps
Serbian. For example:
Serializing multilingual data in RDF is not a
straightforward process either [5]. Below we take a look at
http://sr.dbpedia.org/resource/парсер
the serialization formats supported by the DIEF, with
rdfs:label "Парсер"@sr ;
respect to their support for internationalization.
rdfs:label "Parser"@sr .
N-Triples, a subset of Notation 3, lack shortcuts such
as CURIEs, which is why they are less readable and more
It is worth noting that the IANA Language Subtag difficult to create manually. What's more important for our
Registry1 contains separate tags for Serbian Latin and task, is that they support only the 7-bit US-ASCII
Cyrillic (sr-Latn and sr-Cyrl, respectively), but lists them character encoding instead of UTF-8, meaning there's no
as redundant. support for IRIs either .
As the current version of the DIEF doesn’t provide the
means for transliteration, let alone duplicating literals, we TABLE II.
developed a small post-processor that first transliterates SERBIAN DIGRAMS
the strings in the DIEF dumps, and then merges them back
with the original dump (Figure 2). It should be noted that, Serbian Serbian International
although it is safe to assume that all Cyrillic strings can be Latin Cyrillic Phonetic Alphabet
transliterated directly to their Latin counterparts, vice- alphabet alphabet (IPA) value
versa is not always straight forward. For instance, many
resources on the Serbian Wikipedia have labels that reflect Lj lj Љљ /ʎ/
their original names that are not meant to be transliterated Nj nj Њњ /ɲ/
(e.g. ASCII would never be transliterated to АСЦИИ). The
Serbian edition of Wikipedia has a number of syntactical Dž dž Џџ /dʒ/
1
http://www.iana.org/assignments/language-subtag-registry/language-
subtag-registry

Page 93 of 478
ICIST 2014 - Vol. 1 Regular papers

Turtle (Terse RDF Triple Language) is a subset of, and TABLE III.
compatible with, Notation 3 and a superset of the minimal EXTRACTION RESULTS
N-Triples format. It's compact, human-readable, and UTF-
8 based and, therefore, makes a great solution for Resource type Number of triples
internationalization [7]. Turtle is also part of the SPARQL Instance types 979,022
query language for expressing graph patterns. Labels 553,368
TriX is an experimental format that aims to provide a Mappings 1,519,382
highly normalized, consistent XML representation for
RDF graphs, allowing the effective use of generic XML Transliterated literals 415,406
tools such as XSLT, XQuery, etc. Its form helps it retain Total 3,467,178
expressivity, while providing a compact and readable
alternative to RDF/XML [8].
The mapped infobox properties hold 1,016,180 string
RDF/JSON (JavaScript Object Notation), requires less literals (Cyrillic, Latin, purely numeric and other),
overhead with respect to parsing and serializing than 415,406 of which are encoded in Cyrillic. As previously
XML, and encodes text in Unicode, thereby making the described, the same number is transliterated to the Serbian
use of IRIs possible. The percent character doesn't need Latin alphabet using our post-processor, and then fused
special treatment; the only characters that need escaping back with the original dataset.
are quotation marks, reverse solidus and the control
character (U+0000 through U+001F). RDF serialization in V. CONCLUSIONS AND FUTURE WORK
JSON follows a non-standardized specification, but can be Thanks to the achieved results, we’re certain the
considered a good overall solution for internationalization produced dataset will prove to be an invaluable resource
[9]. This serialization, however, is experimental and for the Serbian Linked Data community.
available only for the DBpedia Live module.
The future updates should involve less manual work, as
As it provides the most compact (non-experimental) most of the mappings are in place, and some of the post-
solution with full internationalization support, we serialize processing work is already being transferred to the
the Extraction Framework output in Turtle. DBpedia Extraction Framework. The Digraphic Extractor
IV. RESULTS will be able to automatically transliterate both Cyrillic and
Latin string literals, while skipping those that are not
As mentioned earlier, finding an appropriate mapping meant to be transformed (by detecting the MediaWiki
for every single Wikipedia infobox/property is not syntax constructs and magic words1 that prevent them
possible (nor needed). Still, using the Mapping wiki, we from being transliterated).
have successfully mapped: Furthermore, the findings concerning the ontology itself
 38.90 % of all templates in Wikipedia (440 of will be reported back to the DBpedia community with
1131). suggestions for improvements, such as using an evaluation
methodology to make it theoretically sound and complete,
 20.52 % of all properties in Wikipedia (6042 of
and, most of all, reusable. In order to give back to
29438).
Wikipedia, the results will be announced to the Serbian
 96.92 % of all template occurrences in Wikipedia Wikipedia community. To help better coordinate and
(174594 of 180146). standardize their efforts, the DBpedia ontology will be
 67.52 % of all property occurrences in Wikipedia recommended as a common vocabulary for Wikipedia
(1485832 of 2200536). infoboxes and properties.
The mapping statistics show that the Serbian DBpedia Finally, we’re safe to assume the collective experience
is on par with the best covered Wikipedias, taking the and findings we’ve come across on our way to producing
leading position in the percentage of all Wikipedia the largest Serbian Linked Data knowledge source should
templates that are mapped at 38.90%, and all template not only lead to better future versions of this particular
occurrences in Wikipedia that are mapped, with total dataset, but to a better DBpedia, in general.
coverage of 96.92% (Figure 3). REFERENCES
Configured to use only the label and the mapping [1] J. Brank, M. Grobelnik, and D. Mladenic, “A Survey of Ontology
extractor, the Extraction Framework produces a dataset of Evaluation Techniques”, Proceedings of Conference on Data
3,051,772 triples (TABLE III). This number can be further Mining and Data Warehouses, 2005
boosted by including other extractors, such as the Page [2] K. Kozaki, Y. Kitamura, M. Ikeda, and R. Mizoguchi, “Hozo: An
Links, Disambiguation, Wiki Page and, especially, the Environment for Building/Using Ontologies based on a
Infobox extractor. As previously mentioned, the last Fundamental Consideration of Role and Relationship”,
module increases the triple count at the expense of data Proceedings of the 13th International Conference Knowledge
Engineering and Knowledge Management, 2002, pp. 213-218
quality, by extracting all properties from all infoboxes. As
such property types are not part of a subsumption [3] D. Kontokostas, C. Bratsas, S. Auer, S. Hellmann, I. Antoniou, G.
Metakides, “Internationalization of Linked Data: The case of the
hierarchy and there is no consistent ontology for the Greek DBpedia edition”, Web Semantics: Science, Services and
infobox dataset, the use of this extractor is advised only if Agents on the World Wide Web, vol. XV, pp. 51-61, September
an application requires complete coverage of all 2012
Wikipedia properties (and noisy data is not an issue). The [4] S. Hellmann, C. Stadler, J. Lehmann, and S. Auer, “DBpedia live
Generic Infobox Extractor alone produces 2,795,427 extraction”, Proceedings. of 8th International Conference on
triples.
1
https://www.mediawiki.org/wiki/Help:Magic_words

Page 94 of 478
ICIST 2014 - Vol. 1 Regular papers

Ontologies, DataBases, and Applications of Semantics, vol. 5871 [7] D. Beckett, T. Berners-Lee, Turtle - Terse RDF Triple Language,
of Lecture Notes in Computer Science, 2009, pp. 1209 - 1223. 2011, Retrieved from http://www.w3.org/TeamSubmission/turtle/
[5] U. Milošević, “I18n of Linked Data Tools with respect to Western [8] J. J. Carroll, P. Stickler. RDF Triples in XML, 2004. Retrieved
Balkan Languages“, Proceedings of the 3rd International from http://www.hpl.hp.com/techreports/2003/HPL-2003-268.pdf
Conference on Information Society Technology, 2013 [9] I. Davis, T. Steiner, A. J. Le Hors, RDF 1.1 JSON Alternate
[6] W3C, N-Triples, Retrieved from Serialization (RDF/JSON), 2013, Retrieved from
http://www.w3.org/2001/sw/RDFCore/ntriples/ https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-json/index.html

100.00

80.00

60.00

40.00

20.00

0.00
ar bg bn ca cs de el en es eu fr ga hr hu id it ja ko nl pl pt ru sk sl sr tr

% of all templates in Wikipedia that are mapped % of all properties in Wikipedia that are mapped
% of all template occurrences in Wikipedia that are mapped % of all property occurrences in Wikipedia that are mapped

Figure 3. Mapping statistics

Page 95 of 478
ICIST 2014 - Vol. 1 Regular papers

Statistical Composite Indicator for Estimating the


Degree of Information Society Development
Marina Dobrota*, Jovana Stojilković*, Ana Poledica*, Veljko Jeremić*
*
University of Belgrade, Faculty of Organizational Sciences, Serbia
[email protected], [email protected], [email protected], [email protected]

Abstract - The development of information and contributes to prosperity on many levels: productivity
communication technologies and the rapid development of gains resulting from the development of ICTs, creating
the IT sector have greatly contributed to the development of new business models and opportunities, creating better
society in general. In recent years, the techniques for educational performance. These are the main reasons to
measuring the development of information and examine the issue of measuring the counties’ development
communication technologies have begun to rise. Subject of rate from the informational perspective [8].
this study is to measure the development of the countries’ Recently, a group of authors made a study in order to
ICT infrastructure, using statistical composite index. In measure information development, creating the ICT
order to measure the development, we will use composite development index (IDI) [9]. It is a composite indicator
IDI index, with special emphasis on the improvement of the
that completely observes information society, and whose
index. In addition to the existing IDI method of ranking, we
main purpose was to measure the level of ICT
will use the I-distance. A comparative analysis of the created
development, progress in ICT development, differences
and the existing indexes shall be given.
between countries with different levels of ICT
Keywords – Information and Communication Technology, development, and the development potential of ICTs.
Composite I-distance indicator, Information society Regarding the issue of composite indicators, each
development multi-criteria performance measurement is formed as one,
and its stability ensures the amount of safety of the
observed system. The importance of securing the safety of
I. INTRODUCTION complex systems has been recognized by various risk
Twentieth century was marked by a number of analysts in industrial and nonindustrial sectors [10, 11, 12,
technological innovations. Information and 13, 14, 15, and 16]. The selection of an appropriate
Communication Technology (ICT) has changed the way methodology is central to any attempt to capture and
of life, learning, and work, and even larger transformation summarize the interactions among the individual
of the way that people, companies, and public institutions indicators included in a composite indicator or ranking
interact is expected in the future. Changes in all aspects of system [17, 18].
society by using ICT are developing the information Paruolo, Saisana and Saltelli [19] consider that
society. composite indicators aggregate individual variables with
Every day the number of people that use information the aim to capture certain relevant, yet maybe latent,
technology grows in the world. Developed countries use dimensions of reality. Composite indicators are often
ICT to improve their socio - economic development, since applied and constructed [20], and they have been adopted
it represents relevant productive and economic forces. by a lot of institutions, both for specific purposes and for
Countries are constantly evaluating their positions and providing a measurement basis for the issues of interest.
perspectives within ICT development [1], making an Saltelli et al. [21] characterize the issue of composite
effort to progress, and aiming to build an inclusive indicators as follows: “Composite indicators tend to sit
information society [2, 3]. Certain authors introduced ICT between advocacy (when they are used to draw attention
indicator as highly important in determining countries’ to an issue) and analysis (when they are used to capture
welfare [4]. complex multidimensional phenomena).” Results and
values of composite indicators significantly depend on the
Numerous institutions continuously monitor
indicator weights, and therefore are often the subject of
developments related to the increased use of technology
controversy [23, 19].
and the Internet. Year 2013th was marked by significant
numbers: 95% of world population uses mobile phones, There are common conclusions from different studies
40% of world population uses Internet, etc. ICT is a basic that multi-criteria methodology definitions suffer from a
structural part of modern society, and it has a wide social ranking instability syndrome [14, 23]; i.e. different
and economic impact. It plays an important role in researchers offer conflicting rankings as to what is “best”.
strengthening economic growth and raising socio- According to Keung, Kocaguneli and Menzies [14], given
economic development [5, 6]. Thus for example, IT different historical datasets, different sets of best ranking
literacy – the skill of handling information – is becoming methods exist under various different situations, which is
crucial for individuals future success, while IT experts are also one of the important directives for our research.
infallible part of big business. IT literacy can be This paper primarily proposes an improved
considered a 21st-century form of literacy [7]. In addition, methodology of measuring the information society
ICT influences the industrial structure of regions and development. Instead of subjectively assigned weights,

Page 96 of 478
ICIST 2014 - Vol. 1 Regular papers

basic idea is to statistically measure the weights for each may also be by mobile phone, game console, digital
indicator that IDI consists of. This paper will be organized TV, etc... Access can be via a fixed or mobile
as follows: in Section 2 a review of IDI methodology is network.
given. Section 3 describes the basic concepts of our • Internet Users per 100 Inhabitants – The increasing
proposed methodology, and section 4 presents the results. use of the Internet through mobile devices is not
In the final chapter concluding remarks are given. necessarily reflected in these estimates.
• Fixed Broadband Internet Subscribers per 100
II. IDI INDEX Inhabitants – Subscribers to paid, high-speed access
The ICT Development Index (IDI) is a tool used to to the public Internet (over a TCP/IP connection).
monitor information society development and the Subscribers with access to data communications
development potential of ICTs, which combines 11 (including the Internet) via mobile cellular networks
indicators related to ICT Access, Use and Skills into a are excluded in this indicator.
single composite index [9]: • Mobile Broadband Subscriptions per 100 Inhabitants
• ICT Access reflects the level of network infrastructure – Subscriptions to mobile cellular networks with
and access to ICTs, capturing its readiness. It includes access to data communications at broadband speeds,
five infrastructure and access indicators: fixed irrespective of the device used to access the Internet
telephony, mobile telephony, international Internet (handheld computer, laptop, or mobile cellular
bandwidth, households with computers, and telephone). These services are typically referred to as
households with Internet access. “3G” or “3.5G”.
• ICT Use reflects the level of use of ICTs in society, • Adult Literacy Rate – The percentage of the
capturing its intensity. It includes three ICT intensity population aged 15 years and over who can both read
and usage indicators: Internet users, fixed broadband, and write, as well as understand a short simple
and mobile broadband. statement regarding his/her everyday life.
• ICT Skills reflects the result/outcome of efficient and • Gross Enrolment Ratio (Secondary and Tertiary
effective ICT use, capturing its capability or skills as Level) – the total enrolment in a specific level of
indispensable input indicators. It includes three proxy education, regardless of age, expressed as a
indicators: adult literacy, gross secondary and tertiary percentage of the eligible official school-age
enrolment. population that corresponds to the same level of
The main data source used in this study is a set of the education in a given school-year.
11 aforementioned IDI indicators, of which the first five The IDI index methodology has predefined the pattern
refer to ICT Access, the next three to ICT Use, and the last for measuring a country’s ICT structure, which includes
three represent ICT Skills. These indicators are [9]: ICT Access and ICT Use at 40% and ICT Skills at 20%.
• Fixed Telephone Lines per 100 Inhabitants – The detailed weights are given in Table 1.
Telephone lines connecting a subscriber’s terminal
equipment to the public switched telephone network III. PROPOSED METHODOLOGY
(PSTN) and which have a dedicated port on a
telephone exchange, though this may not be the same A. I-distance method
as an access line or a subscriber. The number of The common case with different ranking methods is
ISDN channels and fixed wireless subscribers are that their bias and subjectivity can affect the
included in this indicator. measurements and evaluation. This problem can
• Mobile Cellular Telephone Subscriptions per 100
Inhabitants – The number of subscriptions to a public TABLE I.
mobile telephone service using cellular technology, IDI INDEX INDICATORS AND THEIR WEIGHTS
which provides access to the Public Switched
Telephone Network (PSTN). While post-paid and Category Indicators % weights
prepaid subscriptions are included in this indicator, it Fixed Telephone Lines per 100
20
does not differentiate between subscriptions and Inhabitants
subscriber (person). Therein, as one subscriber may Mobile Cellular Telephone
20
ICT access

Subscriptions per 100 Inhabitants


have multiple subscriptions, it would be useful to International Internet Bandwidth (bit/s)
distinguish further between the number of mobile 20 40%
per Internet User
subscriptions and the number of individuals using a The Proportion of Households with a
20
mobile phone. Computer
• International Internet Bandwidth (bit/s) per Internet The Proportion of Households with
20
Internet Access
User – The capacity that backbone operators provide
to carry Internet traffic. This is measured in bits per Internet Users per 100 Inhabitants 33
ICT use

second per Internet user. Fixed Broadband Internet Subscribers


33 40%
• The Proportion of Households with a Computer – A per 100 Inhabitants
Mobile Broadband Subscriptions per 100
computer refers to a desktop or laptop computer. This Inhabitants
33
does not include equipment that may have some
Adult Literacy Rate 33
embedded computing abilities, such as mobile
ICT skills

cellular phones, personal digital assistants or TV sets. Secondary Gross Enrolment Ratio 33 20%
• The Proportion of Households with Internet Access at Tertiary Gross Enrolment Ratio 33
Home – not assumed to be only via a computer. This

Page 97 of 478
ICIST 2014 - Vol. 1 Regular papers

somewhat be surpassed using the I-distance method, a For each year we have calculated I-distance values, and
metric distance in an n-dimensional space, which has created I-distance ranks. Subsequently, we have examined
recently made a significant breakthrough [24, 25, 26, 27, the stability for each of the compounding indicators, by
28]. It was originally defined by professor Branislav calculating the Pearson correlations between the I-distance
Ivanovic [29, 30], who devised this method to rank results and input indicators.
countries according to their level of development based on The main reason for using Pearson correlations between
several indicators, where the main issue was how to use the I-distance results and input indicators in this
all of them in order to calculate a single synthetic methodology is the special feature of I-distance method: it
indicator, which will thereafter represent the rank. is able to present the relevance of input indicators. Instead
For a selected set of variables XT=() of defining subjective weights to input indicators, as it is
X = ( X 1 , X 2 , ... X k ) chosen to characterize the entities, done within the IDI index, I-distance method defines
T

which of the input indicators are most important for


the I-distance between the two entities ranking process, by putting them into a specific order of
er = ( x1 r , x2 r , ... xkr ) and es = ( x1 s , x2 s , ... xks ) is defined as importance according to these correlations.
Next step in the proposed methodology, was calculating
the new weights for each of the compounding indicators,
di ( r, s ) i −1

∏ (1 − r )
k

D( r , s ) = ∑ ji .12... j −1 (1)
which are based on the appropriate Pearson correlations.
Weights are formed by weighting the empirical Pearson
i =1 σi j =1
correlations: values of correlations are divided with sum
of correlations. The final sum equals 1, thus forming the
where di(r,s) is the distance between the values of variable novel appropriate weighting system:
Xi for er and es, e.g. the discriminate effect,
ri
d i ( r , s ) = xip − xis i ∈ {1,...k } (2)
wi = k
(4)
∑r
i =1
i

σi the standard deviation of Xi, and rji.12..j-1 is a partial


coefficient of the correlation between Xi and Xj, (j<i) [29,
31]. where ri is a Pearson correlation between i-th input
In order to surpass the problem of negative coefficient variable and I-distance value.
of partial correlation, which can occur when it is not Final weights represent the means of acquired values.
possible to achieve the same sign mark for all variables in This is one of the significant contributions of our paper,
all sets, it is suitable to use the square I-distance. It is because instead of subjectively defining the values of
given as: weights, our principle is based on methodological and
statistical concept, defined by I-distance method. This way
d i2 ( r , s ) i −1 we are able to significantly improve IDI methodology and
∏ (1 − r )
k
D2 ( r, s ) = ∑ 2
ji .12... j −1
(3) propose a novel improved composite I-distance indicator,
i =1 σ i2 j =1
which would measure the information society
development. The proposed weights and the ranking
results are given in the results section.
The I-distance measurement is based on calculating the
mutual distances between the entities being processed, IV. RESULTS
whereupon they are compared to one another, so as to
create a rank. It is necessary to fix one entity as a referent First step in our research implies calculating I-distance
in the observing set using the I-distance methodology. The values for IDI index, for each year from 2008 to 2012.
ranking of entities in the set is based on the calculated The inputs in calculating I-distance values are eleven
distance from the referent entity. indicators (see Table 1) which constitute IDI index. The
relevance of I-distance ranking is presented and elaborated
B. Composite I-distance indicator methodology in number of scientific papers [23, 26, 27, and 28]. Thus,
In order to create more stable ranking methodology we we were able to gain I-distance values for four
have modified the weights given by the original consecutive years. Subsequently, we have calculated the
methodology. The process of establishing adequate correlations between I-distance values and the whole set
weights shall be described in detail. Proposed of input indicators [26, 27] for all referred years.
methodology is referred to as Composite I-distance Since the results have shown to be quite stable, and
indicator (CIDI) methodology. there were no large oscillations between the correlations,
The study includes the analysis and data collection of we have calculated the new weights for each
the composite IDI index and ranks of the countries for compounding indicator which are based on the appropriate
2008, 2010, 2011 and 2012, as well as search for new correlations of these items. The proposed weights are
indexes to improve the existing one. As mentioned before, calculated by dividing the appropriate correlations with
the current structure of the composite indicator is used the sum of correlations, providing the sum to equals 1 (see
since 2008. Yet, the development of ICT is a continuous Section 3). Thus we have obtained the appropriate weights
process, so the composite index must constantly be for input indicators. Weights for years 2008, 2010, 2011
improved. and 2012, with mean values and standard deviations are
presented in Table 2.

Page 98 of 478
ICIST 2014 - Vol. 1 Regular papers

TABLE II.
WEIGHTINGS OF IDI INDICATORS BASED ON I-DISTANCE METHODOLOGY

CIDI weights IDI weights

Category Indicators 2008 2010 2011 2012 mean weights weights

Fixed Telephone Lines per 100 Inhabitants 0.106 0.110 0.098 0.092 0.101 0.08

Mobile Cellular Telephone Subscriptions per 100 Inhabitants 0.086 0.082 0.088 0.080 0.084 0.08
ICT access

International Internet Bandwidth (bit/s) per Internet User 0.127 0.104 0.122 0.126 0.120 0.502 0.08 0.4

The Proportion of Households with a Computer 0.097 0.100 0.094 0.099 0.098 0.08

The Proportion of Households with Internet Access 0.099 0.101 0.096 0.101 0.099 0.08

Internet Users per 100 Inhabitants 0.097 0.100 0.091 0.096 0.096 0.133
ICT use

Fixed Broadband Internet Subscribers per 100 Inhabitants 0.084 0.082 0.100 0.097 0.091 0.27 0.133 0.4

Mobile Broadband Subscriptions per 100 Inhabitants 0.071 0.074 0.089 0.096 0.083 0.133

Adult Literacy Rate 0.072 0.079 0.072 0.073 0.074 0.067


ICT skills

Secondary Gross Enrolment Ratio 0.077 0.081 0.071 0.077 0.076 0.228 0.067 0.2

Tertiary Gross Enrolment Ratio 0.083 0.086 0.079 0.064 0.078 0.067

Proposed weights are mean values of weights As can be noted from Table 3, there are a lot of
calculated for period from 2008 to 2012. Table 2 also similarities between IDI scores and CIDI scores. This is
presents the differences in weights proposed by IDI index due to the fact that weights that we have gained using our
and I-distance weights. methodology are quite correspondent to official IDI index,
If we examine Table 2, we can notice significant yet somewhat different because they are based on the
aberrations from the officially defined weights. The IDI correlations between input indicators and I-distance
methodology have given the highest weights to the group values. These differences, shown in Table 2, cause some
of indexes for ICT use, while, compared to it, the I- differences in ranks, which can also be seen in Table 3.
distance method gave slightly higher weights to ICT In order to assess the composite I-distance indicator, a
access. We have calculated the weights of indicators each Pearson correlation coefficient between the composite I-
year using the I-distance method. To some extent, I- distance indicator scores and IDI index has also been
distance weights deviate from their means, but when the calculated. Such correlation is significant at a 0.001 level
given mean of the weights obtained by the I-distance (p < 0.001), and very strong, r = 0.998. The fact that our
method is compared to the official IDI weights, they are indicator correlates so closely with the IDI index, proves
greater than or approximately equal for each indicator. that it is equally suitable and greatly connected to the
The largest differences are with indicators Mobile subject of interest. This validates the composite I-distance
Broadband Subscriptions per 100 Inhabitants, Fixed indicator as an acceptable measurement for evaluating
Broadband Internet Subscribers per 100 Inhabitants, information society development. As for the compared
International Internet Bandwidth (bit/s) per Internet User, rankings gained by the two methods, a Spearman’s rho
and Internet Users per 100 Inhabitants. International statistic has additionally been calculated. The correlation
Internet Bandwidth (bit/s) per Internet User is weighted is also significant with rs = 0.999, p < 0.001.
8% according to the official IDI index, while our method As mentioned before, our principle provides the
calculates the share of this indicator to be 12%, giving it a methodologically and statistically justified weights, which
larger weight. Mobile Broadband Subscriptions per 100 are derived from the correlations calculated for 2008,
Inhabitants, Fixed Broadband Internet Subscribers per 2010, 2011, and 2012. In addition, this method provides a
100 Inhabitants, and Internet Users per 100 Inhabitants different perspective on the importance of each input
are weighted 13.3% according to the official IDI index, variable, and a correction in the weighting factor for each
while our method calculates their shares to be smaller of the eleven input indicators. Moreover, not only our
(about 9%), thus lowering its significance. Other values methodology gives more accurate result, but it is also
are more-less consistent with official IDI weights, more stable than the official IDI index. If we calculate the
showing our proposed method to be meaningful and weights using the I-distance method and according to the
profound. composite I-distance indicator methodology, we are able
On the ground of these matters, Table 3 presents the to gain more stable results and to decrease the entropy of
results of our research, giving the composite I-distance the ranking system.
indicator scores, composite I-distance indicator ranks, as
well as the comparison of our scores and official IDI V. CONCLUSION
index scores. The results are shown for 30 firstly ranked The results have shown that the approach to measuring
countries, 10 lastly ranked countries, as well as for Serbia information society development using I-distance method
and its neighboring countries, such as Croatia, Hungary, is very important. I-distance uses the same basic
etc. The whole set of the results, for 156 countries, is indicators, but classifies them according to significance,
available upon request. thus creating composite I-distance indicator (CIDI). This

Page 99 of 478
ICIST 2014 - Vol. 1 Regular papers

paper briefly presented the results of a research that


TABLE III. introduces a new perspective on the measurement of
CIDI SCORES, CIDI RANKS, AND ITS COMPARISON WITH OFFICIAL IDI
SCORES AND RANKS
countries’ information development and compares it with
the already familiar ICT development index (IDI). Like
Country
CIDI CIDI IDI IDI IDI, CIDI used the same eleven indicators related to the
score rank score rank three ICT categories: Access, Use and Skills. Yet unlike
Korea 8.748 1 8.57 1 the IDI, it assigns different weights to those indicators.
Iceland 8.678 2 8.36 3 The important preference of our methodology is its
Denmark 8.568 3 8.35 4 correspondence to official IDI index. The results of the
Sweden 8.530 4 8.45 2 scores (r=0.998, p<0.001) as well as the results of ranks
Hong Kong 8.456 5 7.92 10 (rs=0.999, p<0.001) are in an intense agreement.
Netherlands 8.365 6 8 7 Thereunder, the weights that are assigned to appropriate
Finland 8.357 7 8.24 5 indicators are quite similar (Table 2). These facts ensure
Luxembourg 8.318 8 7.93 9
the recognition and acknowledgement of our proposed
method. Yet our methodology offers significant
Norway 8.260 9 8.13 6
improvements and updating.
Australia 8.260 10 7.9 11
United Kingdom 8.234 11 7.53 16
First important improvement is reflected in the
objectiveness of our methodology. By defining the
Switzerland 8.124 12 7.78 13
methodology of CIDI, we have overcome the
Japan 8.007 13 7.82 12 disadvantage of subjectively assigned weights to the set of
New Zealand 7.996 14 7.64 15 input indicators. Instead of using the bias weights, we
Singapore 7.961 15 7.65 14 have established the weighting system based on a
France 7.932 16 7.53 17 multivariate statistical and methodologically grounded
Germany 7.867 17 7.46 18 course. Furthermore, the weighting system that we have
United States 7.717 18 7.98 8 proposed in forming the CIDI is far more stable,
Canada 7.687 19 7.38 19 producing a high degree of confidence.
Ireland 7.674 20 7.25 22
Austria 7.655 21 7.36 20 REFERENCES
Belgium 7.606 22 7.16 24 [1] Lee, H, & Hwang, J. (2004). ICT Development in North Korea:
Changes and Challenges. Information Technologies and
Malta 7.586 23 7.25 23
International Development, 2(1), pp. 75-88, doi:
Estonia 7.454 24 7.28 21 10.1162/1544752043971206
Spain 7.434 25 6.89 26 [2] Parker, S. (2011). The digital divide is still with us. Information
Israel 7.412 26 7.11 25 Development, 27, pp. 84-84.
Slovenia 7.184 27 6.76 27 [3] Vicente, M.R., & Lopez, A.J. (2011). Assessing the regional
digital divide across the European Union-27. Telecommunications
Barbados 7.074 28 6.65 28
Policy, 35, pp. 220-237.
Italy 6.922 29 6.57 29
[4] Jeremic, V., Markovic, A., & Radojicic, Z. (2011). ICT as crucial
Greece 6.914 30 6.45 31 component of socio-economic development. Management, 16(60),
… pp. 5-9.
Croatia 6.606 37 6.31 37 [5] Apostol, D.M. (2009). Knowledge, education and technological
progress in the new economy. Metalurgia International, 14(SI5),

pp. 78-81.
Russian Federation 6.530 40 6.19 39
[6] Dimelis, S.P., & Papaioannou, S.K. (2011). ICT growth effects at
Hungary 6.467 41 6.1 41 the industry level: A comparison between the US and the EU.
… Information Economics and Policy, 23(1), pp. 37-50.
Romania 5.820 52 5.35 54 [7] Leung, L. (2010). Effects of Internet Connectedness and
Information Literacy on Quality of Life. Social Indicators

Research, 98(2), pp. 273-290.
Serbia 5.758 54 5.34 55
[8] Dobrota, M., Jeremic, V., & Dobrota, M. (2012). ICT
… Development in Serbia: Position and Perspectives. Proceedings of
TFYR Macedonia 5.550 57 5.19 56 the 18th International Conference on Information and
… Communication Technologies YU INFO (YU INFO 2012),
compact disc, pp. 18-23. ISBN: 978-86-85525-09-4
Turkey 5.108 67 4.64 68
[9] MIS (2013). Measuring Information Society 2013. Retrieved on
Bosnia and Herzegovina 5.080 68 4.71 66 November 7th 2013, from http://www.itu.int/en/ITU-
… D/Statistics/Documents/publications/mis2013/MIS2013_without_
Mozambique 1.599 147 1.31 147 Annex_4.pdf
Ethiopia 1.578 148 1.24 150 [10] Arndt, S., Acion, L., Caspers, K., & Blood, P. (2013). How
Reliable Are County and Regional Health Rankings? Prevention
Madagascar 1.566 149 1.28 148 Science, 14(5), 497-502. doi:10.1007/s11121-012-0320-3
Guinea 1.547 150 1.23 151 [11] Monferini, A., Konstandinidou, M., Nivolianitou, Z., Weber, S.,
Eritrea 1.505 151 1.2 152 Kontogiannis, T., Kafka, P., Kay, A.M., Leva, M.C., &
Guinea-Bissau 1.484 152 1.26 149 Demichela, M. (2013). A compound methodology to assess the
impact of human and organizational factors impact on the risk
Burkina Faso 1.464 153 1.18 153 level of hazardous industrial plants. Reliability Engineering &
Niger 1.280 154 0.99 156 System Safety, 119, 280-289. doi:10.1016/j.ress.2013.04.012
Chad 1.255 155 1.01 154 [12] Wainwright, H.M., Finsterle, S., Zhou, Q.L., & Birkholzer, J.T.
Central African Rep. 1.222 156 1 155 (2013). Modeling the performance of large-scale CO2 storage

Page 100 of 478


ICIST 2014 - Vol. 1 Regular papers

systems: A comparison of different sensitivity analysis methods. Sensitivity Analysis. In Global Sensitivity Analysis. The Primer,
International Journal of Greenhouse Gas Control, 17, 189-205, by John Wiley & Sons, Ltd, Chichester, UK.
doi:10.1016/j.ijggc.2013.05.007 doi: 10.1002/9780470725184.ch1
[13] Mahsuli, M., & Haukaas, T. (2013). Sensitivity measures for [22] Saltelli, A. (2007). Composite Indicators between analysis and
optimal mitigation of risk and reduction of model uncertainty. advocacy. Social Indicators Research, 81, 65–77.
Reliability Engineering & System Safety, 117, 9-20. [23] Jovanovic, M., Jeremic, V., Savic, G., Bulajic, M., & Martic, M.
doi:10.1016/j.ress.2013.03.011 (2012). How does the normalization of data affects the ARWU
[14] Keung, J., Kocaguneli, E., & Menzies, T. (2013). Finding ranking? Scientometrics, 93(2), 319-327. doi: 10.1007/s11192-
conclusion stability for selecting the best effort predictor in 012-0674-0
software effort estimation, Automated Software Engineering, [24] Jeremic, V, Bulajic, M., Martic, M., Radojicic, Z. (2011). A fresh
20(4), 543-567. doi:10.1007/s10515-012-0108-5 approach to evaluating the academic ranking of world universities.
[15] Guttorp, P., & Kim, T.Y. (2013). Uncertainty in Ranking the Scientometrics, 87(3), pp. 587-596, doi:10.1007/s11192-011-
Hottest Years of U.S. Surface Temperatures, Journal of Climate, 0361-6.
26(17), 6323-6328. doi:10.1175/JCLI-D-12-00760.1 [25] Radojicic, Z., & Jeremic, V. (2012). Quantity or quality: What
[16] Saisana, M., D’Hombres, B., & Saltelli, A. (2011). Rickety matters more in ranking higher education institutions? Current
numbers: volatility of university rankings and policy implications. Science, 103(2), 158-162.
Research Policy, 40(1), 165–177. [26] Jeremic, V., Bulajic, M., Martic, M., Markovic, A., Savic, G.,
doi:10.1016/j.respol.2010.09.003 Jeremic, D., & Radojicic, Z. (2012). An Evaluation of European
[17] Saisana, M., & Tarantola, S. (2002). State-of-the-art Report on Countries Health Systems through Distance Based Analysis.
Current Methodologies and Practices for Composite Indicator Hippokratia, 16(2), 175-179.
Development, EUR Report 20408 EN, European Commission, [27] Dobrota, M., Jeremic, V., & Markovic, A. (2012). A new
JRC-IPSC, Italy. perspective on the ICT Development Index. Information
[18] Saisana, M., & D’Hombres, B. (2008). Higher Education Development, 28(4), doi:10.1177/0266666912446497
Rankings: Robustness Issues and Critical Assessment. How much [28] Jeremic, V., Jovanovic-Milenkovic, M., Martic, M., & Radojicic,
confidence can we have in Higher Education Rankings? Z. (2013). Excellence with Leadership: the crown indicator of
EUR23487, Joint Research Centre, Publications Office of the SCImago Institutions Rankings IBER Report. El Profesional de la
European Union, Italy. ISBN: 978 82 79 09704 1. Informacion, 22(5), 474-480.
doi:10.2788/92295
[29] Ivanovic, B. (1973). A method of establishing a list of development
[19] Paruolo, P., Saisana, M., & Saltelli, A. (2013). Ratings and indicators. Paris: United Nations educational, scientific and
rankings: voodoo or science? Journal of the Royal Statistical cultural organization.
Society: Series A (Statistics in society), 176(3), 609-634.
[30] Ivanovic, B., & Fanchette, S. (1973). Grouping and ranking of 30
doi:10.1111/j.1467-985X.2012.01059.x
countries of Sub-Saharan Africa, Two distance-based methods
[20] Bruggemann, R., & Patil, G. P. (2011). Ranking and Prioritization compared. Paris: United Nations educational, scientific and
for Multi-indicator Systems, Dordrecht, Springer. ISBN:978-1- cultural organization.
4419-8477-7
[31] Ivanovic, B. (1977). Classification Theory. Belgrade: Institute for
[21] Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Industrial Economic.
Gatelli, D., Saisana, M., & Tarantola, S. (2008). Introduction to

Page 101 of 478


ICIST 2014 - Vol. 1 Regular papers

System for modelling rulebooks for the


evaluation of scientific-research results. Case
study: Serbian Rulebook
Siniša Nikolić, Valentin Penca, Dragan Ivanović
University of Novi Sad/Faculty of Technical Sciences/Department of Computing and Automatics, Novi Sad, Serbia
{sinisa_nikolic, valentin_penca, chenejac}@uns.ac.rs

Abstract— The paper presents an example of a system for defining rulebooks used in the process of evaluation of
storing information about rulebooks which are used for researcher's results.
evaluation of scientific-research results of researchers.
System is based on the CERIF data model and it was II. RELATED WORK
implemented as extension of the current CRIS system of the
University of Novi Sad (CRIS UNS), where it is actively used
There are many ways in which researcher can
by research community. Case study for modelling rulebook contribute to the development of science. He can actively
which was issued by the Ministry of Education and Science participate in the management of research institutions/
of the Republic of Serbia was presented in this paper. organizations, be an organisational/programme committee
member in the realization of scientific events
(conferences, workshops, seminars...), advisor of PhD
I. INTRODUCTION thesis, member of scientific project, author/editor of
publication etc. In practice, the evaluation mainly relays
Evaluation in scientific-research domain is necessary as
on publication (promotion of new scientific ideas).
in all other areas. Today, we are witnesses of many
Publications can be evaluated individually (prone to
efforts, aimed at objective and efficient evaluation of
subjective decisions) or as a part of larger publication.
scientific-research results. Evaluation in scientific-
Evaluating the source (e.g. journal, conference) in which
research domain [1] is a process based on critical analysis
the paper is published is practical and objective, thus
of information and data which leads to a judgment of
many evaluations rulebooks are based on this principle.
merit. Main output of scientific-research activity is
scientific-research result as: journal paper, monograph, Three characteristic approaches can be distinguished in
etc. Evaluation of scientific-research results is done for the the evaluation process [4]:
purpose of: an election of a researcher in scientific and  Expert group (commission) which evaluates the
teaching positions within the scientific-research results based on the defined rules.
institutions; ranking of the researchers and the scientific  Usage of bibliometric indicators (impact factor, h-
institutions; financing of the scientific projects, etc. index, citations ...).
Nowadays, there are number of research management  Combination of the previous two, for example
systems for storing scientific research information. Those expert group which in its final score, also can take
research management systems take an important role in into account the value of bibliometric indicators.
the development of science [2]. Those systems usually
contain information on publications, researchers and The author of the paper [5] made a comprehensive
research institutions. An appropriate example of such analysis by comparing the first and the second approach
system is the Current Research Information System and revealed their positive and negative sides. A
(CRIS) that is based on the Common European Research combination of an expert group and the bibliometric
Information Format (CERIF) standard. indicators is probably the best approach for evaluation of
research results.
Construction of an information system is necessary for
efficient evaluation of the scientific-research data [3]. CERIF [6] is a standard that describes physical data
CRIS research management systems represents a good model and XML messages formats enabling
starting point for development of a system for evaluation interoperability between CRIS research management
of scientific results, because their metadata can be used for systems. Model CERIF [7] provides a representation and
evaluation of scientific-research results of researchers in exchange of a various types of scientific research data, and
accordance with national, regional and international can be expanded and adapted to different needs. Papers
rulebooks. Information about the authors, scientific [8], [9] describe an extension which uses a formalized
publications and their relations are already stored in CRIS Dublin Core to describe the published researcher results.
system, so there is no need to create the system again from In the paper [10] the CERIF compatible data model based
the scratch. Since the University of Novi Sad has a system on the MARC 21 format is presented. An example for
for storing data on scientific research publications named extension of the CERIF model for the needs of storage of
CRIS UNS, it is possible to carry out an extension of the the bibliometric indicators is presented in [11]. A CERIF-
existing system and create subsystem which would enable based schema for encoding of research impact in a
structured way is stated in [12]. The paper [13] describes
the CERIF data model extension which was created in

Page 102 of 478


ICIST 2014 - Vol. 1 Regular papers

order to satisfy the requirements of IST World portal (ist- overcome by using information support systems for
world.dfki.de). Examples of CERIF based CRIS systems evaluation that aims to institutionalize this process. The
are: SICRIS(sicris.izum.si), Cristin (www.cristin.no), main idea of using information system is that researchers
HunCRIS (nkr.info.omikk.bme.hu/HunCRIS_eng.htm), enter data about their scientific-research results by
Pure system of Royal Holloway (pure.rhul.ac.uk/portal), themselves and that some commission (group of experts)
IST Worl portal, etc. Usage of CRIS system for the evaluates those results according to some rulebook.
evaluation of researcher's results in Slovenia is described Creation of the mentioned information system should
in [14]. Authors of [15] present an approach how a journal allow the evaluation of scientific results by different
can be evaluated based on bibliometric indicators and the rulebooks.
CERIF data model. Furthermore, in [16] an example of Serbian rulebook set down the list of researcher's results
CRIS service for journals and journal articles evaluation is (entity types) that are subjected to evaluation. Entity types
given. that can be evaluated by Serbian rulebook are publications
In the paper [17] a CERIF compatible research (journals, conference proceeding, monographs, paper
management system CRIS UNS is presented, which can published listed publications and thesis and dissertations),
be accessed at http://www.cris.uns.ac.rs. Currently, the patents and technical solutions. A part of Serbian rulebook
system stores over 10500 records of scientific publications relays on principle that evaluation of individual
(papers published in journals, conference proceedings, publication can be accomplished trough evaluation of a
monographs, technical solutions and patents etc.). CRIS publication source. So, a paper published in journal,
UNS system is under development since 2008 at the conference proceeding or monograph is evaluated
University of Novi Sad in the Republic of Serbia. Former according to evaluation value of journal, conference or
development of that system covered implementation of the monograph (results source). The Rulebook prescribes the
system for entering metadata about scientific research classification of researcher's results into various types
results [18]. Later phases in the development of CRIS (results' types with appropriate codes) which are organised
UNS system included integration of various extensions in two hierarchical levels. That classification is used in
that relay on CERIF model. Extension of CERIF that evaluation of researcher's results and results sources.
incorporates a set of metadata required for storing theses Hereinafter, classification for the entity types journal and
and dissertations in the University of Novi Sad is defined paper published in journal will be explained. According
in [19]. In [20] a CRIS search profile based on SRU/W to rulebook, journal (result source) can initially be
standard was created that enables a unified and classified into two main types of results: International
semantically rich search of stored records. The paper [21] journals (code: M20) and National journals (code:M50).
propose a CERIF data model extension which is used as a Classification on lower level of hierarchy will be
basis for future development of the CRIS UNS module for explained for category International journal. International
evaluation of scientific research results. System for journal is further subdivided into four types of results:
modelling evaluation rulebooks which is a subject of this Leading journal of international importance (code:M21),
paper strongly relays on that model. The model itself was Outstanding journal of international importance
changed in a certain manner and it will be explained in (code:M22), Journal of international importance
details further in the paper. (code:M22), Specially verified international journal
(code:M24). All researcher's results are categorised
III. EVALUATION ACCORDING TO THE SERBIAN regarding the researcher's role in foundation of that
RULEBOOK scientific-research result. Currently, there are two distinct
role: Author and Editor (applicable to publications) of
The process of evaluation of scientific research is results. If we observe journal classified as M22, the
prescribed by law in the Republic of Serbia. Evaluation authority of the paper in that journal is categorised as
process is based on RuleBook on Procedure and Aspects result type Paper published in the outstanding
of Evaluation and Quantitative Expression of Scientific- international journal (code:M22). Editorial of the journal
Research Results (www.mpn.gov.rs/images/content/nauka of M22, shell be categorised as Editorial of outstanding
/pravna_akta/PRAVILNIK_o_zvanjima.pdf) which in international journal (code:M28).
2008 was regulated by the Ministry of Education, Science
and Technological Development. The Ministry is using In the Serbian rulebook quantitative measure is
the previously mentioned Rulebook for funding scientific prescribed for every type of result in compliance to group
programs in Serbia. Also, many scientific institutions use of sciences to which the researcher belongs. There are
the same rulebook for election of researchers into three science groups: Mathematics and Natural Sciences
appropriate teaching and science positions. (SG1), Technical and Technological Sciences (SG2),
Social Sciences (SG2). Table 1 presents a part of
At the University of Novi Sad a modified version of the hierarchy of types of results with Quantitative measures.
previously mentioned Rulebook of the Ministry is applied.
Researchers at the University are obligated to Serbian rulebook by its structure is not a unique
independently evaluate their own performance and to example of rulebook for evaluating researcher's data. In
submit lists of their scientific results that are categorized the following text some similar rulebooks will be
according to the stated rulebook. In that manner, possible presented. Ministry of Civil Affairs of Bosnia and
outcome may be that the same publication is evaluated Herzegovina in 2012 presented a rulebook which
differently by different co-authors of the paper. Problem coincides to the large extent with the Serbian rulebook.
with the different categorizing of the same results can be Values of entities: entity types, results' types and science
explained by stating that the researchers are not familiar groups of Bosnia and Herzegovina rulebook can be
with the rulebook for evaluation and that they often lack mapped to Serbian rulebook. Values of codes for results'
complete information which is necessary for the types are different but quantitative measures are alike.
evaluation process. The previous problem can be Montenegro has similar rulebook as Serbia, with the

Page 103 of 478


ICIST 2014 - Vol. 1 Regular papers

TABLE I.
A PART OF RESULTS' TYPES CLASSIFICATION IN SERBIAN RULEBOOK
Quantitative measure for Science
Results' type Code group
SG1 SG2 SG3
Papers published in scientific journals of international importance M20
Paper published in leading journal of international importance M21 8 8 8
Paper published in outstanding journal of international importance M22 5 5 5
Paper published in journal of international importance M23 3 3 4
Paper published in journal of international importance verified by a special decision M24 3 3 4
Scientific criticism in outstanding journal of international importance M25 1.5 1.5 1.5
Scientific criticism in journal of international importance M26 1 1 1
Editorial of outstanding international journal M27 3 3 3
Editorial of international journal M28 2 2 2

differences that there are no codes for results' types and  Enabling hierarchical linking of rulebooks.
there is no science groups for which researcher's results Hierarchical linking should be provided regarding
can get quantitative value (only one general science the fact that a number of scientific institutions are
group). Also, Montenegro's rulebook includes two basing their rulebooks on the rulebook of the
additional researcher roles: mentor and committee Ministry.
member. Ministry of Science, Education and Sports of  Create a classification of types of results and allow
Croatia in 2013 presented a rulebook that sets down defining their hierarchical linking.
additional values for researcher's roles (committee
member, coordinator, collaborator...) and entity types  Separate the results' types classification from the
(projects, plenary lectures...). Science groups in that rulebook and organize the classification in a
rulebook are more refined, with the total number of seven. manner that they belong to a certain results' group.
Also, every science group has its own set of results' types This is justified by the fact that a large number of
with their distinct codes. rulebooks does not have its own classification of
scientific-research results, so the classification is
IV. SYSTEM FOR MODELLING RULEBOOKS BASED ON adopted (identical or modified) from other
rulebooks. The advantage of this organization is to
THE CERIF DATA MODEL EXTENSION
define classification of results' types only once
The main purpose of the research presented in this (sometimes over 100 results' types may be
paper was to develop an extension for modelling defined), and to use that classification for all
rulebooks in the existing scientific-research information different rulebooks which support it (by selecting a
system CRIS UNS. Data model for representing rulebooks particular results' group, the rulebook shall take
was developed as an extension of the CERIF model. classification from that group).
Relaying on CERIF will enable potential interoperability  Create classifications of scientific disciplines/
and simple implementation of proposed extension within groups as independent entities in the information
all those systems that support CERIF model. system (regardless the Rulebook).
Implementation of that model will be explained in the
Section (Software architecture and implementation).  Enable defining of quantitative values for results'
types within a specific scientific discipline/group.
Data model and system architecture are represented by
UML (Unified Modelling Language) diagrams. The  Define researchers' role classification and the
CASE tool PowerDesigner was used for modelling. Data classification of entity types as independent units
model is represented with physical diagram, while of the information systems and allow linking of the
system’s architecture is represented by deployment and rulebooks to the appropriate roles (creation of lists
component diagrams. for the researchers' roles which are supported by
this rulebook) and the entity types (creation of lists
A. Requirements specification for entity types which are supported within the
Based on the analysis of the Rulebook of the Republic rulebook in focus).
of Serbia, requirements of the Evaluation Commissions  Enable defining of sets of allowed values of results'
and the authors' experiences in the implementation of types for the supported entity types.
information systems, a list of demands that must be met in  Provide mapping of different results' types
the system implementation is stated. The list is established regarding the role of researchers and types of
to support the definition of all rulebooks which are very entities. The mapping should also take into account
similar to the rulebook of Republic of Serbia in their the possibility of determining the results' type in
structure: accordance to the source of the scientific results
 Enable defining of the basic rulebook information (for publication).
(code, name, short description, start and end date of  Supporting multilingualism for all entities that are
rulebooks' application). prescribed in the Rulebook.
 Enable linking of the physical document that
contains the original text of the Rulebook to its B. Expanding the CERIF model for the needs of storing
electronic representation in the information system. data about rulebooks
The initial idea of modelling a rulebook as an extension
of the CERIF model, is taken from the paper [21]. An

Page 104 of 478


ICIST 2014 - Vol. 1 Regular papers

expansion of the CERIF model is defined, in a manner to


match the specification requirements defined in the
previous section. In addition, the model parts will be
explained in more details:
 The entity RuleBook holds the basic information
about the rulebook. Multilingual input values for
the name and a brief description of the rulebook are
supported with the entities RuleBookName and
RuleBookDescr.
 CERIF entity cfMedium is used to present the
digital document that contains the original text of
the Rulebook. Binding the rulebook with the digital
document is done via the RuleBook_Medium
entity.
 CERIF semantic layer has been used to define the
disciplines / groups, the researchers' roles, entity
types that are subject of evaluation, results' types
and results' groups. For each of these entities a
scheme is defined (an instance of the entity
cfClassScheme) and classes which represent the
corresponding values of the mentioned entities
(entity instance cfClass). The link between the
group and the results' types which belong to the
group and also the hierarchical organization of the
results are achieved by using CERIF entity
cfClass_Class. Multilingual input values for the
classification is achieved by using CERIF entity
cfTerm.
 RuleBook_Class entity is defined for the purposes
of classification of the rulebooks. Linking of the
rulebooks with the certain results' group is Figure 1. Physical diagram of extended CERIF model.
achieved by this entity.
 Entities RuleBook_ResultsType, Diagram contains entities depicted with different border
RuleBook_EntityType and style. Thin line is used to indicate the original components
RuleBook_ResearchersRole will contain inherited from the CERIF data model, while the bolded
information on the classification of types of results, line components are those which are added for needs of
entity types and the researchers' role in the frame of the proposed model and do not exist in the CERIF data
the rulebook (creation of list's classifications which model.
are supported by the Rulebook).
C. Software architecture and implementation
 A list of quantitative measures for various types of
results in different science groups is defined trough The application’s architecture is a multi-layer client-
entity ResultsTypeMeasure. server, based on a set of open-source software
components, mostly written in Java programming
 EntityResultsType entity shall hold all permitted language. CRIS UNS system for entering metadata about
combinations of allowed values of results' types for scientific research results uses MARC 21 format [10] to
the supported entity types. store some metadata information. The system for
 Determination of the results' types classification in modelling rulebooks is integrated within the existing
relation to the researchers' role is provided by the CRIS UNS system for management of scientific research
entity ResultsTypeMapping. Classification is data by extending some existing components described in
determined regarding: the researcher's role in the [18]. UML deployment and component diagram of the
forming of the scientific-research results (e.g. system are shown respectively on Figure 2 and Figure 3.
author, editor ..), type of entity (e.g. journal, paper), Thin line is used to indicate the original components of the
source entity type (e.g. journal), classification for CRIS UNS system, while the dashed line is used to
the observed entity type (e.g. paper published in indicate the modified components.
the leading journal of international importance, Client-Web Browser. A standard web browser can be
outstanding paper published in the journal of used as a client application. The communication between
international importance...) and the observed client and server-side is done via HTTP.
source entity type (leading journal of international Application Server component describes server side of
importance, outstanding journal of international application. Interface component was modified by
importance...). creating new xhtml (web pages) and ManegedBean
The extension of the CERIF model for defining the instances (Java classes which control execution of
rulebook within CRIS research management system, is application) for the purpose of enabling browse, create,
presented at Figure 1. update, view and delete function of all entities necessary
for the modelling of the rulebooks. Text server was

Page 105 of 478


ICIST 2014 - Vol. 1 Regular papers

and translations. The same form but in different operation


mode is reused for actions view and edit record data
(create/read/update form - CRU form). All values for
records are entered in Serbian script.

Figure 2. System architecture deployment diagram Figure 4. Interface for managing Sciences groups.

Figure 3. System architecture component diagram

changed to include rules for searching and retrieving


metadata of rulebooks. In component DTO&MARC21, Figure 5. Form for create/read/update of Sciences group.
new Java classes (Data Transfer Objects - DTOs, object
representations of a MARC 21 records, converters DTO The interface for managing results' groups is similar to
to/from MARC21) that support representation of the interface of science groups. The CRU form of science
rulebooks and accompanying classifications (sciences group differs from the other CRU classification forms,
groups, results' groups, results' types, ...) are created. because it contains an addition for managing results' types.
Database manger component was modified to include Management of results' types is only possible trough the
Java classes that enable CRUD (create, read, update, management of results' groups. To depict hierarchical
delete) operations for extension of CERIF model are organisation of results' types all records are represented in
added. the tree interface component.
Database-MySQL DBMS. Database was altered by Management of rulebook can be viewed in Figure 6.
creating new tables which will store rulebook data. Rulebooks that are created on the basis of other rulebook
From this point further, the implemented system related are presented as his child elements. A part of CRU form
to entering data about rulebooks is described in more for entering basic data is shown on Figure 7. The user is
details. The part of system for storing classification data obligated to enter unique code, name, dates, results' group
for sciences groups, results' groups and results' types will and to define the main language of textual attributes.
also be presented. The other parts of the implemented Optional values are description, translations and attaching
system, which are related to researcher’s roles and entity digital document containing the rulebook text. Principle
types are similar to the part associated with sciences for managing all advance data can be shown on example
groups. for quantitative measures for various types of results in
different science groups (Figure 8). Panel Quantitative
Figure 4 presents the interface for managing science
measures of Results types contains button Adding new
groups, where all records are presented in the table. User
entities, Save changes in the table and a table. Columns of
can enter a new record by clicking on the button , delete
the table represent the entities (different classification)
the selected record by clicking , view record detail
which are combined in the creation of the table row. Every
information or edit record data . Clicking the button
row has an action button remove . Form for adding new
opens the form for adding a science group (Figure 5).
records in the table is designed to contain combination of
The user is obligated to enter unique classification code
elements that represent columns in the table.
and name and to define the main language of classification
textual attributes. Optional values are description, dates

Page 106 of 478


ICIST 2014 - Vol. 1 Regular papers

Figure 6. Interface for managing rulebooks.


[3] J. Russell and R. Rousseau, “Bibliometrics and institutional
evaluation,” Encycl. Syst. EOLSS Part1, 2002.
[4] J. Bar-Ilan, “Informetrics at the beginning of the 21st century—A
review,” J. Informetr., vol. 2, no. 1, pp. 1–52, Jan. 2008.
[5] J. Iivari, “Expert evaluation vs bibliometric evaluation:
experiences from Finland,” Eur. J. Inf. Syst., vol. 17, no. 2, pp.
169–173, Apr. 2008.
[6] A. Asserson, K. Jeffery, and A. Lopatenko, “CERIF: past, present
and future: an overview,” in Proceedings of the 6th International
Conference on Current Research Information Systems, University
of Kassel, Kassel, 2002, pp. 33–40.
[7] J. Dvořák and B. Jörg, “CERIF 1.5 XML - Data Exchange Format
Specification,” 2013, p. 16.
Figure 7. Part of CRU rulebook form for entering basic data [8] K. G. Jeffery, “An architecture for grey literature in a R&D
context,” Int. J. Grey Lit., vol. 1, no. 2, pp. 64–72, 2000.
[9] K. Jeffery, A. Asserson, J. Revheim, and H. Konupek, “CRIS,
grey literature and the knowledge society,” in Proceedings CRIS-
2000, Helsinki ftp://ftp. cordis.
lu/pub/cris2000/docs/jeffery_fulltext. pdf, 2000.
[10] D. Ivanovic, D. Surla, and Z. Konjovic, “CERIF compatible data
model based on MARC 21 format,” Electron. Libr., vol. 29, no. 1,
pp. 52–70, 2011.
[11] S. Nikolić, V. Penca, and D. Ivanović, “STORING OF
BIBLIOMETRIC INDICATORS IN CERIF DATA MODEL,”
presented at the ICIST 2013 - International Conference on Internet
Society Technology, Kopaonik mountain resort, Republic of
Figure 8. Example of rulebook’s advanced data Serbia, 2013.
[12] R. Gartner, M. Cox, and K. Jeffery, “A CERIF-based schema for
I. CONCLUSION recording research impact,” Electron. Libr., vol. 31, no. 4, pp.
465–482, 2013.
Efficient evaluation of the scientific-research data is
[13] B. Jörg, J. Ferlež, E. Grabczewski, and M. Jermol, “IST World:
only possible when evaluation process is relaying on an European RTD information and service portal,” in Proceedings of
information system. The process of evaluation of the 8th International Conference on Current Research Information
researcher’s results is usually in accordance with some Systems, Bergen, 2006.
rulebook. So, it is essential to enable modelling of [14] T. Seljak and A. Bošnjak, “Researchers’ bibliographies in
rulebooks inside the information system. COBISS.SI,” Inf. Serv. Use, vol. 26, no. 4, pp. 303–308, 2006.
The article presents a system for storing and managing [15] D. Ivanovic, D. Surla, and M. Rackovic, “Journal evaluation based
rulebooks for evaluation of researcher’s scientific-research on bibliometric indicators and the CERIF data model,” Comput.
Sci. Inf. Syst., vol. 9, no. 2, pp. 791–811, 2012.
results. The system is based on the extension of CERIF
[16] S. Nikolić, V. Penca, D. Ivanović, D. Surla, and Z. Konjović,
data model, and it is integrated in the existing CRIS UNS “CRIS service for journals and journal articles evaluation,” in
system for storing scientific-research data in the Proceedings of the 11th International Conference on Current
University of Novi Sad. Potential interoperability with all Research Information Systems, Prague, Czech Republic, 2012, pp.
CERIF like research management systems can be a benefit 323–332.
when modelling rulebook within CERIF model. [17] D. Surla, D. Ivanovic, and Z. Konjovic, “Development of the
software system CRIS UNS,” in Proceedings of the 11th
International Symposium on Intelligent Systems and Informatics
ACKNOWLEDGMENT (SISY), Subotica, 2013, pp. 111–116.
Results presented in this paper are part of the research [18] D. Ivanovic, G. Milosavljevic, B. Milosavljevic, and D. Surla, “A
conducted within the Grant No. III-47003, Ministry of CERIF-compatible research management system based on the
Science and Technological Development of the Republic MARC 21 format,” Program Electron. Libr. Inf. Syst., vol. 44, no.
of Serbia. 3, pp. 229–251, 2010.
[19] L. Ivanovic, D. Ivanovic, and D. Surla, “A data model of theses
and dissertations compatible with CERIF, Dublin Core and EDT-
REFERENCES MS,” Online Inf. Rev., vol. 36, no. 4, pp. 548–567, 2012.
[1] Committee for the Evaluation of Research - CIVR, “Guidelines [20] V. Penca, S. Nikolić, D. Ivanović, Z. Konjović, and D. Surla,
for Research Evaluation,” Ministry of University and Research “SRU/W Based CRIS Systems Search Profile,” Program Electron.
(MIUR), Jan. 2006. Libr. Inf. Syst., 2014.
[2] E. Zimmerman, “CRIS-Cross: current research information [21] D. Ivanović, D. Surla, and M. Racković, “A CERIF data model
systems at a crossroads,” in Proceedings of the 6th International extension for evaluation and quantitative expression of scientific
Conference on Current Research Information Systems, University research results,” Scientometrics, vol. 86, no. 1, pp. 155–172, Apr.
of Kassel, Kassel, 2002, pp. 11–20. 2010

Page 107 of 478


ICIST 2014 - Vol. 1 Regular papers

SRU/W service for CRIS UNS system


Valentin Penca * , Siniša Nikolić * , Dragan Ivanović *
* University of Novi Sad/Faculty of Technical Sciences/Department of Co mputing and Automatics , Nov i Sad, Serb ia
{valentin_penca,sinisa_nikolic, chenejac}@uns.ac.rs

Abstract—. This paper describes an independent, modular Apache OpenOffice started a new Bibliographic project
software component that enables search and retrieval of (OooBib) [9]. The bibliographic project will design and
scientific research data from CRIS system in accordance to build an easy to use and comprehensive bibliographic
S RU/W standard. The component is implemented as an facility within OpenOffice. It will be easy to use for the
extension of existing CRIS UNS system of University of Novi casual user, but will meet all the requirements of the
S ad. professional and academic writer. The new bibliographic
facility will utilise the latest open standards and will make
the fullest use of emerging XM L, XSLT and SRU/W
technologies.
I. INT RODUCTION SRU (Search and Retrieve via URL) 2.0 has been
approved as a standard by the Organization for the
The development of science has been accelerated with Advancement of Structured Information Standards
the appearance of information systems for managing data. (OASIS) [10].
Standardization of such systems is very important. System
that contains information on publications , events, research The paper [11] states that in the CRIS systems the
products, researchers and research institutions would have search functionality is often neglected, which certainly
to be based on generally accepted standards. An reduces their usefulness. CRIS systems contain a large
amount of data that can be interpreted differently in
appropriate examp le of such system is the Current
individual CRIS systems. The problem of simu ltaneously
Research Information System (CRIS) [1] that is based on
information searching in several such systems is clearly
the Common European Research Information Format
reviled. EuroCRIS proposes an introduction of CERIF as
(CERIF) [2] standard. CERIF standard represents the
an uniform standard to overcome the problem of data
physical data model [3] and provides the exchange of
search. It is emphasized that the most effective method to
XML messages between communicating systems [4].
Today, any respectable scientific research institutions search CRIS system should be based on metadata.
should use some form of CRIS system. In [12] a list of records for CRIS systems is given, for
which is necessary to implement the functionality of the
CRIS can include various scientific research data from
search. The CRIS systems based on the CERIF standard
digital libraries, institutional repositories (IR) and other
should allow search of: researchers, organizational units
research information systems. It is evidently that CRIS
systems store diverse types of scientific data, so it is (institutions), projects, publications, products, patents,
necessary to provide an efficient search mechanism which equipment, events (conferences, workshops etc) and
should be based on a standard. The standard should sponsors.
provide independence of search process from scientific
research data models. II. CRIS UNS
The necessity for a standardization of the search is In year 2009, at the Faculty of Sciences and the Faculty
described in a series of papers . The most widely used of Technical Sciences a development of information
standard in the area of search of digital contents is system for managing scientific research data from
certainly Z39.50 [5]. One of the many examp les of using University of Novi Sad (CRIS UNS) [13] was started. The
Z39.50 standard is described in [6], where it is used for first phase of development of this system was modeling,
searching and connecting of the Iranian libraries. and data entry of published scientific results. Testing and
verification on the results of researcher from Faculty of
However Z39.50 has certain drawbacks that new
Sciences was performed. At the beginning of 2013, a total
generation standard SRU is trying to overcome. SRU
number of records was over 73000.
standard is trying to keep functionality defined with
Z39.50 and to provide its implementation using currently The system for search of scientific research data is
available Internet technologies. One of the main integrated within the existing system for management of
advantages of SRU co mpare to Z39.50 is possibility of scientific research data. Integration of these systems is
SRU to exchange XML messages which is not allowed by achieved by modifying the existing and adding new
Z39.50. components. The motivation for this research was to:
Papers such as [7] and [8] confirm the use of the  provide public access and search for data from
standard SRU/W. Zarić has described a client application institutions/organizations, researchers and
that has an ability to connect to remote servers by Z39.50 published scientific results within the
or SRU/W protocols and to simultaneously search data of University of Novi Sad.
remote servers.

Page 108 of 478


ICIST 2014 - Vol. 1 Regular papers

 use common search standards to make DTO & MARC 21: DTO & MARC 21 is a component
precondition for interoperability with similar which provides conversions between DTO objects and

Client
Application/Search Server
Web browser
DTO & MARC21
<<HTTP>> Database

<<JDBC>> MySQL DMBS


Apache Tomcat Interface Database
Client SRU/SRW

<<HTTP>>

SRU/SRW Interface SRU/W Server Search mediator Text server


Side
<<SOAP>>

Figure 1 - CRIS UNS Architect ure


systems that contain scientific research data. object representations of a MARC 21 [17] record. DTO
Figure 1 shows the architecture of the system using the (Data Transfer Object) are objects that are used to
UML deployment diagram. Yellow color is used to show transport data between the application components.
the modified components, while the green colored During the search, the Mediator component uses
components are those added for the purpose of the search component DTO & MA RC 21 for conversion of results
system fro m MARC 21 object representations in correspondent
Client SRU/SRW - SRU/W Interface : Client-side DTO objects or XML representations (MARC XM L [20],
application is an external application (usually part of Dublin Core Extended XML [21]). If the search is
initiated by the Interface or SRU/W client, component
another system) which imp lements the client side of the
DTO & MARC 21 will not start loading content of DTO
SRU/W protocol version 2.0. Support of context sets CQL
objects from the database, yet it will only take their
version 1.2 [14] and the DC version 1.1 [15] is the
existing textual representations from Apache Lucene.
minimu m requirement for client side applications. Search
Therefore, performance of the system is improved by
of the all available records of the system is poss ible if the
client side application supports CRIS profile [16]. The avoiding a completely loading of records stored in the
communication between client applications and server- database.
side applications is done via HTTP or SOAP.
III. SRU/W ST ANDARD
SRU/W Server Side: This component executes the
server side of SRU/W protocol version 2.0. The SRU/W search is based on indexes that describe
Search/Retrieve, Scan and Explain services are different search resources. Unambiguous of search
implemented. Server side of SRU/W accepts e.g. semantics and syntax, the search query is defined by CQL
Search/Retrieve queries via SOAP. Afterwards, SRU/W (Contextual Query Language) [22] and with context sets
Server Side component invokes Search Mediator that are organized in SRU/W profiles. CQL is a co mmon
component, which will process the CQL query. SRU/W query language for searching resources where the context
server side forwards the search results provided by the sets are concepts which define allowed entities in a CQL
Search Mediator to the appropriate SRU/W client. query.
Search Мediator : This component allows to search the SRU standard has two different implementations in
system by using the CQL query language. XML which the first one can search and retrieve data by sending
description of the context sets and specification of the messages via HTTP GET and POST methods (SRU) and
CQL language are used for verification of the CQL query the other one is using the SOAP protocol (SRW) for the
(correct syntax and semantics of a query). The system exchange of messages. Basic difference between SRU and
uses XML representation and JAXB (Java Architecture SRW version is in a manner of sending messages [23].
for XM L Binding) [18] library to make object form fro m SRW version of the protocol packaged messages in a
the context sets. CQL query is transformed into a query SOAP Envelope element, while the SRU protocol version
language of Text Server co mponent and then the defines the message in the principle of parameter/value
transformed query is executed in Text Server. Search pairs where the parameter/value are included in URL
Mediator component accepts and process user's CQL address. Another difference between the two versions is
query from SRU/W Server Side and Interface that the SRU protocol is using only the HTTP protocol for
components. Afterwards, search results are converted in transmission of messages, while SRW besides the HTTP
accordance with the component which has initiated the protocol can use the SSH (Secure Shell) and the SMTP
call of Search Mediator component. Search Mediator (Simp le Mail Transfer Protocol) protocol.
component use converters that are defined in DTO & Unlike the Z39.50 standard which defines 11 different
MARC21 co mponents. services, the SRU standard defines three services
Text server : Text Server is a component based on (operations), since it was observed and concluded that
Apache Lucene [19] library for text searching and only a certain number of services defined by the Z39.50
indexing of scientific research data. Component defines standard were actually used in practice. Services e defined
searchable indexes for scientific research data from by SRU standards are:
Database.

Page 109 of 478


ICIST 2014 - Vol. 1 Regular papers

 SearchRetrieve. - service that is responsible for using JavaBeans (top-down development).


search and retrieval of data, where client sends This WSDL document could be obtained from
SearchRetrieveRequest and gets another developer, a system architect, a UDDI
SearchRetrieveResponse message as an registry, or you could write it yourself.
answer from Server. In this paper, top-down development is chosen for
 Scan - service that allows the client to get all creating JAX-WS web service. WSDL is a XML
values from particular index. Also if document that is used for describing web service
supportable by the server, one more optional elements, operations and structures that are used in
feature is possible. For each value of the index communication. WSDL document is consisted of seven
the number of hits obtained during the search elements: types, message, operation, portType,
of that index can be given. Messages that are binding , port, service.
used in communication are the scanRequest Element types defined in WSDL specification is used
and scanResponse. to describe data types or structures that are used in
 Exp lain - service that allows client to send message exchange process. Since, the document/literal
explainReqest in a manner to find out wrapped pattern of the message is selected, each message
information about standard details which are is represented by the corresponding XML element
supported by the server in a form of specified within the correspondent XML schema. Schema
scanResponse message. defines six elements, for every SRU/W service (operation)
In CRIS UNS, only transport mechanism that involves two elements respectively, one for request message and
use of SOAP Protocol is implemented. another for response message. Therefore, the following
messages are defined:
IV. JAXWS  searchRetrieveRequest
Java API for XM L Web Services (JAX-WS) [24] is one  s earchRetrieveResponse,
of the sets of Java technologies used to develop Web  scanRequest
services. JAX-WS is a new programming model that
simplifies application development through support of a  scanResponse
standard, annotation-based model to develop Web Service  explainRequest
applications and clients. JAX-WS belongs to what Sun  explainResponse .
Microsystems calls the "core Web services" group. Like
most of the core groups, JAX-WS is typically used in Element binding is used to define a protocol that will
conjunction with other technologies. Those other be used for message exchanging and for defining format
technologies may also come from the core Web services and message encoding. For
group (JAXB, for examp le). JAX-WS represents remote searchRetrieveOperation operation, whose binding
procedure calls or messages using XML-based protocols element is shown in Listing 1, the HTTP protocol was
such as SOAP, but hides SOAP's innate complexity selected to transport the SOAP messages.
behind a Java-based API. Developers use this API to <binding name="SRW-SoapBinding"type="SRWPort">
define methods, then code one or more classes to <soap:binding style="document"
implement those methods and leave the communication transport="http://schemas.xmlsoap/soap/http"/>
details to the underlying JAX-WS API. <operation name="SearchRetrieveOperation">
<soap:operationsoapAction=""style="document"/>
JAX-WS technology has been selected for the <input> <soap:body use="literal"/> </input>
development of web services in the CRIS UNS for the <output> <soap:body use="literal"/> </output>
following reasons: </operation>
 JAX-WS is one of the leading technologies for ...
</binding>
development of web services based on the
Java programming language Listing 1 - SearchRetrieveOperation binding
 To follow a good practice in the development of Element portType is consistsed from set of operations
CRIS UNS on open source technologies based that are represented with an element operation,
on the Java programming language whereby for each operation are defined the input and the
There are two approaches for developing web services output parameters. On Listing 2 is presented a part of
by using JAX-WS technology: XML document that describes operation supported by
 Developing a JAX-WS Web service from a SRU/W standard.
JavaBean (bottom-up development). When <portType name="SRWPort">
developing a JAX-WS Web service starting <operation name="SearchRetrieveOperation">
fro m JavaBeans, a bean that already exists can <inputmessage="SearchRetrieveRequestMessage"/>
<outputmessage="SearchRetrieveResponseMessage"
used to enable the implementation for JAX- /> </operation> <operation
WS Web services. The use of annotations name="ScanOperation"> <input
simplifies the enabling of a bean for Web message="ScanRequestMessage"/> <output
services. It is not required to develop a WSDL message="ScanResponseMessage"/> </operation>
file because the use of annotations can provide <operation name="ExplainOperation"> <input
all WSDL information necessary to configure message="ExplainRequestMessage"/> <output
message="ExplainResponseMessage"/>
the service endpoint or the client. </operation>
 Another approach is to create a JAX-WS Web
service, but now with an existing WSDL file Listing 2 - portType operation

Page 110 of 478


ICIST 2014 - Vol. 1 Regular papers

Element service defines the physical location of the access endpoints imp lementing the service.
web service. On Listing 3 is shown an examp le of the web
service definition that is installed on the local computer. The wsdl2java co mmand automates the generation of this
code. It also provides options for generating starting point
<?xml version="1.0" encoding="UTF-8"?>
<definitions .....name="SRW"> ... code for our implementation and an ant based makefile to
<service name="SRWSampleService"> build the application. The wsdl2java provides a number
<port name="SRW" binding="SRW-SoapBinding"> of arguments for controlling the generated code. In Table
<soap:addresslocation="http://localhost:8080/" 1 are presented the classes which are commonly generated
/> </port> <port name="ExplainSOAP"
binding="Explain-SoapBinding"> based on the information fro m the WSDL.
<soap:addresslocation="http://localhost:8080/"
/> </port> </service> </definitions>

Listing 3 - Service location File Description


Apache CXF [25] is chosen for implementation of a The SEI class. This file
web service from WSDL. After creating a WSDL file, contains the java interface
with CXF library developing process of a JAX-WS
portTypeName.java that service implements.
service is divided into three steps:
This file should not be
1. Generate starting point code. edited.
2. Implement the service's operations.
3. Publish the implemented service. The endpoint class. This
file contains the Java class
Generate starting point code . JA X-WS specifies a serviceName.java which clients will use to
detailed mapping from a service defined in WSDL to the make requests on the
Java classes that will imp lement that service. The logical service.
interface, defined by the wsdl:portType element, is
mapped to a Service Endpoint Interface (SEI). Any The skeleton
complex types defined in the WSDL are mapped into Java implementation class. This
portTypeNameImpl.ja class needs to be modified
classes following the mapping defined by the Java va
Architecture for XML Binding (JAXB) specification. The with concrete
endpoint defined by the wsdl:service element is also implementation of service
generated into a Java class that is used by consumers to Table 1 - WSDL generated classes

SRW Port

+ searchRetrieveOperation (SearchRetrieveRequestType body) : SearchRetrieveResponseType


+ scanOperation (scanRequest body) : ScanResponseType
+ explainOperation (ExplainRequestType body) : ExplainResponseType

SRWPortImpl

+ <<Implement>> searchRetrieveOperation (SearchRetrieveRequestType body) : SearchRetrieveResponseType


+ <<Implement>> scanOperation (scanRequest body) : ScanResponseType
+ <<Implement>> explainOperation (ExplainRequestType body) : ExplainResponseType
1..1 0..1

SRWValidator

+ validateSRWRequest (Object request) : int

0..*
0..1

SRW Exception

+ generateExeception (int exceptionCode) : String

Figure 2 - SRU/W Server Side

Page 111 of 478


ICIST 2014 - Vol. 1 Regular papers

Implementation and publishing of web services are the record there is a special element (record) that represents
steps that a developer should do by changing previously a particular record. Every single record has <recordData>
generated classes. where their sub-element in accordance with CRIS profile
[16]. Sub-elements are related to the concrete data from
V. SRU/W SERVICE records. For examp le, the title of the records is located in
the element <dc:tit le> fro m Dublin Core context set which
Class diagram of SRU/W server-side is given on the
is a part of the CRIS profile. There is a xml schema for
Figure 2.Generated classes are part of the SRU/W Server
records within the <recordSchema> element (e.g.
Side architecture. Class SRW Port is a Java interface
<recordSchema>info:srw/schema/1/dc-
which contains only declaration (prototype) of a web
v1.1</recordSchema>). SRU/W request and response are
service functions. Implementation of service business both located in the SOAP Envelope section of the SOAP
logic is located in class SRW Port Implementation, message.
where particular SRU/W service (searchRetrive, scan,
explain) is as aimplemented separate function. Whether <?xml version="1.0" ?>
the requirements of clients are in accordance with the <soapenv:Envelope
xmlns:soapenv="http://schemas.xmlsoap.org/soap
SRU/W standard, is checked by the class SRWValidator. /envelope/">
Depending on the request type and disagreement with the <searchRetrieveResponse>
SRU/W standard, SRWException class generates an <version>2.0</version>
<numberOfRecords>78</numberOfRecords>
appropriate message. <records>
<record>
Co mmunication between clients and web service in CRIS <recordSchema>info:srw/schema/1/dc-
UNS system is done by exchanging SOAP messages . One v1.1</recordSchema>
of the most common scenarios is where the client sends a <recordData>
valid CQL query inserted as a parameter of <dc:title>Student Service Software System,
version 2.0</dc:title>
searchRetrieveRequest element as shown in Listing <dc:creator>Rackovi&#263; Miloon;</dc:creator>
4.CQL query defines a request for records which a word Rackovic Milos
―service‖ is contained in record titles. In SOAP request <dc:creator>&Scaron;krbi&#263;
message a version of SRU/W protocol (2.0) is also Sr&#273;an</dc:creator><br/> Škrbić Srđan
<dc:creator>Pupovac
specified. Parameter <maximumRecords> is maximu m Biljana</dc:creator><br/>Pupovac Biljana
number of records which client can get in a response. <dc:creator>Bodroon;ki
Elements startRecord defines starting point in record &#381;arko</dc:creator><br/>&#160;&#160;&#160<
dc:publisher>, Novi Sad, Serbia</dc:publisher>
result set (for example, if the value of the element is 3, the <dc:date>2007</dc:date>
client wants to obtain the records , starting with the third <dc:type>Text</dc:type>
record from the set of results). It is obviously a limit <dc:identifier>http://www.cris.uns.ac.rs/recor
where startRecord must be less than or equal to the d.jsf?recordId=6535</dc:identifier>
<dc:language>English</dc:language>
value maximumRecords (startRecord <= <recordData>
maximumRecords). </record>
<record>
<?xml version="1.0" ?> <recordSchema>http://srw.cris.uns.ac.rs/contex
<soapenv:Envelope tSets/CRIS/1.0</recordSchema>
xmlns:soapenv="http://schemas.xmlsoap.org/soap <recordPacking>xml</recordPacking>
/envelope/"> <recordData>
<searchRetrieveRequest <srw_dc:dc>
xmlns="http://www.loc.gov/zing/srw/" <dc:title>CRIS service for journals
xmlns:ns2="http://www.loc.gov/zing/cql/xcql/" and journal articles evaluation</dc:title>
xmlns:ns3="http://www.loc.gov/zing/srw/diagnos <dc:date>2011</dc:date>
tic/"> </srw_dc>
<query>dc.title=&quot;service&quot;</query> <srw_cris:cris>
<version>2.0</version> <cris:type>JournalArticle</cris:type>
<startRecord>1</startRecord> <cris:firstAuthor>Sinisa Nikolic
<maximumRecords>5</maximumRecords> </cris:firstAuthor>
<recordPacking>xml</recordPacking> <cris:abstract>This paper...</cris:abstract>
</searchRetrieveRequest> ...
</soapenv:Body> </srw_cris:cris>
</soapenv:Envelope> </recordData>
</operation> </record>
</records>
Listing 4 - SRU/W SOAP request </searchRetrieveResponse>
Listing 5 outlines a response SOAP message. As it is </soapenv:Body>
expected, the complete response is located in separated </soapenv:Envelope>
<searchRetrieveResponse> element. As a part of the </operation>
response, protocol version (2.0) on the server and the total Listing 5 - SRU/W SOAP response
number of records (<numberOfRecords>78</
numberOfRecords> ) for a processed query are set. Main
part of the response is records element which represents
all records in accordance with the query. Also within the

Page 112 of 478


ICIST 2014 - Vol. 1 Regular papers

VI. CONCLUSION Koha, Greenstone and Fedora,‖ Program Electron. Libr. Inf. Syst.,
vol. 45, no. 2, pp. 231–239, 2011.
In this article is presented a service for information [8] M. Zaric, D. B. Krsticev, and D. Surla, ―Multitarget/multiprotocol
retrieval for data from scientific research domain based on client application for search and retrieval of bibliographic
SRU/W standard. records,‖ Electron. Libr., vol. 30, no. 3, pp. 351–366, 2012.
The implementation continues the good practice of the [9] ―Open Office Bibliographic project.‖ [Online]. Available:
CRIS UNS [26] by using only open source technology. http://www.openoffice.org/bibliographic/srw.html. [Accessed: 18-
Jan-2014].
SRU/W search service is modular and allows that
particular components can be used and implemented in [10] ―OASIS | Advancing open standards for the information society.‖
[Online]. Available: https://www.oasis-open.org/. [Accessed: 18-
different ways, therefore it is possible to: Jan-2014].
 Develop the external applications for search, [11] W. Sander-Beuermann, M. Nebel, and W. Adamczak, ―Searching
which would be based on the SRU/W Library the CRISses,‖ Maribor, Slovenia, 2008.
standard. [12] K. G. Jeffery, ―CRIS Architectures For Int eroperation,‖ Viena,
Nov. 2007.
 Simp le change of the text server, which would
[13] ―Current Research Information System of University of Novi
not affect the component SRU/W server side. Sad.‖ [Online]. Available: http://www.cris.uns.ac.rs/. [Accessed:
 Simu ltaneous search of more Text Servers, 18-Jan-2014].
whose implementation and the physical [14] ―CQL Context Set, version 1.2 - SRU Version 1.2 Specifications
location can be arbitrary. In this case, it is only (SRU: Search/Retrieval via URL -- SRU, CQL and ZeeRex,
necessary to define the appropriate mapping of Standards, Library of Congress).‖ [Online]. Available:
http://www.loc.gov/standards/sruBob/resources/cql-context-set-
CQL language into language of a Text Server. v1-2.html. [Accessed: 18-Jan-2014].
 Make potential interoperability with various [15] "Dublin Core Context Set Version 1.1 (SRU: Search/Retrieval via
library systems since the SRU/W is de facto URL -- SRU, CQL and ZeeRex, Standards, Library of Congress).‖
standard in these systems. [Online]. Available:
http://www.loc.gov/standards/sru/cql/contextSets/dc-context-
In the future, it is planned to enable search of scientific set.html. [Accessed: 18-Jan-2014].
research data, using RESTful web service [27] [16] V. Penca, S. Nikolić, D. Ivanović, Z. Konjović, and D. Surla,
standards. ―SRU/W Based CRIS Systems Search Profile,‖ Program Electron.
Libr. Inf. Syst., 2014 in press.
A CKNOWLEDGMENT [17] ―MARC 21 Standard.‖ [Online]. Available: www.loc.gov/marc/.
[Accessed: 18-Jan-2014].
Results presented in this paper are part of the research [18] ―JAXB Reference Implementation — Project Kenai.‖ [Online].
conducted within the Grant No. III-47003, M inistry of Available: https://jaxb.java.net/. [Accessed: 18-Jan-2014].
Science and Technological Development of the Republic [19] ―Apache Lucene.‖ [Online]. Available: http://lucene.apache.org/.
of Serbia. [Accessed: 18-Jan-2014].
[20] ―MARCXML: The MARC 21 XML Schema.‖ [Online].
REFERENCES Available:
http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd.
[1] euroCRIS | Current Research Information Systems| CRIS,‖ [Accessed: 18-Jan-2014].
euroCRIS. [Online]. Available: http://www.eurocris.org.
[Accessed: 18-Jan-2014]. [21] ―Dublic Core Extended XML.‖ [Online]. Available:
http://dublincore.org/schemas/xmls/qdc/2006/01/06/dc.xsd.
[2] ―Common European Research Information Format | CERIF.‖ [Accessed: 18-Jan-2014].
[Online]. Available:
http://www.eurocris.org/Index.php?page=CERIFintroduction&t=1 [22] ―CQL: the Contextual Query Language: Specifications (SRU:
. [Accessed: 18-Jan-2014]. Search/Retrieval via URL, Standards, Library of Congress).‖
[Online]. Available: http://www.loc.gov/standards/sru/cql/.
[3] B. Jörg, K. Jeffery, J. Dvorak, N. Houssos, A. Asserson, G. van [Accessed: 18-Jan-2014].
Grootel, R. Gartner, M. Cox, H. Rasmussen, T . Vestdam, L.
Strijbosch, V. Brasse, D. Zendulkova, T. Höllrigl, L. Valkovic, A. [23] E. L. Morgan, ―An Introduction to the Search/Retrieve URL
Engfer, M. Jägerhorn, M. Mahey, N. Brennan, M. A. Sicilia, I. Service (SRU),‖ Ariadne, no. 40, 2004.
Ruiz-Rube, D. Baker, K. Evans, A. Price, and M. Zielinski, [24] ―JAX-WS Reference Implementation — Project Kenai.‖ [Online].
CERIF 1.3 Full Data Model (FDM) Introduction and Available: https://jax-ws.java.net/. [Accessed: 18-Jan-2014].
Specification. 2012. [25] ―Apache CXF: An Open-Source Services Framework.‖ [Online].
[4] J. Dvořák and B. Jörg, ―CERIF 1.5 XML - Data Exchange Format Available: http://cxf.apache.org/. [Accessed: 18-Jan-2014].
Specification,‖ 2013, p. 16. [26] D. Ivanovic, G. Milosavljevic, B. Milosavljevic, and D. Surla, ―A
[5] A. N. S. I. National Information Standards Organization (U.S.), CERIF-compatible research management system based on the
Information retrieval (Z39.50): application service definition and MARC 21 format,‖ Program Electron. Libr. Inf. Syst., vol. 44, no.
protocol specification : an American national standard. Bethesda, 3, pp. 229–251, 2010.
Md.: NISO Press, 2003. [27] L. Richardson, RESTful web services. Farnham: O’Reilly, 2007.
[6] M. A. Hafezi, ―Interoperability between library software: a
solution for Iranian libraries,‖ Electron. Libr., vol. 26, no. 5, pp. .
726–734, 2008.
[7] K. T. Anuradha, R. Sivakaminathan, and P. A. Kumar, ― Open-
source tools for enhancing full-text searching of OPACs: Use of

Page 113 of 478


ICIST 2014 - Vol. 1 Regular papers

Development and implementation of the public


electronic service for managing open
competitions for government grants: Case study
Autonomous Province of Vojvodina
Milan Paroški*, Vesna Popović*, Zora Konjović**
* Government of the AP of Vojvodina/Office for Joint Affairs of Provincial Bodies, Novi Sad, Republic of Serbia
** Faculty of Technical Sciences/Department of Computing and Control, Novi Sad, Republic of Serbia

[email protected]; [email protected]; [email protected]

Abstract— Grants (non-repayable funds) are disbursed by Many strategic documents paved the road for e-
provincial bodies in Autonomous Province Vojvodina to Government development in Europe [19-22].
diverse recipients: local governments, vulnerable social The key components that should be considered in e-
groups, national communities, religious communities, Government evaluation are [7, 24]: ICT and network
nonprofit entities, educational institutions, churches, small infrastructure, enabling environment, information and
and medium-sized businesses, or to individual citizens. communication ecology, organizational capacity, human
Provincial administrative authorities grant funds for capital, ICT prevalence of use, and web presence. The e-
specific purposes by publishing calls for proposals. The Government Index presents a more inclusive and less
purpose of this paper was to present results achieved and subjective measure of a country’s e-Government
experiences gained in development and implementation of environment. It incorporates a country’s official online
an electronic service facilitating non-repayable funds presence, evaluates its telecommunications infrastructure
management process in a way to provide for horizontal and assesses its human development capacity." [18].
integration of fund allocations in different sectors.
Therefore, the paper presents the context of the B. Specifics of the Autonomous Province of Vojvodina
development, the main functional and non-functional In this sub section we present very briefly social, legal
features, and deployment of the government grants and organizational aspects affecting development and
management electronic service serving both the applicants deployment of the presented service.
and the provincial government institutions. Republic of Serbia exists from 2006, including two
autonomous provinces: Vojvodina and Kosovo.
Disintegration of former Yugoslavia and wars that took
I. INTRODUCTION place in 90ties caused obvious and huge devastations in
In this introductory part of the paper we present the economy and all other areas in Serbia. Refugees,
context of the service development considering e- unemployment, poverty, illiteracy are only a few of many
Government concept and basic aims, but also specifics of serious obstacles and challenges that hamper development
Autonomous Province of Vojvodina that affect in many areas, including e-Government.
development of the presented service. Republic of Serbia, which is classified as a middle-
income transition economy [1], is actively dedicated to
A. The e-Government concept and basic aims economic development and substantial reforms in order to
The concept of e-Government was defined variously by advance from transition economy to the developed status
countries, academic institutions, international where e-Government development plays an important
organizations, etc. Definition of e-Government depends role. Development of e-Government in the Republic of
also of the viewpoint (technical, social, economic Serbia and APV was enabled by adoption of many laws,
development...) [5-8]. regulations and strategic documents [23]. Following the
"Simply put, 'electronic' or e-Government is the governments worldwide that have been making significant
systemic use of information and communication attempts to improve the efficiency and effectiveness by
technologies (ICTs) to support the functions that a making government services and information available on
government performs for its constituents, typically the the Internet, Serbia, for only two year period (2010-2012),
provision of information and services." [7]. advanced 30 positions to arrive at 51st in the world
rankings for implementing e-Government portal
Development of e-Government encompasses several “eUprava” (eng. e-Government) in accordance with a
stages or phases, from web presence to seamless or fully “one-stop-shop” principle [25].
integrated web services [9-15].
Autonomus Province of Vojvodina (APV) is one of the
So far, a variety of frameworks, approaches and two autonomous provinces in Serbia characterized by an
recommendations to assist in overcoming challenges and outstanding national and cultural diversity. In accordance
pave the road for developing a successful e-Government with Serbian Constitution, the Statute of the APV
are published [6-7, 16-18]. It can be concluded that guarantees national equality, multiculturalism and
interoperability and open standards are the necessary interculturalism.
prerequisites for development of e-Government.

Page 114 of 478


ICIST 2014 - Vol. 1 Regular papers

The Government of the APV is the bearer of the provincial bodies, and for development of the e-
executive powers in the Province and accounts to the Government projects at provincial level. Basic
Assembly for its work. characteristics of implemented e-Government system at
The Provincial administration is independent and provincial level are: interoperability, safety, openness,
performs affairs within its competencies in accordance flexibility and scalability [26]. Considering historical
with laws, the Statute and Provincial Assembly decisions inheritance and economic obstacles, slow progress can be
and accounts for its work to the Provincial Government. justified, but IT Sector gave maximum effort with respect
to these circumstances.

II. E-GOVERNMENT IMPLEMENTATION IN THE


III. ELECTRONIC SERVICE ECOMPETITIONS
AUTONOMOUS PROVINCE OF VOJVODINA
Government of APV makes efforts to contribute to A. Grants allocation in Autonomous Province
human, social and economic development in the Province Vojvodina
by using ICT to improve government services in order to
achieve effects of the use of ICT on various aspects of One of the main responsibilities and goals of provincial
economic and all other kinds of development [2-4]. Government and administration is to plan, direct and
foster development of the Province in all areas. Thus, on
Following the example of other countries and available an annual basis, provincial bodies provide financial
recommendations for e-Government development, but resources for disbursing grants (non-repayable funds) to
considering all specificities, obstacles and inheritance of diverse recipients.
the past, the Government of APV have adopted e-
Government implementation plan. In Vojvodina, the Grants are disbursed by provincial bodies to diverse
developing of e-Government was partially conducted recipients: local self-governments, nonprofit entities,
through the Program eVojvodina and partly through the educational institutions, churches, religious communities,
implementation of the Strategy of eAdministration of small and medium-sized businesses, or to individual
Provincial Authorities Action Plan. The main goal of citizens. Provincial administrative authorities grant funds
provincial level e-Government strategy in Vojvodina is to for specific purposes by publishing requests for proposals.
implement software support for activities of provincial Applicants write and submit proposals. Usually, some
bodies and to establish all relevant online public electronic level of compliance with competition terms is required.
services needed. Economic Development Program After evaluation and assessment of applications by
(Integrated Regional Development Program - IRDP) of previously defined criteria, reviewers make decision on
APV for the period 2004-2008 was prepared with the the allocation of funds. The provincial body (grant maker
support of the German organization Deutsche Gesellschaft - funder) signs contracts with selected applicants. Contract
für Technische Zusammenarbeit (GTZ). One of the 14 realization monitoring and auditing is based on report
priority IRDP projects adopted by the Assembly of APV analysis. These procedures are very similar at different
in 2004 was the eVojvodina program, carried out by the provincial bodies.
Provincial Secretariat for Science and Technological In order to improve the efficiency, effectiveness and
Development. eVojvodina was a pilot of program transparency of the fund allocation process, it was
budgeting. The global economic crisis has significantly necessary to establish a secure common multilingual
affected the economy in Serbia, and for the electronic public service of the provincial government,
implementation of the Program eVojvodina, as well as which horizontally integrates funds allocations to citizens
many other activities, there was not enough funding. and business across different sectors.
Ending with 2012th year, budget funds allocated for the Main goals and expected benefits of introducing
Program eVojvodina were almost three times smaller than electronic public service were:
initially planned amount. Funding and delivery of • Reduction of the possibility of corruption;
Program eVojvodina completed at the end of 2012. A
number of strategic documents, infrastructure projects, • Transparency of the entire allocation process;
software systems, electronic services and trainings in the • Control of purposeful spending of allocated funds;
field of information technology were carried out within • Publishing of a “black list” (list of negative
the framework of eVojvodina, which has laid the references) of applicants who have not submitted
foundations for the further development of e-Government reports about spending of allocated funds (i.e. a list
and Information Society in the APV [26]. of applicants who did not meet the contractual
On the basis of decision on “Strategy of Provincial obligations).
Administration Reform and Development” adopted by the
Assembly of the APV, the Executive Council of the APV B. The project eCompetitions
enacted decision on “Strategy of eAdministration of Within the framework of the “Strategy of
Provincial Authorities” in 2007. This strategic document eAdministration of Provincial Authorities” Action Plan,
is in accordance with republic and European standards. Executive Council of the APV signed a contract with the
One of the basic goals of this document is to improve the Faculty of Technical Sciences (The University of Novi
quality and availability of information and services Sad) on the development of the project “Model of a
provided to users by provincial civil servants. software system for reporting and monitoring the
Office for Joint Affairs of Provincial Bodies is competitions for grants provided by provincial
responsible for implementation, maintenance and authorities” [27]. The project was completed in September
development of e-Government projects according to 2009.
“Strategy of eAdministration of Provincial Authorities” Methodology used for project development relies on the
and Program eVojvodina. Sector for Information following basic principles:
Technologies (IT Sector) is responsible for development
and maintenance of common ICT infrastructure of

Page 115 of 478


ICIST 2014 - Vol. 1 Regular papers

(1) Compliance with current standards that apply to The project proposed implementation based on the open
design and implementation of complex software source software components. This project also specified
systems; detailed technical specifications for the procurement of a
software system eCompetitions.
(2) Capability of seamless integration into e-
Government system of APV and Republic of Serbia. Project is based on fundamental principles (scalability,
interoperability, use of open standards, etc.) and fully
compliant with the latest ICT trends.
Project defined model of a software system includes the
functional aspects, structural aspects, and behavioral C. Implementation of the software system eCompetitions
aspects of the business process. Model building was Due to its complexity, the software system
carried out through the following activities: eCompetitions was implemented in two phases. Office for
• Surveying the current state; Joint Affairs of Provincial Bodies carried out open
• Determining the functional requirements; procurement procedure for the implementation of a
• Determining of non-functional properties; software system based on eCompetitions project as an
• Specification of model in UML. integral part of the tender requirements, at the end of
Survey of the current state provides information on the 2009. The contract was signed with the company
computer and communication equipment and application „PROZONE” (Novi Sad). The first phase of
software that were used as IT support for fund allocation implementation was conducted during 2010th and the
processes by provincial bodies before the introduction of a second phase was carried out in the 2011.
software system eCompetitions. Data were collected from Software support for making the decision on
organizational units that implement the procedure of funds competition announcement, the preparation and
allocation through verbal interviews with employees and publication of the call for proposals and receiving and
through predefined questionnaires. Survey encompassed processing of applications was implemented in the first
also relevant organization structure, business processes phase. Implementation comprises:
and document architecture. The survey was conducted in • model and CRUD (Create Read Update Delete)
all provincial bodies. On the basis on survey results, forms,
general assessment of the current situation was that the • process diagrams,
computer and communications equipment is sufficient for • module for user management and access control,
service implementation: each employee participating in • module for announcement of open competition,
the process of fund allocation has a computer with • module for submission to the open contest.
satisfactory characteristics, local computer network of
provincial bodies and access to standard Internet services The second phase comprises implementation of the
was already set up (in the framework of Program support for:
eVojvodina), and all administrative employees have • evaluation of applications submitted for the
attended the training and gained ECDL - European competition,
Computer Driving Licence certificate (in the framework • decision-making,
of “Strategy of eAdministration of Provincial • contracting,
Authorities”). • monitoring of grant implementation.
In order to specify functional requirements an analysis Introduction, testing and adjustment of software system
was performed that identifies, defines, systematically were realized in the first half of the 2011. Also in the
describes and unifies actors, activities and documents second phase was implemented integration with existing
appearing in heterogeneous business processes aimed at software systems of provincial bodies. Training for users
reporting and monitoring competitions for grants that are and administrators was carried out afterwards.
carried out separately by provincial authorities. Core Verification (simulation of selected examples of
functional requirements were determined based on business practices) was done by trained employees -
contract requirements and legislation relating to the provincial officials from the Provincial Secretariat for
allocation of funds. Detailed identification of requirements Education, Administration and National Communities.
was carried out based on interviews with competent
employees and paper forms used in relevant business D. Basic functional features
processes in different organizational units. System eCompetitions provides software support for a
The project also encompasses analysis of the following secure common multilingual electronic public service of
non-functional characteristics: interoperability, software the provincial government.
architecture, data management, user interaction and Software system eCompetitions supports the entire life
security. cycle of competition for funds in the provincial bodies: the
The standard language for object modeling - Unified entire process of preparing and publishing call for
Modeling Language (UML) was used. UML model is proposals, receiving, checking and replenishing the
generated in Sybase PowerDesigner format. applications, processing of submitted documents,
establishment of commissions (reviewer selection) and
The project has defined a software model (UML) for defining criteria for evaluation, automatic selection of the
managing submissions, decision making, payment and best bids, making bidding decisions, informing applicants,
reports in competitions for award of funds, implemented preparation and signing of grant contracts and electronic
in practice by provincial administrative bodies and monitoring of contract execution, disbursement of funds,
organizations. The model includes the functional aspects, submission and control of periodic and final reports,
structural aspects, and dynamic aspects of the business archiving, as well as closing the competitions (for award
processes. The dynamic model was constructed to express of funds). In addition, system eCompetitions provides
and model the behavior of the system. The Use Case provincial bodies (grant makers - funders), with
Models describe the proposed functionality of a new information about the competitions (calls for proposals)
system. The project also defined communication schema that particular applicant has applied so far, a list of
between system components.

Page 116 of 478


ICIST 2014 - Vol. 1 Regular papers

negative references and a list of applicants who did not System architecture (Figure 1) encompasses following
meet the contractual obligations. components:
Software system eCompetitions, from the moment of • Web server (IIS or Apache, in the demilitarized zone
public announcement of call for proposals, through all - DMZ) with module (Enhydra Conductor) for
stages until the archiving, allows monitoring of the communication with application servers.
procedure status over the Internet. It provides electronic • Java application server (Enhydra Application Server,
services aimed at information exchange with applicants, in [28]) support for robust, multi-layered web
particular assistance powered submissions, as well as applications.
application status tracking. • Client layer - HTML, displayed in a web browser.
• Application solution based on open source
The system enables multilingual support, archiving, technologies supports document and business
defining roles and right accesses for various users and
creating of various reports. Digital signature feature process management.
• Workflow Engine (Enhydra Shark, [29]) that fully
guarantees signer authenticity and integrity of electronic
documents. The system allows for electronic capture of complies the WfMC (Workflow Management
Coalition, [30]) specifications and XPDL 1.0
documents, their electronic distribution, full monitoring of
workflows defined in accordance with all relevant standard (XML Process Definition Language [31]).
• Database server.
business rules, the verification and control of documents • Document repository (file) server - supports multiple
according to the rights of authorized users of the system,
protocols for handling documents (local file system,
central archiving and efficient search of the archived the Universal Naming Convention - UNC protocol).
documents.
• Tool for modeling business process definitions
So far the system eCompetitions is integrated with: (Enhydra JaWE, [32]) - the graphical workflow
• Portal of public services of APV editor which fully meets the WfMC specification.
• Joint post managing office of provincial bodies • Tool Agents that automatically perform certain
• Software system of Provincial Secretariat of Finance actions (sending e-mail, filling contents of the
(treasury payments) document, etc.).
• Internal PKI system of provincial bodies (electronic
signature implementation) .
• Web service of the National Bank of Serbia.
E. Software architecture and nonfunctional features
Software system eCompetitions consists of two
applications: internal (back office), that is intended for the
users employed in the provincial administration, and the
external (front office) for applicants (citizens, small and
medium enterprises, non-profit organizations, local
governments, educational institutions, churches, religious
communities, and the other institutions).
Software architecture and nonfunctional features are as
follows:
• Multi-tier Web architecture;
• Multiplatform server-side (Windows and Unix);
• Database management system independence; runs on
PostgreSQL, MySQL, Microsoft SQL Server and
Oracle (current deployment is MS SQL Server);
• Multilingual user interface enabled by UTF8
encoding which allows use of scripts of all languages
in official use in Vojvodina (Serbian – Cyrillic and Figure 1. eCompetitions: system architecture
Latin, Hungarian, Slovak, Romanian, Ruthenian and
Croatian). Integration of document management system and
• Information security enabled by the secure client - business process management system is done in two ways:
server connection and data exchange through the by calling method in a Java library, as well as iFrame
HTTPS protocol, using SSL encryption; Web server integration of document management module in the
certificate issued by the Serbian Post Certification application, using WebDAV (Web-based Distributed
Authority; Authoring and Versioning, [33]) and XML (Extensible
• Server and client side applications based on open- Markup Language, [34]) standards.
source technologies; Java standard for multilingual support i18n (Java
• Interoperability with other platforms enabled by Internationalization, [35]) is used for the implementation
open standards. of this solution.
Both applications (back office and front office) are Database independence was accomplished using
implemented as Java web applications based on multi-tier Enhydra DODS project (Professional Open Source
architecture and open source Java technologies. The Workflow, Groupware and Document Management, [36]),
middle tier (web and application server) integrates the user which allows working with different databases without
interface (client) with the data management (database) changing the application source code (only with changes
system. The Client tier is a web browser that processes in the application configuration file).
and displays HTML resources, issues HTML requests, Server and client side applications based on open-
processes the responses and interacts with the web server source technologies Enhydra Shark (workflow engine),
using standard protocols. Enhydra JaWE (graphical workflow editor), Spring,
Hibernate, Enhydra DODS, WebDAV servlet, XMLC,

Page 117 of 478


ICIST 2014 - Vol. 1 Regular papers

Jasper Reports, DWR, JOTM (Transaction Management), important and complex cross-sector multilingual
Log4J (logging), XMLBeans (working with XML electronic public service.
documents), Apache Lucene, Enhydra Snapper (full text Based on the strategic recommendations related to ICT
search), Enhydra application server (based on Apache policy and standards and annual operating action plans for
Tomcat), Enhydra Director (module that allows working e-Government implementation in Vojvodina,
with a cluster of application servers). contemporary methodologies accompanied with
Interoperability with other systems is achieved through international and domestic good practices related to the
open XML standards and web services. Web services and development of electronic services were used for the
Simple Object Access Protocol (SOAP [37]) provide project design and implementation of the multilingual
access to the document management functions and provincial electronic public service eCompetitions.
business processes management from external System eCompetitions provides software support for a
applications. Hence, it is possible to exchange data with secure common multilingual electronic public service of
other systems, import documents into the system, or to the provincial government, which horizontally integrates
export the necessary information and documents from the different sector areas in the field of fund allocation.
system. Improved efficiency in processing applications and in
decision-making contribute to the more dedicated and
IV. DEPLOYMENT OF THE ECOMPETITIONS SERVICE purposeful allocation of funds. The system provides
So far the eCompetitions service was deployed in two transparency of the work of provincial officials
Provincial secretariats of the Autonomous Province responsible for processing applications and allocation of
Vojvodina, Provincial Secretariat for Education, funds and a list of negative references (i.e. a list of
Administration and National Communities and Provincial applicants who did not meet the contractual obligations).
Secretariat for Urban Planning, Construction and The main general conclusion from our experience is
Environmental Protection. that development process consisting of comprehensive
Since 2011, Provincial Secretariat for Education, project matching previously specified strategic framework
Administration and National Communities carried out 15 for e-Government development and strict management of
open competitions for funds using the eCompetitions implementation action plans can result with development
system. Provincial Secretariat for Urban Planning, and deployment of the complex electronic service within
Construction and Environmental Protection used reasonable time limits, even under severe financial
eCompetitions system in 2012 to conduct four open constraints.
competitions for funds. In addition, we opine that the system eCompetitions has
Depending on the type of competition, the number of a capacity to contribute significantly to government and
online submissions was between 30 and 1,000 per public administration thrift by providing support for
competition. At the present time (2013), the system efficient monitoring of contractual obligations’
eCompetitions encompasses 1459 registered legal entities fulfillments and funds’ spendings.
and 88 authorized employees registered for online access. Even though validation of the adoption, financial
Current deployment of the service in two Provincial effects and benefits gained by the deployment of this
secretariats has demonstrated its usability by showing that electronic service have not been studied yet sufficiently,
two administrative entities in charge with two quite deployment of the electronic public service eCompetitions
diverse government sectors were able to carry out done so far indicates that this service can contribute
successfully their competitions in accordance with their significantly to the modernization of the provincial
local practice. In both secretariats significant administration in the framework of information society
improvements are reported concerning quality of the development in the Republic of Serbia and APV.
supporting documentation and decisions on funding Experiences gained will be used in the process of
allocations. introduction of new electronic services in the APV within
Full disaster recovery of server environment, which the framework “Strategy of eAdministration of Provincial
applies to all applications and public services within the Authorities with the Action Plan until 2015” which is “on
information system of Government of APV, is applied to the anvil”.
eCompetitions service. Implementation of the
Information Security Management System (ISMS) in
compliance with ISO 27001 Standard is in progress. REFERENCES
All provincial administrative bodies that carry out the [1] European Commission, "Economic and Financial Affairs", 1995-
allocation of funds (grants, subsidies, loans) are expected 2012, .last update: 30/10/2010, available at:
to start using the eCompetitions system during the 2013. http://ec.europa.eu/economy_finance/international/non_eu/western
-balkan/serbia_en.htm (accessed 14 Dec 2013).
We find as appropriate to mention here that Office for
[2] S. Papaioannou, and P. Dimelis, "Information technology as a
joint affairs of provincial bodies received the prestigious factor of economic development: Evidence from developed and
international ICT award "Diskobolos 2011" in the developing countries", Economics of Innovation and New
category "Social activities" for the software system Technology, vol. 16, pp. 179-194, 2007.
eCompetitions. System eCompetitions has also been [3] S. Qureshi, "Social and economic perspectives on the role of
nominated for the World Summit Award 2013 as the best information and communication technology for development",
e-Content example in e-Government & Open Data from Editorial Introduction, Information Technology for Development,
Serbia (http://www.wsis-award.org). vol. 15:1, pp. 1-3, 2009, DOI: 10.1002/itdj.20117, available at:
http://dx.doi.org/10.1002/itdj.20117 (accessed 5 Dec, 2013).
[4] K. Kriz, and S. Qureshi, "The role of policy in the relationship
V. CONCLUSIONS between ICT adoption and economic development: a comparative
analysis of Singapore and Malaysia", Policy, ICT Adoption, and
The purpose of this paper was to present achieved Economic Development, 2009, available at:
results and experiences gained in implementation of an http://www.globdev.org/files/proceedings2009/24-FINAL-Kriz-

Page 118 of 478


ICIST 2014 - Vol. 1 Regular papers

ICT Transformation Study2009 Nov 21.pdf, (accessed 10 Dec, http://faculty.kfupm.edu.sa/ARCH/rabee/publications_files/03Reff


2013). at_eGov.pdf (accessed 9 Dec 2013).
[5] S. T. Kim, "Converging e-Democracy and e-Government model [19] European Council, "Presidency Conclusions - Lisbon European
toward an evolutionary model of e-Governance: the case of South Council", 2000, available at:
Korea". Global e-Policy e-Government Institute, Sungkyunkwan http://www.consilium.europa.eu/ueDocs/cms_Data/docs/pressData
University, Seoul, South Korea, 2006, available at: /en/ec/00100-r1.en0.htm (accessed 14 Dec 2013).
http://www.apdip.net/projects/e- [20] European Union, "Summaries of EU legislation - eEurope 2002",
government/capblg/casestudies/Korea-Kim.pdf 2000, available at:
[6] Z. Fang, "E-Government in digital era: concept, practice, and http://europa.eu/legislation_summaries/information_society/strate
development", International Journal of The Computer, The gies/l24226a_en.htm (accessed 14 Dec 2013).
Internet and Management, Vol. 10 No.2, pp. 1-22, 2002, available [21] European Commission, "eEurope 2005 Action Plan", Europe’s
at: http://www.journal.au.edu/ijcim/2002/may02/article1.pdf Information Society Thematic Portal, 2002, available at:
(accessed 4 Dec, 2013). http://collection.europarchive.org/dnb/20070405123415/ec.europa.
[7] E. C. Budge, "An emerging model-of-use for developing countries eu/information_society/eeurope/2005/all_about/action_plan/index
- foundations of e-Government, digital opportunities for _en.htm (accessed 14 Dec 2013)
development", 2002, available at: [22] European Commission, "European Commission", the official
http://learnlink.aed.org/Publications/Sourcebook/chapter6/Foundat website of the European Commission, available at:
ions_egov_modelofuse.pdf (accessed 6 Dec, 2013). http://ec.europa.eu/index_en.htm (accessed 9 Dec 2013).
[8] T. A. Pardo, "Realizing the promise of digital government: it’s [23] Z. Babovic, et al. "Survey of e-government services in Serbia",
more than building a web site", Information Impact, 2000, Informatica, vol. 31, pp. 379–396, 2007, available at:
available at: http://www.informatica.si/PDF/31-4/04_Babovic-
http://www.netcaucus.org/books/egov2001/pdf/realizin.pdf Survey%20of%20eGovernment.pdf (accessed 9 Dec 2013).
(accessed 6 Dec 2013).
[24] R. O’Brien, and N. Redman, "E-government in Central Europe -
[9] C. Baum, and A. Di Maio, "Gartners four phases of e-government Rethinking public administration". A white paper from the
model", 2000, available at: Economist Intelligence Unit, 2004, available at:
http://www.gartner.com/DisplayDocument?id=317292. http://graphics.eiu.com/files/ad_pdfs/Central_Europe_egov.pdf
[10] Deloitte and Touche, "The citizen as customer", CMA Manage, (accessed 5 Dec, 2013).
vol. 74, pp. 58-58, 2001. [25] United Nations Department of Economic and Social Affairs,
[11] K. Layne, and J. Lee, "Developing fully functional e-government: "United Nations E-Government Survey 2012 - E-Government for
A four stage model", Gov. Inform. Quarterly, vol. 18, pp. 122-136, the People", United Nations, 2012, available at:
2001. http://unpan1.un.org/intradoc/groups/public/documents/un/unpan0
[12] J. S. Hiller, and F. Belanger, "Privacy Strategies for Electronic 48065.pdf (accessed 9 Dec, 2013).
Government", Rowman and Littlefield Publishers, North America, [26] M. Paroški, Z. Konjović, and D. Surla, "Implementation of e-
2001, pp. 162-198. government at the local level in underdeveloped countries: the
[13] M. J. Moon, "The evolution of E-government among case study of AP Vojvodina", Research paper: The Electronic
municipalities: Rhetoric or reality", Public Administration Rev., Library, vol. 31 iss: 1, pp. 99 – 118, 2013.
vol. 62, pp. 424-433, available at: [27] FTN (Faculty of Technical Sciences), "Model of a software
http://www.ingentaconnect.com/content/bpl/puar/2002/00000062/ system for reporting and monitoring the competitions for grants
00000004/art00006 (accessed 7 Dec 2013). provided by provincial authorities", The University of Novi Sad,
[14] K. Siau, and Y. Long, "Synthesizing e-government stage models-a 2009. (in Serbian), available at:
meta-synthesis based on meta-ethnography approach". Indus. http://www.uprava.vojvodina.gov.rs/eCompetitions/Project_eCom
Manage. Data Syst., vol. 105, pp. 443-458, 2005, available at: petitons_Serbian.pdf (accessed 4 Dec, 2013).
http://www.emeraldinsight.com/journals.htm?articleid=1501596& [28] Enhydra Application Server, http://www.together.at (accessed 4
show=html (accessed 7 Dec, 2013). Dec, 2013).
[15] S. Jayashree, and G. Marthandan, "Government to e-government [29] Enhydra Shark (TM) - Open Source Java native WfMC and OMG
to e-society", J. Applied Sci., vol. 10, no.19, pp. 2205-2210, 2010, compliant XPDL and BPMN Workflow,
available at: http://www.together.at/prod/workflow/tws (accessed 4 Dec, 2013).
http://scialert.net/fulltext/?doi=jas.2010.2205.2210&org=11#3087 [30] Workflow Management Coalition (WfMC), available at:
3_an (accessed 7 Dec, 2013). http://www.wfmc.org (accessed 4 Dec, 2013).
[16] P. Gottschalk, "E-Government interoperability: frameworks for [31] XML Process Definition Language (XPDL), available at:
aligned development". Chapter II in: E-Government Development http://www.xpdl.org (accessed 4 Dec, 2013).
and Diffusion: Inhibitors and Facilitators of Digital Democracy, [32] Enhydra JaWE - Open Source Java Swing graphical native WfMC
Dr. Sahu, Dr. Dwivedi, and Dr. Weerakkody, Eds. IGI Global XPDL and BPMN workflow editor,
publication, 2009, pp. 23-33, available at: http://www.together.at/prod/workflow/twe (accessed 4 Dec, 2013).
http://www.semicolon.no/Gottschalk%20Chapter%20E-
Government%20Interoperability.pdf (accessed 9 Dec, 2013). [33] Web-based Distributed Authoring and Versioning (WebDAV),
available at: http://webdav.org (accessed 4 Dec, 2013).
[17] Danish Board of Technology, "Open-source software - in e-
government, analysis and recommendations drawn up by a [34] Extensible Markup Language (XML), http://www.w3.org/XML
working group under the Danish Board of Technology", 2002, (accessed 4 Dec, 2013)
available at: [35] Java Internationalization, available at:
http://www.tekno.dk/pdf/projekter/p03_opensource_paper_english http://java.sun.com/javase/technologies/core/basic/intl (accessed 4
.pdf (accessed 9 Dec 2013). Dec, 2013).
[18] R. M. Reffat, "Developing a Successful e-Government", in the [36] Professional Open Source Workflow, Groupware and Document
proceedings of the Symposium on Management, http://dods.enhydra.org (accessed 4 Dec, 2013).
e-Government: Opportunities and Challenge, Muscat [37] Simple Object Access Protocol (SOAP), available at:
Municipality, Oman, 2003, pp. IV1-IV13, available at: http://www.w3.org/TR/soap (accessed 4 Dec, 2013).

Page 119 of 478


ICIST 2014 - Vol. 1 Regular papers

Effective tablet dashboard interface for


innovative pipelined multi-teacher lab practicing
management1
Oliver Vojinović, Vladimir Simić, Ivan Milentijević
University of Niš, Faculty of Electronic Engineering, Computer Science department, Niš, Serbia
{oliver,vladimir.simic,ivan.milentijevic}@elfak.ni.ac.rs

Abstract— Lab practicing as a common form of experiential dents’ answers are simpler, the process of grading students
learning is important in engineering education. With the can be done automatically, using some digital assessment
increase of number of students, it becomes a challenge to and grade annotating tool. When students’ answers or so-
achieve an efficient organization of lab practicing activities. lutions are more complex, the process of checking the
To address observed drawbacks of standard approaches, in correctness usually involves domain specific solutions [3].
this paper, a pipelined scheduling of timeslots is described With permanent demands to increase number of
and applied in multi-teacher classroom. Innovations in lab students in higher education, efficient organization of
practicing policy require new software tools that support teaching process, and lab practicing as a part of the
new policies and make them possible and effective. process, becomes more important.
Supporting tool for collaborative lab practicing
management was implemented on Android tablets and A. Traditional lab practicing organization
Google Drive Spreadsheet cloud platform. The tool was
designed with principles of dashboard interfaces design to Essence of lab exercises execution methodology in
be efficient for use in a dynamic lab environment. Effective most cases can be described with the same simple sce-
design solutions within limitations of spreadsheet platform nario. Typically, students are divided into groups of 10 to
were presented. The tool is used and monitored for two 20 and timeslots are scheduled for groups sequentially.
semesters. Errors detection and correction, as the main Assignments are administered in advance or on site at the
potential risk, were analyzed and it is shown that users beginning of the timeslot. The teacher responsibilities are
successfully solved with tablet interface all but one type of to manage administrative tasks, to provide instructions
error, which was corrected from web interface. Main and assistance on equipment, tools and the process, and to
findings confirmed that beside a number of potential risks help students in achieving correct solutions. Lab practices
and sources of errors, it is possible to build a dependent are by instructional design of the course usually graded,
application on spreadsheet platform for use in complex either during the timeslot or deferred. In traditional, or
environment of collaborative lab practicing management. human-to human approach [4] the teacher is responsible to
check student’s results and solution and grade the student.
In this standard approach, lab practicing classes are man-
I. INTRODUCTION aged by one teacher where paper and pen are often used to
Lab practicing, as a form of experiential learning in write down students’ grades, to write comments and addi-
blended or in traditional, non-distant learning environ- tional data. The traditional lab practicing involving imme-
ments, is widely accepted in many fields. In engineering, diate human-to-human assessment by the teacher should
lab practicing is of great importance to help students’ be planned carefully as its efficiency depends on many
adoption of new concepts and to be able to combine factors: the number of timeslots and students, the size of
theory with practice [1], [2]. Having an opportunity to the students groups, the complexity and difficulty of lab
experiment in presence of a teacher is especially useful for exercises, the preparedness of students etc.
students in introductory courses, where basic skills should For the sake of comparison, described approach will be
be developed in a proper manner. identified as Sequential, Single Teacher Lab Practicing
Lab practicing takes different places in different Policy (S-ST-LPP). The main problems affecting lab
courses. Designing lab practicing within a course, course practicing efficiency in this approach are:
authors decide about number and scope of exercises, Inefficient engagement of lab resources. Lab equip-
timeslot duration, presence and policies of reporting and ment, computers, or another apparatus are not fully
assessment, and combination of those decisions creates utilized during sequence of lab exercises timeslots.
variety of lab practicing sets. Student absence leads to unutilized laboratory workplace
Collecting the students’ results, answers and solutions for entire timeslot. Students that finish earlier or those that
as part of lab practicing activities can be performed in are insufficiently prepared to continue working, leave the
different ways. Depending on the nature of lab practicing, lab, thus making part of laboratory equipment not used
final solutions can take various forms. When written stu- until the arrival of following students groups.

1
The research presented in this paper was supported in part by the Serbian Ministry of Education, Science and
Technological Development [TR32012].

Page 120 of 478


ICIST 2014 - Vol. 1 Regular papers

Imbalanced needs for teacher engagement during a introduced to the lab. In that way, available working
timeslot. Due to the nature of lab practicing training and places inside the lab are rescheduled as fast as they are
assessment policies, the teacher was usually mostly unal- expected to be freed. Bar charts in Figure 1 represent the
located at the beginning of the lab class and gets percentage of occupied lab seats in time. It can be seen
overallocated towards the end of a class, often resulting that P-MT-LPP maintains high lab resource occupancy,
delay in start of the next timeslot, and/or students’ feel of except at the end of the day, when there is no new
unfair marking. For the same reason, students often have students. Furthermore, students start working in different
to wait longer for teacher attention. Severity of this prob- moments, implying that demands for teacher attention will
lem is proportional to number of students in a lab. be also spread in time more evenly.
Unfair treatment of students. Source of feeling of With the introduction of pipeline approach, it becomes
unfairness is that scheduled timeslot may be too short for possible to have students that begin and those that
some students to fully comprehend all aspects of given completed exercise in the same classroom. This sole fact
problem and to manage to reach complete solution. brings difficulties in managing lab practicing activities
S-ST-LPP is not suitable to deal with this issue, since using paper and pen, as it becomes very difficult to track
leaving students to work longer would prevent some student progress, and also to synchronize activities
students from the next timeslot to start, thus resulting in between many teachers present simultaneously in a lab.
new unfairness.
The solution to make P-MT-LPP possible is found in
designing and implementing of collaborative application
II. PIPELINED LAB PRACTICING ORGANIZATION
for managing lab practicing.
- VARYING THE INVARIABLE
Broad range of real life systems can be represented with
a set of elements that provide a service (service facilities),
and the number of input elements (customers) that use the
service. The problems of analysis and modeling of such
systems are established and considered by queuing theory
[5]. Queuing theory considers different system character-
istics or objectives such as the arrival rate, the service rate,
the number of service facilities, employed queue disci-
plines, mean waiting times etc. In respect to queuing
theory, the classroom could be considered as a service
facility while the students could be treated as customers.
Students arrival rate describes how often students enter the
classroom and service rate determines the speed of lab
completion. In order to provide more efficient lab
practicing activities, basic knowledge from queuing theory
can be applied to the lab practicing activities organization.
To improve lab practicing efficiency with the new
approach, the student arrival rate and the number of Figure 1. Scheduling of student arrivals to lab practicing classes, a.
teachers are changed. Instead of scheduling common sequential, b. pipelined approach
arrival times for all seats in a laboratory, arrival times are
spread in a pipeline manner. With the goal of obtaining
optimal lab resources utilization, we expected to find a III. DESIGN REQUIREMENTS
suitable correlation between the students arrival rate and
lab resources availability. To achieve this it was necessary To deal with the challenges of the newly introduced
to reference some earlier obtained data to get an average P-MT-LPP approach and to overcome limitations of paper
rate of students’ lab completion and lab resources and pen, it was necessary to introduce software tool that
availability. Approximated completion rate in earlier lab supports work in multi teacher environment.
practicing classes can be used to fine tune student arrival The design of the management application and the plat-
times. Having in mind the characteristics of students’ form choice has to be carefully planned to ensure that
activities in the classroom, the increase in the number of several critical aspects are satisfied: Usability - the appli-
teachers should bring better service rate in the periods of cation on the chosen device screen has to provide
burst activities. With these two modifications, we refer to information and to allow editing data for currently active
this new approach as Pipelined, Multi-Teacher Lab students with minimum of interaction, e.g. scrolling,
Practicing Policy (P-MT-LPP). opening and closing of dialogs etc, interface should pro-
S-ST-LPP and P-MT-LPP are illustrated in Figure 1. vide means for searching and sorting of data.
Bolded numbers at the beginning of timeslots TS1-4, Performance has to be acceptable because data are ac-
represent appointed number of students in the given time cessed and modified frequently, in a dynamic
slot and subsequent not bolded numbers are numbers of environment. Robustness – the teacher needs to be able to
students that remain in the lab after defined time intervals. easily correct errors in data. Battery life – working with
In S-ST-LLP (Figure 1.a.), lab workplaces occupancy large groups of students and mobility inside the lab re-
decreases during a timeslot, leaving equipment unused quires that the portable device needs to provide long
until beginning of the next timeslot. Pipelined schedule enough battery life. Network connection and
approach is shown in Figure 1.b. In this approach, after synchronous work – multiple teachers collaborate in a
the time interval within timeslot, when some of students lab, demanding that data have to be synchronized between
already left the lab, additional students are scheduled and instances of the interface.

Page 121 of 478


ICIST 2014 - Vol. 1 Regular papers

Teachers were involved in all aspects of the design of B. Implementation


the user interface as well as in identification of required Dashboard interface of the lab practicing management
data. Data are classified depending on frequency of application was implemented using Google Drive Sheets
changes during the course of lab practicing classes. platform. Application was tested and used on Android
Constant data are determined once, in the lab planning. tablets Asus Transformer TF-700 with 10.1“ 1920x1200
Example of constant data is student personal data, pixel display, running Google Drive Android application,
including name, ID, and scheduled arrival and finish version 1.1.592.10. and latter updates. Although notebook
times. There are data that change between classes, e.g. computers were also considered for purpose of managing
student profile – data related to previous student’s students in a lab, tablets were chosen as they fit better in
performance during lab practicing cycle or in other dynamic environment of lab practicing and in mobility
activities, thus easing assessment. Another example of level required for teachers.
such data are lab exercise #. ID of the workplace, as well
as grades and comments are data that are changed by the Google Drive Sheets is a free cloud platform, supports
teacher at least once per class. Student grades are inserted essential programming elements (formulas, conditional
only once, while comments are modified several times formatting, scripting), as well as simultaneous
during the class. Data also include instrumentation data collaboration and editing at the document cell level, using
that are collected by the application and can be used for variant of WEBDAV standard protocol. Supported
performance analysis. features qualified the platform for development of
management application. Due to cloud nature of the
Working with teachers, several aspects of additional platform, an active internet connection over Wi-Fi has to
useful functions emerged. The time students spend be available.
waiting for teachers attention may not be taken on account
of the student and timer needs to be stopped during those According to the previous defined data model and
periods. The interface should provide assistance to runtime requirements, spreadsheet document was ex-
teachers prioritizing students that need to be visited, when tended with conditional formatting statements to control
multiple students ask for assistance or identifying students visual aspects of dynamic data presentation, formulas for
that should be visited even if they do not ask. Interface has simpler data manipulation, JavaScript code for complex
to be flexible and with only advisory role, all decisions data manipulation and control. In total, for implementation
must be left to teachers. Student - teacher communication of the interface for one sheet, 332 lines of JavaScript code,
is structured in advance with checkpoints defined during 11 different formulas, two different conditional formatting
the class. The application interface needs to provide statements, and one protected region was used. One sheet
multiple communication points with possibility to insert was created for each class/week in lab practicing cycle,
and edit separate comments and grades. By implementing with necessary modifications in formula, and one
this feature, all students obtain an equal attention of the additional hidden sheet for storing of program variables.
teacher, and early checkpoints help the teacher to identify Design of visual elements was driven by specific attrib-
students with problems in early phases of the class. utes of spreadsheet interface, but also with limitations of
the platform, i.e. current feature set implemented in An-
IV. DESIGN AND IMPLEMENTATION droid Google Drive application [14]. For example,
sparklines feature is implemented in Drive application for
A. Spreadsheet platform for dashboard interfaces web, but not supported in Android application. For that
Designing the application, principles of dashboard reason, marks history was implemented as numeric string,
interfaces design [6] were adopted, bringing together rele- rather than as a sparkline. Column filtering and searching
vant information for teacher to perceive student profile, is not implemented in Android Drive application and
progress history and current status. Choosing dashboard sorting is of very limited usability. Alternative solutions
approach makes the interface efficient during dynamic for implementation of numerous tasks (i.e. searching for
interactions in lab practicing management. Relevant in- active students, searching for student asking for marking)
formation need to be arranged to fit mostly on a single needed to be developed.
screen and to be monitored clearly. The design, the data model and the final visual appear-
Spreadsheets are considered not suitable for building ance of the application were adjusted to specific needs of
dependable applications, due to multiple risk factors, particular course. Implementation was tailored for lab
design limitations, performance issues, etc. [7]. Despite all practicing cycle where two checkpoints, i.e. subtasks,
evident flaws, spreadsheets became practically ubiquitous named A and B are required during each class. Only
in specific fields such as financial reporting [8], and es- passing mark on subtask A qualifies the student to start
tablished themselves as widely-used platform for subtask B. Student is not allowed to work on the next lab
prototyping [9], development of applications by end-users exercise until passes previous one. Official timeslot for a
[10], [11], and for rapid and low-cost development of student is 90 minutes of which 22 minutes for subtask A.
applications for small user base. Having in mind those Periods when the student is waiting for a teacher are not
common uses, and being fully aware of specific design taken into account.
principles of spreadsheet applications, spreadsheet plat- The application interface for one lab practicing class is
form was chosen for designing the system for lab practices shown in Figure 2. Interface is divided in sections by col-
administration and marking. Although spreadsheet appli- umns. Columns displaying student general data (A-F),
cations design requires specific design principles identi- student profile (G-H) and remaining time for a subtask (J)
fied and documented in literature [12], [13], there are are generated and read-only. Columns K-L and N-Q are
many unique details that have to be addressed in particular for data entry: Lab# (that is generated but can be overrid-
implementation. The final design must be highly driven by den), PC#, comments and marks for each subtask. There
capabilities and features of the chosen platform. are 12 additional columns on the interface for quantitative

Page 122 of 478


ICIST 2014 - Vol. 1 Regular papers

instrumentation, remaining time calculation, and for status have chosen string with described syntax as the most
variables. These data are automatically generated, but also descriptive, and easy to understand. For this reason, it
accessible to the user by panning the interface. turns out that absence of sparkline support in the platform
Students are sorted by scheduled time and teachers call was not limiting for usability of the application.
them in that order whenever there is available seat. During History of significant notes. To mark a note as
the lab exercise teachers visit students on demand or after significant teacher enters “-” sign within the note.
the deadline, and note comments and marks. When Significant notes are displayed in note history field next
student receives final grade, corresponding datasheet row week. Notes for subtasks A and B are separated with the
is marked with smaller font. Teachers can scroll or zoom “|” character. On Figure 3, student in the second row had
through student list, but typically all active students can be significant comment for subtask B only, while third
seen on a single screen. student had comment for subtask A, while comment for
the second student in subtask B was not recorded or not
Dashboard interface requirements in one hand, and marked as significant, thus it is not shown.
platform’s limited set of features on other, posed multiple
challenges designing and implementing required
functionalities. Some of requirements and implementation
solutions were listed below.

Figure 3. Design solutions for attendance and marks history string and
significant notes from previous class

Remaining time. Cells with remaining time (Column J


on Figure 2) are color coded, showing how urgent the stu-
dent should be visited. Cells with positive remaining times
are colored in green, while cells for students that passed
the deadline are colored in gradients of red.
Visualization of student progress. Students who
complete exercise are marked with smaller font. Progress
of students through checkpoints is visualized by simulated
progress bar – background of non-empty cells in columns
for comments and marks is colored (Figure 2, columns L-
Q), and fail grade for a subtask is marked with red color.
In this way, teacher can visually scan for active students,
to determine their progress stage and choose most critical
student to visit.
Searching for student record by PC#. This design
requirement was left unimplemented. Teacher must visu-
ally scan for PC# among active unfinished students.
Active unfinished students can be distinguished from fin-
ished or absent ones by looking at the number from PC#
column written with normal font size.

V. EXPERIMENT

A. Sample
The new approach in lab practicing organization was
employed within undergraduate course Computer systems,
introductory computer architecture and assembly language
programming course held at the Faculty of Electronic
Engineering, University of Niš, Serbia, during last two
Figure 2. Application interface on the tablet school years. Two teachers managed lab practicing in a
lab with 23 seats. In second school year, one group of 26
Attendance and marks history (Figure 2, column H) students was used as control group. The control group is
was generated as a string. Previous classes/weeks were scheduled in separate time slot, not overlapping with
represented with a numeric mark, or with “-“ sign in case others, single teacher managed control group. The lab
of absence. Passed mark is followed by “|” character, location, specific tasks and assignments, software tools
failed mark with comma. According to this syntax, the and topics were unchanged between control and test
column H from Figure 3. shows that the first student was groups. The statistics of the application is shown on Table
absent for first 3 weeks, then passed first lab practice from I. Duration of work in one week was calculated as average
second attempt, the second student passed 3 labs in 5 for all observed classes in a cycle, while total usage time
attempts, etc. During development, several different was calculated as a sum of durations for all classes, for
solutions for compact marks history display were test groups multiplied by 2, since two teachers were using
presented to users, including line and bar sparklines. Users the application simultaneously.

Page 123 of 478


ICIST 2014 - Vol. 1 Regular papers

TABLE I. 1.4%) and the second during 75% (std. dev. 14.8%) of the
SAMPLE SIZE AND MONITORING STATISTICS total duration time.
Cycle 1 Cycle 2
Control Regarding equipment engagement rate, i.e. PC usage
group rate, for control group average is 52% (std. dev. 39.29%),
Period
Dec 2012 Nov 2013 Nov 2013 while for test group average is 85% (std. dev. 13.0%).
-Jan 2013 -Jan 2014 - Jan 2014 Values are obtained by dividing total duration of PC
Number of students (N) 139 153 26 activity with the total duration of all classes.
Classes/weeks 4 7 7 Detected problems with management application, their
Timeslots per week 19 24 1 occurrences and actions that solved problems are shown
Total duration of labs in one 6h53min 8h2min 1h44min
on Table II. First 5 problems were anticipated as common
week – average (std dev) (26min) (23min) (8min) and teachers recorded their occurrences with the monitor-
Total application usage time 55h 112h 12h
ing tool. Three more problem types were recorded in the
written reports. In total, teachers reported 177 problem
B. Method occurrences. For 124 hours of monitored application usa-
ge, one problem arises in average at each 42.03 minutes.
To measure performance of the lab practicing policy Only 5.65% of problem occurrence was not possible to be
and performance of the management application, three solved using tablet interface. The most frequently reported
data collecting mechanisms were implemented: (1) was the problem finding particular student record by
instrumentation implemented within the management visual scan (41.8%). Users stated in written reports that
application, (2) special monitoring tool installed on tablet this problem emerges only later during the lab classes, and
devices, and (3) written structured reports by teachers. that they almost do not have that problem in first half of
The special monitoring tool was implemented as the day. Teachers reported that they did not have problems
Android application and installed on user devices. The with understanding data on the interface or remembering
tool logged teacher’s activity and inactivity periods by syntax for marking and viewing significant notes and
logging periods of time device screen is turned on and off. reading lab history. Loosing network connection occurred
Teachers were instructed to keep device screen on while only once on a single device and connection was restored
are communicating with students, and to turn the screen within less than a minute. Most severe reported problem
off when they are idle. Monitoring tool also provided sim- appeared after Google Drive application update to version
ple interface for teacher to log occurrence of some of 1.2.484.18, on Dec. 18 2013). Performance was degraded
common potential problems. The list of possible problem significantly due to forced document reloading on each
types was built into the tool and users were instructed to change by other user. Entire day of lab practice was af-
log each occurrence of a problem, and problem type. With fected. Until next week Google support was contacted but
the built in Help functions, the tool also reminds a user on they were unable to solve the issue. In order to mitigate
solution method for detected problem. Teachers were also the issue and restore previous user experience level, addi-
asked to write short report after each class and to list tional work was invested to find workaround and to make
observations on system performance, and uncommon necessary adjustments.
problems occurrence. Logs from monitoring tool were
collected and analyzed together with user reports during VI. DISCUSSION
Cycle 2 and control group.
Changing lab practicing policy aimed to improve
Using collected data, a detailed analysis of different as-
efficiency of lab equipment usage, teacher time usage and
pects of applied policy performances can be performed,
overall lab practicing efficiency. Proposed pipeline sched-
and multiple measures can be established. We will define uling and multiple simultaneously managing teachers,
and analyze only two metrics, while more detailed while very complex to perform, if supported with effective
analysis is beyond the scope of this paper. Teacher
software tool, can be possible to manage, and leads to
engagement rate is defined as percentage of time during
better efficiency.
lab practicing a teacher is engaged managing, helping or
accessing students, demonstrating also fraction of time From the teacher perspective, the new approach to lab
teachers are idle. Equipment engagement rate is defined as practicing management is much more intensive and harder
percentage of lab resources used during lab practicing, to perform. This is reasonable with pipeline approach, as
demonstrating efficiency of resource usage. laboratory resources, equipment and teacher’s time are
To analyze quality of the management application, logs intended to be used most of the time. Comparing the tablet
from monitoring tool were analyzed to find occurrence device dashboard interface with earlier paper based anno-
ratio of presumed common errors and problems in moni- tation, the application interface comprises of all necessary
toring tool usage. User reports were used to complement data elements to support the teacher activities during lab
problem list and to learn more about user experience using practicing classes and provides additional features not
monitoring tool. possible to implement otherwise. With adaptation of sim-
ultaneous editing features it became possible for many
C. Results teachers to assess same students and to have an insight in
all of notes made by any other teacher.
By analyzing log files exported from monitoring tool
and instrumentation data collected with the management Benefits of the spreadsheet platform include flexibility,
application, lab resources and teachers time engagement allowing invention and adding of new functionalities in
during two cycles of lab classes were measured. In control the application 'on-the-fly'. For example, teachers at the
group the teacher was active during 81% (std. dev. 8.8%) end of the semester marked students that were given
of total time. In test groups, where two teachers were pre- certification of lab practicing completion. For marking,
sent, one of the teachers was active during 89% (std. dev. they used background color for name cell.

Page 124 of 478


ICIST 2014 - Vol. 1 Regular papers

TABLE II.
USAGE PROBLEMS, FREQUENCY AND SOLUTION METHODS
Occurrences
Problem type Solution method
N per h Solved*
Wrong PC# 48 0.39 48 Entry edit
Comment for wrong student 21 0.17 21 Cut/paste
Presence for wrong student 9 0.07 9 Delete data in columns M, R, S
a): delete data in O, U, V, AB; if mark 5, + select row, set font 10
Mark to wrong student 11 0.09 11
b): delete data in columns Q, X, AC, select row, set font size to 10
Problem finding student’s record 74 0.60 74 More time spent visually scanning.
Unresponsive application or device 3 0.02 3 Wait or reboot device
Network connection lost 1 0.01 1 Wait or contact tech support
Slow response from the app, reloading and focus lost 2 N/A 0 Bug in Google Drive , workaround had to be implemented
Note inserted in mark column for wrong student,
2 0.02 0 Web interface, previous versions
overwrites the final mark, not detected immediately
Student was late, allowed to work, but his record is too
6 N/A 0 N/A (be aware of particular student) - Limit of the design.
away from other active students
TOTAL 177 1.3
*
Solved by the user on detection, using tablet interface

Flexibility makes application for small user base cost spreadsheet platform, for use in very complex
effective when developed on spreadsheet platform. environment of lab practicing.
Adding feature of multiple simultaneous users with
WEBDAV technology, provided by many cloud REFERENCES
providers, additionally increases usage possibilities. [1] B. Surgenor, and K. Firth, "The Role of the Laboratory in Design
Engineering Education," Proceedings of the Canadian Engineering
The most serious risk of exposing all the data to the Education Association, 2011.
user with spreadsheet platform is a risk of destroying data [2] L. D. Feisel, and A. J. Rosa. "The role of the laboratory in
by accident. Only immediate detection of error can be undergraduate engineering education," Journal of Engineering
solved with undo command. Users usually solved these Education, vol. 94, pp. 121-130, 2005.
issues by manually entering correct data again. In certain [3] I. Pascual-Nieto, et al. "Extending computer assisted assessment
cases, it was necessary to make the correction inside systems with natural language processing, user modeling, and
instrumentation columns. During our study, in two cases recommendations based on human computer interaction and data
accidental errors resulted in invalidation of large amount mining," Proc. of the Twenty-Second international joint
conference on Artificial Intelligence, Barcelona (ES), vol. 3, 2011.
of data. Users were able to recover data using Revision
history mechanism available in Google Drive web [4] Y. Rosen, and M. Tager. "Computer-based Assessment of
Collaborative Problem-Solving Skills: Human-to-Agent versus
interface. Other most common errors were editing or in- Human-to-Human Approach," Research & Innovation Network,
sertion data for the wrong students, and users were able to Pearson Education, 2013.
successfully resolve them. [5] S. Stidham Jr, Optimal design of queueing systems. Boca Raton
FL (US): CRC Press, 2010.
Other issues are not directly related to the application
[6] S. Few, Information dashboard design. Sebastopol CA (US):
but to underlying technology. After an update, referencing O'Reilly, 2006.
between sheets didn’t handle well in the Google Docs [7] R. Abraham, M. Burnett, and M. Erwig. "Spreadsheet
Spreadsheet application which resulted in reloading of the programming." in Wiley Encyclopedia of Computer Science and
whole spreadsheet on every editing of data. The Engineering, Wiley, 2008.
functioning of permanent internet connection can [8] M. Madahar, P. Cleary, and D. Ball, “Categorisation of
sometimes be serious source of problems as the platform Spreadsheet Use within Organisations, Incorporating Risk: A
is highly dependent on the continuality and quality of Wi- Progress Report,” The European Spreadsheet Risks Interest Group
Fi connection, but in our study, there were only one short 8th Annual Conference – EuSpRIG, London (UK), 2007.
disruption in connectivity. [9] A. Kumiega, B. Van Vliet, “A Software Development
Methodology for Research and Prototyping in Financial Markets”.
Having in mind intensity of interactions between The European Spreadsheet Risks Interest Group 7th Annual
students and teachers during lab practicing and complexity Conference – EuSpRIG, Cambridge (UK), 2006.
of teachers duties, overall recorded problem occurrence [10] A. J. Ko et al. "The state of the art in end-user software
engineering," ACM Computing Surveys (CSUR) vol. 43 no.3,
ratio can be considered very low, while successful re- 2011.
solved problems ratio extremely high, leading to
[11] R. Panko, Spreadsheet errors: What we know. What we think we
conclusion that management application design and im- can do, The European Spreadsheet Risks Interest Group 1st
plementation succeeded to achieve satisfactory level of Annual Conference – EuSpRIG, London (UK), 2000.
robustness. We also found that user tiredness significantly [12] T. A. Grossman, and Ö. Özlük, "A Paradigm for Spreadsheet
affects performance when using dashboard interface for Engineering Methodologies," European Spreadsheet Risks Interest
weaker implemented functionalities that relay on visual Group 5th Annual Symposium, Klagenfurt (AT), 2004.
scan. For that reason, in future development, alternative [13] P. Booth, “Twenty Principles for Good Spreadsheet Practice”,
methods for potentially problematic tasks can be provided. http://www.ion.icaew.com/itcounts/27870 (accessed: December
24, 2013)
We have demonstrated in this study that, with careful [14] “Edit Google spreadsheets on Android”, Google Drive support,
design driven by feature set limitations of the platform, https://support.google.com/drive/answer/2763167?hl=en&ref_topi
dependable and effective application can be developed on c=1361437 (accessed: January 10, 2014)

Page 125 of 478


ICIST 2014 - Vol. 1 Regular papers

Toward More General Criteria of Conformity


Between Learner and Learning Objects
Eleonora Brtka*, Vladimir Brtka*, Vesna Makitan*, Ivana Berkovic*
* University of Novi Sad/Technical Faculty “Mihajlo Pupin”, Zrenjanin, Republic of Serbia
[email protected], [email protected], [email protected], [email protected]

Abstract— The paper deals with IEEE 1484.12.1 – 2002  Self-Regulated Learning Competence Index.
Standard for Learning Object Metadata (LOM) and IMS  Self-Regulated Learning Performance Index.
Learner Information Package (LIP) specification. The main
goal is to develop Web-based Learning System which is The values of both indicators are calculated by the exact
characterized with high level of adaptability. The LOM is formula, based on:
used for the description of learning objects, making possible  The time consumption by a student while learning.
the creation of well-structured descriptions of learning  The accuracy of his/her answers to questions.
resources, while LIP describes learner’s profile. Conformity
criteria for best matching learning object and learner’s Chen has developed PELS system (Personalized E
profile are investigated. Previously introduced solutions are Learning System) which is the implementation of
based on If-Then rules with low level of adaptability. The concepts that were introduced by Zimmerman in previous
original contribution of this work is the adoption of investigations in this domain. The user has an interactive
similarity relation and more general logical operations in interface that can be customized.
order to achieve greater adaptability. It is shown how the Based on Chen’s results, different approach was
similarity relation, as well as general logical operations, developed by Biletskiy [8]. He defined two entities:
could be incorporated in Web-based Learning System.  Learning Objects (learning content) which
conceptual model was defined by IEEE Standard
I. INTRODUCTION 1.12.1484 (Learning Object Metadata - LOM,
Nowadays, web-based learning systems (WLS) are IEEE LOM, 2002).
widely accepted and used, covering a variety of scientific  Student (learner) whose conceptual model is based
domains. WLSs are available in many different languages on the IMS Learner Information Package
and intended to be used by users who differ in skill level, specification (IMS LIP, 2010).
age, affiliation, competency, interests, etc. This variety
Therefore, LOM provides metadata for learning content
requires an exempt adaptability by WLS in order to be
while IMS LIP provides user’s metadata. Various
useful for every individual user. Nowadays, it is possible
relations between LOM instances and IMS LIP instances
to develop a WLS which, to a certain extent, overcomes
can be defined.
time – space constraints and provides a suitable
environment for users. In addition to the interactivity, In [9] is presented the system for automatic generation
there is also an adaptive component. WLS, that is capable of IMS compliant courses based on explicitly expressed
of adaptation to the specific user needs, can create a user’s machine-readable instructional design templates. IEEE
profile, and track profile changes in order to adapt its Learning Object Metadata (LOM) was used, while
actions. In general, the actions that the system takes are binding of data was done by XML. A new domain-
related to the presentation of educational content in many specific language named ELIDL is proposed. The system
different ways. An adaptation process involves selection is used for generating Web Programming course at the
of content and the way it will be presented to the user. Faculty of Technical Sciences in Novi Sad, but generated
course is static because it represents the sequence of
In many studies [1, 2, 3] a significant positive
predefined learning activities.
correlation between the success of the student’s
achievements and self-regulation is confirmed. Self- A tutoring system named Protus that is used for
Regulated Learning (SRL) is type of WLS, which allows learning basic concepts of Java programming language is
user to determine the level and manner of content described in [10]. Protus uses ontologies, and Semantic
presentation. This one, and similar research works were Web concepts for performing personalization.
primarily related to the academic level of education. SRL The main goal of this work is to investigate possible
is understood as the ability to adapt characteristics of the differences in WLS implementation when similarity
content that has been delivered by WLS to the user. So, relation is used. The proposed relation should estimate
user’s preferences (students in this case) are main input to similarity between learner’s profile and learning objects.
SRL. It has been also observed that students with low This should result in WLSs with better adaptability.
ability of self-regulation while learning were less This paper is organized as follows: Section II contains a
successful at university courses, compared to students brief description of IEEE LOM standard and IMS LIP
with high ability of self-regulation while learning [4, 5, 6]. specification. Section III deals with previously introduced
More contemporary study that was conducted by Chen conformity criteria, while Section IV deals with more
[7] introduced two indicators reflecting the level of general approach to conformity criteria. Section V
capabilities of self-regulation while learning:

Page 126 of 478


ICIST 2014 - Vol. 1 Regular papers

describes a case study. Finally, Section VI gives some  Providing information about learner’s achievement
conclusions and guidelines to future work. to employers.
II. IEEE LOM AND IMS LIP  Personal development planning process.
 Storing information about learner preferences to
The IEEE 1484.12.1 – 2002 Standard for Learning support widening participation for learners with
Object Metadata (LOM) is an internationally recognized disabilities.
open standard. The LOM is used for the description of
“learning objects”, making possible the creation of well Rather than plain text, LIP uses XML to record
structured descriptions of learning resources. As in [11] information about learners. For purely illustrative
LOM enables: purposes, a small portion of XML is taken from [12]:
 Sharing descriptions of learning resources between <language>
resource discovery systems. <typename>
 Tailoring learning resource. <tysource
sourcetype=”imsdefault”/>
 Labeling learning resources by LOM along with <tyvalue>German</tyvalue>
other specifications.
</typename>
Therefore, LOM provides the description of learning
resources which results in cost reduction, customization, <contentype>
optimization, as well as easy accessibility and <referential>
interoperability of learning resources. The LOM
comprises a hierarchy of elements. The basic elements are <indexid>language_01</indexid>
given in Table I. Each basic element comprises sub- </referential>
elements. </contentype>
The IMS Global Learning Consortium created Learner <proficiency
Information Package (LIP) as a specification for a profmode=”OralSpeak”>Excellent</profici
standard means of recording information about learners. ency>
All data created according to LIP can be transferred
between different software applications. As in [12] LIP <proficiency
can be used for: profmode=”OralComp”>Excellent</proficie
ncy>
 Moving learner information between institutions
<proficiency
when students transfer courses.
profmode=”Read”>Good</proficiency>
 Producing a lifelong record of a learner’s <proficiency
achievement. profmode=”Write”>Poor</proficiency>
</language>

TABLE I. IEE LOM AND IMS LIP ELEMENTS


IEEE LOM Elements IMS LIP Elements
General: e.g. title, language, and keywords Identification: e.g. name, address, demographic, and agent
Life Cycle: e. g. version, and status Accessibility: e.g. language, disability, preferences, and eligibility
Meta-metadata: e.g. identifier, and contribution Goal: learning, career and other objectives and aspirations
Technical: e.g. format, size, and location Qcl: qualifications, certifications, and licenses
Educational: e.g. interactivity level, and semantic density Activity: e.g. educational program
Rights: e.g. cost, and copyright Competency: acquired learning competencies e.g. awards
Relations: e.g. kind, and resource Transcript: summary records of academic performance
Annotation: e.g. entity, date, and description Interest: hobbies and recreational activities
Classification: e.g. purpose, and taxon path Affiliation: membership of learned, professional, civic and
recreational organizations
Security key: e.g. passwords, and public key
Relationship: the relationship to be established between the other
core data structures
Extension: the extension facility for top-level ‘‘learner information”

Table I. contains LIP elements that were used in learning object for specific learner. This task can be done
previous research works. IMS contributed to the drafting by ranking process with pre-defined parameters, but in
of the IEEE LOM. It is obvious that IMS LIP must be such case the adaptability is minimized or ignored. The
compatible with the IEEE LOM. Therefore, data binding generalized criterion Kj of estimation of conformity of
should be used, and obvious choice was XML. learning objects to the specific learner is introduced by
(1).
III. CONFORMITY CRITERIA
As in [8] it is good to focus on those LOM and LIP (1)
elements that are useful as criteria of conformity of
learning object to learner profile. Table I contains selected where i  [0,1], j=1,2,…,P, kij{0, 0.5 ,1}
elements which are considered to be the best match for As there are multiple learning objects that potentially
conformity criteria. The goal is to find best matching can be delivered to the learner, a proper selection of

Page 127 of 478


ICIST 2014 - Vol. 1 Regular papers

learning object is challenging task. If there are P learning A. Similarity Relation


objects (LOM instances), then conformity of j-th learning If similarity relation between two objects is defined as a
object is calculated by (1), where i is the coefficient of negation of distance between them [13, 14], then we can
importance of i-th criterion of learner’s profile, and n is a use many types of distance measures such as: Euclidean,
total number of criteria. Most important parameter is kij, Squred Euclidean, Chebyshev, Mahalanobis, Manhattan,
the measure of conformity of the i-th criterion of the Minkowski, City-Block, etc. According to [14] the
learner to the i-th criterion of j-th LOM instance. In the distance between two values a1 and a2 can be calculated
end, learning object with maximal value of Kj, is delivered by (2):
to the learner.
The calculation of kij is done by multiple If-Then rules.
In [8] there are 13 rules proposed. Some of proposed rules (2)
are:
If (typical_age_range.min ≤ (date- where maxval is maximal possible value and minval is
birth_date) ≤ age_range.max) Then kij=1 minimal possible value. As values a1 and a2 are from
Else kij=0 [0,1], we have maxval =1 and minval = 0. Now, (2) is
transformed to simple form:
If current_activity=’active’ 
activity_type=context Then kij=1 Else
kij=0 (3)
Meaning of the rule 1 is: if learning object is
appropriate to learner’s age, then kij is 1, but if this is not If similarity of two objects is defined as negation of
the case then kij is 0. their distance, and negation of value x[0, 1] is defined as
1–x , then similarity of two objects a1 and a2 is defined as:
Meaning of the rule 2 is: if user’s current activity is
“active” or activity type is same as context, then kij is 1,
(4)
else kij is 0. Context and activity type are kind of
enrollment, e.g. “school”, “training”, “research”, etc.
where  is the coefficient of importance.
Formally, it is possible that the value for kij is in the
range [0, 1], but that raises the question of foundation of B. Implication on [0,1] and Rules
such a decision. Only possible way to change rules is to There are many definitions of the logical implication
recalculate values of the coefficients of importance. This AB (A implies B), when A and B are taking values from
research deals with different definition of the If-Then rules [0, 1]. Some of them are compatible with Boolean
and their application to LOM-LIP ranking process. implication, but some are not. As in [15, 16] the simplest
IV. TOWARD MORE GENERAL CONFORMITY CRITERIA possible way to define implication AB is to use min
function (Mamdani implication):
By insight to IEEE LOM standard it is evident that
most of elements are textual and descriptive. The (5)
“Educational” element consists of 11 sub-elements, most
importantly: Interactivity Level, Difficulty, Semantic
Density and Typical Learning Time. These four sub- Antecedent A of the implication defined by (5) is
elements are chosen to be criteria while calculating the formed as an aggregation of similarities between learner’s
measure of conformity. It is possible that values of these profile criteria and corresponding learning objects criteria,
four criteria are numerical, e.g. percent or some value while consequent B is a constant that is related to
from [0, 1] range. Now it is possible to define learner’s conformity criterion k. The aggregation of similarities is
profile as a 4-tuple <a, b, c, d> where a corresponds to done by application of AND logical operator that is
Interactivity Level, b corresponds to Difficulty, c implemented as t-norm [15, 16]. Most common t-norms
corresponds to Semantic Density and d corresponds to are: min function and prod (algebraic product). Logical
Typical Learning Time. Learning objects are defined as a OR operator is commonly implemented as max function.
4-tuple, analogously to the definition of learner’s profile. If the measure of similarity of learner’s profile and
There are many ways to define more general measure of learning object is  then the form of the rule is: If  Then
conformity, some of them are: k. Finally, there is no need for multiple rules. Now, only
 It is possible to use some similarity relation instead one rule of the form: If  Then k is needed. Multiple sets
of binary relations such as =, ≥, ≤. of rules can be emulated by varying the value of k: if k=1
the rule is strict, but for k<1 we have relaxed rules.
 It is possible to use more general logical operators
that deals with values from [0, 1] instead of logical
AND, OR and NOT operators that deals with V. A CASE STUDY
values from {0, 1}.
For four criteria: Interactivity Level, Difficulty,
 It is possible to use multiple sets of pre-defined
Semantic Density and Typical Learning Time we have
rules instead of one set of 13 pre-defined rules, as learner’s profile defined in Table II, and four Learning
proposed in [8]. objects defined in Table III.
 It is possible do use logical implication on [0, 1]
instead of Boolean logical implication on {0, 1}.
In this research we adopted the usage of similarity
relation while logical operations are defined on [0, 1].

Page 128 of 478


ICIST 2014 - Vol. 1 Regular papers

TABLE II. LEARNER’S CRITERIA (LIP) binding enable elimination of the problems such as:
Interactivity Difficulty Semantic Typical moving learner information between institutions,
Level (LIL) (LD) Density Learning producing a lifelong record of a learner’s achievement,
(LSD) Time providing personal development planning process and
(LTLT) storing information about learner preferences. While the
0.7 0.5 0.8 0.4 execution of the data binding process, it is of great
importance to find learning object which is the best match
TABLE III. LEARNING OBJECTS CRITERIA (LOM) to learner’s profile. In the past, this was done by simple
No. Interactivity Difficulty Semantic Typical Boolean If-Then rules. In this paper is proposed the
Level (OIL) (OD) Density Learning introduction of similarity measure, as well as general logic
(OSD) Time operations. This led to the more general criteria of
(OTLT) conformity between learner and learning objects.
1. 0 0.9 1 0.9 Generalization of logical operation is done by the
2. 1 0.6 0.8 0.9
3. 0.3 0.3 0.5 0.7
introduction of t-norms, while the logical implication is
4. 0.8 0.2 0.1 0.5 implemented as Mamdani logical implication via min
function. A case study shows that the adaptability of the
Web-based Learning System is possible.
Strict rule is defined by:
If LIL is similar to OIL AND LD is In [17] is underlined that there are problems regarding:
similar to OD AND LSD is similar to OSD National educational environments, Differences between
AND LTLT is similar to OTLT Then k=1. beneficiary institutions, Flexibility, Teaching materials
development and Delivery of the curriculum. By usage of
For the coefficient of importance  =1, and for the first IEEE LOM and IMS LIP joint data binding it is believed
learning object, we can calculate similarities by (4): that those and similar problems would be minimized.
LIL is similar to OIL: 1 – |0.7–0| = 1–0.7 = 0.3 Future work will investigate the impact of different
LD is similar to OD: 1 – |0.5–0.9| = 1–0.4 = 0.6 logical implication implementations such as: Kleene-
LSD is similar to OSD: 1 – |0.8–1| = 1–0.2 = 0.8 Dienes implication, Lukasiewicz implication, etc. to the
measure of the criteria of conformity of learner’s profile
LTLT is similar to OTLT: 1 – |0.4–0.9| = 1–0.5 = 0.5
and learning objects.
After application of the logical AND operator
implemented by min function, we have: min(0.3, 0.6, 0.8, ACKNOWLEDGMENT
0.5)=0.3. Measure of similarity between learner’s profile
This research is financially supported by Ministry of
and first learning object is =0.3. Analogous calculus is Education and Science of the Republic of Serbia under the
done for all remaining learning objects (see Table IV). If project number TR32044 “The development of software
AND logical operator is implemented as prod, different tools for business process analysis and improvement”,
measures of similarities are achieved (see Table IV, 2011-2014.
column three).
REFERENCES
TABLE IV. MEASURES OF SIMILARITY
[1] Dabbagh, N., Kitsantas, A., “Using web - based pedagogical tools
No. Measure of Measure of as scaffolds for self - regulated learning”, Instructional Science,
similarity  similarity  No 33, 2005, pp. 513-540.
min prod [2] Kumar, V., et al., “Effects of self -regulated learning in
1. 0.3 0.072 programming”, Proceeding of fifth IEEE International Conference
2. 0.5 0.315 on Advanced Learning Technologies ICALT, 2005, pp. 383-387.
3. 0.6 0.2352 [3] Narcissus, S., Proske, A., & Koerndle, H., “Promoting self -
4. 0.3 0.1701 regulated learning in web-based learning environments”.
Computers in Human Behavior, 2007, No 23, pp. 1126-1144.
Now, we calculate the final result by logical implication [4] Schunk, D. H., & Zimmerman, B. J., “Self-regulation of learning
and performance – Issue and educational application”, Hillsdale,
implemented as min function for previously calculated  NJ, Lawrence Erlbaum Associates, 1994.
and k=1: min(0.3, 1)=0.3, min(0.5, 1)=0.5, min(0.6,
[5] Zimmerman, B. J. Schunk & D. H., “Self-regulated learning and
1)=0.6, min(0.3, 1)=0.3. It is obvious that best matching academic achievement: Theory, research, and practice”, New
learning object for specific learner is learning object No. York, Springer – Verlag, 1989.
3. Analogously, if logical AND operator is implemented [6] Zimmerman, B. J. Schunk & D. H., “Self-regulated learning and
as prod, best matching learning object for specific learner academic achievement: Theoretical perspectives”, Hillsdale, NJ,
is learning object 2. Lawrence Erlbaum Associates, 2001.
If rule is relaxed, so that k=0.5, we have: min(0.3, [7] Chen Chih – Ming, “Personalized E-learning system with self -
regulated assisted mechanisms for promoting learning
0.5)=0.3, min(0.5, 0.5)=0.5, min(0.6, 0.5)=0.5, min(0.3, performance”, An International Journal of Expert Systems with
0.5)=0.3. Now, if logical AND operator is implemented as Applications, No 36, pp. 8816-8829, 2009.
min, then best matching learning objects for specific [8] Biletskiy Yevgen, Hamidreza Baghi, Keleberda Igor, Michael
learner are objects 2 and 3, while if logical AND operator Fleming, “An adjustable personalization of search and delivery of
is implemented as prod, then there is no change. learning objects to learners”, An International Journal of Expert
Systems with Applications, No 36, pp. 9113-9120, 2009.
VI. CONCLUSION [9] Savić, Goran, Segedinac, Milan, Konjović, Zora, "Automatic
Generation of E-Courses Based on Explicit Representation of
IEEE 1484.12.1 – 2002 Standard for Learning Object Instructional Design", DOI: 10.2298/CSIS110615005S, ComSIS
Metadata (LOM) and IMS Learner Information Package Vol. 840 9, No. 2, June 2012.
(LIP) specification, combined together via XML data

Page 129 of 478


ICIST 2014 - Vol. 1 Regular papers

[10] Vesin, Boban, Ivanović, Mirjana, Klašnja-Milićević, Aleksandra, [14] Burkhard, H., D., “Case Completion and Similarity in Case Based
Budimac, Zoran, "Ontology-Based Architecture with Reasoning”, ComSIS Vol. 1, No.2, ISSN 1820-0214, pp. 27–55,
Recommendation Strategy in Java Tutoring System, 2004.
DOI:10.2298/CSIS111231001V, ComSIS Vol. 10, No. 1, January [15] Tick, Jozsef, Fodor, Janos, "Fuzzy Implications And Inference
2013. Processes", Computing and Informatics, Vol. 24, 2005, pp. 591–
[11] Barker, Phil, “What is IEEE Learning Object Metadata / IMS 602.
Learning Resource Metadata”, JISC CETIS, The University of [16] Brtka, Vladimir, “Soft Computing”, Technical faculty “Mihajlo
Bolton, USA, 2005. Pupin”, Zrenjanin, Serbia, 2013.
[12] Wilson, Scott, Rees-Jones, Peter, “What Is IMS Learner [17] Bothe, Klaus, Budimac, Zoran, Cortazar, Rebeca, Ivanović,
Information Packaging”, JISC CETIS, Bolton Institute, USA, Mirjana, Zedan, Hussein, "Development of a Modern Curriculum
2002. in Software Engineering at Master Level across Countries", UDC
[13] Sung-Hyuk Cha, "Comprehensive Survey on Distance/Similarity 004.41, DOI: 10.2298/CSIS0901001B, ComSIS Vol. 6, No. 1,
Measures between Probability Density Functions, International June 2009.
Journal Of Mathematical Models And Methods In Applied
Sciences, Issue 4, Volume 1, pp. 300-307,2007.

Page 130 of 478


ICIST 2014 - Vol. 1 Regular papers

Increasing the lifetime of hexagonal deployed


Wireless Sensor Web Network
Mirjana Maksimović*, Vladimir Vujović*, Vladimir Milošević**, Branko Perišić**
*
Faculty of Electrical Engineering, East Sarajevo, Bosnia and Herzegovina
** Faculty of Technical Sciences, Novi Sad, Serbia

[email protected], [email protected], [email protected], [email protected]

Abstract: Like with Wireless Sensor Networks (WSN) there The design of a wireless Sensor Web platform must
are two main goals of Sensor Web: the optimal deployment deal with challenges in energy efficiency, cost, and
of sensor nodes, and the maximization of sensor network application requirements. Choice of deployment strategy,
lifetime. They are particularly important considering which is a first step in forming any WSN, depends on
critical event applications, like residential fire detection,
applications, requirements and design goals [4]. Sensors
because of the fact that the topology of sensor nodes and the
network lifetime have a dramatic impact on the overall can generally be deployed in an area of interest either
network effectiveness and the efficiency of its operation. deterministically or randomly. The choice of the
One of the possible solutions may be the development of an deployment scheme highly depends on the type of
appropriate algorithm for minimizing the expected energy sensors, application and the environment that the sensors
consumption as a function of specific sensor nodes topology. will operate in. In other words, the node’s position
In this paper we have analyzed a hexagonal sensor determines the functionality, life span and the efficiency
deployment scheme and proposed specific algorithm for of the network. Using deterministic deployment strategy,
energy saving by scheduling specific active-passive pattern the access to the monitored field must be granted and the
of sensor nodes.
number of required nodes for full converge could be
determined. Therefore, it is suitable for optimal
I. INTRODUCTION deployment where the coverage and/or the sensor
A sensor node in WSN is a small embedded networks lifetime are maximized. In addition, the number
computing device that interfaces with sensors/actuators of the required sensors to monitor a given area in a
and communicates using short-range wireless transmitters. deterministic deployment, in most cases, is more efficient
It has limited battery resources, processing and [4, 5]. There are many deterministic deployment strategies
communication capabilities. Sensors nodes form a logical presented in literature: square [6], triangular [7, 8], strip
network in which data packets are routed hop-by-hop [8, 9], hexagonal [10, 11, 12]. As the WSNs are made up
towards management nodes, typically called sinks or base of tiny energy hungry sensor nodes, it is a challenging
stations. Thus, a WSN comprises a potentially large set of process to retain the energy level of those nodes for a
nodes that may be distributed over a wide geographical long period. Therefore it is necessary to extend the
area, indoor or outdoor. Recent advances in WSN battery life of individual sensors so that the network can
technology and the use of the Internet Protocol (IP) in remain functional as long as possible. In the above
resource constrained devices has radically changed the mentioned topologies, the number of neighboring nodes
Internet landscape creating a new form called Internet of determines the number of receivers and hence results in
Things (IoT) [1]. The IoT will connect physical more overall power usage, even though the number of
(analogic) environments to the (digital) Internet, transmissions decreases. Thus, there is a fundamental
unleashing exciting possibilities and challenges for a trade-off between decreasing the number of transmissions
variety of application domains, such as smart metering, e- and increasing the number of receptions [13]. In [14]
health logistics, building and home automation [2]. One authors consider the topology that best supports
of the most important building elements of IoT are sensor communication among sensor nodes. They consider
nodes, precisely, a Sensor Web. Traditionally, Sensor different routes and topologies, demonstrating the
Web is defined as a web of interconnected heterogeneous difference in performance and explaining the underlying
sensors that are interoperable, intelligent, dynamic, causes. Proposing a power-aware routing protocol and
flexible and scalable. This definition implies that a Sensor simulating the performance, authors show that their
Web is a hardware network of sensors. Alternatively, the routing protocol adapts routes to the available power
Sensor Web can be defined as a universe of network - what leads to a reduction in the total power used as well
accessible sensors, sensory data and information [3]. In as more even power usage across nodes. Energy
other words, the concept of the Sensor Web reflects such awareness, power management and data dissemination
a kind of infrastructure for automatically discovering and considerations have made routing in WSN a challenging
accessing appropriate information from heterogeneous issue. Low latency data delivery is an important
sensor devices over the Internet. requirement for achieving effective monitoring through
WSNs. The paper [15] propose a forwarding protocol

Page 131 of 478


ICIST 2014 - Vol. 1 Regular papers

based on biased random walks where nodes use only local networks. In monitoring applications, like fire detection,
information about neighbors and their next active period sensors must be positioned so that every point in the
to make the forwarding decisions. This is referred as region of interest is monitored by at least one sensor. A
lukewarm forwarding. Analytical and simulation sensor is able to monitor all points within a certain
experiments make it possible to reduce the latency without distance of its location, i.e., a disk of radius “r”. This
increasing the number of transmissions needed to deliver paper is focussed on finding the deployment with
the message to destination. minimal number of sensor nodes whose centralized disk
In this paper the performance issues associated with completely covers given area. The underlining Hexagonal
hexagonal Sensor Web network topology applied in fire deployment of smoke sensors used is presented in Fig. 2.
detection are analyzed. Application in fire detection
requires accurate deployment of the sensors. In addition,
many parameters need to be considered during the
deployment process for efficient network operation.
Since the topology in this case is fixed and known, it is
assumed that the base station position can be optimally
determined. Thus, the power requirements for
communicating with the base station should be essentially
independent of the topology. The ultimate objective of
the practical design in this paper can be defined as Figure 2. Hexagonal deployment strategy of smoke sensors
follow: For a specific sensing task – early and accurate
smoke detection, determine how the number, deployment Fig. 3 shows that, with sensor nodes placement for r=5,
and scheduling of smoke detectors into hexagonal above mentioned requirements are fulfilled and 100%
structure influence on the network energy consumption coverage of the room is achieved meeting the crucial
and lifetime. For the purpose of the paper the variant of requirements in the case of fire presence.
active-passive scheduling is considered with the aim to
reduce energy consumption and thus increase network
lifetime.
II. HEXAGONAL DEPLOYMENT OF SENSOR NODES
A grid-based deployment is considered as a good
deployment in WSNs, especially for the coverage
performance. There are several grid based designs like as
unit square, equilateral triangle, regular hexagon etc [16].
In this paper the hexagon grid deployment pattern with
Figure 3. Coverage obtained by hexagonal deployment of smoke
“r” communication range is chosen for the evaluation sensors
purposes (Fig. 1).
III. OPTIMIZATION OF ACTIVE SENSOR NODES NUMBER
Acquisition of precise data and immediate transfer of
the data to sink node is very important, especially in fire
detection. Data processing and data transfer require more
power. When, the data has to be transferred and when, it
needs to be stored, depends on the state of the radio in the
node. To conserve energy, the radio can be switched to
sleep state when there is no data to send or receive. This
method of switching the radio to a sleep state and making
it active if any event is detected is well known as on-
demand scheme or event-based scheme. There is an
another method of scheduling based on regular time
interval for switching all the nodes either in sleep or
active mode known as synchronous scheme. There is no
Figure 1. Hexagonal deployment strategy [11] need to keep all the nodes active at the same time. WSN
can follow a scheduling pattern, accordingly, at any
The easiest way to detect a fire at residential places is by instant; only a limited number of nodes can be active
using the smoke sensors that are usually sensitive to [13].
ionization or obscuration. The room with dimensions The work presented in this paper considers two cases
50x19x4 m, with a flat ceiling and with the average fire based on the same principle and presented scheduling
risk and medium fire load is observed for simulation protocol will keep only a subset of nodes to be in active
purpose. It is assumed that there are no physical partitions state and keeping others inactive or in passive state. A
and barriers in the monitored room that may affect the scheduling protocol will be the best if it keeps only a
deployment process as well as the operation of sensor

Page 132 of 478


ICIST 2014 - Vol. 1 Regular papers

minimum number of nodes active at any instant. In this such increase is usually based on the selection of the
paper is proposed that for critical application monitoring appropriate routing algorithm [14, 15], in the case of
only half of sensors should be active, according to Fig. 4 Sensor Web network there is a need for a different
and Fig. 6. solution. As it already stated, Sensor Web is traditionally
defined as a web of interconnected heterogeneous sensors
that are interoperable, intelligent, dynamic, scalable and
flexible, but can often be presented as a group of
interoperable web services which all comply with a
specific set of behaviors and sensor interfaces
specifications [2], where the difference between ordinary
sensor and Sensor Web to the end user shouldn’t be
visible [17].
In the case considered in this work, a sensor node that
Figure 4. 1st modified hexagonal deployment strategy (hex-1) can have two states is created, unlike traditional sensor
nodes that were mentioned earlier, which usually have
four states (Transmit, Receive, Idle and Sleep). Thus, two
states that a sensor node can have are:
• active – all units are active (Fig. 8),
• passive – only transceiver unit is active (a
condition in which a sensor node maintains
the minimum activities necessary to wake up
a sensor).
In both conditions, the sensor node can communicate
Figure 5. Coverage obtained by active sensors of 1st modified with the central unit, which is responsible for monitoring
hexagonal deployment of smoke sensors and controlling sensor networks.

Figure 8. Typical sensor node architecture

The main principle of saving energy and increasing


nd
Figure 6. 2 modified hexagonal deployment strategy (hex-2) Sensor Web network lifetime can be based on scheduling
[13] or on the Smart Sleep [18] approach. In both cases,
the solution is primarily based on the planning and setting
up of a number of sensor nodes in an active and passive
state. What is most important when applying this method
is to determine the proper network settings, i.e., define
the following goals:
• The minimum number of active sensors (but
the sensor network performance shouldn’t be
significantly disrupted);
• Distance between the sensor vs. sensors’
Figure 7. Coverage obtained by active sensors of 2nd modified power;
hexagonal deployment of smoke sensors
• Time changes between active/passive states
From Fig. 5 and 7 is obvious that in those two cases full of nodes;
coverage of room is not achieved what can lead to late • Algorithm for controlling the active/passive
detection of fire, especially if it is located out of detection sensor nodes;
zones. • The actions taken at the time when the sensor
To avoid the situation of the late detection in areas detects a critical event (assuming that the
outside of detection zones, it is necessary to use special existence of critical points is adopted).
algorithms to change the active and passive states of The first two items are directly related to the topology
sensors. of the sensor network and the area where the sensor
network is deployed. Identification of these elements is
IV. A SIMPLE DYNAMIC SCHEDULE ACTIVE-PASSIVE usually done using a complex mathematical simulation or
NODES ALGORITHM empirical methods.
One of the main objectives in WSNs is to increase the As the proposed Sensor Web network can be managed
lifetime of the network. Unlike traditional WSNs, where centrally and dynamically, switching between the

Page 133 of 478


ICIST 2014 - Vol. 1 Regular papers

active/passive nodes states can be determined depending is obvious that, upon the detection of critical event in the
on the situation. It is important to note that as the network, it is necessary to activate all the neighbouring
frequency of switching increases, the accuracy of the nodes immediately (adjacent nodes that are not active –
sensor network grows up, and approaches the situation Fig. 9). Fig. 11 depicts the state of adjacent sensor
shown in Fig. 2. Consequently the energy saving in this network nodes upon activation.
case decreases.
The algorithm for controlling the active/passive sensor
nodes performs on complete set of sensor network nodes
divided into two groups: active sensor nodes and passive
sensor nodes and all their adjacent nodes (in the hexagonal
network for every node there are maximal three adjacent
nodes) - Fig. 9.

Figure 11. Hexagonal Sensor Web deployment: Alert state

In the case that one of sensors has been activated by


smoke presence, the adjacent sensors should be awaked
immediately to avoid possibility of false alarm existence.
If at least one of those sensors also detects a presence of
smoke whole sensor network should become active. In
Figure 9. Hexagonal Sensor Web deployment: Non alert state this way, the entire sensor network is active only during a
Red dot – active sensor, Black dot – passive sensor, red line – adjacent
sensors detection of critical event, while in other cases the
maximum number of active nodes is (n/2) +3 (n is total
In Fig. 9 red nodes are active nodes while the black number of sensors in network) which represents a saving
ones are passive. Connections between the nodes are of 25-50%, depending on the size of the sensor network.
shown only for a better understanding of adjacent nodes
(red marked connections are connections between V. SIMULATION RESULTS
adjacent nodes), although they do not exist in the real The aim of simulations performed using Pyrosim
system. software tool [19] is to make a comparative analysis
between proposed deployment schemes and to realize
influence of number of active sensors and each of
proposed schemes to detection of fire and network
lifetime in a presence of fire. As long as one sensor is
alive there is a possibility that information about the
location and spread of the fire will be provided to higher
level.
For the simulation purposes five different positions of
fire ignition are considered (Fig. 12).

Figure 10. The activity diagram of hexagonal Sensor Web network Figure 12. Fire source positions (size 2 x 2 m)

During the simulation process it is assumed that in first


The algorithm works by a combination of scheduling case all 18 sensors are active while in other two cases only
and smart sleep algorithm. In a given time interval, only 9 sensors are active in presence of fire, according to Fig. 4
one group of sensors is active, while the second group is and 6. Simulation results are given in Fig. 13.
in the passive state. After the time interval elapses, states
of active and passive sensors are changed. The activity
diagram of proposed algorithm is presented in Fig. 10. It

Page 134 of 478


ICIST 2014 - Vol. 1 Regular papers

6 6
5.5 5.5
5 5
4.5 4.5
4 4
hexagonal hexagonal
3.5 3.5
3 hex-1 3 hex-1
2.5 2.5
hex-2 hex-2
2 2
1.5
1.5
1
0.5 1
0 0.5
0
activation time of 1 sensor activation time of at least 2
activation time of 1 sensor activation time of at least 2
[s] adjacent sensors [s]
[s] adjacent sensors [s]

(a) fire source position “1” (b) fire source position “2”

5 6.5
4.5 6
5.5
4 5
3.5 4.5
3 hexagonal 4 hexagonal
2.5 hex-1 3.5
hex-1
2 3
hex-2 2.5 hex-2
1.5
2
1 1.5
0.5 1
0 0.5
activation time of 1 activation time of at least 0
activation time of 1 sensor [s] activation time of at least 2
sensor [s] 2 adjacentsensors [s]
adjacent sensors [s]

(c) fire source position “3” (d) fire source position “4”

5.5
5 35
4.5 30
4 25
3.5 hexagonal
hexagonal 20
3 hex-1
hex-1 15
2.5 hex-2
2 hex-2 10
1.5 5
1 0
0.5 average activation average activation average activation
0 time of 1 sensor [s] time of at least 2 time of all sensors
activation time of 1 sensor activation time of at least 2 adjacent sensors [s] [s]
[s] adjacent sensors [s]

(e) fire source position “5” (f) Average activation time


Figure 13. Response sensor times for five proposed fire source positions and average activation time in a case of three proposed
deployment strategies

Fig. 13 (a)-(e) show that for fire source positions “1”-“5” rotating of hex-1 and hex-2 schemes in certain time
there is no significant difference between initial intervals. Thus, instead of all 18 active sensors,
deployment of 18 sensors and hex-1 and hex-2 periodically it will be active one of schemes: hex-1 or
deployments where are only 9 sensors active. The faster hex-2, what means that during the whole monitoring time
reaction time can be measured in tens of seconds, but in only 9 sensors will be active always. That should lead to
the event of a fast moving fire, these are precious significant reduce of energy consumption and overall
seconds. According to results presented in Fig. 13 (a)-(e) longest network lifetime. Simulation results have shown
and average values of activation times for all observed that, by only 9 active sensors, fire will be detected
cases shown in Fig. 13 (f), it can be noted that the accurately and on time. In the case that one of sensors has
maximum delay in activation times of hex-1 and hex-2 been activated by smoke presence, the adjacent sensors
model compared to initial deployment scheme is 0.5 s. should be awakening immediately to solve problem of
false alarms existence. If any of those sensors also detect
VI. CONCLUSION a presence of smoke whole network consisted of 18
Choice of deployment strategy, being the first step in sensors should become active in order to obtain as much
forming any WSN, depends on applications, requirements information about fire as possible. Advantage of the
and design goals. WSN application in fire detection Sensor Web is that it provides a mechanism for
requires accurate deployment of the sensors and long authorized professionals to access sensed data remotely
network lifetime. A hexagonal grid-based deployment on using Internet connection. In the case of fire is detection,
which bases all known optimal patterns can be generated, the fire department will be provided with a constant
is considered in this paper. The network lifetime is stream of information about the location and spread of
increased by the proposed solution which is consisted in the fire while the deployed firefighters will have

Page 135 of 478


ICIST 2014 - Vol. 1 Regular papers

information about building plan, an initial location of the [9] K. Kar and S. Banerjee, “Node placement for connected coverage
in sensor networks,“ in the Proceedings of the Workshop on
fire, its spread, development of smoke, presence of toxic Modeling and Optimization in Mobile, Ad Hoc and Wireless
gases and other factors which may affect them. Networks (WiOpt’03), Sophia Antipolis, France, 2003
Improvement of proposed scheduling algorithm and [10] Z. Cheng, M. Perillo, W. Heinzelman, “General network lifetime
optimization of transceiver unit energy consumption will and cost models for evaluating sensor network deployment
be the main aim of our future research. strategies,” IEEE Transaction on Mobile Compuing. 2008, 7, 484–
497.
[11] Y. E. Aslan, “A Framework for the use of Wireless Sensor
Networks in forest fire detection and monitoring,” , Master thesis,
REFERENCES The Institute of Engineering and Science of Bilkent University,
[1] L. Atzori et al., “The Internet of Things: A survey,” Computer 2010.
Networks, pp. 2787-2805, October 2010. [12] Z. Yun, X. Bai, D. Xuan, T. H. Lai, and W. Jia, “Optimal
Deployment Patterns for Full Coverage and k-Connectivity
(k ≤ 6) Wireless Sensor Networks,” IEEE/ACM Transactions on
[2] D. Liping, “Geospatial Sensor Web and Self - adaptive Earth
Predictive Systems (SEPS),” Proceedings of the Earth Science
Technology Office (ESTO)/Advanced Information System Networking, vol. 18, no. 3, 2010
Technology (AIST) Sensor Web Principal Investigator (PI) [13] R. Rathna, A. Sivasubramanian, “Improving Energy Efficiency in
Meeting, San Diego, USA, 2007 Wireless Sensor Networks through scheduling and routing,”
[3] M. Rouached, S. Baccar, M. Abid, “RESTful SensorWeb International Journal Of Advanced Smart Sensor Network Systems
Enablement Services for Wireless Sensor Networks,” 2012 IEEE ( IJASSN ), Vol. 2, No.1, 2012
Eighth World Congress on Services, pp. 65-72 [14] A. Salhieh, J. Weinmann, M. Kochhal, and L. Schwiebert. “Power
[4] R. Ramadan and H. El-Rewini, H, “Deployment of Sensing Efficient Topologies for Wireless Sensor Networks,” In Proc. of
Devices: A Survey,” [Online]. Available: Intl. Conf. on Parallel Processing (ICPP), 2001.
http://rabieramadan.org/papers/deployment%20survey.pdf [15] K. D. Hanabaratti, R. Jogdand, “Design of an Efficient Random
[5] H. Zhang and J.C. Hou, “Is Deterministic Deployment Worse than Walk Routing Protocol for Wireless Sensor Networks,” IJECT
Random Deployment for Wireless Sensor Networks?” 25th IEEE Vol. 2, Issue 4, 2011
International Conference on Computer Communications [16] U. Aeron, H. Kumar, “Coverage Analysis of Various Wireless
INFOCOM 2006. Sensor Network Deployment Strategies,” International Journal of
[6] N. Hadžiefendić, “Detekcija požara”, Beograd, 2006 [Online]. Modern Engineering Research (IJMER) Vol.3, Issue.2, March-
Available: April. 2013 pp-955-961
http://spec-nstalacije.etf.rs/predavan/glava_5/DojavaPozara.pdf [17] K. A. Delin, “Sensor Webs in the Wild", Wireless Sensor
Networks: A Systems Perspective, Artech House, 2005
[7] M. Cardei and J. Wu, “Coverage in wireless sensor networks,” in
Handbook of Sensor Networks, M. Ilyas and I. Magboub, Eds. [18] C. H. Lee, D. Y. Eun, “Smart sleep: Sleep more to reduce delay in
USA: CRC Press, pp. 19.1-19.10. 2004 duty-cycled wireless sensor networks,” INFOCOM 2011: 611-615
[8] C.T. Vu, “An Energy-Efficient Distributed Algorithm for k- [19] Pyrosim software tool, [Online]. Available:
Coverage Problem in Wireless Sensor Networks" Computer http://www.thunderheadeng.com/pyrosim/
Science Theses. Paper 40, 2007

Page 136 of 478


ICIST 2014 - Vol. 1 Regular papers

SDN-based concept for Network Monitoring


Vassil Gourov*
* Sofia,
Bulgaria
[email protected]

Abstract- The deployment of increasing number of real-time utilization of network resources, it is expected that the
services over communication networks raises essential issues exponential growth of users and Internet traffic could
for assurance of the quality of services, which requires a create problems in provision of high quality of services
clear picture of the network performance. The availability (QoS) and meeting users’ demands. Thus, accurate traffic
of accurate statistics helps to estimate the traffic flows, to measurement (TM) becomes a key aspect for network
find service degradation due to congestion, as well as to management, in order to reach QoS, ensure network
optimize routing. Presently, for network measurement and security and traffic engineering (TE). Due to the large
monitoring are applied various methods which require number of flow pairs, high volume of traffic and the lack
separate infrastructure, and thus, higher expenses. Most of measurement infrastructure it has become extremely
methods are not capable to meet all monitoring difficult to obtain direct and precise measurements in
requirements, some are not accurate or granular enough, Internet Protocol (IP) networks [4]. Current measurement
others are adding network load or lack scalability. The methods use too many additional resources or require
paper provides a concept for using Software Defined changes to the infrastructure configuration, thus, bringing
Networking as a unified monitoring solution, by suggesting additional overhead. There is obvious a need to find a
how to measure link utilization, packet loss and delay. network management solution able to provide accurate,
Initially, some monitoring methods are presented, and the detailed and real-time picture of the network, being also
opportunity of using OpenFlow enabled topology in cheap and easy to implement.
Software Defined Networking for monitoring. The paper
proposes a monitoring concept for measuring link usage, The main problem addressed in this paper is how to
packet loss and delay. monitor network utilization efficiently in real time. The
aim is to use SDN as a lever to meet future networking
demands by designing a monitoring solution capable to
I. INTRODUCTION measure network utilization, delay and packet loss and to
In last decades, Internet has evolved into a huge evaluate it by using OpenFlow (OF) Protocol.
structure interconnecting thousands of networks, and its Subsequently, the main research questions are [1]: How
users have grown exponentially. The network has also can monitoring be achieved with Software Defined
experienced deep changes in the services provided and the Networking? What kind of improvements could SDN
usage patterns. Present trends towards Internet of Things bring compared to present solutions?
(IoT) and Factories of the Future (FoF), as well as the This paper focuses initially on SDN specificity and the
development of several new applications and services available network monitoring solutions. Next a conceptual
(such as video streaming, social networking, online architecture and a prototype are presented and the
gaming, e-banking, e-business, etc.) have raised not only implementation of a network monitoring solution in a real
issues of interoperability, but also have added new control test environment using the OF protocol. Finally, some
requirements and have significantly increased the network monitoring concept evaluation results are presented.
complexity.
Present-day networks are growing large and need to II. NETWORK MONITORING METHODS
support a lot of new applications and protocols. Measuring the traffic parameters provides a real view of
Subsequently, their management complexity is increasing, the network properties, an in-depth understanding of the
which reflects on higher expenses due to maintenance and network performance and the undergoing processes.
operations [2], as well as, due to human errors in network- Network monitoring is crucial for QoS and assures that
downtime [3]. Certain new services (i.e. voice and video the network functions properly. The ability to obtain real
delivery) are not capable to operate in a best effort traffic data makes possible to analyze network problems,
environment, where resources are socially shared, generate traffic matrices, optimize the network using TE
resulting in delays and losses. On the other hand, there is techniques or even upgrade it based on future predictions.
no guarantee that any new architecture would not result in Finally, a proper network overview allows the routing
a similar problem a decade from now. Every time when a algorithms to take more appropriate decisions, increasing
new infrastructure capable of solving past problems is the resource utilization and decreasing the congested
introduced, new problems emerge. A solution that is able nodes/links [1].
to meet future requirements when they arise is needed, Traditionally, different techniques are used for
and it is considered that Software Defined Networking measuring the amount and type of traffic in a particular
(SDN) may play an important role [1]. network. Generally, two distinct groups of measurement
New Internet-based services need an environment methods are applied: passive and active. The former
capable to dynamically adjust to changing demands, and counts the network traffic without injecting additional
able to provide the best point-to-point connection. As the traffic in the form of probe packets, while the latter is
performance of applications depends on the efficient achieved by generating additional packets. Both are useful

Page 137 of 478


ICIST 2014 - Vol. 1 Regular papers

for network monitoring purposes and for collecting The OF protocol is capable of not only controlling the
statistical data. Other methods focus on measurements on forwarding plane, but also to monitor the traffic within the
application or network layers of the Open System network. OpenTM [11] estimates a TM, by keeping track
Interconnection (OSI) model. Network layer of the statistics for each flow and polling directly from the
measurements use infrastructure components (i.e. routers switches situated within the network. The application
and switches) to get statistics, whereas Application layer decides which switch to query on runtime and converges
measurements are operating on the upper layer and are to 3% error after 10 queries. In the paper presenting it,
easier to deploy as they are application specific. The latter several polling algorithms are compared for a querying
are more granular and could be used also for better service interval of 5 seconds [11].
delivery, however, this method requires access to end In [12] an active measurement technique is suggested,
devices, which Internet Service Providers (ISP) normally whereas the authors use the fact that every new flow
do not have [1]. It is important to note that OF provides request has to pass through the controller. This allows
means to implement any of the methods or combine them routing the traffic towards one of multiple traffic
if needed, while traditionally every type of measurement monitoring systems, to record the traffic or to analyze it
requires separate hardware or software installations. with an Intrusion Detection System.
Today, different techniques are applied to measure link For passive measurements in FlowSense [13] are used
usage, end-to-end delay and packet loss. Some monitoring some features of OpenFlow in order the measurements to
techniques use direct measurement approaches. For be evaluated from three prospectives: accuracy (compared
example, flow-based measurements such as NetFlow and to polling), granularity (estimate refresh) and staleness
sFlow [5] rely on packet sampling in order to ensure (how quickly can the utilization be estimated). FlowSense
scalable real-time monitoring. This method, however, has suggests gathering statistics passively based on the
some limitations linked to high overhead and unreliable massage the controller receives once the flow has expired.
data [6]. Deep Packet Inspection is heavily used within In [14] is suggested to implement a new SDN protocol
network monitoring for security reasons and also for high
for statistic gathering, whereas new software defined
speed packet statistics. Unfortunately, few network traffic measurement architecture is proposed. The authors
devices support it, so very often additional hardware
implement five measurement tasks on top of an
installations are required. Using DPI also creates a OpenSketch enabled network in order to illustrate the
network bottleneck point. Another method is based on
capabilities of this approach. The measurement tasks are
port counters: Simple Network Management Protocol detection of: heavy hitters (small number of flows account
counters are used to gather information about packet and for most of the traffic), super spreader (a source that
byte counts across every individual switch interface [7]. contacts multiple destinations), traffic changes, flow size
Some of the limitations of this method are linked to the distribution, traffic count.
switch query frequency (limited to once every 5 minute),
the overall resource utilization, the lack of insight into the A network monitoring system should be able to observe
flow-level statistics and hosts behavior, and thus, the lack and display up-to-date network state. It is obvious that
of granularity of the monitoring information obtained [1]. several monitoring solutions are already capable to do that
in one or another way. However, in order to meet the
Today, delay and packet loss data are mainly obtained
specific challenges that ISPs face, the following design
by application measurements and a common practice is to requirements should be considered in a new monitoring
use ping. It uses the round trip time (RTT) by sending a concept [1]:
number of packets from a source node to a destination
node and measures the time it takes for it to return back.  Fault detection - Whenever a link or node failure
For example, Skitter [8] uses beacons that are situated happens, the network monitoring system should be
throughout the network to actively send probes to a set of warned as soon as possible.
destinations. The link delay is calculated by finding the  Per-link statistics - ISPs require statistics for
difference between the RTT measures obtained from the every link in order to assure QoS within the boundaries
endpoints of the link. However, using such a strategy to of their network, without bandwidth over-provisioning.
monitor the network delay and packet losses requires  Overhead - The proposed solutions should not
installing additional infrastructure, because every beacon add too much network overhead. The overhead should
is limited to monitor a set of links. Using this method scale no matter how big the network is (as long as the
accounts additional inaccuracy and uncertainties [1]. controller can handle them) or the number of active
Passive measurements are widely used for packet and flows at any moment. The component should be able to
delay monitoring. An example of passive monitoring is obtain statistics based on the routing information, thus,
given in [9] and consists of capturing the header of each sending a query requests only to those devices that are
IP packet and timestamp it before letting it back on the currently active.
wire. Packet tracers are gathered by multiple measurement  Accuracy - A big difference between the reported
points at the same time. The technique is very accurate network statistics and the real amount of used capacity
(microseconds), but requires further processing in a should be avoided.
centralized system and recurrent collecting of the traces,
which generates additional network overhead.  Granularity - The system should be able to
Furthermore, every device needs accurate clock account for different type of services. It should be able
synchronization between every node. Another similar to make distinction between flows that have specific
approach is used to measure packet losses [10]. It tags needs, i.e. require special care (bandwidth, delay, etc.).
uniquely each packet when it passes trough the source Furthermore, it should make distinction between
node and accounts if it was received in the end node. applications, as well as, clients.

Page 138 of 478


ICIST 2014 - Vol. 1 Regular papers

Finally, the monitoring solution should reduce the  complexity: The definition of many new
amount of additional overhead generated in the network protocols result in difficulties for operators to configure
and device as much as possible, without too much thousands of devices and mechanisms in order to reflect
degradation of the measurement accuracy. any changes in the network topology or implement a
new policy [15].
III. MONITORING IN SOFTWARE DEFINED  scalability: The exponential growth of data
NETWORKING demands and the not predictable change of traffic
patterns, as well as the emergence of cloud computing
A. Emergence of Software Defined Networking concept and several new applications increase the demand for
Presently, the communication networks architecture bandwidth. The scalability problems emerge as
rely on devices where the control plane and the data plane networks are no longer capable growing at the same
are physically one entity, the architecture is coupled to the speed, and network providers could not endless invest
infrastructure, and every node needs to be separately into new equipment [15].
programmed to follow the operator’s policies. In addition,  dependability: The dependability on equipment
the companies that provide network devices have full vendors and the not sufficient inter-vendor operability
control over the firmware and the implementation of the face network operators with several difficulties to tailor
control logic. Thus, the trends in networks development the network to their individual environment.
face operators with the challenges of meeting market Taking into account the need for a network that uses
requirements, and ensuring interoperability and flexibility. simple, vendor-neutral and future-proof hardware [11], on
Generally, it could be summarized that the main the one hand, and the ability of software to support all
constraints limiting networks evolution include [1]: present network requirements (e.g. access control, TE), on
the other, the SDN concept emerged as an option for more
centralized control system of the whole network [1].
The SDN approach decouples the control plane from
the network equipment and places it in a logically
"centralized" network operating system (OS), referred to
as controller. One way to achieve this is by using a
protocol to interconnect the two separated planes,
providing an interface for remote access and management.
The SDN architecture (Fig. 1) varies with the
implementation and depends of the type of network (i.e.
data-centre, enterprise and wide area network) and its
Figure 1. Basic SDN architecture [1] actual needs. The main idea behind SDN is to abstract the
architecture and provide an environment, which would
reduce the development time for new network applications
and allow network customization based on specific needs.
The main goals behind this architecture are to ensure [1]:
 interoperability: using centralized control over
the SDN enabled devices from any vendor throughout
the whole network;
 simplicity: to eliminate complexity issues and
make the network control easier and finer grained, thus
increasing reliability and security;
 innovativeness: with the abstraction of the
network services from the network infrastructure the
entire structure becomes much more evolvable, and
network operators would easily tailor the behavior of
the network and program new services faster.
The OF protocol is one way to implement the SDN
concept and to manage interconnected switches remotely.
Figure 2. Scheme of OpenFlow controller [1] This protocol allows the controller to install, update and
delete rules in one or more switch flow tables, either
proactively or reactively, to interconnect the forwarding
TABLE I. with the data plane, and to enable part of the control
OPEN SOURCE OPENFLOW CONTROLLERS [1] operations to run on an external controller.
Since the controller is the most important element of the
SDN architecture, it attracts a lot of efforts and a number
of new controllers have been released (Example of some
of them is presented in Table I). Its main task is to add and
remove entries from the switch flow-tables. The controller
(Fig. 2) interacts with a set of switches via OF using the
Southbound interface. It is responsible for service and
topology management, and could be enhanced with

Page 139 of 478


ICIST 2014 - Vol. 1 Regular papers

additional features or could provide information to


external applications. Currently, the Northbound
communication is not standardized. Some efforts are made
to enhance the abstraction level by designing network
programming languages on top of the controllers [16],
[17]. Via Westbound and Eastbound interfaces the
controller is able to communicate with other controllers,
several proposals for this interaction are available [18].
Despite SDN abilities to overcome some network
problems, certain scalability limitations also exist as a Figure 4. Basic diagram of the monitoring component [1]
result of the centralized SDN architecture, and the
bottleneck that could be formed between the infrastructure How it works? The first thing every OF switch does,
and the controller. Some concerns are linked to a once it is started, is to establish a connection with the
bottleneck with the switching equipment, in terms of designated controller. The switch gives its state and link
forwarding capacity (table memory) and the overhead that information. This allows the controller to keep a global
could be created by constant reactive invocation of the up-to-date network view. Once a packet enters the
control plane. network and no matching rule for it exists, it is forwarded
towards the controller. The controller inspects the packet
B. Concept for OpenFlow monitoring architecture and determines how to handle it. Normally, the controller
Using Software Defined Networking could solve some would install new flow entry in every switch table that
of the current monitoring problems in IP networks. Since, needs and then return the packet to its source node. This
SDN is a new paradigm, some architectural aspects are means that the controller has topology view of the
still under investigation. In order to pay more attention to network and information about the active flows (IP
the research problems that were already outlined, the source/destination, port source/destination, etc.) and the
following two architecture assumptions are made based on routes they take trough the network. Each switching
study of similar concepts of scholars [1]: device within the network contains activity counters, i.e.
for OF there are separate table, port, flow and queue
 First, one “centralized” controller manages all counters. The flow and route information should be used
switches and handles all the control operations. as input parameters of the monitoring component (Fig. 4).
 Second, there are no scalability issues for the It is responsible to poll one or multiple network devices
controller and the switches. per flow, which in terms should return the requested
In order to illustrate and confirm the monitoring information. Another option is to implement a passive
abilities, a prototype is implemented as a Python measurement and wait for the switches to send statistics
application for POX [19] (a Python-based OF controller once the flow has expired. Every time the flow is no
that can be used for fast implementation of network longer active for some time the switches may send
control applications). The OF monitoring application (Fig. message, indicating the utilization statistics for the flow.
3) works as a Core component of the controller, therefore, The author considers that the monitoring component
it has access to all the available information, including should make use of the two statistical gathering
routing decisions. It is also capable to directly interact approaches. The final output should be data for link
with each switch that supports OF. The discovery utilization [1].
component is responsible to build a graph representation
of the network topology (topology view). A virtual switch C. Monitoring concept
instance is created for every new switch that connects to By using SDN to implement a network monitoring
the controller, and each instance stores switch specific system some of the objectives given in II above are
information. already met. Since every device communicates with the
controller, there is real-time view on the network status,
including links, nodes, interfaces, etc. Furthermore, it
provides sufficient granularity and it is capable to monitor
the utilization of every link within a given network
without sampling any packet or adding more overhead to
any of the switches.
OpenFlow allows granular view of the network, but this
is done by generating additional network/switch load.
Obtaining flow statistics is a task that requires polling for
information for every flow separately. The following ways
for its improvement are proposed [1]:
 Aggregate flows: Generate only one query per
set of flows that share the same parameters, for example
the same source destination path instead of polling
statistics for every flow separately.
Figure 3. OpenFlow prototype [1]  Data collection schemes: In case that there is no
packet loss between the source-destination devices, poll
different switches, thus reducing the overhead on a

Page 140 of 478


ICIST 2014 - Vol. 1 Regular papers

single switch/link and spreading it evenly. Otherwise,


stick to query the last switch only.
 Adaptive polling: Using a recurrent timer does
not accommodate traffic changes and spikes.
Furthermore, it may miss traffic changes resulting in
inaccurate statistics. Hence, an adaptive algorithm that
adjusts its query rate could enhance the accuracy and
reduce the overhead.
According to the OF switch specifications [20],
switches have to keep counters for port, flow table/entry,
queue, group, group bucket, meter and meter band. Table
II presents the Per Flow Entry counters used. Furthermore,
in order to follow the statistics for more than one flow,
there is an option to bundle multiple flows in a group and
observe their aggregated statistics.
The monitoring concept [1] implements new packet
loss and link utilization methods, and known delay
measurements. The main processes are depicted in Fig. 5
Figure 5. Monitoring algorithm [1]
The monitoring component released in POX registers
every "PacketIn" event and creates a unique identification
based on the flow information. Additionally, a separate ID TABLE II. COUNTERS [20]
is used to distinguish between the network paths. Every
flow is assigned to a certain path, and the monitoring Counter Description
component keeps track of every flow that enters and the Received Packets Counts the number of packets
path it follows through the network. Furthermore, every
Switch object also accounts the flows that pass through it. Received Bytes Counts the number of bytes
This information is later used to determine the link Duration (seconds) Indicates the time the flow has been installed on
utilization. the switch in seconds
In order to execute a piece of code in the future or Duration Counts time the flow has been alive beyond the
assign a recurring event the monitoring component uses (nanoseconds) seconds in the above counter
the Timer class incorporated in POX. In case, this is the
first packet that uses this route, the monitoring component
starts a polling timer for every second. Whenever the
timer expires it fires an event. During this event a data
collection algorithm is used (Round Robin or Last
Switch). These two algorithms present a trade-of between
accuracy and overhead. Afterwards, a message
"StatusRequest" to the chosen switch is sent. This is the
query requesting statistics for all the flows that follow the
same path. Every path has a separate timer.
When a switch receives a "StatusRequest" message it Figure 6. Calculating the packet loss percentage [1]
generates a response. The "StatusReply" message contains
the information obtained from the switch counters. On This is used to obtain packet loss information and
flow level it gives the duration of the flow (in undertake actions for the upcoming flows. If it is needed
nanoseconds), packet and byte count. Port statistics give the controller may actively poll for packet loss statistics
more information about the state (both transmitted and while the flow is still active. While this method generates
received) such as number of dropped packets, bytes, errors additional overhead it is useful for cases when it is
and collisions. The controller obtains information for required to measure packet loss during the data transfer.
every flow that follows the same path. The polling timer is
also adjusted. The controller tracks the time that passed For measuring link packet loss a novel approach is
since the last flow routed trough this path was registered, proposed, capable to eliminate the overhead, and based on
as this time increases, the polling timer also increases. In passive measurements. The main presumption is that
the implementation, the controller polls every second for packet loss metrics can be generalized on per class basis
the first five seconds, then every five seconds until the 15th without loss of accuracy, and that measuring the packet
second, moving to 15 seconds until the end of the first loss for every single flow would not be viable. In order to
minute and polling once per minute when there has not estimate a stable and accurate link metric, that does not
been any flow activity for over a minute. fluctuate too much, a set of measurements are required,
more specifically - a metric that represents most of the
When the switch removes a flow entry from its table, packets, without accounting for the anomalous changes or
because it was deleted or expired, it also raises a the statistical outliers. As in an active network flows
"FlowRemoved" event. Such event means that this flow is terminate every second, thus, the obtained measurements
no longer active and the monitoring component does not would still be real-time [1].
need to account for it anymore. The controller receives a
massage that indicates the whole duration of the flow The whole process for measuring packet loss is
together with the data statistics for this particular flow. depicted in Fig. 6. On a new flow arrival and when the
switch does not have any rules installed, the first packet is

Page 141 of 478


ICIST 2014 - Vol. 1 Regular papers

sent towards the controller. The controller is then  using the aggregate flow query method decreases
responsible to decide what to do with the packet and the overhead that is generated;
eventually install table rules on each switch on the path of  the adaptive polling gives better results in terms of
the flow. Once the flow has finished each switch indicates accuracy and overhead then recurrent polling.
this with another message to the controller. The flow is The new method for measuring the packet loss was
installed at time t0 with a "FlowMod" message sent from tested first in Mininet environment, and then repeated in
the controller towards every switch on the route of the the testbed. The results showed that the packet-loss varies
flow. At time t1, t2, up to tN (where N is the amount of from flow to flow. The packet-loss distribution results
switches), the controller receives "FlowRemoved"
were promising, an average of 0.99% losses per flow and
messages. Those messages indicate that the flow has
standard deviation of ±0.34 (due to the fact that NetEm
expired and give some specific statistics for the flow, such
uses normal distribution for packet loss emulation).
as: the number of bytes, packets and the flow duration.
Measuring the packet-loss relies on the fact that each In order to determine exactly how accurate the method
switch sends this information based on its own counters. is 18 flows were recorded (Iperf Server report) and then
compared with the measured packet loss. The first
Each switch has separate flow counters, but it counts
measurement consisted of sending flows worth of 64 Kbps
different amount of bytes. This is due to link losses,
for the duration of 195 seconds (average call duration).
congested links, etc. Receiving flow information from
The results obtained matched perfectly with the Iperf
every switch allows comparing counter statistics and
Server report [1]. The second set of measurements
calculating the number of bytes that were lost. Whenever
emulated short term Video connection using MPEG 2
messages that the flow has expired from the same flow are
with data rate of 10 Mbps, whereas 10 flows set to
received their recorded packet bytes are compared. This
continue each for 2 minutes were recorded. The results
comparison allows determining the packet losses for the
from both measurements prove that the proposed
particular flow. The technique is sufficient to determine
measurement method gives perfect accuracy.
what the current link state for this traffic class is. In case
there is a need for flow packet loss, the controller could The test results generally suggest that in order to reduce
poll for two or more node flow counters periodically [1]. the network overhead, a technique that aggregates all
flows that go through the same network route should be
IV. EVALUATION RESULTS used. In addition, for eliminating the need of trade-off
between overhead and accuracy, it is better to base the
The preliminary tests were done in two phases. First, polling decisions not on the recurrent interval, but on the
using a virtual environment Mininet 2 (a container-based occurrence of certain event.
emulator able to create realistic virtual topology) [21] used
Finally, the new measurement method for packet losses
hardware consists of Intel Core i5 nodes (the controller
has proven to be really accurate, and capable to determine
included) with four 2.53 GHz cores and 4 GB RAM. The
the exact percentage for each link and also for any path.
containers mechanism uses groups of processes that run
While it is a passive method, it does not impose additional
on the same kernel and yet use separate system resources,
network overhead, and it is not influenced by the network
like network interfaces and IDs. Thus, every emulated
characteristics like the active probing methods that
switch or host creates its own process. Network links can
currently exist. The method is capable to provide statistics
be assigned specific link properties such as bandwidth and
for every different type of service that passes trough the
packet-loss. However, like most emulators, Mininet has
network.
also some drawbacks, e.g. processes do not run in parallel,
instead they use time multiplexing, which may cause Possible extensions to the measurements schemes
delayed packet transmission, not suitable for time accurate suggested in this paper could be considered. The accuracy
experiments. could be improved based on a combination of past
statistics, link characteristics or weighted measurements
As the results from the preliminary tests showed
results without imposing additional overhead. The
promising results, the same experiments were repeated in
adaptive timer requires more tuning, therefore, more
a real topology. The physical testbed was installed on
research would be necessary on when more samples are
servers that have Intel(R) Xeon(TM) processor with four
needed and when less. More experiments in a real
3.00 GHz cores. Every switch uses separate physical
environment would help to fully proof the proposed
machine with Ubuntu 12.04.2 LTS operating system. The
measurement approaches. For the suggested packet loss
testbed uses Open vSwitch [44] as OF enabled switch.
method some questions need to be answered like: how
Traffic is generated by the Iperf application. This is a
much data are enough to take that the reported percentage
network testing tool capable to create TCP and UDP
of packet losses is not a random spike and how long
traffic between two hosts, where one is acting as client
before the data are too old to be considered valid [1].
and the other as server. It measures the end-to-end (either
uni- or bi-directional) throughput, packet loss and jitter.
NetEm [25] is used, in order to emulate link packet losses V. CONCLUSIONS
and delay. It is an Linux kernel enhancement that uses the This paper explores the concept of network monitoring
queue discipline integrated from version 2.6.8 (2.4.28) and implemented in SDN architectures. In terms of network
later [1]. monitoring, SDN allows to build a monitoring solution
Different tests were carried out for measuring link adjusted to the specific network needs. By using SDN the
utilization, comparing direct flow and aggregate flow monitoring system is capable to obtain a complete view of
statistics, adaptive and recurring polling, and for testing the network that includes nodes, links and even ports.
the proposed packet loss measurement method. The Furthermore, the solutions are capable to obtain fine
results for link utilization measurements show that [1]: grained and accurate statistics, for every flow that passes
trough the network.

Page 142 of 478


ICIST 2014 - Vol. 1 Regular papers

Once there are suitable monitoring systems capable to [5] sFlow. Traffic Monitoring using sFlow, URL:
provide the necessary performance and usage statistics, http://www.sflow.org/sFlowOverview.pdf. Online, July 2013.
the next phase is the network optimization phase. Major [6] P. L. C. Filsfils, A. Maghbouleh, Best Practices in Network
Planning and Traffic Engineering, Technical report, CISCO
goal of TE is to enhance the performance of an Systems, 2011.
operational network, at both traffic and resource level.
[7] W. Stallings. “SNMP and SNMPv2: the infrastructure for network
Network monitoring takes an important part in TE by management”, Comm. Mag., 36(3), 1998, pp. 37–43.
measuring the traffic performance parameters. [8] B. Huffaker, D. Plummer, D. Moore, K. Claffy, “Topology
Additionally, today TE in service provider’s networks discovery by active probing”, in SAINT, Nara, Japan, 2002, pp.
works on coarse scale of several hours. This gives enough 90–96.
time for offline TM estimation or it’s deduction via [9] C. Fraleigh, S. Moon, B. Lyles, C. Cotton, M. Khan, D. Moll, R.
regressed measurements. Unfortunately, this approach is Rockell, T. Seely, C. Diot, “Packet-Level Traffic Measurements
not always viable, current IP traffic volume changes from the Sprint IP Backbone”, IEEE Network, 17, 2003, pp. 6–16.
within seconds (or miliseconds), which could lead to [10] Silver Peak Systems, “How to Accurately Detect and Correct
congestion and packet losses in the most crucial moment. Packet Loss”, URL: http://www.silver-peak.com/info-center/how-
accurately-detect-and-correct-packet-loss. Online, July 2013.
Since SDN is a new architecture still gaining [11] A. Tootoonchian, M. Ghobadi, Y. Ganjali, “OpenTM: traffic
popularity, there are also some questions that need to be matrix estimator for OpenFlow networks”, in PAM’10, Berlin,
answered in terms of routing. Obtaining an accurate and Heidelberg, Springer-Verlag, 2010, pp. 201–210.
real time view of the network could bring more benefits [12] J. R. Ballard, I. Rae, A. Akella, “Extensible and scalable network
and open more options. Monitoring the network is the first monitoring using OpenSAFE”, in INM/WREN’10, Berkeley, USA,
step towards a SDN forwarding protocol capable to 2010, pp. 8.
provide sufficient QoS for all types of applications and [13] C. Yu, C. Lumezanu, Y. Zhang, V. Singh, G. Jiang, H. V.
traffic. Madhyastha, “FlowSense: monitoring network utilization with
zero measurement cost”, in PAM’13, Berlin, Heidelberg, Springer-
Finally, it should be stressed that all present trends Verlag, 2013, pp. 31–41=
towards IoT, cloud computing, FoF, etc. highly depend on [14] M. Yu, L. Jose, R. Miao, “Software defined traffic measurement
the availability of high-speed networks with certain QoS. with OpenSketch”, in NSDI’13, Berkeley, USA, 2013, pp. 29–42.
While researchers are heavily working on the [15] N. Foster, R. Harrison, M. J. Freedman, C. Monsanto, J. Rex-ford,
interoperability of applications and new Internet-based A. Story, D. Walker, “Frenetic: a network programming
services, if the present problems on the transportation language”, SIG-PLAN Not., 46(9), 2011, pp. 279–291.
layer are not timely resolved, a real bottleneck for further [16] ONF, Software-defined Networking: The New Norm for Networks,
developments could emerge. White Paper, April 2012.
[17] A. Voellmy, H. Kim, N. Feamster, “Procera: a language for high-
level reactive network control”, in HotSDN’12, New York, USA,
ACKNOWLEDGMENT 2012, pp. 43–48.
The author gratefully acknowledges the MSc Thesis [18] A. Tootoonchian, Y. Ganjali, “HyperFlow: a distributed control
guidance provided by the Network Architectures and plane for OpenFlow”, in INM/WREN’10, Berkeley, USA, 2010,
Services Group of Delft University of Technology. pp. 3.
[19] Murphy McCauley, About POX, URL:
http://www.noxrepo.org/pox/about-pox/. Online, 2013.
REFERENCES
[20] The Open Networking Foundation, OpenFlow Switch
[1] V. Gourov, Network Monitoring with Software Defined Specification v1.3.1, URL:
Networking, Masters Thesis, TU Delft, Netherlands, August 2013. https://www.opennetworking.org/images/stories/downloads/specif
[2] M. H. Behringer, “Classifying network complexity” in ReArch ication/openflow-spec-v1.3.1.pdf. Online, September 2012.
’09, New York, USA, 2009, pp. 13–18. [21] N. Handigol, B. Heller, V. Jeyakumar, B. Lantz, N. McKe-own,
[3] Z Kerravala, Configuration management delivers business “Reproducible network experiments using container-based
resiliency, Technical report, The Yankee Group, 2002. emulation”, in CoNEXT ’12, New York, USA, 2012, pp. 253–264.
[4] Q. Zhao, Z. Ge, J. Wang, J. Xu, “Robust traffic matrix estimation
with imperfect information: making use of multiple data sources”,
SIGMETRICS Perform. Eval. Rev., 34(1), 2006, pp. 133–144.

Page 143 of 478


ICIST 2014 - Vol. 1 Regular papers

Vehicle Classification and False Detection


Filtering using a Single Magnetic Detector based
Intelligent Sensor
Peter Sarcevic*, Szilveszter Pletl**
* University of Szeged, Department of Informatics, Szeged, Hungary
** University of Szeged, Department of Informatics, Szeged, Hungary
** University of Novi Sad, Novi Sad, Serbia

[email protected], [email protected]

Abstract— Vehicle detection and classification is a very ‘signature’ can be processed for classification.
actual problem, because vehicle count and classification Advantages and disadvantages of magnetic detectors
data are important inputs for traffic operation, pavement are shown in Table 1.
design, transportation planning and other applications.
Magnetic detector based sensors provide many advantages II. THE SINGLE MAGNETIC DETECTOR BASED
compared to other technologies. In this work a new vehicle INTELLIGENT SENSOR
detection and classification method is presented using a
single magnetic detector based system. Due to the relatively The used magnetic sensor is an HMC5843 based unit
big number of false detections caused by vehicles with high developed by “SELMA” Ltd. and “SELMA Electronic
metallic content passing in the neighboring lane, a technique Corp” Ltd., companies from Subotica, Serbia. Two types
for false detection filtering is also presented. Vehicle classes of magnetic detectors have been developed, one with
are determined using a feedforward neural network which cable and one with wireless communication.
is implemented in the microcontroller of the detector, For classification sample collection and for detection
together with the detection algorithm and the algorithm and classification efficiency testing, a unit with cable
used for determining the neural network inputs. The communication has been mounted in Subotica, on the
gathering of training samples and testing of the trained main road passing through the town. All vehicles classes
neural network have been done in real environment. For the can be found passing on the mentioned road, so the place
training of the neural network the back-propagation is ideal. The sensor has been mounted 5 centimeters
algorithm has been used with different training parameters. beneath the pavement surface. The direction of the axis is
very important, because the network inputs are calculated
Keywords— vehicle detection, false detection filtering, vehicle by axis, and if the positioning is changed, the waveforms
classification, magnetic sensors, neural networks will not be the same. The X axis points to the movement
direction, the Y axis points to the neighboring lane, and Z
I. INTRODUCTION is orthogonal with the pavement surface.
To provide speed monitoring, traffic counting, The Honeywell HMC5843 is a small (4x4x1.3 mm)
presence detection, headway measurement, vehicle surface mount multi-chip module designed for low field
classification, and weigh-in-motion data, the need for magnetic sensing. The 3-axis magnetoresistive sensors
automatic traffic monitoring is increasing. This urges the feature precision in-axis sensitivity and linearity, solid-
manufacturers and researchers to develop new state construction with very low cross-axis sensitivity
technologies and improve the existing ones. Vehicle designed to measure both direction and magnitude of
count and classification data are important inputs for Earth’s magnetic fields, from tens of micro-gauss to 6
traffic operation, pavement design, and transportation gauss. The highest sampling frequency is 50Hz.
planning. In traffic control, signal priority can be given to Wireless magnetic sensor networks offer an attractive,
vehicles classified as bus or an emergency vehicle.
In this work, a new detection and classification method
TABLE I.
for a single magnetic sensor based system is discussed, ADVANTAGES AND DISADVANTAGES OF MAGNETIC SENSOR BASED
and a technique for filtering the false detections caused by VEHICLE DETECTORS
vehicles passing in the neighboring lane is also presented.
Advantages Disadvantages
Magnetic sensors can measure the changes in the
 Insensitive to inclement  Difficult to detect stopped
Earth’s magnetic field caused by the presence of metallic weather such as snow, rain, vehicles
objects. Magnetic vehicle detectors use the changes and fog  Installation requires
generated by the metallic content of vehicles when they  Less susceptible than loops pavement cut or tunneling
are near the sensor as written in Reference [1]. Two sensor to stresses of traffic under roadway
nodes placed a few feet apart can estimate speed as  Some models transmit data  Decreases pavement life
described in Reference [2]. A vehicle’s magnetic over wireless RF link  Installation and maintenance
 Some models can be require lane closure
This work was supported by TÁMOP-4.2.2.A-11/1/KONV-2012- installed above roads, no
0073 need for pavement cuts

Page 144 of 478


ICIST 2014 - Vol. 1 Regular papers

low-cost alternative to inductive loops, video and radar


for traffic surveillance on freeways, at intersections and TABLE II.
EFFICIENCY OF THE DETECTION ALGORITHM DURING A ONE HOUR TEST
in parking lots as written in Reference [3].
Vehicle class Passed Detected Rate
III. VEHICLE DETECTION ALGORITHM Motorcycle 4 3 75%
As written in References [4] and [5], magnetic Car 168 168 100%
detectors are capable of very high, above 97 percent Van 10 10 100%
detection accuracy with proper algorithms. In Reference Truck 15 15 100%
Bus 6 6 100%
[6] 97% of detection accuracy has been achieved using
Other 2 2 100%
neural networks and fuzzy data fusion. Most of the False detections caused
algorithms use adaptive thresholds as used in Reference by vehicles passing in the 13
[7]. neighboring lane
Σ 205 217 94,15%
In Reference [4] the effect of temperature on HMC
magnetic sensor measurements is described. The the classification algorithm could lose important parts of
temperature on the pavement can change a lot in the the waveforms.
course of a day, but the changes in the measured values
are very slow. IV. SAMPLE COLLECTION
The developed vehicle detection algorithm uses For neural network training and false detection filtering
thresholds which can change when no detection is samples have been collected using the mounted sensor.
available to avoid the effects of temperature changes. The measurement values and a detection number, which
The principles of the algorithm: has been incremented at every rising edge of the detection
 A calibration process is run when the unit is turned on. flag, have been saved into a database.
Maximum and minimum values are determined in a To declare the classes (neural network targets) of the
period of time at all three axis (if even at one axis the passing vehicles, and to separate the good and false
difference between the maximum and minimum detections, we used the images made by a camera
exceeds a previously defined width, the calibration mounted beside the road. The images have been saved at
starts from the beginning). After this stage, the range is every falling edge of the detection flag, and have been
equally stretched to the previously defined width, and named using the detection number.
the upper and lower thresholds are now determined at Altogether measures of 11021 passing vehicles had
all three axis. This method makes the further algorithm been collected.
immune to noise.
 If the measures exceed the range determined by the V. FALSE DETECTION FILTERING
thresholds at axis X and Z, detection is generated
(detection flag is “1”). If only one axis exceeds the The gathered samples had been divided into 3 groups
range, probably a vehicle is passing in the neighboring using the images: good detections (10218 samples), false
lane. detections (345 samples), and vehicles passing between
 In case of detection, the detection flag goes back to the two lanes (458 samples), which are also false
“0” if measures in all three axis are between thresholds detections.
for a previously defined number of measures. The basic idea of the filtering algorithm was to generate
 If measures at all three axis are in the range different rules, and to find optimal parameter values with
determined by the thresholds, and no detection is which false detections can be determined. 18 rule types
available, the algorithm calculates new thresholds. have been declared, and the optimization has been done
The axis along the direction of travel can be used to for all types using specific parameters calculated of every
determine the direction of the vehicle, what is shown in sample.
Reference [8]. When there is no car present, the sensor The optimizations have been done using genetic
will output the background earth’s magnetic field as its algorithms. Every optimization has been done in 3000
initial value. As the car approaches, the earth’s magnetic generations with a population size of 50. The fitness
field lines of flux will be drawn toward the ferrous functions determined the rate of misses at all samples. If
vehicle. the result of the rule is “true”, the detection is false.
A. Efficiency The used rules are shown in Table 3, where X, Y and Z
are measurement values in the samples, Ylth is the
For testing the efficiency of the algorithm a one hour number of measures where Y was continuously not
test has been done. The results have been divided by between thresholds before the detection was declared,
vehicle classes, and are shown in Table 2. As seen, the
algorithm is effective, only motorcycles can cause and YX, YZ, XY, Xd, Yd, Zd and YL are the optimized
failures. The reason of failures in motorcycle detection parameters. In types where X/Z has been used, the fitness
can be the low metallic content, and the distance from the function chooses between “<” and “>” using a further
detector. parameter. The rules which consist j, have j number of
As the results show, the number of false detections is same parts with “or” between them, so if one part gives
high. This is caused by vehicles passing in the as result “true”, the output of the whole rule is “true”.
neighboring lane with high metallic content, usually This way the functions were able to filter out more
trucks or buses. Filtering a part of these detections could groups of samples which have similar values.
be done by increasing the width of the detection ranges, Optimization has been done at every type with
but this can affect the motorcycle detection efficiency, and j={1,2,…,10}.

Page 145 of 478


ICIST 2014 - Vol. 1 Regular papers

TABLE III.
ADVANTAGES AND DISADVANTAGES OF MAGNETIC SENSOR BASED VEHICLE DETECTORS

Number Rule
1 ( Y⁄X >YX(j) and Y⁄Z >YZ(j) and (X/Z<>XZ(j))j
2 ( Y⁄X > YX(j) and Y⁄Z > YZ(j) and (X/Z <> XZ(j) and (Ylth > YL(j)))
3 ( Y⁄X >YX(j) and Y⁄Z >YZ(j) )j
4 ( Y⁄X >YX(j) and Y⁄Z >YZ(j) and (Ylth>YL(j)))j
5 (Y⁄X >YX b ) or (Y⁄Z >YZb ) or (Ylth>YLb )
6 (Y⁄X >YX b ) or (Y⁄Z >YZb ) or (Ylth>YLb ) or ( Y⁄X >YX(j) and Y⁄Z >YZ(j) and (X/Z<>XZ(j)) j
7 (Y⁄X >YX b ) or (Y⁄Z >YZb ) or (Ylth>YLb ) or ( Y⁄X >YX(j) and Y⁄Z >YZ(j) and (X/Z<>XZ(j) and (Ylth>YL(j)))j
8 (Y⁄X >YX b ) or (Y⁄Z >YZb ) or (Ylth>YLb ) or ( Y⁄X >YX(j) and Y⁄Z >YZ(j) )j
9 (Y⁄X >YX b ) or (Y⁄Z >YZb ) or (Ylth>YLb ) or ( Y⁄X >YX(j) and Y⁄Z >YZ(j) and (Ylth>YL(j)))j
10 ( X<Xd(j) and Y>Yd(j) and (Z<Zd(j))j
11 ( X<Xd(j) and Y>Yd(j) and (Z<Zd(j) and (Ylth>YL(j)))j
12 (Y>Ydb ) or (Ylth>YLb )
13 (Y>Ydb ) or (Ylth>YLb ) or ( X<Xd(j) and Y>Yd(j) and (Z<Zd(j)) j
14 (Y>Ydb ) or (Ylth>YLb ) or ( X<Xd(j) and Y>Yd(j) and (Z<Zd(j) and (Ylth>YL(j)))j
15 ( Y⁄X >YX(j) and Y⁄Z >YZ(j) and Y>Yd(j) ) j
16 ( Y⁄X >YX(j) and Y⁄Z >YZ(j) and Y>Yd(j) and (Ylth>YL(j)))j
17 (Y⁄X >YX b ) or (Y⁄Z >YZb ) or (Y>Ydb ) or (Ylth>YLb ) or ( Y⁄X >YX(j) and Y⁄Z >YZ(j) and Y>Yd(j) ) j
18 (Y⁄X >YX b ) or (Y⁄Z >YZb ) or (Y>Ydb ) or (Ylth>YLb ) or ( Y⁄X >YX(j) and Y⁄Z >YZ(j) and Y>Yd(j) and (Ylth>YL(j)))j

X, Y and Z values are the distances between the similar, but the optimizations could not reach any usable
measurement values at a specific point and the ranges parameters at types 13 and 14, where the optimized values
specified by the thresholds. were very small, and gave bad results for all good
To try to declare false detections immediately, at the detections.
start of the detection, the optimizations have been done The best results at all cases had been reached with type
with X, Y and Z values calculated of the measurement 10, which achieved recognition rates of 95.9% all
values in the moment when X and Z first exceeded their detections, 98.85% at good, 75.94% at false and 44.98%
ranges. The results showed that with this method almost at detections where the vehicle passed between the two
none of the false detections can be filtered without lanes, which were also declared as false. This means that
declaring good detections as false. This is because the X 58.28% of false detections have been filtered.
and Z axis exceed their ranges too quickly to see bigger The loss of around 1% of good detections could be the
differences in Y. Range widths could be increased, but result of cases when a vehicle was passing in the
this would result information loss at the classification neigboring lane with high metallic content beside the
algorithm. vehicle which should be detected.
As the false detections can not be declared immediately, The results showed that types which were tested with
the optimizations have been done with the highest X, Y different j values did not differ greatly with the added
and Z values during the entire detection. The results are futher parts, only small improvements could be noticed
shown on Fig.1. It can be seen, that the results are very when j was bigger then 1.
100.00%
90.00%
80.00%
70.00%
60.00%
50.00%
40.00%
30.00%
20.00%
10.00%
0.00%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
ALL GOOD FALSE BETWEEN

Figure 1. Hit rates of false detection filtering after optimization of parameters for different rule types

Page 146 of 478


ICIST 2014 - Vol. 1 Regular papers

VI. CLASSIFICATION ALGORITHM Network, 93.5 percent classification efficiency was


The basic idea was to collect the measurement values achieved, but vehicles were only separated into two
when the detection flag is “1”, and calculate specific classes. In a double sensor system 10 classes were
parameters from the magnetic signature, which can be selected for development, and 73.6 percent was achieved
applied to the inputs of the neural network. with length estimation and a methodology using K-means
Clustering and Discriminant Analysis.
A. Other classification algorithms
B. The used neural network and the input parameters
Classification stations with highly calibrated inductive
loops are very popular. However, the infrastructure and A three-layer feedforward neural network has been
maintenance costs of such a vehicle classification station used for vehicle class estimation. The neurons in the
are high. hidden layer have logarithmic sigmoid transfer functions,
while the output layer neurons use saturating linear
In Reference [9] an artificial neural network based functions. The structure of the used neural network can be
method was developed to estimate classified vehicle seen on Fig.2. Bias values have not been used in the
volumes directly from single-loop measurements. They network because the network has to be implementable in a
used a simple three-layer neural network with back- neural network, and the bias values would need big
propagation structure, which produced reliable estimates memory space.
of classified vehicle volumes under various traffic The networks have been trained using the
conditions. In this study four classes (by ranges of length) backpropagation algorithm. During the training, weights
were defined, and all classes had an own ANN. All have been modified after every sample. Using this
networks had 19 nodes in the input layer, 1 node for the network the error of the output layer output can be
time stamp input and 9 pairs of nodes for inputting calculated with the next formula (1):
single-loop measurements (volume and lane occupancy). δ = Target − out (1)
All networks had one output node (each was one class
bin), but the number of hidden neurons differed for each where is the error of output neuron, is the
class (35 for class1, 8 for class2, 5 for class3 and 21 for target value, and is the current output of the neuron.
class4). The output neuron weights have to be modified the next
way (2):
Sun, in Reference [10] studied the use of existing
infrastructure of loop detectors for vehicle classification W = W + η ∗ δ ∗ out (2)
with two distinct methods. The seven-class scheme was where is the modified weight between hidden and
used for the first method because it targets vehicle classes output neuron, and is the learning rate.
that are not differentiable with current techniques based The error of the hidden layer neurons can be calculated
on axle counting. Its first method uses a heuristic using the errors of the output neurons, the weights
discriminant algorithm for classification and multi- between the hidden neuron and each output neuron, and
objective optimization for training the heuristic the output of the hidden neuron (3):
algorithm. Feature vectors obtained by processing δ = out ∗ (1 − out ) ∗ (δ ∗ W + δ ∗ W ) (3)
inductive signatures are used as inputs into the
classification algorithm. Three different heuristic The modifications of the weights between the input and
algorithms were developed and an overall classification the hidden layer can be done with the next formula (4):
rate of 90% was achieved. Its second method uses Self- W = W + η ∗ δ ∗ in (4)
Organizing Feature Maps (SOFM) with the inductive where is the input of the input neuron.
signature as input. An overall classification rate of 80%
was achieved with the four-class scheme. Network training has been done with different number
of neurons in the hidden layer, and different learning rates.
In the last few years a big number of studies have been The network has 6 outputs, because 6 vehicle classes
made with classification algorithms using magnetic had been defined to be classified: motorcycles, cars, vans,
sensors. trucks, buses and other. The class with the biggest output
The rate of change of consecutive samples was will be declared as the class of the passed vehicle.
compared with a threshold in Reference [11] and declared
to be +1 (–1) if it is positive and larger than (negative
with magnitude larger than) the threshold, or 0 if the
magnitude of the rate is smaller than the threshold. The
second piece of information was the magnetic length of
the vehicle. 82% efficiency was achieved, with vehicles
classified into five classes.
Reference [12] achieved a vehicle detection rate better
than 99 percent (100 percent for vehicles other than
motorcycles), estimates of average vehicle length and
speed better than 90 percent, and correct classification
into six types with around 60 percent, when length was
not used as a feature.
In Reference [13], with x and z dimension data and
without vehicle length information, a single magnetic
sensor system, with a Multi-Layer Perceptron Neural Figure 2. The structure of the used neural network

Page 147 of 478


ICIST 2014 - Vol. 1 Regular papers

The input layer consists 16 neurons. These are process, but the smallest values at validation samples were
parameters calculated of the waveforms at each axis. The also recorded around 300 iterations.
network inputs are the next: The number of misses by classes in the case when the
 1 input – Detection length (number of measures highest matching rate at training samples has been found
made while the detection flag is “1”) are shown in Table 5 for training samples and Table 4 for
 6 inputs - The biggest differences between measured validation samples. The places with most misses are very
values and thresholds at each axis (the difference similar. The most misses are were made between classes
between the highest measured value and the upper 2, 3 and 4. A possible reason can be that cars, vans and
threshold (5), and the difference between the lower smaller trucks are almost the same length, the number of
threshold and the smallest measured value (6) ) axles is also the same, and the distance between them is
also similar.
X _ =X _ −X _ (5)
X = X − X (6) D. Neural network implementation and testing
_ _ _
 6 inputs – Number of local maximums (if the During the implementation a very important factor
measured values are above the upper threshold), and was not to stop the measuring for the time of the network
local minimums (if the values are under the lower output calculating. The network input calculation and
threshold) at each axis. updating has been done after every measurement when the
 Range changes at each axis. The thresholds define detection flag was “1”.
three ranges, one above the upper threshold, one After a falling edge on the detection flag, the network
under the lower, and one between them. outputs are calculated, and the vehicle class is determined.
This process is done until the next measurement is made,
C. Neural network training so the class is determined in 20ms, what is a measurement
Measurement data for 130 samples per class have been cycle.
collected for network training. 100.00%
The network training has been done in three series 90.00%
depending on the learning rates of the layers. The used 80.00%
rates were the next: 70.00%
 0.11 at the output and 0.1 at the hidden layer 60.00%
50.00%
 0.08 at the output and 0.06 at the hidden layer
40.00%
 0.05 at the output and 0.04 at the hidden layer 30.00%
Every learning rate pair has been tested with 1 to 25 20.00%
hidden layer neurons. Every of the 75 trainings has been 10.00%
done in 1000 epochs. 0.00%
Of 130 samples per class, 90 have been used for 1 3 5 7 9 11 13 15 17 19 21 23 25
training, and 40 for validation.
Training - 0.11; 0.1 Training - 0.08; 0.06
During the training process the matching rates and the
mean squared errors at training and validation samples Training - 0.05; 0.04 Valdation - 0.11; 0.1;
have been calculated after every sample in the epoch. The
Validation - 0.08; 0.06 Validation - 0.05; 0.04
highest matching rates and the smallest mean squared
errors have been saved. When finding a better value of
each parameter, all current values of other performance Figure 3. Highest matching rates achieved at training and
parameters have also been saved together with a matrix validation samples with different number of hidden layer
containing the current places of the misses. The current neurons and different learning rates.
number of the iteration and the sample number have also 0.16
been recorded to see are more iterations needed. 0.14
The highest matching rates with all learning rate pares 0.12
depending on the number of hidden layer neurons can be 0.1
viewed on Fig.3. The efficiency of the network both on 0.08
training and validation samples has improved when 0.06
learning rates have been reduced. The highest matching 0.04
rate on training samples, 88.44%, has been recorded with 0.02
18 hidden layer neurons. The highest efficiency on
0
validation samples was 70.83%. The rate at training 1 3 5 7 9 11 13 15 17 19 21 23 25
samples can be improved by increasing the number of
iterations, because at most cases the biggest value was
achieved near to the end of the training process. But this is Training - 0.11; 0.1 Training - 0.08; 0.06
not true for the validation samples, where in most of the
cases the maximums were recorded around 300 epochs. Training - 0.05; 0.04 Validation - 0.11; 0.1
On Fig.4 the smallest mean squared errors are shown Validation - 0.08; 0.06 Validation - 0.05; 0.04
found during the trainings. As seen, the smallest values
were achieved also with the smallest learning rate pair. Figure 4. Smallest mean squared errors achieved at
The values similarly as at matching rates, could have been training and validation samples with different number of
improved at training samples by using a longer training hidden layer neurons and different learning rates.

Page 148 of 478


ICIST 2014 - Vol. 1 Regular papers

Neural network training has been done with different


number of hidden layer neurons, and different learning
rates. Training results showed that the recognition rates
TABLE IV.
PLACES AND NUMBER OF MISSES AT VALIDATION SAMPLES are not usable in real-life applications, but the results are
perspective. For better recognition rates, changes are
Output needed. Increasing the number of training samples,
\ 1. 2. 3. 4. 5. 6. Rate dividing the vehicles into more classes, or dividing them
Target
by length and axle number could all lead to better results.
1. 0 1 1 0 0 3 87,5%
Probably the most efficient modification could be to
2. 1 0 9 9 0 0 52,5% include the changes of waveforms in time.
3. 0 11 0 10 1 1 42,5%
4. 3 7 5 0 9 1 37,5% ACKNOWLEDGMENT
5. 0 4 2 6 0 0 70%
6. 2 0 0 0 0 0 95%
The publication/presentation is supported by the
European Union and co-funded by the European Social
Fund. Project title: "Telemedicine-focused research
activities on the field of Mathematics, Informatics and
TABLE V. Medical sciences" Project number: TÁMOP-4.2.2.A-
PLACES AND NUMBER OF MISSES AT TRAINING SAMPLES
11/1/KONV-2012-0073.
Output The authors would like to thank companies “SELMA”
\ 1. 2. 3. 4. 5. 6. Rate Ltd. and “SELMA Electronic Corp” Ltd. for the technical
Target
resources and support.
1. 0 1 1 0 0 0 97,33%
2. 1 0 9 7 0 0 77,33% REFERENCES
3. 1 7 0 6 1 0 80%
[1] A. Daubaras and M. Zilis, “Vehicle Detection based on Magneto-
4. 2 4 4 0 4 1 80% Resistive Magnetic Field Sensor”, Electronics and Electrical
5. 0 2 1 0 0 0 96% Engineering, Kaunas, Vol. 118, 2012, pp. 27-32.
6. 0 0 0 0 0 0 100% [2] X. Deng, Z. Hu, P. Zhang and J. Guo: “Vehicle Class Composition
Identification Based Mean Speed Estimation Algorithm using
Single Magnetic Sensor”, Journal of Transportation Systems
For testing, the weights of the 18 hidden neuron Engineering and Information Technology, ScienceDirect, 2010, pp
network has been implemented which achieved the 35-39.
highest matching rate on training samples. This network [3] A. Haoui, R. Kavaler and P. Varaiya, “Wireless magnetic sensors
for traffic surveillance”, Transportation Research Part C:
had 64.17% efficiency on validation data, the mean Emerging Technologies, Vol. 16, ScienceDirect, 2008, pp. 294-
squared error of the training samples was 0.041736, and 306.
0.096034 of the validation samples. [4] S.-Y. Cheung and P. Varayra, “Traffic Surveillance by wireless
The network was tested for 300 detections, and the sensor networks:Final Report”, California PATH Research Report,
results are shown in Table 6. As seen, the recognition rates 2007.
are very similar to the values calculated on validation [5] Isaksson M., “Vehicle Detection using Anisotropic
samples during training (Table 4). The testing was not Magnetoresistors”, Thesis For The Degree Of Master In
Engineering Physics, Chalmers University Of Technology, 2007.
ideal, because some classes had very small number of
[6] E. Jouseau and B. Dorizzi, “Neural networks and fuzzy data
vehicles passing in this testing interval. During this test fusion. Application to an on-line and real time vehicle detection
period the false detection filtering has not been used. system”, Pattern Recognition Letters, Vol. 20, Elsevier, 1999, pp.
97-107.
[7] W. Zhang, G.-Z. Tan, H.-M. Shi and M.-W. Lin “A Distributed
TABLE VI.
TEST RESULTS OF THE IMPLEMENTED NEURAL NETWORK Threshold Algorithm for Vehicle Classification Based on Binary
Proximity Sensors and Intelligent Neuron Classifier”, Journal of
Recognition Information Science and Engineering, Vol. 26, 2010, pp. 769-783.
1. 2. 3. 4. 5. 6. Σ
rate [8] M.J. Caruso and L.S. Withanawasam, “Vehicle Detection and
1. 2 0 0 0 0 0 2 100% Compass Applications using AMR Magnetic Sensors”, Honeywell
Inc., 2007.
2. 18 99 48 55 5 4 229 43,23%
3. 0 1 6 5 0 0 12 50% [9] G. Zhang, Y. Wang and H. Wei, “An Artificial Neural Network
Method for Length-based Vehicle Classification Using Single-
4. 0 4 1 13 2 0 20 65% Loop Outputs”, Journal Of The Transportation Research
5. 0 1 0 2 3 1 7 42,86% Board, Vol. 1945, 2007, pp. 100-108.
6. 0 0 0 0 0 0 0 0% [10] C. Sun, “An Investigation in the Use of Inductive Loop Signature
False for Vehicle Classification”, California PATH Research
11 1 1 6 0 11 30
Detection Report, 2000, pp. 499–512.
Σ 31 106 56 81 10 16 300 [11] S.-Y. Cheung, S.E. Coleri and P. Varayra, “Traffic Surveillance
by Wireless Magnetic Sensors”, ITS World Congress, 2005.
[12] S.-Y. Cheung, S. Coleri, B. Dundar, S. Ganesh, C.-W.Tan and P.
VII. CONCLUSION Varayra, “Traffic Measurement and Vehicle Classification with a
In this work a detection and neural network based Single Magnetic Sensor”, 84th Annual meeting of the
Transportation Research Board, 2005, pp. 173-181.
vehicle classification method was presented. The number
[13] This work was supported by TÁMOP-4.2.2.A-11/1/KONV-
of false detections was pretty big, so a filtering algorithm 2012-0073.
has been also developed. The filtering algorithm can [14] Liu H., Jeng S.-T., Tok J.C.A. and Ritchie S.G.: Commercial
exclude almost 60 percent of false detections. Vehicle Classification using Vehicle Signature Data, 88th Annual
meeting of the Transportation Research Board, 2009.

Page 149 of 478


ICIST 2014 - Vol. 1 Regular papers

Motion Analysis with Wearable 3D Kinematic


Sensors
Sara Stančin, Sašo Tomažič
* University of Ljubljana, Faculty of electrical engineering, Ljubljana, Slovenia
[email protected], [email protected]

perpendicular, sensitivity axes of the device. These axes


Abstract— Wearable motion sensors provide data that
form the sensors coordinate system.
directly reflect the motion of individual body parts and that
can enable the development of advanced tracking and
analysis procedures. Today widely available light and small
size sensors make a wide range of practical measurements
feasible. As these sensors are somewhat inaccurate, they are
primarily suitable for monitoring motion dynamics. A
number of studies conducted so far show that these sensors
can be efficiently used for motion pattern identification and
classification enabling general motion analysis and
evaluation. To make full advantage of the feasibility and
widespread use of these sensors, it is necessary to provide
for, in terms of lifetime and computational complexity,
efficient calibration and data analysis procedures.

I. INTRODUCTION
Figure 1. The illustration of the projections of the measured quantity
Natural human motion is a complex process that vector q on the sensor sensitivity axes given with directions of vx, vy and
involves the entire psychophysical system. Motion vz. If the orientation of the sensor sensitivity axes is error free, these
analysis can contribute to a better and more axes coincide with the coordinate system axes x, y and z.
comprehensive understanding of specific activities and of
behavior in general. We can ascertain that motion
evaluation is an important part of recreation, rehabilitation, Providing measurements along the sensitivity axes, 3D
injury prevention, and the objective determination of the accelerometers, gyroscopes and magnetometers enable
level of functional ability of individuals. complete motion capture, including the change in position
In the context of motion analysis, we strive for detection as in orientation.
and recognition of different motion patterns. Essentially, A number of available sensors enable data capture with
motion pattern recognition is based on the capture and high sample frequencies. This is a great benefit when
analysis of motion data of different motion-involved body capturing data of rapid movements.
segments.
B. 3D accelerometer
In modern methods for motion tracking, the relevant
data are the starting point for a comprehensive motion A 3D accelerometer enables measurements of
analysis. Wearable wireless motion sensors [1-16] provide acceleration caused by gravity and self-accelerated motion
data that directly reflect the motion of individual body along the three orthogonal sensitivity axes. As such,
parts and that can enable the development of advanced accelerometers have a number of applications in several
tracking and analysis procedures. Today available fields.
kinematic sensors that are based on The result of sensitivity to gravity is that when at rest,
Microelectromechanical systems (MEMS) are small, light, the accelerometer shows 1 g of acceleration directed
widely affordable, and come with their own battery upwards along the axis of sensitivity oriented along the
supply. These sensors cause minimal physical obstacles direction normal to the horizontal surface. This makes it
for motion performance and can provide simple, easy to determine the orientation with respect to the
repeatable, and collectible motion data indoors. Moreover, direction of the vector of gravitational acceleration in the
because of their low energy consumption, MEMS sensors accelerometer coordinate system. It is a necessary
are a promising tool for tracking motion outdoors. condition that the accelerometer is stationary or moving
with negligible acceleration in relation to the gravitational
acceleration.
II. SENSOR DATA INTERPRETATION Although the MEMS sensor technology is improving
rapidly, MEMS accelerometers do not enable the
A. The general 3D sensor model estimation of the exact sensor position. When trying to
A 3D sensor is a device that measures a physical estimate the orientation of an inaccurate sensor during its
quantity in the three-dimensional space. As shown in accelerated movement, the specific problems of correct
Figure 1, values measured with a 3D sensor represent the gravitational acceleration estimation appear. Improperly
projections of the measured quantity on three, mutually deducted gravitational acceleration is reflected in an
incorrectly determined motion direction of the accelerated

Page 150 of 478


ICIST 2014 - Vol. 1 Regular papers

sensor. Since the position data is obtained with double single step, thus making it possible to avoid computing the
integration of acceleration, even small errors in the iterative infinitesimal rotation approximation.
estimated direction of acceleration can cause a significant SORA is simple and well-suited for use in the real-time
deviation of the calculated sensor position form its true calculation of angular orientation based on angular
position. Therefore, based on the measured acceleration, it velocity measurements derived using a gyroscope.
is extremely difficult, if not impossible, to determine the Moreover, because of its simplicity, SORA can also be
exact position of a moving body. used in general angular orientation notation. Using the
Widely available low-cost and somewhat inaccurate vector SORA provides for the correct interpretation of the
MEMS accelerometers are hence primarily suitable for values measured with the gyroscope. The measured values
monitoring motion dynamics. Rather than determining the are equal to the projections of the measured angular
absolute values of the motion, when capturing motion velocity on the sensitivity axis of the gyroscope. This
dynamics we dedicate are attention to the relative changes, interpretation allows the applying of the general 3D sensor
trying to create an effective framework for motion pattern model to the 3D gyroscope.
identification.
D. 3D magnetometer
C. 3D gyroscope A 3D magnetometer provides for magnetic field
3D gyroscopes measure angular velocity in an inertial measurements. As such, the 3D magnetometer can be a
space and as accelerometers have a number of applications useful tool for determining orientation relative to the
in many fields. Earth’s magnetic field. However, due to disturbances in
By providing angular velocity measurements, 3D the magnetic field and the influence of motion on
gyroscopes are also used to determine orientation. In measurement error, 3D magnetometers are mostly used for
general, the orientation is treated as the position of the intermediate motion phases when the sensor is at rest and
coordinate system of a rigid body observed relative to a not for capturing motion itself.
reference coordinate system with the same origin.
Orientation can be described using rotations needed to III. SENSOR CALIBRATION
bring the coordinate system of a rigid body, initially When analyzing motion dynamics, just as in the case of
aligned with the reference coordinate system, into its new absolute motion values estimation, accurate data are the
position. In gyroscope measurements, the gyroscope is basis for an effective and a comprehensive analysis. The
considered as the rigid body and inertial space coordinate accuracy of the captured data is essential for relevant and
system as the reference system. The measured angular comparable results. The first step in motion data capture
velocity determines the rotation of the sensor to its new and analysis is hence sensor calibration.
position. According to the generally adapted model, the accuracy
Because the measured angular velocities represent of the values measured with a 3D sensor is influenced by
simultaneous rotations, it is not appropriate to consider the accuracy of the sensor axis sensitivity, zero level offset
them sequentially. Rotations in general are not and orientation. The sensitivity of the sensor is called the
commutative, and each possible rotational sequence has a ratio of the measured change in value and the real change,
different resulting angular orientation. Three simultaneous assuming that the sensor characteristic is full-scale linear.
angular velocities, measured with the gyroscope can hence Zero level offset is the sensor measurement output when
not be considered to be sequential. There are six possible the real measured value is equal to zero. For a 3D sensor,
different sequences of rotations around three axes. Each of considering sensitivities and zero level offsets gives 6
these six sequences determines a different angular calibration parameters.
orientation, none of which corresponds to the three Further on, because of the imprecise manufacturing, the
simultaneous rotations result. orientation of the sensor sensitivity axis may deviate from
Angular velocities can be represented as vectors that are the sensor coordinate axes. The orientation of the 3D
oriented along the direction of the axis of rotation, and sensor sensitivity axes in the sensor coordinate system is
their size corresponds to the size of the angular velocity. fully defined with 6 parameters.
However, these vectors cannot be unreservedly treated in The aim of different calibration procedures is to
the same manner as normal vectors. In general, the sum of compensate for the measurement errors that arise because
two angular velocity vectors in 3D space does not of the enlisted inaccuracies. According to the presented
correspond to their rotational sum. For this reason, the model, a total of 12 parameters are needed to be estimated.
angular velocity vectors cannot be regarded as Euclidean If the enlisted inaccuracies are time-invariant, the
vectors. Hence, when analyzing data obtained with a 3D calibration parameters are constant and the calibration
gyroscope it is necessary to provide for the correct procedure is said to provide for static compensation. On
interpretation of the obtained angular velocity data. the other hand, if the enlisted inaccuracies are time
To obtain the correct angular orientation, it is dependant, dynamic procedures have to be implemented
appropriate to consider that every angular orientation can and the calibration parameters are functions of time.
be represented by a single rotation. Vector SORA Because of their small dimensions, low weight and
(Simultaneous Orthogonal Rotations Angle) [17, 18] is a affordability, MEMS sensors allow a wide range of
rotation vector which has components that are equal to the practical measurements that can be conducted by
angles of the three simultaneous rotations around the individuals who do not have any prior special training.
coordinate system axes. The orientation and magnitude of Time, computation and cost consuming calibration
this vector are equal to the equivalent single rotation axis diminish the feasibility of the widespread use of these
and angle, respectively. As long as the orientation of the sensors to some extent. Accounting for the above-
actual rotation axis is constant, given the SORA, the mentioned considerations, it is necessary to provide for, in
angular orientation of a rigid body can be calculated in a terms of lifetime and computational complexity, an

Page 151 of 478


ICIST 2014 - Vol. 1 Regular papers

efficient calibration procedure that does not require any In general time series similarity determination, elastic
additional expensive equipment and is suitable for distance measures like Dynamic Time Warping (DTW)
everyday practical use. and its derivates [19, 20], the Longest Common
For calibrating a 3D sensor a number of measurements Subsequence (LCSS) [19, 21], and the Minimal Variance
are perform. The calibration parameters are estimated Matching (MVM) [21] can be implemented to solve the
based on the known values and the measured values. problem of time scaling.
Most procedures for calibrating the 3D accelerometer DTW searches for the best alignment between two time
exploit the fact that the value of the measured acceleration series, attempting to minimize the distance between them
at rest is constant and equal to gravity acceleration. [19]. DTW allows for dips and peaks alignment with their
Measured data are obtained during different orientations of corresponding points from the other time series. DTW
the sensor on a level surface. requires that each point of the query sequence is matched
to each element of the target sequence.
For calibrating a 3D gyroscope, the sensor is usually
rotated with known angular velocities around know axes. LCSS finds subsequences of two time series that best
As the rotation axis remains constant, considering vector correspond to each other. When used for time-series
SORA, the measured angular velocity can be obtained by analysis, it finds a match between two observations
averaging the non-constant measured values during each whenever the difference between them is below a given
calibration rotation. Considering this, it is possible to threshold. LCSS allows skipping elements of both the
perform sensor calibration without the usage of special query and the target sequence and as such solves the
equipment that provide for constant rotation of the device. problem of outliers.
To determine the zero level offset of the 3D gyroscope The MVM algorithm [21] computes the distance value
it is sufficient to carry out a single measurement while the between two time series directly based on the distances of
gyroscope is at rest. corresponding elements. While LCSS optimizes over the
In the present general 3D sensor model, the influence of length of the longest common subsequence, MVM directly
sensor noise is neglected. In practice present sensor noise optimizes the sum of distances of corresponding elements
and does not require any distance threshold. MVM can
causes errors in the estimated calibration parameters. It
skip some elements of the target series and is so used
should be noted that for both sensors, the accelerometer
and the gyroscope, the measured values can be obtained by when the matching of the entire query sequence is of
averaging a large number of samples. When averaging, the interest.
power of noise declines with the number of samples. With Elastic measures are in general more robust than the
a sufficient number of samples it is thus possible to Euclidian distance but are computationally more intensive.
achieve that the noise affecting the calibration is Adaptations of DTW exist that upon implementation of
substantially less than the noise affecting each individual certain constraints make the execution of the DTW and the
measurement. During accelerometer calibration, the sensor Euclidian distance comparable.
can be at rest an arbitrary long time. For a given value of Elastic measures adapt well when parts of the
the gyroscope calibration angular velocity, a greater comparing time series have different time scale. However,
number of samples results in a higher rotation angle. their efficiency and reasonableness of their deployment for
motion pattern recognition is yet to be fully investigated.
IV. MOTION DATA SEGMENTATION
Measurements consisting noise are typical for the V. MOTION EVALUATION
affordable kinematic sensors is use today. It is therefore A number of studies conducted so far have been focused
necessary to implement an adequate filtering technique to on identifying motion patterns and enabling motion
reduce the influence of noise in the obtained raw data. evaluation based on the analysis of individual motion
Sensors supporting high sampling frequencies are parameters obtained using wearable motion sensors [8, 10-
advantageous for this purpose. 14]. Most studies aimed to identify different body postures
The obtained filtered and re-sampled data are the basis are based on collecting data from sensors attached to the
for motion segmentation, pattern recognition, body and making distinction between standing, sitting and
classification and clustering. Procedures used for this lying down. In such stationary examples, the gravitational
purpose are the ones used for general matching and acceleration projections on the sensors coordinate axes are
similarity determination in the field of time series analysis. relatively easy to identify. Recognition of walking periods
A quality tutorial on this topic can be found in [19]. and transitions between different body postures using a
kinetic sensor attached to the chest [11] is intended for the
Motion segmentation refers to the process of identifying
ambulatory monitoring of physical activity in the elderly
motion sequences in the collected time series data. It is
achieved considering some similarity measure that is population. The kinematic sensor here combines a
applied to the target and the query sequence. gyroscope and two accelerometers. The analysis is based
on the wavelet transform.
Due to its simplicity and efficiency, the Euclidian
A sensor including only a single gyroscope is shown to
distance is the most popular and common time series
be efficient for measuring transitions between the sitting
similarity measure. However, it requires that both
sequences are of the same length. The measure itself is position standing [8]. Such a sensor is designed to assess
sensitive to distortions. In some time series, different the risk of falling in the elderly population.
subsequences can have different significance; a part of the A system for pedestrian navigation in [9] is based on the
series can be shifted or scaled in time; a part of the time use of the gyroscope to identify the intervals of rest.
series can be a subject to amplitude scaling. For this In a study [14] a comparative analysis of different
reason, the Euclidian distance is not always the optimal techniques of classification human leg movement using
distance measure. individual parameters of the signals obtained with a pair of
gyroscopes has been presented. The authors compare the

Page 152 of 478


ICIST 2014 - Vol. 1 Regular papers

results of different classification methods including In [16] a simple and practical detection of improper
Bayesian decision-making, decision trees, the least squares motion during the golf swing is presented. Here, individual
method, the k-nearest neighbors, Dynamic Time Warping, swing motion is explored. Acceptable deviations, (i.e.,
Support Vector Machines, and neural networks. The those not having an effect on swing accuracy and
comparison is based on the parameters of the relationship consistency) from those leading to unsuccessful shots are
distinction, data processing cost and the self-study differentiated using PCA. This enables the detection of an
requirement. improper swing motion as illustrated in Figure 2. To
A comparative analysis of different human activities accomplish this task, multiple swing motion data were
classification using sensors mounted on a moving body is captured using a single wearable motion sensor consisting
presented in [10]. Human activities are classified using of a 3D accelerometer and a 3D gyroscope. The analysis
five sensor units mounted on the chest, shoulders and legs. itself can be performed using an arbitrary component of
Each sensor unit consists of a gyroscope, accelerometer the measured kinematic data, e.g. acceleration or angular
and magnetometer. Characterizing parameters are velocity. Each swing observation is labeled according to
excluded from the raw data using Principal Component its performance. Along with objective outcome
Analysis (PCA). evaluations, subjective marks provided by the golfer are
also considered for the overall performance evaluation.
Different studies also deal with the possibilities of the
usage of wearable sensor devices for identification motion Reflecting the overall feeling and easiness of swing
motion, the subjective marks are very valuable when
patterns in sports. In [15] authors investigate motion
considering the player’s individual swing characteristics.
during the golf swing. The purpose of this study is to
determine the repeatability of the kinematics of the chest The proposed method refers to a specific player, for his
and pelvis during the backswing for different swing specific swing and with a specific club. According to this
recurrences, between different players, days and locations method, any portion of the golf swing (e.g., only the
(open and closed driving range). The results of the analysis backswing) can be analyzed, which enables the detection
indicate a high degree of repeatability in different of an improper motion in the early phases of the swing.
conditions. With early improper motion detection, focus is given to
the problem itself and not to the subsequent reactions.

Figure 2. A demonstrative example of the efficiency of wearable motion sensors together with suitable analysis techniques for motion analysis:
improper motion detection during the golf swing. The left panel shows different observations of the golfer’s leading arm rotation around its intrinsic
longitudinal axis during the first 0.625 s of the backswing. All reference observations refer to properly performed swings and are used to establish the
acceptable deviations from the desired motion. The desired motion is obtained as the mean of the reference observations. Test observations 1-5 refer
to an improperly performed swing, and 6 refers to a properly performed swing. Note that not all improper swing motions could be detected by directly
comparing the test and reference observations. Test observations 1, 2, and 3 could eventually be detected. However, test observations 4 and 5,
although referring to improperly performed swings, could not be distinguished from the reference observations. By showing the acceptable and test
observation residual deviations in time domain, obtained using the PCA based procedure [16], the right side indicates errors in the performed swings.
Acceptable deviations residuals represent deviations in properly performed swings attributed to noise and/or different artefacts. The deviation
residuals for test observations 1-5, for which improper motion was detected, considerably exceed acceptable deviations residuals. Consistently
positive values in the second half of the considered swing interval for test observations 1-4 indicate a typical improper motion in the associated
swings.

Page 153 of 478


ICIST 2014 - Vol. 1 Regular papers

[7] Trifunovic, M., Vadiraj, A.M., Van Driel, W.D., MEMS


VI. CONCLUSION accelerometers and their bio-applications, 13th International
Conference on Thermal, Mechanical and Multi-Physics
Wearable kinematic sensors cause minimal physical Simulation and Experiments in Microelectronics and
obstacles for motion performance. Together with proper Microsystems (EuroSimE), 16–18 Apr. 2012, str. 1–7, 2012.
data analysis techniques, these sensors provide for simple [8] Najafi B., Aminian K., Loew F., Blanc Y., Robert P.A.,
and practical motion analysis and evaluation. Measurement of stand-sit and sit-stand transitions using a
miniature gyroscope and its application in fall risk evaluation in
It is possible to evaluate motion and detect improper the elderly, Biomedical Engineering, IEEE Transactions on , Vol.
motion in the early phases of its performance. This is 49, No. 8, pp.843-851, 2002
essential for the offline improvement process. [9] Park, S.K., Suh, Y.S., A zero velocity Detection Algorithm Using
Exploring the possibilities of developing biofeedback Inertial Sensors for Pedestrian Navigation Systems, Sensors, 10,
applications relying on early-phase improper motion str. 9163–9178, 2010.
detection for real-time motion supervision and training [10] Altun, K., Barshan, B., Tunçel, O., Comparative study on
can motivate further study. If upgraded with sufficient classifying human activities with miniature inertial and magnetic
sensors, Pattern Recogn., 43, str. 3605–3620, 2010.
processing power, wearable motion sensors can be used to
[11] Najafi, B., Aminian, K., Paraschiv-Ionescu, A., Loew, F., Bula,
perform well-designed real-time analysis of the collected C.J., Robert P., Ambulatory system for human motion analysis
data. If further equipped with adequate small and light using a kinematic sensor: monitoring of daily physical activity in
hardware (for example, audio speakers), useful feedback the elderly, IEEE Trans. Biomed. Eng., 50(6), str. 711–723, 2003.
applications could be enabled. Instantaneously providing [12] Junker, H., Amft, O., Lukowicz, P., Tröster, G., Gesture Spotting
feedback information and bringing it to consciousness with Body-Worn Inertial Sensors to Detect User Activities, Pattern
could help to improve shot accuracy and consistency in Recogn., 41, str. 2010–2024, 2008.
real time. Providing efficient swing analysis and [13] Lementec, J.-C., Bajcsy, P., Recognition of arm gestures using
performance evaluation in real time and offering multiple orientation sensors: gesture classification, Intelligent
Transportation Systems, 2004. Proceedings. The 7th International
immediate information on the likely outcome of the IEEE Conference on, str. 965–970, 2004.
performing motion could potentially transform the [14] Tunçel O., Altun K., Barshan B., Classifying Human Leg Motion
approach to instruction and practice. with Uniaxial Peizoelectric Gyroscopes, Sensors. Vol. 9, pp. 8508-
8546, 2009
REFERENCES [15] Evans, K., Horan, S.A., Neal, R.J., Barrett, R.S., Mills, P.M.,
[1] McIlwraith, D., Pansiot, J., Yang, G.Z., Wearable and ambient Repeatability of three-dimensional thorax and pelvis kinematics in
sensor fusion for the characterisation of human motion, 2010 the golf swing measured using a field-based motion capture
IEEE/RSJ International Conference on Intelligent Robots and system, Sports Biomech., 11(2), str. 262–272, 2012.
Systems (IROS), str. 5505–5510, 2010. [16] Stančin, S., Tomažič, S., Early Improper Motion Detection in Golf
[2] Aminian, K., Robert, P., Buchser, E.E., Rutschmann, B., Hayoz, Swings Using Wearable Motion Sensors: The First Approach,
D., Depairon, M., Physical activity monitoring based on Sensors, 12(6), str. 7505–7521, 2013.
accelerometry: validation and comparison with video observation, [17] Tomažič, S., Stančin, S., Simultaneous orthogonal rotation
Med. Biol. Eng. Comput., 37, str. 304–308, 1999. angle, Electrotech. Rev., 78, str. 7–11, 2011.
[3] Aminian, K., Najafi, B., Capturing human motion using body- [18] Stančin, S., Tomažič, S., Angle Estimation of Simultaneous
fixed sensors: outdoor measurement and clinical application, Orthogonal Rotations from 3D Gyroscope Measurements,
Comp. Anim. Virtual Worlds, 15, str. 79–94, 2004. Sensors, 11(9), str. 8536–8549, 2011.
[4] Uiterwaal, M., Glerum, E.B.C., Busser, H.J., Lummel, R.C., [19] Lin, J., Wiliamson, S., Borne, K., DeBarr, D., Pattern Recognition
Ambulatory monitoring of physical activity in working situations, in Time Series, Advances in Machine Learning and Data Mining
a validation study, J. Med. Eng. Tech., 22(4), str. 168–172, 1998. for Astronomy, Eds. Kamal, A., Srivastava, A., Way, M., and
[5] Heinz, E.A., Kunze, K.S., Gruber, M., Bannach, D., Lukowicz, P., Scargle, J. Chapman & Hall, 2012.
Using Wearable Sensors for Real-Time Recognition Tasks in Keogh, E. J., Pazzani, M. J., Derivative dynamic time warping,
Games of Martial Arts - An Initial Experiment, 2006 IEEE The 1st SIAM Int. Conf. on Data Mining (SDM-2001), Chicago,
Symposium on Computational Intelligence and Games, str. 98– IL, USA, 2001.
102, 2006. [20] Latecki, L. J., Megalooikonomou, V., Wang, Q., Lakaemper, R.,
[6] Roetenberg, D., Slycke, P.J., Veltink, P.H., Ambulatory Position Ratanamahatana, C. A., Keogh, E. J., Partial elastic matching of
and Orientation Tracking Fusing Magnetic and Inertial Sensing, time series, Fifth IEEE International Conference on Data Mining,
IEEE Trans. Biomed. Eng., 54(5), str. 883–890, 2007. IEEE, 2005.

Page 154 of 478


ICIST 2014 - Vol. 1 Regular papers

QUALISYS WEB TRACKER – A WEB-BASED VISUALIZATION TOOL


FOR REAL-TIME DATA OF AN OPTICAL TRACKING SYSTEM
Andraž Krašček and Jaka Sodnik
Faculty of Electrical Engineering, University of Ljubljana, Slovenia

Abstract – In this paper, we describe a web-based tool


for the visualization and analysis of real-time output data
of a professional optical tracking system – Qualisys.
Optical tracking systems are used for capturing the
positions and movements of various objects in space with
the aid of several high-speed cameras and reflective
markers. The positions of individual markers are
represented as time-dependent 3D spatial points. Tracking
software running on a single dedicated computer enables
the visualization of data as an interactive 3D scene. The
main goal of our work is to enable a real-time
visualization of this 3D scene on multiple computers
simultaneously with the aid of modern web technologies
and tools. The framework is based on an interactive Figure 1. The basic principle of triangulation –
Node.js web server, which streams data from the Qualisys reconstruction of 3D data from 2D images [3].
tracking software, reformats it and sends it to a web
browser through a fast WebSocket protocol. The web The passive markers are made from simple, light and
browser enables the visualization of the 3D scene based highly-reflective plastic materials and require an active
on WebGL technology. The tool described in this paper source of infrared light within each camera. The infrared
enables a synchronized visualization of the tracking data light is then reflected from the markers back to the
on multiple computers simultaneously and thus represents cameras and provides good contrast between the markers
an excellent teaching and presentation tool of optical and the surroundings. Active markers, on the other hand,
motion capture techniques. can be simple LEDs, which do not require an additional
source of light within each camera.
1. INTRODUCTION In our research, we deal with a professional motion
capture system Qualisys [4], which consists of eight high-
The term “motion capture” refers to a technique for speed cameras, a set of passive markers and a dedicated
capturing and recording movements of various objects in tracking software called Qualisys Track Manager (QTM)
space [1][2]. It is widely used in the field of [5]. The latter is responsible for calculating the exact 3D
biomechanics, game industry, movie and television location of each marker from 2D images acquired by
production, etc. The majority of widely used motion individual cameras. It is a desktop application that runs on
capture techniques is based on optical methods that record a normal PC and communicates with the cameras through
the object’s movements. A number of cameras are used to standardized Ethernet protocol. It is operated through a
observe the passive or active reflective markers, which well-designed and intuitive GUI and enables a real-time
have to be attached to the points of interest or to the visualization of all tracked points in a virtual coordinate
points that are then tracked in space. The system system. It also enables the control over all cameras, such
calculates the exact location of each marker in space by as calibration, timing, capture rate, exposure and flash
triangulation based on the projection of the marker onto time, etc.
each camera’s 2D image plane. Figure 1 demonstrates the Each marker (3D point) is presented with three
basic principle of tracking 3D data based on multiple 2D independent coordinates (X, Y and Z) in a Cartesian
images. coordinate system. It is illustrated as a colored dot at the
Several cameras are typically used in order to provide corresponding spatial position. A set of markers can be
constant visibility of each marker by a minimum of three grouped into a “model”. The model is based on a certain
independent cameras. The high number of cameras number of markers, attached to different parts of the
provides redundancy, a lower possibility for marker loss object tracked. The rigid parts of the object (the parts with
and occlusion, and also a higher accuracy over the entire a constant inter-distance) within the model can be
tracking volume. presented with “bones”. The bones can be built with the
aid of GUI by selecting individual markers in the model

Page 155 of 478


ICIST 2014 - Vol. 1 Regular papers

Figure 2. The visualization of an AIM model in QTM software consisting of a set of markers attached to different parts
of a human body.

and connecting them with straight lines. Figure 2 shows all major components of the system and communication
an example of a set of markers attached to a human body. protocols between them.
The final model with a selected number of markers and
bones can be saved as an “AIM model” (Automatic 2. QTM REAL-TIME SERVER PROTOCOL
Identification of Markers), which is then applied in future
measurements. This process helps the QTM software to RT protocol component enables the retrieval of processed
increase the tracking accuracy with the aid of exact real-time data from QTM over a TCP/IP or UDP/IP
distances and relations between the markers. connection [6]. The protocol structure is well defined and
The measurements of marker locations and model provides all the basic features such as: auto discover,
movements can be saved and processed locally or settings changing, streaming, error messaging, etc. A
streamed to an additional computer through a predefined client that connects to the server can require data in 2D,
RT (Real-Time) protocol [6]. The “RT protocol” feature 3D or 6DOF (six degrees of freedom), with or without the
enables the QTM software to function as a TCP or UDP marker labels, etc. In our case, we use the server to stream
server from which various clients can acquire tracking 3D points of a set of predefined markers in real-time.
data in real time.
The visualization and analysis of tracked objects and 3. WEB SOCKET
models can currently be performed only on the computer
running the QTM software. In this paper, we propose a The web was originally designed around a
complete solution for the visualization and analysis of the request/response model of a HTTP protocol, which then
tracked data on multiple computers simultaneously, in evolved to asynchronous XHR request (part of AJAX)
real time and even from a remote location. It is based on making the web more dynamic. However, the updates
modern web technologies and protocols, which enable full from the server are still only received upon client request
duplex client-server communication and real time triggered by user interaction or periodic polling. This
rendering of 3D models and objects in a browser. In the carries big data overhead and higher latencies. WebSocket
following sections we provide an overview of the key API is a part of HTML5 specification and introduces a
web technologies used for this experiment and describe socket-like persistent full-duplex connection between the

Page 156 of 478


ICIST 2014 - Vol. 1 Regular papers

server and the client. The data can be sent from the server implementation. Additional functionalities can be added
to the client and vice versa without prior request. After as modules via NPM (Node Package Manager).
the initial handshake, the data is sent in frames with a
frame header in the size from 48 bits to a maximum of 5. SYSTEM ARCHITECTURE OF QUALISYS WEB
112 bits, depending on the payload data size. The TRACKER
maximum frame size depends on the implementation of
the protocol. [7] Our system consists of three main entities: QTM
software, Node.JS server and multiple clients with
3. WEBGL ordinary browsers. The coordinates of all the tracked
markers are transferred from QTM to Node.JS server over
WebGL is a royalty-free, cross-platform API that brings the RT protocol. The server handles the data translation
OpenGL ES 2.0 to the web. It is implemented as a 3D from raw stream to JavaScript Object Notation (JSON)
drawing context within HTML, exposed as a low-level and broadcasts it to all connected clients. Each individual
Document Object Model interface. It uses the OpenGL client is responsible for the reconstruction and
shading language, GLSL ES, and can be cleanly visualization of the 3D scene from the received data. The
combined with other web content layered on top or following two subparagraphs briefly describe the basic
underneath the 3D content. It is ideally suited for dynamic functionalities of the server and the client.
3D web applications in the JavaScript programming
language, and has been fully integrated in all leading web
browsers. [8]

Rendering one frame in WebGL can be computationally


expensive and can block the JavaScript event loop;
therefore rendering should not be implemented by using
the SetInterval or infinite loop but rather with the
requestAnimationFrame method. This method will
execute rendering script at the first available time slot
without blocking the user interface or other scripts.
Figure 3. The basic architecture of the Qualisys Web
4. NODE.JS
Tracker software.
Node.JS was first introduced in 2009 by Ryan Dahl at a
Server side
JavaScript conference in Berlin. It is a JavaScript based
The application's server side consists of three main
web server. Thanks to Google's fast JavaScript interpreter
modules: HTTP module, WebSocket module and RT
and a virtual machine called V8, used by Google in its
protocol module. The HTTP module is a basic Node.JS
Chrome web browser, Node.JS can outperform traditional
module which handles HTTP requests. In our application,
server stacks. With classic scripting programming
it is mainly used for handling the delivery of static files
languages, such as for example PHP or ASP, the
(like CSS, JS, etc.) and control parameters in the
programmer writes scripts that are executed by a server
administrator view. The other two modules are used to
installed separately. In the case of Node.JS, the
transfer the data from QTM to the clients through the
programmer writes a code that represents a part of the
server. The RT protocol module handles the
server itself.
communication between the QTM and the server
application, while the WebSocket module communicates
Node.JS executes the code in same process on a single
with multiple clients simultaneously.
thread called event loop. One event is performed in each
loop, so all the I/O events must be written in an
Before broadcasting any data to potential clients, an
asynchronously non-blocking way. A good example of
administrator has to connect to the QTM through a special
such a code is a database query. Node.JS sends a query to
administrator view. The administrator view is a simple
a database that takes a considerable amount of time to
web application running on a separate port and requiring
process. In the blocking way, the server would wait for
user authentication. The administrator specifies the
the database response and only then continue with the
corresponding QTM IP address and port. After
execution. On the other hand, the non-blocking script
connecting, the administrator requests a special XML file
would not wait for the database response, but would
which specifies all the important properties of the
handle other events instead. The response is pushed in the
markers, such as for example their labels (names) and
event queue immediately when it is returned by the
colors. The specification of the properties is important in
database. In this way, Node.JS is capable of handling a
order for the markers to be correctly interpreted and
high number of simultaneous connections.
visualized by the clients. It is vital that this XML file is
transmitted before the start of the data stream, so that it
The core of Node.JS is just a set of low-level APIs,
can be parsed and saved in memory, otherwise the data
known from JavaScript and V8 JavaScript
with the marker coordinates sent to the client will be

Page 157 of 478


ICIST 2014 - Vol. 1 Regular papers

inadequate. The administrator can then control the stream WebSocket module, which updates the local variable with
(start and stop) and also set the transmission rate (in new coordinates. This change is easy to detect; however,
samples per second). The default value for the it is much simpler to continually render the scene because
transmission rate is set to 60. The initiation or stopping of of the second type of event. The latter is the user
the stream by the administrator actually starts and stops interaction. The user can jaw, pitch, or zoom in or out the
the data stream between the QTM and the server. The scene. This interaction can also be detected and the scene
main functionality of the server application is to translate can be rendered on demand, but we get a much more
the coordinates data from raw buffer stream to JSON smooth interaction if we use the requestAnimationFrame
format, a typical format for representing data in method and render the scene whenever possible. We try to
JavaScript. One 3D frame becomes an array with the size achieve a refreshing frequency of 60 frames per second.
corresponding to the number of nodes. Each item is an When the user interacts with the scene, his or her changes
object with x, y, z coordinates expressed in millimeters are saved in a special view matrix. The view matrix is a
and the residual keys. Some additional parameters are transformation matrix that transforms the vertices from
added to JSON format, such as frame number and type, scene-space to view-space. The WebGL module is
which may be used in further development. The data array responsible also for applying the vertex and fragment
is then broadcasted to all connected clients via shaders.
WebSocket. Figure 4 represents an example of a frame in
JSON format, which is broadcasted through WebSocket 6. CONCLUSION
protocol.
The tool described in this paper enables a remote and
interactive visualization of the tracking objects from the
Qualisys optical tracking system. The 3d tracking scene
can be presented on multiple computers simultaneously
and synchronized with the QTM software. It is based
solely on web technologies and runs in all modern
browsers supporting WebGL API. The current version of
the tool enables the visualization of all labeled markers in
the scene as well as a list of their current coordinates. The
colors of the individual markers can also be applied in
accordance with their original color in the QTM software.
Optical motion capture systems are often used as teaching
tools in lab exercises with a high number of students.
With the aid of our tool, a 3D tracking scene can be
streamed to multiple computers simultaneously, while
also enabling custom interaction, field of view and zoom
for each client. In this way, each student can observe the
experiment more actively and customize his or her own
perception. The tool could easily be integrated in an e-
learning framework enabling remote participation in such
experiments in real-time.
Currently, no information about the existing AIM models
is streamed through the RT protocol component and the
visualization of the bones between the markers has to be
Figure 4. The representation of marker data in JSON
done manually (hardcoded) on the client. Our main goal
format
for future development is to enable the capability of
rebuilding the AIM models or building new ones on the
Client side
client. The user should be able to select individual
The client side application consists of two parts, a
markers and connect them with bones. These new models
WebSocket and WebGL module. The WebSocket module
should then be saved on the server or locally on the client
is very simple. After loading the application and all its
itself for future use.
external dependencies (JavaScript libraries), the module
connects to the Node.JS server. The connection is open
for the lifetime of the application. When the data in JSON
format is received from the server, it is saved to a local
variable. Our goal was to keep the WebSocket module as
lean as possible so it does not block the WebGL module.
The WebGL module is responsible for constantly
rendering a 3D scene and exposing it in canvas HTML5
element. An infinitive rendering is necessary because of
the constant updates of the scene. The updates can come
from two events. The first update comes from the

Page 158 of 478


ICIST 2014 - Vol. 1 Regular papers

Figure 5. The visualization of an AIM model in a Firefox browser (the same model is shown on figures 2 and 5)

REFERENCES

[1] A. G. Kirk, J. F. O’Brien, D. A. Forsyth, “Skeletal


parameter estimation from optical motion capture
data,” IEEE CVRP 2005, vol. 2, pp. 782-788, 2005.
[2] M. Gleicher, N. Ferrier, “Evaluating video-based
motion capture,” Proceedings of Computer
Animation 2002, pp. 75-80, 2002.
[3] S. Hofverberg, “Theories for 3D reconstruction,”
Qualisys QTECH1004 v1.0, 2007.
[4] Qualisys – Motion Capture Systems,
http://www.qualisys.com/, 12/2013.
[5] QTM – Qulisys Track Manager, User Manual, 2011.
[6] L. Nilsson, “QTM Real-time Server Protocol
Documentation, V1.9,” 2011.
[7] The WebSocket API,
http://www.w3.org/TR/websockets/, 12/2013
[8] WebGL - OpenGL ES 2.0 for the Web,
http://www.khronos.org/webgl/, 12/2013

Page 159 of 478


ICIST 2014 - Vol. 1 Regular papers

Usability of Smartphone Inertial Sensors for Confined


Area Motion Tracking

Anton Umek, Anton Kos


University of Ljubljana, Faculty of Electrical Engineering, Ljubljana, Slovenia
E-mail: [email protected],
[email protected]

Abstract - Modern smart phone devices are equipped 2 Motion constraints and tracking
with several space positioning sensors. Most of them
are inaccurate low-cost silicon devices, not designed Our experiments include two important constraints,
for motion tracking. The paper presents the results which simplify motion trajectory calculation:
of several constraint motion tracking experiments us-
ing iPhone 4 sensors. The experiments confirm that • the motion of the device is in a two dimensional
the best choice for motion tracking is sensor fusion - well-balanced horizontal plain, perpendicular to
a simultaneous usage of accelerometer and gyroscope the gravitational vector,
data. While accelerometer data are less accurate than
gyroscope data, they are still good enough for a number • the motion orientation of the device is always in
of various motion-connected applications. sensor’s principal direction 1y .

Assuming there is no gravitation projections in plain


1 Introduction dimensions (x,y), only the accelerometer bias requires
compensation in directions: ∆ax , ∆ay . Assuming
Inertial sensors in iPhone 4 are embedded in two there is no sideway slithering, smart phone velocity
IC devices manufactured by STMicroelectronics: 3D vector absolute value depends only on the acceleration
accelerometer LIS331DLH [2] and 3D gyroscope vector component ay . If all of the above conditions are
L3G4200D [3]. Devices are not labeled with original fulfilled, two simple tracking algorithms can be used.
part numbers, but identified by Chipworks [1]. Both
devices are designed more for movement detection,
Algorithm 1
gaming and virtual reality input devices and less for
navigation applications. The major sensor parameters The algorithm uses accelerometer data ay and gyro-
are listed in table 1. scope data ωz . The acceleration bias ∆ay should be
compensated, otherwise it has the same influence on
LIS331DLH L3G4200D
Parameter Accelerometer Gyroscope the path length error ∆s as in a one dimensional mo-
tion (1). The gyroscope bias ∆ωz makes a linear drift
Measurement range ±2g ± 2000 deg/s
Sensitivity 1 ± 0.1 mg/dig 70 mdeg/s/dig in the device orientation α[n] and should be compen-
Bias error ± 20 mg ± 75 deg/s sated.
The starting velocity v[0] and orientation α[0] of the
Table 1. iPhone 4 inertial sensor parameter values. device should be defined. The velocity vector is calcu-
lated by sensor fusion: the velocity vd [n] is calculated
Both biases induce deviations on the derived spatial from the acceleration vector component ay and the
and angular position. The relative space position is orientation α[n] is obtained from gyroscope data ωz :
calculated by integrating the acceleration vector twice
over time. For a simplified one dimensional motion, vd [n] = vd [n − 1] + Ts (ay [n] − ∆ay ) (2)
the position error is equal to the path length error α[n] = α[n − 1] + Ts (ωz [n] − ∆ωz ) (3)
∆s. The acceleration bias ∆a makes a linear drift in
velocity and a squared drift in position: where Ts is sensor data sampling time.
1
∆s = ∆a t2 (1) Algorithm 2
2
The rotation angle is calculated by integrating the an- The algorithm uses only accelerometer data (ax , ay ).
gular velocity over time. The gyroscope bias makes Both accelerometer bias values should be compen-
a linear drift: ∆α = ∆ω t. The position accuracy is sated. The starting velocity vector should be defined
Page 160vxy
more sensitive to acceleration bias, but in longer time of [0]
478= (vx [0], vy [0]). The orientation of the device
periods both drifts blur the actual position. Therefore α[n] is obtained from the current velocity direction an-
both sensor biases should be compensated. gle. The velocity difference vector dvxy [n] is calculated
ICIST 2014 - Vol. 1 Regular papers

from the device acceleration vector, obtained from the


accelerometer data axy = (ax , ay ):
π
dvxy [n] = Ts (axy [n] − ∆axy )Rot(α[n]− 2 ) (4)
vxy [n] = vxy [n − 1] + dvxy [n] (5)
where Rot(angle) assigns 2D rotation of the device to
the absolute coordinate system. The starting device
orientation angle α[0] is measured relative to the prin-
cipal sensor direction 1y . The relative device position
and the path length are measured from the starting
position r[0] = (x[0], y[0]), d[0] = 0:
r[n] = r[n − 1] + Ts vxy [n] (6)
d[n] = d[n − 1] + Ts |vxy [n]| (7)
Figure 1. A LEGO City train track. The outer track is
Instead of using two dimensional matrix algebra, in the shape of a rounded rectangle and the inner track is
complex numbers can simplify numeric calculations. in the shape of a ”babuška”. More interesting is the inner
track that changes the course of the train several times in
Both 2D tracking algorithms are used in our exper- both directions (left/right). Hence its accelerometer and
gyroscope sensor readings are more diverse and interesting
iments. The first algorithm gives much better motion for analysis (see figure 3 for details). The smart phone
tracing results than the second algorithm, where gy- is mounted onto the train in the position that ensures its
roscope data is not used. The main reason for such sensors are in the middle of the track.
results is that the orientation obtained from two noisy
and biased accelerometer components is very inaccu-
rate, especially when accelerometer readouts are low.
Gyroscope data has been recognized as very accurate;
in comparison to accelerometer bias, relatively small
gyroscope bias value ∆ωz does not have a significant
effect on the device motion tracking results.

3 Experiments

All experiments were done under the constraints spec-


ified in the previous section: the smart phone is hori-
zontally balanced in the (x, y) plane, which is perpen-
dicular to the earth’s gravity vector, and the motion
of the smart phone is always in sensor’s principal axis
1y . Among several experiments we have chosen the
two most representative examples. Their settings are
shown in figures 1 and 2. We did not use any dedicated
mechanical laboratory equipment.
Several experiments were done using a simple kid’s
toy; a LEGO City train set, which is flexible and pre- Figure 2. A horizontally balanced bicycle wheel. The
cise enough to build different tracks. Two testing track smart phone is tied to the wheel. When the wheel is
configurations are illustrated in figure 1. The results spinned, the sensors go round the wheel’s axes in a cir-
cle with the radius of r = 22 cm.
that follow in the next section correspond to the inner
track. The train composition itself is not shown, but
it is easy to precisely install a smart phone onto it in a 4 Results
way that ensures its motion sensors are in the middle
of the track. The train is driven by a simple start/stop
Recorded sensor data was transferred to the PC and
remote control.
processed by both algorithms from section 2 using dif-
Some experiments were done by mounting the smart ferent pairs of sensor signals (ay , ωz ) and (ax , ay ). Fig-
phone onto a bicycle front wheel, as illustrated in fig- ure 3 shows signals of the first experiment.
ure 2. The wheel was accelerated by hand from resting
If accelerometer and gyroscope are used for motion
position for approximately 90 degrees and after ten
tracking, then both offset values for ay and ωz should
free-drive rotations smoothly braked and stopped at
be tuned well in order to fit the motion trajectory
the starting position angle.
Page 161close to the known trace pattern of the experiment.
of 478
Sensor data is recorded by iPhone application Sensor Accelerometer and gyroscope biases have independent
Monitor Pro and later processed on a PC. influence on different velocity parameters (2), (4). Off-
ICIST 2014 - Vol. 1 Regular papers

AccX + 1
AccY - 1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 t

(a)

GyroZ
2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 t

(b)

Figure 3. Smart phone sensor readings for the inner track from figure 1. (a) accelerometer readings in X axis (accX)
and Y axis (accY), both normalized to the earth’s gravity g = 9.81 m/s2 . For the purpose of the presentation accX
readings are raised and accY readings are lowered for 1 g to prevent the overlapping. (b) gyroscope readings around Z
axis (gyroZ). Since all experiments were conducted with the phone balanced in the (x, y) plane, all other readings from
the accelerometer and gyroscope are insignificant for our results. AccZ shows earth’s gravity, gyroX and gyroY are zero
as the phone can only rotate around Z axis.

set values can be fitted separately. Accelerometer and second experiment shown in figure 2. In figure 9 we
gyroscope offsets should be adjusted to prevent veloc- present only the calculated trajectory with more ac-
ity (magnitude) and angular drift. Many details of the curate algorithm 1, where both sensor data are used.
motion of the device in experiments are known and can The trace path makes almost perfect circles with little
be used in sensor data post-processing. offset in the central point.
Gyroscope and accelerometer bias values are first Accelerometer offsets vary with experiments, some-
measured by averaging signals in the resting time in- times even when measured in resting time before or
terval before acceleration. Measured gyroscope bias, after the acceleration. The measured accelerometer
averaged over 100 samples, is 5 10−3 rad/s. The pre- offset values are inside the specified tolerance from ta-
dicted angular drift in a 15 s long experiment is less ble 1. The measured angle drifts are very low in com-
than 5 degrees and below the actual trajectory mea- parison with gyroscope sensor specification in table 1.
surement tolerance. Sensor monitoring application obviously uses iOS fil-
tered gyroscope data, where sensor fusion algorithm
Unfortunately, the averaged measured accelerometer
bias is not accurate enough to compensate the veloc- already compensates large amount of gyroscope bias.
The later also explains why measured gyroscope off-
ity drift. The adjusted zero-drift offset is 6.2 10−3 g0
sets vary in different experiments. Similar results are
and differs from the averaged measured bias for more
reported in related work by several authors [4], [5], [6].
than 30%. The effect of accelerometer bias compen-
sation is illustrated in figure 4. If only accelerometer
v_1 [m/s]
signals are used for the 2D motion tracking then both 1.5 v_2 [m/s]
biases influence the velocity angular error. Both tra- v_3 [m/s]

1
jectory best-fit offset values were set very close (± 5%)
to averaged measured biases. 0.5

Comparison of both methods, using different inertial


sensor signals from the first experiment, is illustrated 0 2 4 6 8 10 12 14 16 18
t
0.5
and explained for velocity, orientation and path profile
in figures 5 ,6 and 7. Some minor differences are visi- 1

ble between the cumulative path lengths and velocity


magnitudes, while angular differences are much higher. Figure 4. Smart phone velocities in the direction of Y axis
Both motion trajectories are compared in figure 8.Page 162vcompensated
y in three different cases. Curves show vy when ay bias is:
of 478 (v ), not compensated (v ), and with double
1 2
Better accuracy in motion tracking was found in the offset (v3 ).
ICIST 2014 - Vol. 1 Regular papers

y
1 v_1 [m/s]
v_2 [m/s]

0.5
1

0.8

0 2 4 6 8 10 12 14 16 t
0.6
Figure 5. Smart phone velocity vy in the direction of Y
axis. The velocity vy is calculated by: (a) algorithm 1
(v1 ), which uses accY data that gives us the acceleration 0.4
in the direction of movement and gyroZ data that gives
us the information about smart phone orientation in the
(x, y) plane, (b) algorithm 2 (v2 ) uses the accX and accY 0.2
data to directly calculate the velocity vy .

Φ
200
1 0.8 0.6 0.4 0.2 0 0.2
x
0.2

0 2 4 6 8 10 12 14 16
t
Figure 8. Calculated trajectory of the smart phone move-
200 ment on the inner track from Figure 1. The solid curve
shows the trajectory calculated by the algorithm 1 and the
dashed curve shows he trajectory calculated by the algo-
Figure 6. Rotation around Z axis. The smart phone is rithm 2. We notice that the results obtained by fusing the
balanced in the (x, y) plane and the angle of the rotation is accelerometer and gyroscope data (algorithm 1) give much
given in degrees (modulo 360). The blue curve corresponds more faithful trajectory.
to the algorithm 1 and the red curve to the algorithm 2.
y
s_1 [m]
s_2 [m] 0.3

0.15

0 2 4 6 8 10 12 14 16
t

Figure 7. Path lengths calculated by both algorithms. We


see that the path of algorithm 2 (s1 ) is a little longer, what 0.3 0.15 0 0.15 0.3
x
was expected as its velocities v2 in Figure 5 are a bit above
the velocities v1 of algorithm 1.

0.15
5 Conclusion

By setting the correct offset values we can control


tracking in a longer periods of time. Our experiments
confirmed that accelerometer data are generally less 0.3

accurate than gyroscope data, but still good enought


Figure 9. Calculated trajectory of the smart phone move-
for various motion-detection applications. ment on the bicycle wheel 2. The trajectory is calculated
by algorithm 1. We see that the iPhone makes circles with
the radius of about r = 22 cm, what faithfully represents
6 References the conditions of the test.

[1] Motion sensing in the iPhone 4: MEMS accelerome-


ter, http://www.memsjournal.com/2010/12/motion- [4] C. Barthold, P. Subbu, R. Dantu: Evaluation of
sensing-in-the-iphone-4-mems-accelerometer.html Gyroscope-embeded Mobile Phones, Systems, Man,
and Cybernetics (SMC), 2011 IEEE International
[2] STMicroelectronics, LIS331DLH MEMS dig- Conference on, pg. 1532-1638,
ital output motion sensor ultra low-power
high performance 3-axes “nano” accelerom- [5] H.Graf, K.Jung: The Smartphone as a 3D Input De-
eter, http://www.st.com/web/catalog/sense vice, 2012 IEEE Second International Conference on
power/FM89/SC444/PF218132 Consumer Electronics, ICCE-Berlin, pg. 254-257,

[3] STMicroelectronics, L3G4200D MEMS motion [6] Xiaoji Niu et al, Using Inertial Sensors of iPhone 4
sensor ultra-stable three-axis digital output Page
gy- 163 of 478
for Car Navigation, Position Location and Navigation
roscope http://www.st.com/web/catalog/sense Symposium (PLANS), 2012 IEEE/ION , pg. 555-561.
power/FM89/SC1288/PF250373
ICIST 2014 - Vol. 1 Regular papers

Enhanced Gaussian Selection in Medium


Vocabulary Continuous Speech Recognition
Branislav Popović*, Dragiša Mišković*, Darko Pekar**, Stevan Ostrogonac*, Vlado Delić*
* University of Novi Sad, Faculty of Technical Sciences, Novi Sad, Serbia
** AlfaNum – Speech Technologies, Novi Sad, Serbia
[email protected], [email protected], [email protected], [email protected], [email protected]

Abstract—Eigenvalues Driven Gaussian Selection (EDGS) is speech frame is above the predefined threshold or
used in this paper in order to reduce the computational percentage, are calculated exactly. The aim is to find the
complexity of an acoustic processing module of a medium most significant components for calculating the overall
vocabulary continuous speech recognition system for the state likelihood, based on a given input vector, and at the
Serbian language, based on Hidden Markov Models same time, to assign as few nonessential components as
(HMMs) with the diagonal covariance matrices. The optimal possible [2].
values of five different parameters are discussed: overlap The paper is organized as follows. In Section II, EDGS
threshold and overlap percentage used for clustering, is described in more details. In Section III, we describe the
pruning threshold and pruning percentage used for
CSR system and the parameters used for training and
decoding, as well as newly introduced discard threshold.
testing purposes. The results are also given, confirming
Significant reduction of computational complexity is
considerations from previous sections. Paper concludes
obtained, without noticeable degradation in error rate.
with Section IV, providing conclusions.

I. INTRODUCTION II. EIGENVALUES DRIVEN GAUSSIAN SELECTION


Gaussian Selection (GS) procedure is used in order to EDGS represents a variant of the GS procedure, driven
increase the speed of a Continuous Speech Recognition by the eigenvalues of the covariance matrices of the
(CSR) system, with an acceptable degradation of the baseline Gaussian components [3]. It was proposed in
system performance, that consequently occurs as a trade- order to deal with the situation when there is a significant
off. It was originally proposed in [1], stating that the overlapping between the baseline Gaussian components.
likelihood of HMM state could be efficiently Prior to the execution of the appropriate clustering
approximated by using only a small number of highly algorithm, a Gaussian is assigned to a group from a
dominant Gaussian components, without significant predefined set of groups. The assignment is based on a
degradation of the recognition accuracy. The method was value aggregated from the eigenvalues of the covariance
later refined and efficiently applied to CSR system, using matrix of the particular Gaussian, using slightly modified
HMMs with the diagonal covariance matrices [2]. A novel Ordered Weighted Averaging (OWA) operators,
method, addressing the problem of the GS in case of represented in a form
larger overlap between the baseline Gaussian components p
is proposed in [3]. It incorporates grouping algorithm, (1)
implemented as an initial step, before the actual GS
OWAω (λ1 ,..., λ p ) = ∑ω
j =1
j λσ ( j )

clustering procedure. It was further enhanced in [4], by


using iterative split and merge algorithms. where 0 ≤ λσ(1) ≤ … ≤ λσ(p) and the coefficients ω ∈ Rp
Calculation of acoustic state likelihoods contributes satisfy the constraints
considerably to the total computational load of HMM-
p
based recognition systems [2]. HMM states are (2)
represented by multiple mixture Gaussian state emitting 0 ≤ ω j ≤1 , ∑ω j =1
j =1
distributions. Each Gaussian component has to be
evaluated separately in order to determine the likelihood EDGS combines the most significant eigenvalues of the
of a single state. The idea behind the GS is to generate a baseline Gaussian components. The particular Gaussian
set of clusters during the training phase, i.e., to form with the eigenvalues λ = (λ1,…,λp) is assigned to a g-th
hyper-Gaussians by clustering the baseline Gaussian group, g ∈ {1,…,G}, iff OWAω(λ) is in the corresponding
components [5]. The Gaussians that are close to each predefined interval [τming, τmaxg), where τmaxg = τming+1. We
other in terms of the appropriate clustering divergence are set the borders of intervals to τ(i+1) = cτ(i), where c is a
clustered into a single group, resulting in a division of the predefined constant.
acoustic space into a set of vector quantized regions. The second step is the GS clustering. It is an iterative
Regions are represented by the parameters of their hyper- procedure. At each iteration, the particular Gaussian
Gaussians. Each Gaussian component could be assigned component is assigned to a specified cluster, assuming
to one or more regions, i.e., attached to one or more that the "distance" between the Gaussian and the hyper-
hyper-Gaussians. In the decoding phase, the Gaussian Gaussian that corresponds to that cluster is minimal. The
components associated to clusters with the corresponding parameters of hyper-Gaussians are obtained as maximum
hyper-densities, whose distance to the particular input likelihood estimates, given in the closed form as functions

Page 164 of 478


ICIST 2014 - Vol. 1 Regular papers

of the parameters of the belonging Gaussian components. using the information from pronunciation dictionary and
In our previous research [6], we obtained optimal results language model. Phonetic transcriptions of words are used
by using the one-sided KL divergence as our clustering for lexical tree creation. Afterwards, they become
measure, and the Mahalanobis distance between the obsolete. If the full covariance matrices are used, the
observation and the hyper-Gaussian in the decoding stage. calculation of acoustic scores (CAS) is the critical part in
The expression for the one-sided KL divergence for any terms of the computational complexity. Even in the case
two d-dimensional Gaussians exists in the closed form of the diagonal covariance matrices, the CAS produce a
significant portion of the total computational load. The
Σ2 state emitting probability is calculated only for the states
a = log that correspond to the active tokens.
Σ1
−1
Medium-sized vocabulary is used for the purpose of our
b = Tr (Σ 2 Σ1 ) experiments, with the approximately 1250 words. The
−1 (3) system operates on a set of 4792 states, and 30286
c = ( µ1 − µ 2 )T Σ 2 ( µ1 − µ 2 )
Gaussians, represented by the diagonal covariance
matrices. The database is windowed using 30 ms
KL (h1 || h2 ) = (a + b + c − d ) / 2 Hamming windows, with 20 ms overlap between the
adjacent frames. The system uses 32-dimensional feature
vectors, containing 15 Mel-frequency cepstral coefficients
(μ1, Σ1) and (μ2, Σ2) are the parameters of the and normalized energy, in combination with their first
corresponding Gaussians h1 and h2. The performance of order derivatives. Significant improvements are obtained
the EDGS method is assessed in terms of the trade-off in terms of the trade-off between the speed and the
between the recognition performance and the reduction in accuracy, by applying the GS procedure, as shown by the
the number of exactly evaluated hyper-Gaussians. experiments.
The values of five different parameters are examined in
the paper. In case of the disjoint clustering, the overlap
III. EXPERIMENTAL RESULTS percentage determines the relative number of the "nearest"
hyper-Gaussians to which a Gaussian component will be
A. System Setup attached during the training GS phase, but only if the
CSR system, developed for the Serbian language, is "distance" between the component and a hyper-Gaussian
used in this paper for the purpose of our experiments [7]. is below the minimum "distance" value for the given
Acoustic and linguistic model, together with a decoding component and all of hyper-Gaussians, increased by the
module, constitute a decoder. The decoding module is specified overlap threshold. The percentage of the
independent of the acoustic model implementation. It baseline Gaussians shared between 1, 2, 3 or more hyper-
allows the use of different scoring and optimization Gaussians, is illustrated in Fig. 1. For 10% overlap
procedures, without modifications to the other parts of the between the clusters and overlap threshold set to 0.5,
system. The system is HMM-based, using Gaussian 81.48% of Gaussians are attached to only one hyper-
mixture models for representing HMM states. Gaussian, 15.49% of Gaussians are shared between 2
The decoding module uses a sequence of input feature hyper-Gaussians, less than 3% of Gaussians are shared
vectors in conjunction with the search space, in order to between 3 hyper-Gaussians, and about 0.5% of Gaussians
generate the recognition output. The decoder is based on a are shared between 4 or more hyper-Gaussians.
variant of the Viterbi algorithm, known as the token- The pruning percentage represents the percentage of the
passing algorithm [8]. The information about the path and hyper-Gaussians with the highest likelihoods for a given
the score is stored at the world level. Two types of pruning input speech frame. The Gaussian components attached to
are supported, i.e. the beam search, where all the tokens hyper-Gaussians within the specified range, have to be
whose score is lower than the current maximum, evaluated exactly, assuming that the likelihood of hyper-
decreased by a predefined threshold, are discarded, as well Gaussian to which they are attached is above the
as pruning by limiting the number of tokens with highest predefined value, determined as the difference between
scores. Search space is created by the linguistic model, the maximum likelihood value for all hyper-Gaussians, for

Figure 1. Mixture sharing percentage per number of clusters for Figure 2. Pruning by threshold or percentage, the pruning
10% overlap and the overlap threshold set to 0.5 percentage set to 10

Page 165 of 478


ICIST 2014 - Vol. 1 Regular papers

TABLE I.
DISJOINT CLUSTERING, DISCARD THRESHOLD DEACTIVATED
Pruning Pruning Discard Overlap Overlap CSR time CAS time
No. WER PER
percentage threshold threshold percentage threshold gain [%] gain [%]
- - - - - - - - 13.60 3.50

1 20 - - - - 1.97 19.79 13.70 3.60

2 10 - - - - 3.32 33.33 14.00 3.80

3 5 - - - - 3.73 37.50 14.30 4.00

4 20 4 - - - 3.01 30.21 15.10 4.10

5 10 4 - - - 4.25 42.71 15.60 4.10

6 5 4 - - - 3.83 38.54 15.60 4.30

7 20 8 - - - 3.01 30.21 13.90 3.70

8 10 8 - - - 3.21 32.29 14.10 3.80

9 5 8 - - - 3.73 37.50 14.50 4.00

10 20 12 - - - 2.49 25.00 13.80 3.70

11 10 12 - - - 3.01 30.21 13.90 3.80

12 5 12 - - - 3.83 38.54 14.30 4.00

TABLE II.
SMALL OVERLAPPING, DISCARD THRESHOLD DEACTIVATED
Pruning Pruning Discard Overlap Overlap CSR time CAS time
No. WER PER
percentage threshold threshold percentage threshold gain [%] gain [%]
- - - - - - - - 13.60 3.50

1 20 - - 10.0 0.5 1.35 13.54 13.50 3.60

2 10 - - 10.0 0.5 2.69 27.08 13.70 3.60

3 5 - - 10.0 0.5 3.11 31.25 14.10 3.90

4 20 4 - 10.0 0.5 2.38 23.96 14.40 3.90

5 10 4 - 10.0 0.5 2.80 28.13 14.60 4.00

6 5 4 - 10.0 0.5 3.11 31.25 14.40 4.00

7 20 8 - 10.0 0.5 1.76 17.71 13.90 3.60

8 10 8 - 10.0 0.5 2.07 20.83 13.80 3.70

9 5 8 - 10.0 0.5 2.69 27.08 14.20 3.90

10 20 12 - 10.0 0.5 1.87 18.75 13.60 3.60

11 10 12 - 10.0 0.5 2.59 26.04 13.70 3.60

12 5 12 - 10.0 0.5 2.49 25.00 14.10 3.90

a given input speech frame, and the pruning threshold. and their likelihood will be specified as the likelihood of
The percentage of hyper-Gaussians pruned by the the first hyper-Gaussian, whose likelihood value is above
threshold or the percentage, for the pruning percentage set the specified difference.
to 10, and the values of pruning threshold set to {4,8,12} In order to catch the order of magnitude of the
respectively, is presented in Fig. 2. particular eigenvalue which has a multiplicative nature
The value for all the other Gaussian components, [3], the vector of thresholds was set to τvec = [0.7 1.4 2.8].
attached to hyper-Gaussians outside the specified range, We set the borders of the intervals to τ(i+1) = cτ(i), c is a
have to be approximated in order to reduce the constant (c = 2 in our case). The approximate number of
computational complexity. In case that the likelihood of Gaussians per cluster was set to 200. Larger hyper-
their hyper-Gaussian is above the difference between the Gaussians further reduce the computational complexity,
maximum likelihood value for all hyper-Gaussians, for a but they also have more significant impact on error rate.
given input speech frame, and the discard threshold, their
likelihood values will be floored with the value B. Parameter Values
determined for the hyper-Gaussian to which they are In Table I, the results are presented for the system using
attached. Otherwise, the Gaussians will be "discarded", the disjoint EDGS clustering vs. the baseline system (the

Page 166 of 478


ICIST 2014 - Vol. 1 Regular papers

TABLE III.
DISJOINT CLUSTERING, DISCARD THRESHOLD ACTIVATED
Pruning Pruning Discard Overlap Overlap CSR time CAS time
No. WER PER
percentage threshold threshold percentage threshold gain [%] gain [%]
- - - - - - - - 13.60 3.50

1 20 - 16 - - 3.73 37.50 13.50 3.50

2 10 - 16 - - 5.08 51.04 13.70 3.70

3 5 - 16 - - 5.60 56.25 14.20 3.90

4 20 4 16 - - 5.39 54.17 15.10 4.10

5 10 4 16 - - 5.80 58.33 15.20 4.10

6 5 4 16 - - 5.49 55.21 15.40 4.30

7 20 8 16 - - 4.77 47.92 13.90 3.60

8 10 8 16 - - 5.08 51.04 13.80 3.70

9 5 8 16 - - 5.49 55.21 14.30 3.90

10 20 12 16 - - 3.73 37.50 13.60 3.50

11 10 12 16 - - 4.77 47.92 13.80 3.70

12 5 12 16 - - 5.18 52.08 14.20 3.90

TABLE IV.
SMALL OVERLAPPING, DISCARD THRESHOLD ACTIVATED
Pruning Pruning Discard Overlap Overlap CSR time CAS time
No. WER PER
percentage threshold threshold percentage threshold gain [%] gain [%]
- - - - - - - - 13.60 3.50

1 20 - 16 10.0 0.5 2.69 27.08 13.50 3.50

2 10 - 16 10.0 0.5 4.56 45.83 13.60 3.60

3 5 - 16 10.0 0.5 4.46 44.79 14.10 3.80

4 20 4 16 10.0 0.5 4.15 41.67 14.40 3.90

5 10 4 16 10.0 0.5 4.77 47.92 14.60 4.00

6 5 4 16 10.0 0.5 4.35 43.75 14.40 4.00

7 20 8 16 10.0 0.5 3.94 39.58 13.80 3.60

8 10 8 16 10.0 0.5 4.46 44.79 13.80 3.60

9 5 8 16 10.0 0.5 4.15 41.67 14.20 3.90

10 20 12 16 10.0 0.5 3.52 35.42 13.50 3.50

11 10 12 16 10.0 0.5 4.46 44.79 13.60 3.60

12 5 12 16 10.0 0.5 4.66 46.88 14.00 3.80

system working directly with the baseline Gaussians, i.e. clutters. Our previous research showed that small
without the Gaussian selection procedure). The results are overlapping between clusters provides favorable results in
given for the combinations of values for the pruning comparison to the disjoint clustering or larger overlapping
percentage set to {20,10,5}, and the pruning threshold set case [6]. Therefore, we used 10% overlapping, and the
to {4,8,12} respectively, or without the pruning threshold, overlap threshold set to 0.5. The values are determined
i.e., when only the pruning percentage is used. For the intuitively, in order to get about 80% of Gaussians
pruning percentage set to 10, and the pruning threshold set attached to only one hyper-Gaussian, and another 10 to
to 8, about 60% of Gaussians are pruned by the 20% shared between 2 "nearest" clusters, as shown in Fig.
percentage, and another 40% are pruned by the threshold. 1. Similar computational gain of about 30% is obtained by
Computational time needed in order to calculate the using the lower pruning percentage and threshold values.
acoustic score was decreased by 30%, without significant However, we also obtained better accuracy.
degradation of word (WER) and phoneme (PER) error In Table III, we used previously described settings,
rate (less than 0.5 in both cases). given in Table I, but in this case we also used the discard
In Table II, the results are given by using the same threshold. For a given value of the discard threshold, more
pruning settings, with only small overlapping between the than 50% of Gaussian components, that were selected to

Page 167 of 478


ICIST 2014 - Vol. 1 Regular papers

be floored, will be "discarded". The values of "far away" ACKNOWLEDGMENT


hyper-Gaussians will be floored by using the greater The work described in this paper was supported in part
likelihood value determined for the first hyper-Gaussian, by the Ministry of Education, Science and Technological
whose likelihood is above the specified difference, instead Development of the Republic of Serbia, within the project
of smaller likelihood value determined for the hyper- TR32035: “Development of Dialogue Systems for Serbian
Gaussian to which they are attached. Therefore, they have and Other South Slavic Languages”.
more chance to be selected in the decoding phase, for a
given input speech frame. We obtained larger speed gain
REFERENCES
of about 55% and slightly better accuracy.
[1] E. Bocchieri, “Vector quantization for efficient computation of
In Table IV, we used the settings given for Table II, i.e., continuous density likelihoods,” in Proc. ICASSP, 2:II-692-II-695,
small overlapping between the clusters, but we also used Minneapolis, MN, 1993.
the discard threshold. Better results are obtained in terms [2] K. M. Knill, M. J. F. Gales, S. J. Young, “Use of Gaussian
of the accuracy in comparison to the results given in Table selection in large vocabulary continuous speech recognition using
III, i.e., the disjoint case. In terms of both, the speed and HMMs,” in Proc. ICSLP, vol. 1, pp. 470–473, 1996.
the recognition performance, better results are obtained in [3] M. Janev, D. Pekar, N. Jakovljević, V. Delić, “Eigenvalues driven
comparison to the results given in Table II. We obtain the Gaussian selection in continuous speech recognition using HMM’s
computational gain of about 45%, followed by the with full covariance matrices,” in Appl. Intel., vol. 33, no. 2, pp.
107–116, 2010.
increase of WER and PER by no more than 0.2, as a trade-
off between speed and accuracy. [4] B. Popović, M. Janev, D. Pekar, N. Jakovljević, M. Gnjatović, M.
Sečujski, V. Delić, “A novel split-and-merge algorithm for
hierarchical clustering of Gaussian mixture models,” in Appl.
IV. CONCLUSION Intel., vol. 37, no. 3, pp. 377-389, 2012.
Significant reduction in the computational complexity [5] B. Popović, M. Janev, V. Delić, “Gaussian Selection Algorithm in
Continuous Speech Recognition,” in Proc. TELFOR 2012, pp.
of acoustic scores calculations is obtained by using the 705-712, Belgrade, Serbia, 2012.
appropriate values of five different parameters, examined
[6] D. Pekar, M. Janev, N. Jakovljević, B. Popović, V. Delić,
in this paper. In terms of trade-off between speed and “Improving the performance of Gaussian selection algorithm,” in
accuracy, the optimal results were obtained by calculating Proc. SPECOM 2011, pp.89–95, Kazan, Russia, 2011.
no more 10% of hyper-Gaussians, and for the values of [7] N. Jakovljević, D. Mišković, M. Janev, D. Pekar, “A Decoder for
pruning threshold that provide close, but not equal pruning Large Vocabulary Speech Recognition,” in Proc. IWSSIP 2011,
border, to the one obtained by using the above mentioned pp.1–4, Sarajevo, Bosnia and Herzegovina, 2011.
pruning percentage. Additional improvements in terms of [8] S. J. Young, N. H. Russell, J. H. S. Thornton, “Token passing: a
the accuracy were obtained by introducing small simple conceptual model for connected speech recognition,”
overlapping between clusters, in combination with the Cambridge University Engineering Department, Cambridge, UK,
Tech. Rep. CUED/FINFENG/TR-38, July 1989.
appropriate overlap threshold. Another discard threshold
was also introduced, providing optimal results.

Page 168 of 478


ICIST 2014 - Vol. 1 Regular papers

Building a virtual professional community:


the case of Bulgarian Optometry and Eye Optics
Mila Dragomirova *, Boyan Salutski **, Elissaveta Gourova ***
*Sofia University /Department Software Engineering, Sofia, Bulgaria
** Technical University of Sofia, Sofia, Bulgaria
*** Sofia University/Department Software Engineering, Sofia, Bulgaria

[email protected], [email protected], [email protected]

Abstract— Knowledge management (KM) can support small the knowledge-sharing and the communication among all
and medium enterprises and individual professionals to organizations in this industry, individual professionals and
access information and knowledge on recent developments other interested stakeholders. On the other hand, having a
and innovations in their field, and to share the best practices virtual space for exchange of ideas and for building a
available. It is particularly important for the healthcare common memory, could facilitate the process of learning
professional field where a lot of changes appear and of individual participants and the community as a whole.
professionals need to be aware of them in order to better The goal of this paper is to present a concept of a
help people. This paper presents a concept and a prototype virtual platform for the purposes of the community in
of a web-based KM platform in support of the Bulgarian
optics and optometry in Bulgaria. The paper initially
community of professionals in the field of Optometry and
presents the problems of the community, and later focuses
Eye Optics.
on the concept and its implementation.

I. INTRODUCTION II. PROBLEMS IDENTIFIED


In the knowledge-based economy, knowledge has The professional community in the field of Optometry
become a key resource and an essential factor for ensuring and Eye Optics in Bulgaria comprises individuals working
high quality, efficiency and competitive advantages. The as optometrists or opticians, SMEs (the majority are micro
rapid development of science and technology has faced enterprises) operating in the field of ophthalmic optics,
organizations with many challenges and the need to optical shop owners, manufacturers, importers of optical
continuously monitor their environment, in particular, or contact lenses and spectacle frames, and suppliers of
their clients’ needs, competitors’ behaviour and the equipment and processing tools for glasses. Other
regulatory amendments. Many small and medium essential stakeholders are educational institutions in the
enterprises (SMEs) face difficulties to develop on their field, researchers and lecturers in universities and
own new knowledge and innovation, as well as to actively vocational secondary schools, as well as professional
monitor the trends in their external environment [1]. organizations and associations of optometrists and
Therefore, they are often supported by non-governmental opticians.
organizations (NGOs), industrial associations, clusters, The Eye Optics is a professional field closely linked to
etc. for exchange of knowledge and best practices, and health care, and therefore, the level of knowledge of
thus, building a collective intelligence. As noted in [2], the practitioners is of vital importance, not only for the
generation of innovation in SMEs is related to multilateral professionals of this community, but also for the whole
relationships – they collaborate with other SMEs, society. The professional performance of this industry
scientific communities and other organizations, from strongly depends on the involvement of highly-qualified
which they obtain up-to-date knowledge and expertise, staff, able to learn on-the-job and to adapt to the trends in
share and discuss ideas. The development of information the sector, and the dynamic development of technologies.
and communication technologies (ICT) provided to SMEs Its primary sources of knowledge include:
enormous opportunities to access information and  sources of theoretical knowledge (know-what) -
knowledge, to enter various virtual communities – training programs at universities and vocational
communities of practice (CoP), communities of interest secondary schools, textbooks, educational materials,
(CoI), Communities of creation, Learning communities, and research papers;
etc. [1]. SMEs embeddings into virtual networks or
communities gives them opportunities to take advantage  sources of practical knowledge (know-how) -
of the open collaborative learning processes, and thus, to seminars and workshops organized by representatives
acquire new knowledge and competencies [2]. of manufacturers, news articles, company guidelines
and standards, cases and conference materials.
The problems faced by SMEs in the industry of
Optometry and Eye Optics in Bulgaria for accessing The challenges of life-long learning and sharing of best
knowledge are a reason for developing a collaborative professional practices are facilitated presently by many
platform to assist the interested individuals and international organizations in the field of Optometry and
organizations in sharing knowledge and expertise. The Eye Optics, however, the Bulgarian community is still not
analysis of the needs of the community [3] suggests that a taking full advantage of these opportunities. There is also
knowledge management system (KMS) could facilitate a need for a constant dialogue with educational institutions

Page 169 of 478


ICIST 2014 - Vol. 1 Regular papers

for providing training according to the changing skills B. Goals and services
demands of the industry. The analysis of the status and the needs of the
The analysis of the present state of the knowledge community in optics and optometry were a reason to
transfer in the industry shows serious obstacles [3]. On the consider developing a KMS. The main goal is to facilitate
one hand, no literature is available in Bulgarian, and on the knowledge sharing and the communication between
the other, a common archive of documents and materials
needed by the community, including on Wikipedia, does organizations and professionals in the industry and other
not exist. The trainings are chaotic, organized mainly stakeholders, to enhance their cooperation and to support
according to the interests of big companies in the field and them to find knowledge and expertise. This would
not consistent with the professional qualifications and the establish conditions for building a strong CoP as an
real needs of the community. The access to information informal network of people with shared values and
and knowledge from international events and conferences beliefs in optics and optometry in Bulgaria.
is quite difficult, mainly due to the lack of funding for As main users of the system were considered:
participation. Much better is the opportunity to follow the  opticians, optometrists and their organizations;
innovations in the field by attending trade fairs.  teachers and students in the field of optometry
In the virtual space certain gaps also exist. For example, and optics;
the National Association of optometrists and opticians
 members of partner organizations (European,
(NABOO) has made attempts to provide online fora for its
global, industry and professional organizations in other
members developing three websites, which unfortunately,
countries and in Bulgaria);
were not professionally designed and contain a limited
information about different periods. None of them  representatives of public bodies related to the
provides to the community a platform for sharing of industry;
knowledge and information. The use of social network  end-users of products and services.
sites (SNS) gives temporary results. In a spontaneously According to the literature [1], [6], the virtual
created closed group on Facebook "Opticians and community could be supported by various technologies
Optometrists” with presently over 500 members quite for knowledge sharing and collaboration (Fig. 1).
active are around 30 people, who share mainly news, and
a small and fragmented data base that could be used as a On bases of the overall goal of the KMS, as specific
basis for a future specialized platform. Generally, the objectives were identified:
knowledge sharing occurs with changing degree of active (a) Building collective memory based on internal and
participation and is quite chaotic, unorganized and external community knowledge (creation of
ineffective, despite the enthusiasm and efforts of some shared specialized knowledge base);
NABOO members. (b) Strengthening communication and collaboration.
One of the main problems in the industry is the lack of The implementation of the first objective focuses on:
information 'who is who'. Statistics on the number of  providing information about legislation,
certified practitioners and opticians and official lists of regulation and performance requirements of the
organizations working in the field do not exist, as well as industry (online library, news);
contact information of optics. While such information
 facilitating access to libraries with professional
could facilitate the networking of the community, it could literature and educational materials;
also be of importance for its visualization in the society.
This is a reason to consider establishing a directory of  providing access to existing good practices and
'yellow pages' which will support SMEs in the industry to innovations in the industry;
find relevant expertise, and will facilitate the contacts with  supporting expertise allocation and connection
other stakeholders and end-users, in particular. with experts in the field ("yellow pages");
 providing information on related organizations,
III. CONCEPT OF A COLLABORATIVE PLATFORM educational and training programs, as well as on foreign
experience to solve problems in the field.
A. Methodology
The methodology followed by the platform
development team takes into account the life cycle of Chatroom Central

software systems development, and the problem solving


Administrator

life-cycle [4]. Initially, the main functionalities were Yellow Pages

determined on bases of a careful analysis of the state-of-


the-art in the community, and the main needs in place. As Moderation Document Search
a second step, the research literature was studied for Tools Management Function

identifying technologies supporting collaboration and


knowledge sharing, as well as taking advantage of existing
practice of virtual communities [1]. On this base were Discussion Workflow

defined the main requirements of the platform, its services


Forum Management

and technologies to be used. The pilot platform was


Mailing System

designed and tested within a Bachelor Thesis [5],


however, a lot remains for its validation within the
community and its final exploitation. Figure 1. Components of the CoP platform [6]

Page 170 of 478


ICIST 2014 - Vol. 1 Regular papers

In order to strengthen the communication and The whole prototype structure will be based on PHP,
collaboration in the field, it is foreseen the following: HTML and JavaScript, and for the administration of the
 establishing an environment for assistance and MySQL data base will be used phpMyAdmin 4.0.5.
advice of community members on important regulatory,
technical, financial, management and organizational IV. WEB PLATFORM DESIGN
aspects (help-desk system); The platform is implemented based on the three-tier
 facilitating discussions on common problems of Model-View-Controller (Fig. 2), separating the business
the industry to develop common views, as well as for logic of the graphical interface and using the data base of
taking important joint decisions (closed discussion the product in three separate modules. This ensures easy
forum, online conferencing tools, etc.) readability, maintenance, extensibility and reusability of
 providing various online communication the code. In addition to all these advantages of a modular
channels with the external environment and for design, this architectural template allows a change of
obtaining users’ feedback (forum, groupware). technology and the regeneration of each of the three
components without affecting the remaining. The only
C. Technology prerequisite is to comply with the approved interfaces
between each of the modules [5].
The goals and objectives set should be implemented by
taking into account that the potential users have moderate According to the requirements and needs of the
ICT skills, and thus, the web platform should provide opticians and optometrists from the virtual community in
easy-to-use (in terms of access, content management, Bulgaria, the platform prototype provides two types of
presentation, search and navigation) and intuitive users registration: non-professional users (subscribers);
interface, integrating, when possible, widely-used web and professionals (authors).
applications and tools. This is essential in order to The pilot implementation of the KMS has taken into
facilitate the platform uptake and not to discourage users. account the expected users roles and has incorporated
Other requirements include platform independent special rules for the levels of access to different
presentation of digital content (text, images, audio and functionalities. In order to ensure the system security and
video), both on mobile devices and desktop computers. It to protect the content from unauthorized access and
was, therefore, taken into account that present Web 2.0 malicious acts, different access rules are implemented and
technologies provide excellent tools for KMS: different users’ groups defined (Table I). All users should
 For exchange of expertise and experience in the be registered on the platform, however, some filling only a
industry can be used a wiki that makes it easy to create, short registration form with minimum data. A long
organize, and search for knowledge coming from large registration form is foreseen for professionals (Fig. 3),
groups of users. In practice, wiki pages can become an who should prove their status, and after their accounts
online branch encyclopedia. were approved by the moderator or administrator of the
system they obtain rights to write, and in certain cases – to
 Social networks (e.g. Facebook, LinkedIn) can edit the content (Authors). The moderator is supported by
be used as a base for communication between members a group of ‘Editors’ who have almost all administrative
of the association and for integration of information rights on the platform. The only difference with the
"who is who", including professional contacts and administrator is that the editors can not promote other
expertise. registered professionals to become editors – this is done
 A discussion forum can be used to exchange only by the administrator.
information with external organizations and individuals,
as well as to discuss problems of the optical industry
and of the individual SMEs.
 A dynamic web data base can be used to present
the different organizations in the field, maintained by
individual members, and to provide greater visibility of
SMEs which are a dominating part of the industry.
While initially the main focus is on facilitating
communication, collaboration and sharing of knowledge
and information, on a second stage could be considered
how to extend the KMS with e-learning functionalities in
order to better support trainings organized for the Figure 2. Model-View-Controler [5]
community, the dissemination of educational materials, TABLE I. USERS ACCESS TO SERVICES ON THE PLATFORM
and to provide opportunities for on-line consultation with /ABBREVIATIONS USED: R - READ; C - COMMENT; W - WRITE;
other experts, e.g. researchers and teachers. W * (PUBLICATION AND ACCESS TO HIDDEN FORUM); E – EDIT,
E * - EDIT * (CHANGE ONLY OWN PUBLICATIONS ); D - DELETE/
The programme realization of the platform was decided
to be based on CodeIgniter, a fast evolving open source Non- Professional Moderator Administrator
framework supporting web pages design with PHP. The professional
availability of reach libraries facilitates project design in News R R/W R/W/E/D R/W/E/D
this framework. In addition, CodeIgniter is based on the Articles R/C R/C/W/E* R/C/W/E/D R/C/W/E/D
Wiki R R/W R/W/E/D R/W/E/D
MVC (Model-View-Controller) model, desribed below.
Library - R R/W/E R/W/E
For fast deployment of the CodeIgniter framework will be Yellow R/ R/W/E* R/W/E/D R/W/E/D
used FTP client with open code – FileZilla, version 3.7.3. pages
Forum R/W R/W* R/W*/E/D R/W*/E/D

Page 171 of 478


ICIST 2014 - Vol. 1 Regular papers

technology developments, etc. RSS feeds will be


integrated in later stage.
 Articles – The objective is to create a web blog
where users could publish own articles and interesting
information, search, comment and discuss them. This
functionality is available to all registered professionals,
as well as those having Facebook profiles. It is
considered at a later stage to integrate also LinkedIn
and Twitter users, as well as to ensure indexing,
categorization and labeling of the articles.
 Wiki – It is integrated in the platform with the
aim of creating a common space for easy creation,
organisation and search of information and knowledge.
The difference from articles is that no comments could
be made, and the quality of information is ensured by
the Editors. The objective is the Wiki to become an
Figure 3. Registration of professionals [5] online encyclopedia for the professionals in the field of
Optometry and Eye Optics in Bulgaria.
 Library – In comparisson with Articles and Wiki,
in the Library only moderators will be able to store files
adding to them meta-data for search facilitation.
 Yellow pages – Presently, only organisations
could be added in the yellow pages, alphabetically
indexed. All Authors could add content. It is considered
to add also individuals by integrating initially their SNS
profiles. In order to facilitate finding expertise in the
community, some search options will be integrated as
well.
 Forum – Its objective is to serve as a platform for
discussions on different topics of interest to all users, as
well as obtaining feedback from end-users. A hidden
forum will facilitate professionals collaboration,
exchange of ideas and discussions on joint statements
and the collective decision taking.
 Entry / Exit, Registration;
 Verification of users;
Figure 4. Model of the relational data base [5]
 Contacts.
The concept of the data base design is linked to the Presently, the platform passed initial tests by the
opportunity for serving present and future needs of the designer for the software realization, and by two
community, maintaining fast changing structured and non professionals for the available functionalities. It was
structured data, as well as easily integrating them into new considered that the prototype meets the initial
modules to be added to the KMS according to the needs. requirements. It is foreseen to widely discuss the concept
The present view of the data base is depicted in Fig. 4. and the prototype within the professional community for
Optometry and Eye Optics in Bulgaria before finalising it.
The collection of various unstructured data on the
platform faces its developers with the challenge to collect,
categorize and organize them in order to facilitate users to V. CONCLUSION
easy find and retrieve the information and knowledge they The presented collaborative platform concept could
are looking for. Thus, metadata will be added (e.g. name perfectly serve the SMEs in the Optometry and Eye Optics
of authors, type of publication, date, key words, etc.). A in Bulgaria as it ensures sufficient functionalities to meet
controlled vocabulary will help for data classification and their present needs. However, as the literature on
better organization. It will be specially developed for the knowledge management suggests [1], the introduction of
aims of the community providing a common terminology new products and services is not easily accepted by end-
and a single taxonomy. Subsequently, the vocabulary will users. On the one side, there is a need for awareness
provide all area specific key words necessary for raising and ensuring the necessary users’ skills. On the
describing knowledge resources by the authors or for other, knowledge sharing and collaboration require trust
searching in the data base. A search engine will be among users, which would be difficult in a community of
integrated on the platform with opportunities for multi competitors. The key here would be that every single user
criteria search. finds added-value in the new tools offered. Therefore, the
The present functionalities of the platform include: concept envisages various functionalities in order to serve
different interests: either for new knowledge and
 Homepage / News – Aimed at providing users information, or for higher visibility. It is considered that
with opportunities to get up-to-date information on the professionals from the community will benefit from
upcoming events, meetings, changes in legislation,

Page 172 of 478


ICIST 2014 - Vol. 1 Regular papers

the opportunities for higher visibility (of them or their ACKNOWLEDGMENT


organization), access to new knowledge, exchange of The authors gratefully acknowledge the support
ideas, collaborative problem solving, as well as obtaining provided under the European social fund Operational
end-users feedback. At the same time, as by every new Programme Human Resources Development, grant
project, some champions should be identified who could agreement BG051PO001-3.3.06.-0052.
actively promote and support the development of content
and the wide usage of the virtual collaboration platform.
REFERENCES
In addition, various media channels will be used for
widely disseminating information about the platform. [1] E. Gourova, A. Antonova, R. Nikolov (eds.), Knowledge
Management, Bulvest 2000, Sofia, 2012 (in Bulgarian).
In order to diminish the entry barriers, it is envisaged to [2] J. Hafkesbrink, J. Evers, “Innovation 3.0: Embedding into
organize a seminar for presentation of the platform and for community knowledge - The relevance of trust as enabling factor
training of its potential users among NABOO members. A for collaborative organizational learning”, in Hafkesbrink, J. et al.
survey among professionals will gather information on (eds.) Competence Management for Open Innovation, Josef EUL
their satisfaction and the new features which they would Verlag, Koln, Germany, 2010.
like to be added in order to better meet their needs. [3] M.Dragomirova, “Knowledge Management for Bulgarian Branch
Organisations in Optometry and Optic”, in 1st Doctoral
A more ambitious future goal is the platform to Conference in Mathematics, Informatics and Education, Sofia,
facilitate industry-academia collaborations, and especially Bulgaria, 19-22 September 2013.
debates on new research topics, innovation and joint [4] M. Paukert, М., C. Niederée, M. Hemmje, “Knowledge in
future opportunities. The already established collaboration Innovation Processes”, in: D. G. Schwartz (ed.), Encyclopedia of
with some stakeholders for professionals’ education and Knowledge Management, Idea Group Reference, Hershey, USA,
training could serve as a sound base for deepening 2006.
collaboration for mutual benefits. [5] B. Salutski, Knowledge management in virtual communities, BSc
Thesis, Technical University of Sofia, Sofia, Bulgaria, 2013.
Another possible feature would be to add a jobs
[6] TRAINMOR KNOWMORE consortium, Handbook on
searching facility, which could provide information on organizational knowledge management in European
employment opportunities (vacancies published by organizations, Sofia, Bulgaria, 2008.
organizations) and the available expertise (CV published
by job seekers). This would essentially support SMEs in
finding the skills and knowledge they are looking for.

Page 173 of 478


ICIST 2014 - Vol. 1 Regular papers

Fuzzy Influence Diagrams in Power Systems


Diagnostics
Zoran Marković*, Aleksandar Janjić**, Miomir Stanković***, Lazar Velimirović****
*,**** Mathematical
Institute of the Serbian Academy of Sciences and Arts, Belgrade, Serbia
**University
of Niš, Faculty of Electronic Engineering, Niš, Serbia
***University of Niš, Faculty of Occupational Safety, Niš, Serbia
[email protected], [email protected], [email protected],
[email protected]

Abstract — In this paper, influence diagram with fuzzy study: the determining of the cause of excessive tripping
probability values, as a graphical tool for the diagnostic of transformer circuit breaker.
reasoning in power system has been proposed. Instead of
Bayesian networks that are using conditional probability
II. INFLUENCE DIAGRAMS
tables, often difficult or impossible to obtain, a verbal
expression of probabilistic uncertainty, represented by fuzzy
sets is used in this paper. The proposed methodology Influence diagrams were proposed by Howard and
enables both type of inference support: bottom-up and top- Matheson [7], as a tool to simplify modeling and analysis
down inference, including decision nodes in the analysis. of decision trees. They are graphical aid to decision
This inference engine is illustrated on the case of the making under uncertainty, which depicts what is known
detection of the cause of excessive tripping of transformer
or unknown at the time of making a choice, and the
breaker in the substation.
degree of dependence or independence (influence) of
each variable on other variables and choices. It represents
I. INTRODUCTION
the cause-and-effect (causal) relationships of a
phenomenon or situation in a non-ambiguous manner,
Problems of diagnostics in power systems, including and helps in a shared understanding of the key issues.
the detection of failures, or equipment condition Building of an influence diagram is performed with the
assessment are faced with many uncertainties related to usage of several graphical elements. A circle depicts an
their past, present and future operational conditions. external influence (an exogenous variable), rectangle
Bayesian networks and Influence diagrams are graphical depicts a decision. Chance node (oval) represents a
tools that aid reasoning and decision-making under random variable whose value is dictated by some
uncertainty, modeling the system with the network of probability distribution, and value node is presented as a
states with known probability distributions [7,11,13]. diamond (objective variable) - a quantitative criterion that
These tools are used for the medical diagnosis, map is the subject of optimization.
learning, heuristic search, and, very recently, in power The diagram can be used as a basis for creating
systems, including both the predictive and diagnostic computer-based models that describe a system or as
support. descriptions of mental models managers use to assess the
In power systems, the predictive support is used mostly impact of their actions. Influence diagram represents a
for the prediction of circuit breaker or transformer pair N={(V, E), P} where V and E are the nodes and the
failures [4, 17] based on condition monitoring data. edges of a directed acyclic graph, respectively, and P is a
Second approach – fault diagnostic is applied for relay probability distribution over V. Discrete random variables
protection selectivity and transformer fault diagnostic [3, V={X1, X2, ..., Xn} are assigned to the nodes while the
18]. edges E represent the causal probabilistic relationship
Using probabilistic methods for the linking of among the nodes. Each node in the network is annotated
symptoms to failures is possible only in the presence of with a Conditional Probability Table (CPT) that
necessary failure probabilities, obtained from operating represents the conditional probability of the variable
data, or through the solicitation of subjective probabilities given the values of its parents in the graph. However, the
from experts. However, this is not always possible, and use of probability tables with many elements is very
depends on quality and quantity of available data. difficult, because of the combinatorial explosion arising
The objective of this work is to propose an integrated from the requirement that the solution must be extracted
method for both types: bottom-up and top-down inference by the cross product of all probability tables.
support in uncertain environment, including decision Solving of an ID can be effectuated using fuzzy
nodes in the analysis. The rest of the paper is organized reasoning [1, 8, 9, 12], where each node in the diagram
as follows. Section II discusses basic concept of ID can be represented by appropriate fuzzy sets, describing
modeling. Section III gives details of the fuzzy influence the uncertain nature of a given value. The combination of
diagram model, while Section IV elaborates the case predecessor nodes fuzzy sets gives the value of resulting

Page 174 of 478


ICIST 2014 - Vol. 1 Regular papers

node. A commonly used technique for combining fuzzy From previous definition, two fuzzy Bayes rules
sets is Mamdany’s fuzzy inference method. However, the analogue to classical crisp number relations (4) and (5)
main limitations of fuzzy reasoning approaches are the are formulated. Operator “≅” stands for “=” operator.
lack of ability to conduct inference inversely. Feed-
forward-like approximate reasoning approaches are FP(Y  y j , X  xi ) 
strictly one-way, that is, when a model is given a set of (4)
inputs can predict the output, but not vice versa.  FP( X  xi )  FP(Y  y j \ X  xi )
Furthermore, utilization of a probability measure to
assess uncertainty requires too much precise information
FP ( X  xi \ Y  y j ) 
in the form of prior and conditional probability tables, FP ( X  xi )  FP (Y  y j \ X  xi )
and such information is often difficult or impossible to 
obtain. In certain circumstances, a verbal expression or FP (Y  y j )
(5)
interval value of probabilistic uncertainty may be more
appropriate than numerical values. The fuzzy influence Based on the law of total probability another rule for
diagram with the fuzzified probabilities of states is the fuzzy marginalization can be added, represented by
presented in the next section. the expression (6).

FP(Y  y j ) 
III. FUZZY INFLUENCE DIAGRAMS
(6)
 FP( X  xi )  FP(Y  y j \ X  xi )
The fuzzification of influence diagrams used in this i

approach is performed both by the fuzzification of


random variables, like in [15, 16], and by introduction of Using the above equations, fuzzy Bayes inference can
fuzzy probabilities [5, 6, 10]. Based on previous works on be conducted, with operations of fuzzy numbers defined
linguistic probability [5, 6], it is possible to define similar as operations in terms of arithmetic operations on their α
probability measure for fuzzy probabilities. – cuts (arithmetic operations on closed intervals).
The main advantage of the proposed method is very
Definition 1. Given an event algebra ε defined over a set flexible operation with uncertain data, which are now
of outcomes Ω, a function FP: ε→E is termed a fuzzy presented in verbal form, through fuzzy inference rules.
probability measure if and only if for all A ∊ ε Prior and conditional probabilities for individual nodes
are presented in tables, and they are following fuzzy
0 ° FP( A) ° 1 (1) inference rules derived from expert knowledge base. The
elements of matrix M=[Mij], where m is the number of
FP     1 and FP     0 discrete states of parent node Xi , and n is the number of
(2) discrete states of the child node Yj represent possible state
If A1 , A2 ,   are disjoint , then of the nature, denoted with the name of node (A, B, C... )
  (3) and the number of node state (I, II, III). Mij represents the
FP( Ai )  FP( Ai ) conditional probability 𝐹𝑃(𝑌 = 𝑦𝑗 , 𝑋 = 𝑥𝑖 ).
i 1 i 1 This methodology will be illustrated on the case of the
power transformer diagnostics.
where FP is fuzzy probability measure on (Ω, ε), the
tuple (Ω, ε, FP) is termed fuzzy probability space.
Embedded real numbers are denoted by χ subscript. IV. CASE STUDIES
Based on previous definition, fuzzy probabilities,
grouped in several fuzzy sets, are introduced and denoted
The methodology for both predictive and diagnostic
with linguistic terms (extremely low, very low, low,
support is illustrated on the case of power transformer in
medium low, medium, medium high, high, very high and
one transformer substation, planned for the replacement,
extremely high) and presented on Figure 1.
because of its age and unsatisfying diagnostic test results.
Transformer deterioration is modeled with three
μ(p) deterioration stages, with parameters represented in table
I.
EL VL L ML M MH H VH EH TABLE I
DECISION NODE A
Decision Description

I Replace
II Do Nothing

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 p


Figure 1 Fuzzy probabilities

Page 175 of 478


ICIST 2014 - Vol. 1 Regular papers

The illustration of influence diagram for the transformer


TABLE II
risk assessment is given on figure 2.
CHANCE NODE B TRANSFORMER CONDITION
State Description Failure rate Probability
λ
(int./year)
A AI AII

I Good 0,02 H EL
II Medium 0,05 VL L
C III Bad 2 EL MH

B
D
TABLE III
CHANCE NODE C AMBIENT CONDITIONS
E State Description Probability

F G I Moderate temperature, not H


below -30
II Severe conditions, L
Figure 2. Transformer condition assessment temperature below -30

A – decision node Node D is the chance node describing the loading of the
B – transformer condition transformer. This node has two parent nodes: in the case
C – wheather conditions of cold winter, the loading will increase. The decision of
D – transformer loading keeping the existing transformer will also affect the
E – failure probability loading, because in the case of replacement, dispatcher
F – penalty policy would more likely decide to put more loads to the new
G – value of risk node transformer from surrounding feeders (Table IV).

TABLE IV
Node A is decision node, and decision is bound to only CHANCE NODE D TRANSFORMER LOADING
two decisions: whether to replace (AI) or keep the State Descriptio Conditional Probabilities
n
existing power transformer in use (AII).
AI AI AII AII
CI CII CI CII
A. Predictive support
I Below H H EL
maximum
Increased number of transformer outages is expected, II Around VL L VL L
maximum
but that number, together with consequences that these
III Above EL H EL MH
outages will produce can vary depending on uncertain maximum
parameters in the future, including weather conditions,
loading of the transformer, and level of penalties imposed
by the regulator. Therefore, one has to investigate the
possibility of keeping it in service one more year, and to Node E is chance node describing reliability parameters
check whether this decision greatly increases the risk of depending on related transformer station (Table V). The
surpassing required values for system reliability, imposed calculation of expected failure probability, and
by the regulator. consequently the SAIFI parameter is based on Poisson
Node B is describing the condition of transformer itself. law, with λ denoting the failure rate from table II, and k
It can be described by three deterioration states: bad, representing the number of failures.
medium and good. This node represents the chance node,  k e 
because the condition of the transformer is of the f k  
stochastic nature, and cannot be fully determined by
k! (7)
transformer diagnostic. On the other hand, this node has
the parent node A, because the state of transformer’s
health is directly influenced by the decision of The usual metrics of system reliability is System
replacement. The conditional probabilities are presented Average Interruption Index (SAIFI). Hypothetically, the
in Table II. new law, which will drastically increase the penalties in
Node C is independent node, describing future weather the case of surpassing value of 2 interruptions per
conditions of purely stochastic nature (Table III). customer and year, is expected, but some uncertainty

Page 176 of 478


ICIST 2014 - Vol. 1 Regular papers

about the date of adoption still exists. Verbally modeled Chance node L is the parent node to chance node FC,
probabilities are presented in table VI. and node Cond is the child node for both Age and FC
nodes. If both Age and L are represented by discrete
TABLE V nodes, rules for probability calculations of child node and
CHANCE NODE E
parent nodes are presented in equations (8) –(10).
Stat Descript Conditional
e ion Probabilities
SAIFI BI BI BI BII BII BII BII BII BIII P  Condi , Age, FC  
DI DII DI DI DII DII I I DII
II I DI DI
I
I  P  Condi \ Age, FC   P( Age j )  P( FCk )
j k

I <1 E V H M EL V V (8)
H H H L L P  FCk   P  FCk \ L j  P( L j )
II 1-2 L L M L M L M j
L L L (9)
III >2 E V EL M M M M M
L L H H L H Probability that, given the evidence that condition is in
the state i, hypothesis of loading being in the state j is:

P  L j  P(Condi \ L j )
TABLE VI P  L j \ Condi   (10)
CHANCE NODE F PENALTIES P(Condi , Age, FC )
State Description Probabilities
For the transformer that is, for example, 25 years old,
I New energy law adopted, severe H with the furan content of 2200 ppm, we are getting
penalties – [0.8 0.9 1] following membership functions: Age = (0 young, 0,4
II New law not yet adopted, mild L medium 0,4 old) and FC = (0 low, 0,95 medium 0,05
penalties [0.2 0.3 0.4]
high). For the sake of practical representation of data in
BN, a simple mapping of random variable to appropriate
fuzzy probability Xi→FP(Xi) has to be performed, by the
The value node G is the risk node, defined as the selection of appropriate fuzzy probability set . For the
product of probability and consequence (financial proposed example, mapping is presented in the following
penalty). table.

In this simplified model of power transformer, its TABLE VII


condition can be assessed by two independent variables: MAPPING OF FUZZY VARIABLES TO FUZZY PROBABILITY MEASURES
Age of the transformer and the Furan content (FC). Both Variable: Fuzzy Variable: Fuzzy
Age probability Furan probability
Age and FC can be represented by triangular fuzzy sets, content
with following presumed membership functions: Age
(Young[0 0 15], Medium [5 25 40], Old [25 40 40]); Young EL Low EH
Furan content(Low [0 0 2000], Medium [0 2000 4000], Medium M Medium M
High [2000 4000 4000]) with variables expressed in years Old M High EL
and ppm, respectively. The condition of the transformer
will be represented by three states: Good, Medium and
Bad.
To calculate the probability of transformer being in one The value node, G, represents the risk associated to
of deterioration states, the results of diagnostic tests of particular event E. FP(E) is fuzzy probability calculated
the furan content FC, which is directly influenced by the for the node E, and final value of risk is the expected
loading history of the transformer is used (Figure 3). value of risk for all combinations of event E over N
Conditional probabilities of deterioration state, depending possible outcomes of event F. PENi denotes penalties in
on the decision (or the age of the transformer) are the case of the i-th outcome of event F. Penalties are also
presented in table III and are expressed by appropriate represented as fuzzy numbers, and they are given in per
fuzzy sets. unit values, relative to the maximal possible penalty:
L
N

Risk  FP( E )  FP( Fi )  PENi (11)


Age FC
i 1

Using expressions for the fuzzy joint probability and


Cond Bayes rule, we are calculating the value of node G
. (Figure 4). Different methods of fuzzy number ordering
Figure 3. Transformer condition assessment
can be used, and final results are showing that by

Page 177 of 478


ICIST 2014 - Vol. 1 Regular papers

replacing the transformer we are reducing the risk more


than two times [9].

Figure 4. Fuzzy values of risk node G for two alternative decisions for
a) replacing and b) keeping the existing transfomer

B. Diagnostic support Figure 5. Fuzzy values of probability of transformer being in a) bad, b)


medium and c) good condition.

In the case of diagnostic support, we will presume that


results from previous year show the level of SAIFI
surpassing 2 interruptions per customer. The cause of ACKNOWLEDGMENT
interruptions are unknown (internal fault in the
transformer followed by Buholtz relay tripping, contact The work presented here was supported by the
thermometer, or overcurrent relay tripping caused by the Serbian Ministry of Education and Science (project
overloading). The transformer in supplying transformer III44006).
station has not been replaced (AII), but its condition is
unknown. Weather conditions were severe (CII). To V. CONCLUSION
calculate the probability that condition of transformer is
good (BI) in spite of achieved level of reliability, the The utilization of a probability measure for uncertainty
expression (6) is used. modeling requires too much precise information in the
form of prior and conditional probability tables, and such
FP( B  BI \ E  EIII )  information is often difficult or impossible to obtain. In
FP( B  BI )  FP( E  EIII \ B  BI ) (8) this paper, a verbal expression or interval value of
 probabilistic uncertainty is proved to be more appropriate
FP( E  EIII ) than numerical values. The fuzzy influence diagram with
the fuzzified probabilities of states is presented in the
Obtained fuzzy numbers of probability of transformer paper.
being in bad, medium or good conditions are shown on Calculation of these probabilities is performed with
figure 5. interval based fuzzy arithmetic. Results presented in case
studies proved that this new form of description - fuzzy
influence diagram, that is both a formal description of the
problem that can be treated by computers and a simple,
easily understood representation of the problem can be
successfully implemented for various class of risk
analysis problems in power systems.

Page 178 of 478


ICIST 2014 - Vol. 1 Regular papers

REFERENCES
[1] An N, Liu J, Bai Z, “Fuzzy Influence Diagrams: an approach to
customer satisfaction measurement,” Fourth International
Conference on Fuzzy Systems and Knowledge Discovery, 2007.
[2] Begovic M, Djuric P, “On predicting the times to failure of power
equipment” 43rd Hawaii International Conference on system
science (HICSS), 2010 , Page(s): 1 – 6
[3] Elkalashy N.I,et al. “Bayesian selectivity technique for earth fault
protection in medium-voltage networks” IEEE Transactions on
Power Delivery, Volume: 25 , Issue: 4 2010 , Page(s): 2234 –
2245
[4] Gulachenski E.M, Besuner P.M, “Transformer failure prediction
using Bayesian analysis“ IEEE Transactions on Power Systems,
Volume: 5 , Issue: 4, 1990 , Page(s): 1355 – 1363
[5] Halliwell J, Keppens J, Shen Q, Linguistic Bayesian networks for
reasoning with subjective probabilities in forensic statistics,
Proceedings of the 5th International Conference of AI and Law,
Edinburgh, Scotland, June 24-28, 2003, pp 42-50
[6] Halliwell J, Shen Q, Towards a linguistic probability theory,
Proceedings of the 11th International Conference on Fuzzy Sets
and Systems (FUZZ-IEEE 02) Honolulu, May 16-17, 2002, 596-
601
[7] Howard R, Matheson J, “Influence Diagrams”, Decision Analysis,
Vol. 2, No.3, September 2005
[8] Hui L, Yan Ling X, “The traffic flow study based on Fuzzy
Influence Diagram theory”, Second International Conference on
Intelligent Computation Technology and Automation, 2009.
[9] Janjic A, Stajic Z, Radovic I,”A practical inference engine for risk
assesment of power systems based on hybrid fuzzy influence
diagrams”, Latest Advances in Information Science, Circuit and
Systems, Proceedings of the 13th WSEAS International conference
on Fuzzy Systems, FS12
[10] Jenkinson R, Wang J, Xu D.L, Yang J.B, “An ofshore risk
analysis method using fuzzy Bayesian network”, J. Offshore
Mech. Arct. Eng. 131, 041101 (2009) (12 pages)
[11] Jenssen F, Nielssen T, “Bayesian networks and decision graphs”,
Springer Science, 2007
[12] Mateou N.H, Hadjiprokopis A.P, Andreou A.S, “Fuzzy Influence
Diagrams: an alternative approach to decision making under
uncertainty”, Proceedings of the 2005 International Conference
on Computational Intelligence for Modelling, Control and
Automation, (CIMCA-IAWTIC05)
[13] Pearl. J, “Probabilistic reasoning in intelligent systems”. Networks
of Plausible Inference, Morgan Kaufmann, Palo Alto, USA, 1988
[14] Rolim G, Maiola P.C, Baggenstoss H.R., da Paulo A.R.G,
“Bayesian Networks Application to Power Transformer
Diagnosis” Power Tech, 2007 IEEE Lausanne 2007 , Page(s):
999 – 1004
[15] Yang. C.C , K. Cheung,“Fuzzy Bayesian analysis with continuous
valued evidence” IEEE International Conference on Systems, Man
and Cybernetics, 1995. Intelligent Systems for the 21st Century.,
Page(s): 441 - 446 vol.1
[16] Yang C.C “Fuzzy Bayesian inference”, 1997 IEEE International
Conference on Systems, Man, and Cybernetics, 1997.
Computational Cybernetics and Simulation.Volume: 3 ,1997 ,
Page(s): 2707 - 2712
[17] Zhang Z, Jiang Y, McCalley J, “Condition based failure rate
estimation for power transformers”, Proc. of 35th North American
Power Symposium, Octobar 2003. Rolla, USA
[18] Zhao W.; Zhang Y; Zhu Y; “Diagnosis for transformer faults
based on combinatorial Bayes Network”, 2nd International
Congress on Image and Signal Processing, 2009. CISP '09.
Page(s): 1 - 3

Page 179 of 478


ICIST 2014 - Vol. 1 Regular papers

Linear Fuzzy Space Based Scoliosis Screening


Marko Jocić*, Dejan Dimitrijević*, Milan Pantović**, Dejan Madić**, Zora Konjović*
* University of Novi Sad, Faculty of Technical Sciences, Novi Sad, Serbia
** University of Novi Sad, Faculty of Sport and Physical Education, Novi Sad, Serbia

{m.jocic, dimitrijevic}@uns.ac.rs, [email protected], [email protected], [email protected]

Abstract — In this paper we propose a method for scoliosis projected from two or more light sources (waves).
screening based on mathematical model of linear fuzzy space Depending on the light wave amplitudes, phase difference
and image processing using self-organizing maps. Taking into and its frequencies, the interferences can cause the light to
account that the number of school age children with some sort either grow or dim, which in turn produces darker or
of a spine deformity in Serbia exceeds 27%, this paper’s lighter lit zones. During the 70-ies and early 80-ies of the
research came out of a need to develop and implement some last century new methods were developed for diagnostics
novel, effective and primarily economical methods for and follow-ups of some spine deformity disorders sing the
automated diagnostics of some spine disorders. The ultimate aforementioned Moiré topography contour maps [5–7].
goal however is to produce a suite of mobile applications That had also triggered a rise in the research of modelling
capable of automated diagnostic of some spine disorders, which a human spine and its potential deformities [8–10]
could be used by non-medically educated school personal for predominantly based on visual data. However, since the
the purpose of early diagnosis i.e. screening for those spine 80-ies and the early 90-ies of the previous century, along
disorders within adolescents (when the early physical therapy with the advances in development and greater accessibility
and scoliotic bracing proves to be most effective, and thus least of other sensory technologies and more precise devices,
monetary demanding compared to some other invasive means of such as laser scanners, the interference images, silk fabric
clinical therapy). blinds and models produced through regular optical means
were mostly substituted by more precise three dimensional
I. INTRODUCTION models of human back surfaces and spines produced using
The main subject of research presented in this paper is more advanced measuring methods [11].
an evaluation of some novel noninvasive spine disorders The “golden” standard in the scoliosis diagnostics is
diagnostic methods and implementations realized with known as the Cobb’s angle [12] and is the best method for
limitations by price and precision. Automated diagnostic measuring of scoliotic curvatures used today. It equals the
solutions for spine disorders currently come in various angle formed by the intersection of the two lines drawn
shapes and sizes, using various diagnostic methods. Some perpendicularly to the end plates of the superior and
of the methods i.e. techniques used today for diagnosing inferior spinal vertebrae which form the scoliotic curves.
spine disorders are based on manual deformity testing, However, since that requires the spinal anterior-posterior
topographic visualizations, other sensory inputs (such as projection first (not easily attainable without radiography),
laser, infrared, ultrasound scanners, etc.), magnetic a number of other measuring methods have been formed
resonance imaging (MRI) and/or radiographic imaging i.e. as an alternative which don’t require radiography, among
ionizing radiation. Beside deformity tests which can be them a couple named Posterior Trunk Symmetry Index
conducted, with or without additional aids (scoliometers), (POTSI) and Deformity in the Axial Plane Index (DAPI),
by sufficiently medically trained personal only [1], second both of which will be further reviewed later on.
most used method for spine deformity disorder diagnostic This paper is laid out in the following chapters: the first
is via radiographic imaging. However, since the chapter gives an introduction into the research we are
recommended age for scoliosis testing is between ages 10 conducting, as well as the personal and historical motives
and 14, and even twice with the same period for female driving such and similar research. Chapter II gives an
adolescents [2] with a positive diagnosis rate of about 5%, overview of some related work done previously that has
it is of no surprise that radiographic imaging is avoided. been identified by us both in academic paper and practical
Also, since the number of recommended scoliosis tests both commercial and academic solution terms. Chapter III
does not include the number of post-diagnostic follow-ups gives preliminaries, providing some basic facts about self-
for positively diagnosed adolescents, and thus the organizing maps and linear fuzzy space. Chapter IV shows
potential number of needed additional radiographic the proposed algorithm, and Chapter V shows the results
images taken, it is even more so obvious that the of the proposed algorithm, concluding with directions of
development of alternative noninvasive methods and future research.
techniques came from a de facto need for reduction of
negative cumulative effects of ionizing radiation on II. RELATED WORK
adolescents [3].
As noted, this section deals with existing automated
The first noninvasive methods for diagnosing scoliosis spine disorder diagnostics related work. Subsection A lists
without the use of ionizing radiation or deformity tests some of the methods deduced in the academic literature,
came about after 1970, with the so-called Moiré and subsection B surveys some of the solutions currently
topography [4]. Moiré topography represents a method of available for automated scoliosis spine disorder
morphometrics in which a three dimensional contour maps diagnostic.
are produced using the interference of coherent light, as
the observed object gets flooded with parallel light

Page 180 of 478


ICIST 2014 - Vol. 1 Regular papers

A. Current academic literature and given guidelines POTSI ≤ 27.5%) are classified as non-pathological, but
Even though the Moiré topography contour maps could subjects with high DAPI or either POTSI are diagnosed as
provide immediate visual feedback into potential spinal pathological.
deformities, such feedback was only crudely quantifiable B. Current spine disorder diagnostic solutions
without such advances as POTSI or DAPI, explained now:
Because the number of school age children in Serbia
Posterior Trunk Symmetry Index (POTSI) is a [16] with some sort of a spinal disorder rose above 27% of
parameter of assessment of the surface trunk deformity in which on scoliosis accounts more than 19%, there is a
scoliosis, first described by Suzuki et al. back in 1999 significant motivation for seeking out improved solutions
[13], and it is a key parameter to assess deformity in the used in early screenings for various spinal disorders.
coronal plane. Eight specific points at the surface of the However, the economic costs for screening of an
patient’s back are required, and those are: the natural cleft, individual child for scoliosis also seems to increase with
C7 vertebra, most indented point of the trunk side line on greater solution’s complexity, compared to use of simple
both sides, the axilla fold on both sides and the shoulder aids such as scoliometers [17]. Thus, our systematic
points which are cross points of the shoulder level lines research of publicly available PhD papers produced within
(Figure 1 left) and the lines drawn vertically from each the last decade identified two papers dealing with the
axilla fold (Figure 1 right). The center line is drawn from solutions for automated diagnostics of scoliosis [18, 19].
the natal cleft. POTSI is relatively simple to measure, But, both solutions were considerably more complex and
even on regular photography of the back. Ideal POTSI is expensive, requiring highly computationally and
zero, meaning full symmetry of the back surface. Normal measurably capable resources (i.e. laser scanners) for
values were reported to be below 27 [13, 14]. POTSI is analysis and diagnostics.
very sensitive in revealing any frontal plane asymmetry.
To measure it Frontal Asymmetry Index (FAI) values for III. PRELIMINARIES
the axillar, trunk and C7 spinal vertebra must be
Approach proposed in this paper consists of using self-
determined, respectfully as - , organizing maps and fuzzy sets theory to determine
- , - , where a and b scoliosis by digital image analysis. Our previous work
are distances measured from the center line to waist [22-25] in mathematical models for describing imprecise
points, c and d are lengths from center lines to axilla fold data image and medical analysis [26,27] has shown that
points and i equals center line distance to C7 vertebra this approach based on fuzzy points and fuzzy lines in
position. Height Difference Index (HDI) values of the linear fuzzy space is simple, yet can be very effective.
shoulders, underarm (axilla) and trunk must also be Another motivation for this alternative approach is
determined, respectfully as - , because of the inherent properties of digital images which
are vagueness and imprecision, which are caused by
- , - , where e equals
image resolution, bad contrast, noise, etc.
height offset from natal cleft to C7 vertebra points, and f,
g and h equal the height offsets of left and right most A. Self-organizing map
indented waist, axilla fold and shoulder level points. The Self-organizing map, also called Kohonen map, is a type
total sum of all the listed index values represents the of artificial neural network that is trained using
Posterior Trunk Symmetry Index (POTSI). unsupervised learning to produce a low-dimensional
(typically two-dimensional), discretized representation of
the input space of the training samples, called a map.
Self-organizing maps are different from other artificial
neural networks in the sense that they use a neighborhood
function to preserve the topological properties of the
input space. Unsupervised learning algorithm is based on
competitive learning, in which the output neurons
compete amongst themselves to be activated, with the
result that only one is activated at any one time. This
activated neuron is called the winning neuron. Such
competition can be induced/implemented by having
Figure 1 - Schematic of POTSI FAI (left) and HDI (right) lateral inhibition connections (negative feedback paths)
parameter points and measurement lines between neurons. The result is that neurons are forced to
organize themselves.
As for the other parameter of spinal deformity B. Linear fuzzy space
mentioned which is named Deformity in the Axial Plane
Index (DAPI), it represents a topographic variable which Definition. Fuzzy point , denoted by is defined
quantifies the spinal deformity in another plane different by its membership function , where the set
from POSTI. Its value is measured mostly based on depth contains all membership functions
distances of most and least prominent points on the human satisfying following conditions:
back surface, making it a complementary value for a
combined more accurate diagnostic criterion [15]: subjects
with normal DAPI and POTSI values (DAPI ≤ 3.9% i i) ,

Page 181 of 478


ICIST 2014 - Vol. 1 Regular papers

ii) (i) ,
(ii) ,
iii) function is upper semi continuous,
(iii)
iv) -cut of function
is convex.
where , i
.
The point from with membership function
will be denoted by ( is the core of the fuzzy point C. Spatial relations in linear fuzzy space
), and the membership function of the point will be
denoted by . By we denote the -cut of the Spatial relations (predicates) are functions that are used to
fuzzy point (this is a set from ). establish mutual relations between the fuzzy geometric
objects. The basic spatial relations are coincide, between
Definition. Linear fuzzy space is the set of and collinear. In this section we will give their definitions
all functions which, in addition to the properties given in and basic properties.
Definition 2.1, are: Fuzzy relation coincidence expresses the degree of truth
that two fuzzy points are on the same place.
i) Symmetric against the core
), Definition 4.1 Let be the Lebesgue measure on the set
, and is a linear fuzzy space. The fuzzy relation
where is the distance in . is fuzzy coincidence represented
ii) Inverse-linear decreasing w.r.t. points’ distance from by the following membership function
the core according to:
.

Remark. Since the lowest is always 0, then a


membership function of the fuzzy coincidence is given by

.
where is the distance between the point and
the core ( ) and is constant. Elements Proposition “Fuzzy point is coincident to fuzzy point
of that space are represented as ordered pairs ” is partially true with the truth degree ; in
where is the core of , and is the distance the Theorem 4.1 we present method for its calculation.
from the core for which the function value becomes 0; in
the sequel parameter will be denoted as fuzzy support
Theorem 4.1 Let the fuzzy relation coin be a fuzzy
radius.
coincidence. Then the membership function of the fuzzy
relation fuzzy coincidence is determined according to the
Definition. Let be a linear fuzzy space. Then a following formula
function is called linear
combination of the fuzzy points given by

where and operator is a scalar multiplication Fuzzy relation contains or between is a measure that
of fuzzy point. fuzzy point belongs to fuzzy line or fuzzy line contains
fuzzy point.
Definition 3.3 Let and . Then a point
is called internal homothetic center if the Definition 4.2 Let be Lebesgue measure on the set
following holds , linear fuzzy space and be set of all fuzzy
lines defined on . Then fuzzy relation
,
is fuzzy contain
where and . represented by following membership function

Definition 3.5 Let be fuzzy line defined on linear .


fuzzy space and . Then a fuzzy point
is called fuzzy image of point on fuzzy line Remark. Its membership function could be also
, and a real number is called eigenvalue of represented as
the fuzzy image on fuzzy line if following hold

Page 182 of 478


ICIST 2014 - Vol. 1 Regular papers

symmetry for found edge points on that horizontal line is


. calculated. At this point, we calculate an angle that these
Proposition “Fuzzy line contain fuzzy point ” is centers form, and this is done by using linear regression.
This angle can be another useful indicator for
partially true with the truth degree in
determining scoliosis.
the Theorem 4.2 we present method for its efficient
calculation. Because the image processed with SOM is simplified,
inherently a certain amount of imprecision is introduced
Theorem 4.2 Let be fuzzy points defined to the image. So, we decided to model this imprecision by
on linear fuzzy space, and be fuzzy using fuzzy points for previously calculated points that
image of point on fuzzy line . Points and represent centers of symmetry. The amount of uncertainty
are internal homothetic center fuzzy points for fuzzy for these fuzzy points is inversely proportional to size of
points and and and respectively. Then the SOM that was used to segment the image – the smaller
membership function of the fuzzy relation fuzzy contain the SOM used the more imprecise these points are.
is determined according to the following formula Following this, a measure of fuzzy collinearity for these
fuzzy points is calculated and this value can be used with
as an input for some machine learning algorithms to infer
whether scoliosis is present in the analyzed image or not.
Complete algorithm is shown as flowchart in Figure 2.

where point is a projection of core of on the line


passing through the points and .

Definition 4.3 Let be a fuzzy points defined


on linear fuzzy space and be Lebesgue measure on
the set . The fuzzy relation
is fuzzy collinearity between three fuzzy points and
it is represented by following membership function

.
Proposition "Fuzzy points and are collinear" is
partially true with the truth degree ; in the
Theorem 4.3 we present method for its calculation.

Theorem 4.3 Let , fuzzy relation


be fuzzy contain. Then a membership function of the
fuzzy relation fuzzy collinearity is determined according
to the following formula

This definition of fuzzy collinearity for three points is easily


extended to arbitrary number of fuzzy points.

IV. PROPOSED ALGORITHM


First step in our algorithm is image segmentation by
using self-organizing map (SOM). SOM is used to reduce
the number of colors of the digital image that is analyzed.
More precisely, after providing SOM with training data
(colors on the image), 16 million possible colors are
quantized to only 4 colors, and then trained SOM is
applied to the original image to get much more simplified
image. This allows easier extraction of edge points. These
edge points are extracted in a simple way – every
horizontal line on the image is scanned in order to find
transitions between colors, and each of these color
transitions is considered an edge points. After that, for Figure 2 – The proposed algorithm flowchart
each horizontal line, a mean point, or a center of

Page 183 of 478


ICIST 2014 - Vol. 1 Regular papers

V. RESULTS VI. CONCLUSION


In this chapter we show the results of applying the In this paper we proposed a novel method for scoliosis
proposed algorithm to image shown in Figure 3. On the screening based on mathematical model of linear fuzzy
left side of this figure, a healthy spine is shown, while the space and image processing using self-organizing maps.
right side shows a scoliotic spine. As previously Proposed algorithm results in two values: angle formed
described, SOM with dimensions 2x2 is used to reduce by calculated centers of symmetry and fuzzy collinearity.
the number of colors from 16 million to only 4. The Future research would provide the proposed algorithm
results of image segmentation are shown in Figure 4. with many real-world images of patients with and without
scoliosis, and certain number of these images could be
taken as a training set for some supervised machine
learning algorithm which would later be used to infer the
presence of scoliosis. As inputs for this training set,
calculated measures of fuzzy collinearity and calculated
angles which centers of symmetry form could be used
along with indicator if scoliosis is present.

ACKNOWLEDGMENT
Research presented in this paper is partly funded by the
Ministry of Education, Science and Technological
Development of the Republic of Serbia, Grant No. III
47003.
Figure 3 – Image of spine without (left) and with
scoliosis (right) REFERENCES
Following the image segmentation, edge points are found [1] T. W. Grossman, J. M. Mazur, and R. J. Cummings, “An
evaluation of the Adams forward bend test and the scoliometer in
and after that their corresponding centers of symmetry. a scoliosis school screening setting,” J. Pediatr. Orthop., vol. 15,
With available centers of symmetry, the angles that these no. 4, pp. 535–538, Aug. 1995.
points form are calculated by using linear regression [2] D. L. Skaggs, “Referrals from scoliosis screenings,” Am. Fam.
(blue lines). Fuzzified centers of symmetry, along with Physician, vol. 64, no. 1, pp. 32, 34–35, Jul. 2001.
calculated angles and extracted edge points are shown in [3] C. L. Nash Jr, E. C. Gregg, R. H. Brown, and K. Pillai, “Risks of
Figure 5. Calculated angles that centers of symmetry exposure to X-rays in patients undergoing long-term treatment for
scoliosis,” J. Bone Joint Surg. Am., vol. 61, no. 3, pp. 371–374,
form are 4° on the left image, and 17° on the right image. Apr. 1979.
Also, calculated measure of fuzzy collinearity between [4] H. Takasaki, “Moire Topography,” Jpn J Appl Phys Suppl, pp.
fuzzy points is 0.8 on the left image, and 0 on the right 14–1, 1975.
image. [5] I. V. Adair, M. C. Van Wijk, and G. W. Armstrong, “Moiré
topography in scoliosis screening,” Clin. Orthop., no. 129, pp.
165–171, Dec. 1977.
[6] S. Willner, “Moiré topography for the diagnosis and
documentation of scoliosis,” Acta Orthop., vol. 50, no. 3, pp. 295–
302, 1979.
[7] T. Laulund, J. O. Søjbjerg, and E. Hørlyck, “Moire topography in
school screening for structural scoliosis,” Acta Orthop., vol. 53,
no. 5, pp. 765–768, 1982.
[8] W. Frobin and E. Hierholzer, “Analysis of human back shape
using surface curvatures,” J. Biomech., vol. 15, no. 5, pp. 379–
390, 1982.
[9] B. Drerup and E. Hierholzer, “Objective determination of
anatomical landmarks on the body surface: Measurement of the
vertebra prominens from surface curvature,” J. Biomech., vol. 18,
no. 6, pp. 467–474, 1985.
[10] A. R. Turner-Smith, J. D. Harris, G. R. Houghton, and R. J.
Jefferson, “A method for analysis of back shape in scoliosis,” J.
Figure 4 – Image segmentation with self-organizing map Biomech., vol. 21, no. 6, pp. 497–509, 1988.
[11] Y. Santiesteban, J. M. Sanchiz, and J. M. Sotoca, “A Method for
Detection and Modeling of the Human Spine Based on Principal
Curvatures,” in Proceedings of the 11th Iberoamerican Conference
on Progress in Pattern Recognition, Image Analysis and
Applications, Berlin, Heidelberg, 2006, pp. 168–177.
[12] “Cobb’s angle.” [Online]. Available: http://www.e-
radiography.net/radpath/c/cobbs-angle.htm. [Accessed: 05-Jan-
2014].
[13] N. Suzuki, K. Inami, T. Ono, K. Kohno, and M. A. Asher,
“Analysis of posterior trunk symmetry index (POTSI) in Scoliosis.
Part 1,” Stud. Health Technol. Inform., pp. 81–84, 1999.
Figure 5 – Result image

Page 184 of 478


ICIST 2014 - Vol. 1 Regular papers

[14] K. Inami, N. Suzuki, T. Ono, Y. Yamashita, K. Kohno, and H. [21] “CONTEMPLAS: Motion analysis software, gait analysis,
Morisue, “Analysis of posterior trunk symmetry index (POTSI) in treadmill.” [Online]. Available: http://www.contemplas.com/.
Scoliosis. Part 2,” Stud. Health Technol. Inform., pp. 85–88, 1999. [Accessed: 12-Jan-2014].
[15] M. F. Minguez, M. Buendia, R. M. Cibrian, R. Salvador, M. [22] Đ. Obradović, Z. Konjović, E. Pap, and N. M. Ralević, “The
Laguia, A. Martin, and F. Gomar, “Quantifier variables of the maximal distance between imprecise point objects,” Fuzzy Sets
back surface deformity obtained with a noninvasive structured and Systems, vol. 170, no. 1, pp. 76–94, May 2011.
light method: evaluation of their usefulness in idiopathic scoliosis [23] Đ. Obradovic, Z. Konjovic, E. Pap, and I. J. Rudas, “Modeling
diagnosis,” Eur. Spine J., vol. 16, no. 1, pp. 73–82, Jan. 2007. and PostGIS implementation of the basic planar imprecise
[16] Мadić, D. “Relacije motoričkog i posturalnog statusa dece u geometrical objects and relations,” in Intelligent Systems and
Vojvodini (Relations of motor and postural status of children in Informatics (SISY), 2011 IEEE 9th International Symposium on
Vojvodina)” Proceedings of Anthropological status and physical Intelligent Systems and Informatics, 2011, pp. 157–162.
activity of children and youth, 2006, vol. 40, pp. 185-191. [24] Đ. Obradovic, Z. Konjović, and M. Segedinac, “Extensible
[17] J. Chowanska, T. Kotwicki, K. Rosadzinski, and Z. Sliwinski, Software Simulation System for Imprecise Geospatial Process,”
“School screening for scoliosis: can surface topography replace presented at the ICIST, Kopaonik, 2011, pp. 1–6.
examination with scoliometer?,” Scoliosis, vol. 7, p. 9, Apr. 2012. [25] Đ. Obradovic, Z. Konjović, E. Pap, and I. J. Rudas, “Linear Fuzzy
[18] J. L. Jaremko, “Estimation of Scoliosis Severity from the Torso Space Based Road Lane Model and Detection,” Knowledge-Based
Surface by Neural Networks.” [Online]. Available: Systems 38, pp. 37-47, 2013.
http://dspace.ucalgary.ca/bitstream/1880/42538/1/Jarmeko_20428. [26] Đ. Obradovic, Z. Konjovic, E. Pap, and M. Jocic, “Linear fuzzy
pdf. [Accessed: 05-Jan-2014]. space polygon based image segmentation and feature extraction,”
[19] T. M. L. Shannon, “Dynamic Surface Topography And Its in Intelligent Systems and Informatics (SISY), 2012 IEEE 10th
Application To The Evaluation of Adolescent Idiopathic Jubilee International Symposium on, 2012, pp. 543 –548.
Scoliosis.” [Online]. Available: [27] M. Jocic, Đ. Obradovic, Z. Konjovic, and E. Pap, “2D fuzzy
http://cms.brookes.ac.uk/staff/PhilipTorr/Theses/Thesis_TML_SH spatial relations and their applications to DICOM medical images”
ANNON_26_OCT10_2.pdf. [Accessed: 10-Jan-2014]. in Intelligent Systems and Informatics (SISY), 2013 IEEE 10th
[20] K.-R. Ko, J. W. Lee, S.-H. Chae, and S. B. Pan, “Study on International Symposium on, 2013, pp. 39-44.
Determining Scoliosis Using Depth Image.”

Page 185 of 478


ICIST 2014 - Vol. 1 Regular papers

Context Modeling based on Feature Models


Expressed as Views on Ontologies
Siniša Nešković*, Rade Matić**
* University of Belgrade/Faculty of Organizational Sciences, Belgrade, Serbia
** Belgrade Business School, Belgrade, Serbia

[email protected], [email protected]

Abstract—This paper presents an approach for context through mappings (correspondences) between modeling
modeling in complex self-adapted systems consisting of elements of global and local contexts.
many independent context-aware applications. Contextual Additionally, local context models in our approach are
information used for adaptation of all system applications expressed in the form of feature models (FM) [5], which
are described by an ontology treated as a global context are commonly used in software product line engineering
model. A local context model tailored to the specific needs of (SPLE) to enable generation of application variants
a particular application is defined as a view over the global customized to specific needs of users. In our approach,
context in the form of a feature model. Feature models and derived feature models are used to instantiate variants of
their configurations derived from the global context state context-aware applications corresponding to a specific
are then used by a specific dynamic software product line in
context state. Thereby, our approach relies on so called
order to adapt applications at the runtime. The main focus
dynamic software product lines (DSPL) [6] as the main
of the paper is on realization of mappings between global
adaptation mechanism in CASAS.
and local contexts. The paper describe an overall model
architecture and provides corresponding metamodels as The main focus of this paper is on describing the
well as rules for mapping between feature models and realization of mappings between global and local contexts.
ontologies. The rest of the paper is structured as follows. The next
section gives an overview of work related to our research.
In Section III our approach is described by an overall
I. INTRODUCTION model architecture, brief description of the adaptation
Context-aware self-adapted systems (CASAS) are process and detailed descriptions of models used to realize
characterized by so called “smart applications” which global and local contexts. An example to illustrate our
react to users and their surroundings without user’s approach is also given. Final Section IV ends the paper
explicit commands [1]. Such systems require an with conclusions.
adaptation mechanism which timely adapts applications
at the runtime according to changes in the context. The II. RELATED WORK
development of such mechanism is usually based on a A lot of research has been recently dedicated to context
single context model and rules that specify which modeling and development of context-aware systems.
configurations of the applications should run in every Several techniques are proposed for the representation of
possible instance of the context [2, 3, and 4]. context [2, 7, and 8].
However, in case of very complex systems consisting Ontologies are the most expressive and most used
of many context-aware applications, it is very difficult to technique for modeling contextual information [9, 10, 11,
use a single context due to its complexity. Such single and 12]. The application of ontologies for context
global context must include all information needed by all modeling provides a unique way to specify
applications, i.e. data about large number of different key concepts as well as a number of additional concepts
situations and different users with different interest and and instances, and thus allows reuse and sharing of
views. Thus, adaptation of a single working application information or knowledge about the context in distributed
must deal with the entire context including large amount systems. However, limitations of this technique are
of context data that are mostly irrelevant. On the other various [7, 13, 14, and 15]. Most of suggested ontologies
hand, a possible solution to this problem could be to use do not provide a clear description of contextual
separate local contexts tailored for each particular information. Different ontologies have been proposed to
application. However, such solution is also connected with model domain specific context information or generic
many difficulties due to synchronization and potential models reusable in many domains. All of them have
inconsistencies among different local contexts possessing certain drawbacks in generality and/or dynamicity [4].
overlapping contextual information. General problem in all ontologies is that suggested model
This paper presents an approach to the problem of self- fails in using and presenting generic context ontology
adaptation in such complex CASAS which is based on because they may contain useless ontologies in specific
usage of both global and local contexts. Global context is applications, which limits their usage and extensibility.
treated as an ontology describing contextual information For example, location ontology is worthless in a context-
required by all applications, whereas local contexts are aware system operating in local area.
derived as views over the global context tailored to the Approaches defined in [4, 16, 17, and 18] use feature
needs of particular applications. Views are defined modeling as a technique for context modeling and
development of context-aware application in order to

Page 186 of 478


ICIST 2014 - Vol. 1 Regular papers

improve reusability and new configurability. Feature


models offer different degree of formalism and

definition time
expressiveness. Looking at the practical applicability and

Method
<<From>>
usability, which are not discussed here in detail, it can be ER Mapping <<To>> FM
stated that the more extensions for feature models are used metamodel metamodel metamodel

the more practical and usable for context modeling it is. A


limitation of feature models is that concepts or relations <<Conforms to>> <<Conforms to>> <<Conforms to>>
are defined in an unsuitable way. Feature models do not

Design time
have a clear view of the relation between contexts and Global <<From>>
Mapping <<to>>
Local context
features. It is thus difficult to determine if the features in a context model
(ER schema)
Model
model
(FM)
feature model are arranged and structured consistently
with domain knowledge and whether they are accurately
<<Conforms to>> <<Conforms to>>
expressed, organized and represented.
Hybrid models represent a combination of two or more
different modeling techniques for different usage, either Global <<Derived from>> Local context

general or domain-specific. In order to gain more context state state


(FM configuration)

Runtime
beneficial, flexible and general applications, many
researchers try to integrate or expand various context <<Updates>> <<Derived from>>
modeling techniques [19 and 20].
Surveys of self-adaptation software are presented in [21
and 22], but they do not analyze SPL as software Working applications

adaptation approaches. Other approaches that apply SPLE


paradigm to develop adaptive systems use a feature model
for variability [23, 24, 25, 26, 27, and 28]. UbiFEX was Figure 1. Model architecture
presented by Fernandes et al. [16]. Their aim is to provide UML packages in the diagram represent models,
a modeling notation that extends feature models with whereas various relationships among models are
context feature models. In [16, 18] context-aware represented by stereotyped dependency associations
adaptation is discussed. Desmet et al. [29], propose between corresponding packages. The diagram also
Context-Oriented Domain Analysis (CODA) while
classifies models in three different categories according
Hartmann et al. [18] present high-level context
information using context variability models (CVM). to the time when they are created (represented as swim
Using reconfiguration patterns which are based on UML lanes in the diagram):
collaboration and state diagrams, Gomaa et al. [30]  Runtime category encompasses models which are
propose a solution for dynamic reconfiguration of SPL. created during runtime of CASAS components
The paper presented by Ref. [4] is very interesting and and user applications.
similar to our research. Despite the similarities, our paper  Design time category encompasses models which
differs in the main idea of mapping ontology with FM and are created during the design of CASAS
making views on ontology. Regarding adaptation components and user applications.
mechanisms several research projects, such as MADAM  Method definition time category encompasses
[31], MUSIC [32], DiVA [33] and Trinidad et al. [26] models which are defined by our approach, i.e.
address component-based architecture to provide metamodels which are introduced in the next
development of self-adaptive systems. To control subsection of the paper.
adaptation, MADAM use architectural models and SPL
techniques, at runtime. MUSIC continued on the works of In our approach the Entity-Relationship (ER) data
the MADAM project developing methodology and tools model is used as an ontology definition language [34, 35,
for adaptation of mobile application. Compared to DiVA, 36, and 37]. Hence, a global context model is defined
the main difference is in implementations for each using an ER schema, which must conform to ER
component type of the architecture. Ref [26] assumes that metamodel. On the other hand, feature models are used to
feature represent component enabling DSPL to represent local context models which must conforms to
dynamically include or exclude its components at runtime. the FM metamodel.
Feature models are defined as views on ontologies,
III. OUR APPROACH namely, as projections of the ontologies from different
viewpoints [38]. The views definition is defined by a
This section describes our approach to context mapping model which maps concept of an ER schema to
modeling. We first give an overall model architecture, concepts of a FM. The defined mappings have to follow
which identifies all models (including their metamodels) rules and constraints, which are defined by the Mapping
and their relationships required in our approach. We also metamodel.
briefly describe the adaptation process and how ontology
models can be mapped to feature models. At the runtime level, a CASAS maintains a global
context state, which keeps contextual information at the
A. Model architecture particular moment. The global context state is usually
realized as some form of a database structured according
The overall model architecture is shown as an UML to its ER schema defined in the design time. The database
package diagram in Fig. 1. is updated, i.e. the global context state is maintained, by
context-aware applications and other CASAS’s run time
components.

Page 187 of 478


ICIST 2014 - Vol. 1 Regular papers

When significant changes of a context are detected,  An ER schema maps to a Feature model.
CASAS run time components will trigger an adaptation  Each Entity type (Kernel, Subtype, Aggregation,
process which will instantiate (generate) affected running and Week) maps to a Feature.
applications. The adaptation process consists of two main
steps:  Each Attribute and EnumerationLiteral maps to a
Feature.
 Derivation of local contexts. Local context states
for affected running applications are derived from  Each SpecializationMapping maps to a
the current global context state using the GroupedRelationship where Supertype in this
corresponding mapping models defined at the mappings maps into a Feature whose attribute
design time. According to SPL engineering “IsFeatureGroup” is set to true. Their subtypes
principles, the local context state is represented as become features whose attribute
a FM configuration. IsGroupedFeature is set to true.
 Instantiation of running applications. Using a  Each WeekMapping, AggregationMapping and
DSPL, affected working applications are OrdinaryMapping maps to a Single Relationship.
instantiated based on corresponding FM  Each Domen maps to a TypeAtribute.
configurations. Thus, a new instance (version) of  Cardinality of any mapping becomes cardinality
the application is adapted to the current local of relationships of their corresponding features.
context state.
 Mandatory of a feature depends from the
The adaptation process is preformed by a part of CASAS cardinality of mapping.
called Adaptation Manager. Due to space limitations,
detailed description of the adaptation process and  When two entities have two or more relationship
Adaptation Manager implementation is not included here. or if tree or more entities form a circle than
FMReference is created.
B. Metamodels and mapping rules
C. An example
ER and FM metamodels as well as the Mapping model
are shown in Fig. 2. ER metamodel is based on ER model In this section we illustrate our approach by an
defined in [39]. ERConcept represent the most abstract example of a CASAS aimed to support a consortium of
concept in the ER data model. It is specialized into more flower stores in a big city. The consortium has made an
concrete ER concepts: agreement with local taxi drivers to deliver flowers from
the stores to their customers. When a store gets a flower
 Entity represents types of objects in a system. It is delivery order from a customer, it creates a request which
further specialized into Kernel, Subtype, is sent to drivers from the store in order to select a driver
Aggregation, and Week entity types. assigned for the actual delivery. Drivers compete for the
 Relationship between two entity types. delivery by sending their current location. Depending of
 Mapping which represents relationship roles as preferences of stores (e.g. automatic or manual delivery
well as special relationships between specific assignments to drivers, delivery confirmation required or
entity types. Min and Max attributes specify lower not, etc.) and equipment options for drivers (e.g. whether
and upper bound of its cardinality. Mapping is driver is equipped with a GPS device), there can be many
further specialized into more concrete subtypes: different variants of applications supporting stores and
OrdinaryMapping (i.e. relationship role), drivers. Here we give three use cases (UC) with
WeakMapping, AggregationMapping, and contextual variants:
SpecializationMapping. 1. Use case: Select driver for delivery
 Attribute describes an entity type and Domain Actor: Florist
specifies the type value for an attribute. - The system sends offer for delivery to all drivers
The FM metamodel shown in Fig. 2 is an original - The system registers the positive responses and
version developed by authors independently of other FM the location of the driver
metamodels available in the literature. Its most abstract - The system ranks all driver responses based on
concept is FMConcept which is further specialized into current driver distances from the store
more concrete concepts: - Variant 1: Automatic Assignment
 Feature can be either solitary feature, grouped - The system selects driver with the best rank.
feature or feature group. Feature attribute is also - Variant 2: Manual assignment
defined as a specialized type of feature. - Florist manually selects the driver from the
 Relationship which represents an association driver ranking list.
between two features. It can be usual hierarchical 2. Use case: Send bid for delivery
feature/subfeature association, but also an Actor: Driver
association between feature group and grouped - The driver receives a request for a delivery
features. FDreference represent a reference to - The driver accepts the offer
another FM enabling a division of large feature - Variant 1: driver with GPS device
models into smaller ones. - The system gets the current location from
The Mapping metamodel defines allowed the driver’s GPS device
correspondences between ER concepts and FM concepts. - Variant 2: driver without GPS device
Allowed correspondences are determined by the following - Driver enters current location
rules: - Send confirmation response and current location

Page 188 of 478


ICIST 2014 - Vol. 1 Regular papers

ER MM Mapping model FM MM
ERConcept ER2FM
1 FMConcept

1
1..*
1
ER model
Mapping Element
Min FeatureModel
Max

1
1
1
1 Feature
Attribut Atr-Feat
1 IsRoot
Specialization 1 IsGroupedFeature
Mapping Entity-Feature
Spec-Group IsFeatureGroup

2..*
Weak

*
*
Mapping Mapp-Single
1

1 Relationship

1
Min
Aggregation
1

Max
Mapping 1
*

Entity 1 Attribut

0.1
Weak-Single
2..*

1
1

Grouped FDReference
*

*
Relationship Agg-Single
1
SubType Subtype-Feature
1
1

0.1
1
Weak Weak-Feature
1

1
Aggregation Agg-Feature Single

Kernel 1 Kernel-Feature
*

1
2

OrdinaryMapping Ord-Single
1

1
1
Domain Domain-TypeAttr TypeAttr
0..1

Enumeration PrimitiveType
*

EnumLiteral-Feature
EnumerationLiteral
1

Figure 2. Mapping metamodel

Page 189 of 478


ICIST 2014 - Vol. 1 Regular papers

3. Use case: Confirming delivery IV. CONCLUSIONS


Actor: Driver and Customer The main advantage of our approach stems from the
- Variant 1: Store requires delivery confirmation utilization of both global and local contexts modeled by two
- The driver asks the client to enter confirmation different modeling techniques. Ontologies are superior for
code given to him buy the store context modeling and realization of global context state, but
- Confirmation code is sent to the store not so suitable for the adaptation purposes in DSPL. On the
- Variant 2: Store doesn’t require confirmation other hand, feature models are suitable for the adaptation
- The driver has no this use case purposes, but inadequate for global context modeling. Thus,
our approach takes the best of both ontologies and feature
In order to do required adaptations, our CASAS system models by using synergy effects.
must keep the contextual information about stores and Comparing to other existing approaches, the key benefit of
drivers. Based on the given use case variants, the (simplified) our approach is in the adaptation process. It can be much
global context model is given in Fig. 3. more efficient due to smaller, less complex and better
Since stores and drivers has two independent applications tailored local context models. This efficiency is achieved
for realizing appropriate use cases, each application must without sacrificing the advantages of ontologies as a superior
have its own local context model. The two appropriate local knowledge representation technique for context modeling.
context models expressed as feature models are given in Fig.
4. REFERENCES
On the left side of Fig 4. is a FM for the application [1] A.Schmidt, “Implicit human-computer interaction through context,”
supporting store (UC 1) is shown. On the right is a FM 2nd Workshop on Human Computer. Interaction with Mobile
Devices, 1999.
supporting taxi driver use cases (UC 2 i UC 3). Both models
[2] M. Baldauf, S. Dustdar, and F. Rosenberg, "A survey on context-
include only relevant contextual informatuon projected from aware systems," Int. Journal of Ad Hoc and Ubiquitous Computing,
the global context. For example, FM on the left doesn't have 2(4):263–277, June 2007.
information about drivers, because the adaptation of store [3] C. Bolchini, C. A. Curino, E. Quintarelli, F. A. Schreiber , L. Tanca,
application does not depend on such contextual information. "A data-oriented survey of context models". ACM SIGMOD Record,
Due to space limitations, the corresponding mapping vol.36 no.4, December 2007.
models between the global context model and the two local [4] Z. Jaroucheh, X. Liu, and S. Smith, "CANDEL: Product Line Based
context models are not given here. For the same reason the Dynamic Context Management for Pervasive Applications,"
International Conference on Complex, Intelligent and Software
corresponding FM configurations are also not given here. Intensive Systems (ARES/CISIS 2010), IEEE CS, 2010, pp. 209-216.
[5] K. Kang, S. Cohen, J. Hess, W. Novak, and S. Peterson, "Feature-
Oriented Domain Analysis (FODA) Feasibility Study,” Software
Engineering Institute, Carnegie Mellon University,Tech. Rep.
Driver s CMU/SEI-90-TR-21, Nov. 1990.
GPS (bool)
profile [6] S. Hallsteinsen, M. Hinchey, S. Park, and K. Schmid, "Dynamic
Software Product Lines, " Computer, 41(4):93–95, 2008.
[7] C. Bettini , O. Brdiczka , K. Henricksen, J. Indulska, D. Nicklas, A.
Auto Ranganathan, D. Riboni,”A survey of context modeling and
Assign work reasoning techniques, Pervasive and Mobile Computing,” vol.6 no.2,
(bool) p.161-180, April, 2010.
[8] T. Strang and C. Linnhoff-Popien,”A context modeling survey,” In
1st Int. Workshop on Advanced Context Modelling, Reasoning and
Delivery Management, 2004.
Confirmation Store s [9] H. Chen, T. Finin, and A. Joshi, "An Ontology for Context-Aware
(bool) profile Pervasive Computing Environments," The Knowledge Engineering
Figure 3. An example of global context model Review, vol. 18, pp. 197-207, 2004.
[10] X. H. Wang, T. Gu, D. Q. Zhang, and H. K. Pung, "Ontology Based
Context Modeling and Reasoning using OWL," presented at
Store s Driver s Proceedings of the Second IEEE Annual Conference on Pervasive
profile profile Computing and Communications Workshops, 2004.
[11] T. Strang, C. Linnhoff-Popien, and K. Frank, "CoOL: Context
Ontology Language to enable Contextual Interoperability," Lecture
GPS Store s Notes in Computer Science, vol. 2893, pp. 236-247, 2003.
Auto Delivery (Bool) profile [12] [23X] F. Fuchs, I. Hochstatter, M. Krause, and M. Berger, "A
Assign Confirmation Metamodel Approach to Context Information" presented at Third
(bool) (bool) IEEE International Conference on Pervasive Computing and
Delivery Communications Workshops, 2005.
Confirmation [13] J. Zakwan, L. Xiaodong and S. Sally, “Mapping features to context
Information: supporting context variability for context-aware
(bool) pervasive applications”. International Joint Conference on Web
Figure 4. An example of two local context models Intelligence and Intelligent Agent Technologies (WI-IAT 2010).
IEEE Computer Society, Toronto, Canada, pp. 611-614, August 2010.
[14] Vanathi, B. and V.R. Uthariaraj, “Hybrid hierarchical context
representation in a context aware system,” in Proc. of the 2nd

Page 190 of 478


ICIST 2014 - Vol. 1 Regular papers

International Conference on IT and Business Intelligence (ITBI’10), Software Product-Family Engineering, “ Lecture Notes in Computer
IEEE and IEEE Computational Intelligence Society, Nagpur, 2010. Science, 2004.
[15] Vanathi, B. and V.R. Uthariaraj, ”Collaborative Context Management [28] B. Cheng, R. de Lemos, H. Giese, P. Inverardi, and J. Magee, editors,
and Selection in Context Aware Computing,” in Communications in “ Software Engineering for Self-Adaptive Systems, “ vol. 08031 of
Computer and Information Science, 1, Vol. 133, Advanced Dagstuhl Seminar Proceedings. Internationales Begegnungs- und
Computing, Part 4, Springer-Verlag Berlin Heidelberg, pp: 348-357. Forschungszentrumfuer Informatik (IBFI), Schloss Dagstuhl,
[16] P. Fernandes, C. Werner, E. Teixeira, “An Approach for Feature Germany, 2008.
Modeling of Context-Aware Software Product Line,” Journal of [29] B. Desmet, J. Vallejos, P. Costanza, W. De Meuter, and T. D’Hondt,
Universal Computer Science, Special Issue on Software Components, “Context-Oriented Domain Analysis, “ in 6th International and
Architectures and Reuse, vol. 17, no. 5, pp.807-829, 2010. Interdisciplinary Conference on Modeling and Using Context
[17] M. Acher, P. Collet, F. Fleurey, P. Lahire, S. Moisan, J.P. Rigault, (CONTEXT 2007), Lecture Notes in Artificial Intelligence. Springer-
”Modeling Context and Dynamic Adaptations with Feature Models,” Verlag, August 2007.
in Int'l Workshop [email protected] at Models 2009 (MRT'09). [30] H. Gomaa and M. Hussein. Dynamic software reconfiguration in
October, 2009. software product families. In Software Product-Family Engineering,
[18] Hartmann, H., Trew, T., “Using Feature Diagrams with Context Lecture Notes in Computer Science, 2004.
Variability to Model Multiple Product Lines for Software Supply [31] K. Geihs et all, “Software engineering for self-adaptive systems,“
Chains,” in 12th International Software Product Line Conference, Springer-Verlag, Berlin, Heidelberg, Chapter Modeling of Context-
IEEE (2008), pp. 12-21, Ireland, September 2008. Aware Self-Adaptive Applications in Ubiquitous and Service-
[19] K. Henricksen, S. Livingstone, and J. Indulska,” Towards a hybrid Oriented Environments, 2009.
approach to context modelling, reasoning and interoperation,” [32] S. Hallsteinsen , K. Geihs , N. Paspallis , F. Eliassen , G. Horn, J.
Proceedings of the First International Workshop on Advanced Lorenzo , A. Mamelli , G. A. Papadopoulos, “A development
Context Modelling, Reasoning and Management, 2004. framework and methodology for self-adapting applications in
[20] I. Roussaki, M. Strimpakou, N. Kalatzis, M. Anagnostou, and C. Pils, ubiquitous computing environments, “ Journal of Systems and
”Hybrid context modeling: A location-based scheme using Software, v.85 n.12, p.2840-2859, December, 2012.
ontologies,” PerCom Workshops, IEEE Computer Society, 2006. [33] B. Morin, O. Barais, J-M. Jézéquel, F. Fleurey, and A. Solberg,
[21] M. Salehie, L. Tahvildari, “Self-adaptive software: landscape and "Models@ run. time to support dynamic adaptation," Computer 42,
research challenges,“ ACM Transactions on Autonomous and no. 10: 44-51, 2009.
Adaptive Systems 4, 2009. [34] V. Devedžić, “Understanding ontological engineering,”
[22] K. Kakousis, N. Paspallis, G.A. Papadopoulos, “A survey of software Communications of the ACM, vol.45 no.4, April 2002.
adaptation in mobile and ubiquitous computing,“ Enterprise [35] D.M. Sanchez, J.M. Cavero, and E.m. Martinez, "The road toward
Information Systems (UK) 4, 355–389, 2010. ontologies," in ONTOLOGIES : A handbook of priciples, concepts
[23] C. Cetina, P. Giner, J. Fons, and V. Pelechano, “Using Feature and applications in information systems, R. Sharman, R. Kishore, and
Models for Developing Self-Configuring Smart Homes,“ in Proc. of R. Ramesh, Eds. London: Springer, 2006, pp. 3-20.
Int’l. Conf. Autonomic and Autonomous Systems (ICAS), pages [36] M. Jarrar, J. Demey, R. Meersman, “On Using Conceptual Data
179–188. IEEE CS, 2009. Modeling for Ontology Engineering,” in Spaccapietra, S., March, S.,
[24] S. Hallsteinsen, E. Stav, A. Solberg, and J. Floch, “Using Product Aberer, K., (Eds.): Journal on Data Semantics (Special issue on Best
Line Techniques to Build Adaptive Systems, “ in Proc. Int’l. Software papers from the ER, ODBASE, and COOPIS 2002 Conferences).
Product Line Conf. (SPLC), pages 141–150. IEEE CS, 2006. LNCS, Vol. 2800, Springer, pp.:185-207. October 2003.
[25] J. Lee and K. C. Kang, “A Feature-Oriented Approach to Developing [37] K. Rajiv Kishore, Z. Hong , R. Ramesh, “A Helix-Spindle model for
Dynamically Reconfigurable Products in Product Line Engineering,“ ontological engineering,” Communications of the ACM, vol.47 no.2,
in Proc. Int’l. Software Product Line Conf. (SPLC), pages 131–140. p.69-75, February 2004.
IEEE CS, 2006. [38] K. Czarnecki, P.K. Chang Hwan, K.T. Kalleberg, ”Feature Models
[26] P. Trinidad, A. Ruiz-Cort´es, and J. Pe˜na, “Mapping Feature Models are Views on Ontologies,” proceedings of the 10th International on
onto Component Models to Build Dynamic Software Product Lines, “ Software Product Line Conference, p.41-51, August 21-24, 2006.
in Int’l. Workshop on Dynamic Software Product Lines (DSPL), [39] B. Lazarević, Z. Marjanović., N. Aničić, S. Babarogić, „Baze
pages 51–56. Kindai Kagaku Sha Co. Ltd., 2007. podataka“, Beograd, 2006.
[27] [M2011Y] M. Rosenmüller , N. Siegmund , M. Pukall, S. Apel,
“Dynamic software reconfiguration in software product families, “ in

Page 191 of 478


ICIST 2014 - Vol. 1 Regular papers

Approach in realization of analogy-based


reasoning in semantic network
Milan Trifunovic*, Milos Stojkovic*, Miroslav Trajanovic*, Dragan Misic*, Miodrag Manic*
University of Nis, Faculty of Mechanical Engineering in Nis, Nis, Serbia
*

[email protected], [email protected], [email protected], [email protected],


[email protected]

Abstract – In this paper an approach in realization of their elements are different, although the relations are the
analogy-based reasoning in semantic network is presented. same. In order to recognize analogy between two
New semantic model, called Active Semantic Model (ASM), problems, it is necessary to have insight into the common
was used. Core of the process is performed by ASM’s elements of a solution which can be applied to new
association (semantic relation) plexus upgrading procedure problem. Insight into the common elements of a solution
which is based on recognition and determining the is actually contained in similarity and/or sameness of
similarity of association plexuses. Determining the similarity relations between these elements [6]. Realization of this
of association plexuses is performed by the recognition of claim is the main objective of Active Semantic Model
topological analogy between association plexuses. ASM (ASM) – to embed knowledge in semantic relations and
responds to unpredicted input by upgrading the new their plexuses (not in nodes of the semantic network), and
association plexus modeled on remainder of the context also to try to recognize analogies by determining the
whose subset is recognized as topologically analogous similarity of semantic relations and their plexuses in order
association plexus. to interpret the meaning and draw conclusions.

I. INTRODUCTION II. ACTIVE SEMANTIC MODEL


Semantic interpretation of data represents one of the ASM is a new semantic model which has been
biggest challenges faced by modern information developed in-house. Its primary aim was to capture and
technologies. In fact, this problem is closely related to the interpret semantics of the design features related to
ability of computer applications to attach certain meaning manufacturability issues [7]. ASM intends to use an
to data which is being processed. The motive for the alternative approach to knowledge representation in
solution of this problem lies in the ever increasing need to comparison with the existing semantic models by moving
enable software applications to provide meaningful the focus of data structuring from concepts to semantic
answers when it is not possible to predict the input, and relations or associations (the term which is used in ASM).
consequently the code by which a meaningful response is This idea of structuring the meaning in associations is
programmed. chosen to support the thesis stating that the knowledge
that people have about things (visual representations,
A. Analogies objects, situations, etc.) is contained in associations
between concepts that abstractly represent those things
Autonomy, flexibility and analyticity of semantic [8]. Furthermore, ASM has proved itself as a more
interpretation are yet to be fully reached by modern web flexible and productive in capturing and interpreting
semantic and functional knowledge representation models semantics of data compared to the existing semantic
(ontologies) [1]. At the same time, achieving these models [1]. Here, we will explain ASM in brief.
functionalities is considered as major current goal of all
artificial intelligence methods and models, including A. Structure
ontologies [2]. In pursuit of solution, interest for approach
where semantic interpretation of data is based on The ASM structure consists of:
analogies reappears [3]. Research in cognitive psychology  Associations,
often indicate that the use of analogies represents the core  Concepts,
of cognitive process, and may be considered as primary
 Concept bodies, and
process of cognition and communication [4]. Traditional
logic distinguishes three forms of reasoning: deductive,  Contexts.
inductive, and analogy-based reasoning (ABR). Examples The structure of ASM association is characterized by
of heuristics most commonly used in solving problems are eleven parameters [1].
determination of partial goals and reliance on analogies The names (cpti, cptj) are two parameters that define the
[5]. In the latter case, known procedure, which proved to junction points of each association in the semantic
be successful in solving previous related (similar) network. These two parameters are used to designate two
problems is used to solve new problem. Precondition for concepts or contexts that are associated by the association.
the success of this strategy is the recognition of analogy A concept in ASM is defined just with its name and is
between two problems and recalling the solution applied used to designate object, activity, or abstract concept (such
earlier. One of the reasons why sometimes it is difficult to as feature, attribute, number, value, emotion, adverb, etc).
recognize analogy between two problems is the fact that There can be only one concept with given name, but there

Page 192 of 478


ICIST 2014 - Vol. 1 Regular papers

can be many associations belonging to different contexts Concept body is a specific realization, i.e. instance or
associating it with other concepts. occurrence of a concept which is commonly used to
Besides two different names, an association in ASM is encapsulate an instance of formalized knowledge about
defined by additional two sets of parameters: some concept. For example, concept Blue-Color can be
Topological parameters: roles (ri, rj) of concepts (e.g. embodied by one or plenty of specific values of color
codes and procedure to generate this color on the
type, subtype), type (t) of associating (e.g. classifying),
computer screen in accordance to its code. Thus, one
direction (d) of associating (←, ↔, →) and character (c)
of associating (+, -); concept can have several concept bodies, i.e. its real
represents. The concept and its bodies are connected by
Weight parameters: accuracy (h) of associating for specific type of association in which concept plays role of
given context (0; 0.25; 0.5; 0.75; 1) and significance (s) of concept and body plays the role of a concept body. By
associating for given context (0; 0.25; 0.5; 0.75; 1); and these associations ASM builds the difference between the
affiliation parameters: context id to which association concepts and their bodies and connects them at the same
belongs, and user id to identify who has created the time.
association (Fig. 1).
Implant-Design context

Free-Form

{cpti=Implant,ri=concept,t=attributive,d=→, CAD-Model
c=+,h=0.75,s=0.75,rj=attribute,cptj=Free-Form}
{cpti=Implant,ri=sub-type,t=classifying,
d=→,c=+,h=1,s=1,rj=type,cptj=CAD-Model}

{cpti=Implant,ri=product,t=product-activity,d=↔,c=+, Implant
h=1,s=1,rj=activity,cptj=Implant-Design-Procedure}

{cpti=Implant,ri=assembly,t=affiliation,
Implant-Design- d=→,c=+,h=1,s=1,rj=part,cptj=Implant-Extension}
Procedure

Implant-Extension

Figure 1. ASM association structure: Several associations with specified parameters belonging to a context

A specific type of ASM data structure are contexts structure. Associations belonging to two different
(CTX) which are represented by sets of associations, i.e. association plexuses or contexts, which have similar
segments of the semantic network. Each context serves to values of weight parameters and the same values of
describe the semantics of a complex concept, a situation topological parameters are called topologically
or an event. Each context is defined by its name and its correspondent associations (TCA) (associations
creator (user). General context is defined and built in represented by the same color of line in Fig. 2). Concepts
ASM structure independently of the user, while other belonging to TCA-s of two different association plexuses
particular contexts are created by the user. All the or contexts, which have the same role in these TCA-s are
associations from particular contexts are assigned to the called topologically correspondent concepts (TCC). Two
general one, but usually with different parameters. types of topologically analogous association plexuses or
Association plexus (PLX) in ASM is, in general, a context contexts are distinguished: semantically distant
subset (mathematical structure) and can be considered (association plexuses or contexts do not share concepts,
without specific abstract meaning. The main reason why nor are their concepts similar, synonyms or connected
association plexus is treated as a separate entity in ASM is over series of up to four associations) and semantically
because it enables and facilitates identification of close (association plexuses or contexts share one or more
similarity or analogy of topology between different concepts, or have concepts which are similar, synonyms
segments of the semantic network. or connected over series of up to four associations).
The ASM structure is not domain-specific and can be CPT54
used for knowledge representation in diverse fields. The CPT25

knowledge from specific domain should be represented


R6
through context(s), while associations as semantic CPT5 R3 CPT35
R3
CPT33
R6
relations between contexts allow knowledge from one CPT4 CPT3
R6 R5

context to be applicable to others. R2


R6
R5
R3
PLXN
CPT31
R1 R3 R1
PLXX R1 R4
III. TOPOLOGICALLY ANALOGOUS ASSOCIATION CPT1
R1
R2
CPT32
R4 CPT51
PLEXUSES R2
CPT2 R2
The most common and probably most significant case CPT22
CPT34

of semantic content similarity between different


association plexuses or contexts is called topological CPT21
CPT57
analogy (similarity) (Fig. 2). Topologically analogous CTXX
CTXN
association plexuses or contexts have the same type of
Figure 2. Association plexuses PLXX and PLXN are topologically
topology (combination of appropriate values of analogous
topological parameters of associations) and the same

Page 193 of 478


ICIST 2014 - Vol. 1 Regular papers

IV. ASSOCIATION PLEXUS UPGRADING PROCEDURE Therefore, new association plexus PLXX should be
Every association plexus can be observed as a part of upgraded to context CTXX, modeled on the remainder of
the semantic network which is connected with other parts the known context CTXN (Fig. 4).
of the semantic network by associations involving other PLXX ≍ PLXN
concepts. In general, it is very difficult to distinguish CPT3
CPT6
CPT13 АN13,16
CPT16

where one association plexus “ends”, and where others R3


R3

“begin”. User introduces new association plexus (which АX1,3


CPT5
АN11,13
АN11,15 CPT15
represents new or unknown situation) to ASM, usually by R1
R1
creating associations between concepts of which some or CPT1
CPT11 АN11,14
АN14,15

all are known to ASM, i.e. were added to ASM semantic R2


АX1,2
CPT4 R2 CPT14
network earlier (Fig. 3). PLXX R4 АN11,12
R4
АN12,14
CPT2 CTXX PLXN CPT12
CTXN
CPT3 ASM space

АX1,3
Figure 4. Logic of topologically analogous association plexus
upgrading
CPT1
Associations between Logic of topologically analogous association plexus
АX1,2 known concept CPT2
PLXX
and other concepts upgrading is carried out in three attempts (sub-
CPT2
from the ASM procedures). First and second attempt are carried out in
semantic network
several iterations.
Each iteration for every attempt is followed by the
Figure 3. Introducing new association plexus PLXX to ASM. Concept iteration of the process of determining semantic similarity
CPT2 is known to ASM. of concepts, which also can result in the creation of
association(s) between concepts. This procedure is
ASM responds to input by recognizing topological presented in detail in [9].
analogy between new and known association plexuses
(from the narrowed semantic network space) and A. First attempt
upgrading the new association plexus modeled on The first attempt is carried out in several iterations. The
remainder of the context (whose subset is recognized as procedure for each iteration is identical. First attempt ends
topologically analogous association plexus). The response in situation when ASM is not able to add new association
is being formulated through creating new associations to known association plexus.
between concepts from new association plexus and known
concepts in the network. The same example will be used independently to
illustrate first attempt procedure for semantically close
Association plexus upgrading procedure is based on and semantically distant TCC-s.
similarity between new and known association plexuses.
New association plexus concepts will be connected 1) Semantically close TCC-s
modeled on their TCC-s in similar association plexuses. ASM first recognizes TCC-s of new and known
association plexus (or contexts of which they are a subset)
In the case when new association plexus PLXX is which are identical (denoted by  ) or are synonyms (fifth
topologically analogous to certain known association
class of similarity (denoted by 5. ): absolute value of the
plexus PLXN, regardless of whether they are semantically
close or semantically distant, ASM will use the logic of difference of accuracy and significance for all association
topologically analogous association plexus upgrading pairs connecting these concepts have to be less than 0.25;
(element denotes topological correspondence (for all association pairs connecting these concepts (through
associations and concepts) or topological analogy (for the same connectional concepts) have to have the same
contexts and association plexuses)): type of associating (and the same corresponding concept
roles) and the same characters and directions of
If associating) or similar (fourth class of similarity (denoted
by 4. ): absolute value of the difference of accuracy and
A PLX X
i, j   A   CPT
PLX N
k ,l i
PLX X
 CPT k
PLX N
 (1) significance for all association pairs connecting these
concepts have to be less than 0.5; all association pairs
  PLX X PLX N  connecting these concepts have to have the same type of
associating (and the same corresponding concept roles)
and the same characters and directions of associating) in
where
general context (semantically close TCC-s):
AiPLX
,j
X PLX X
 ACPTi CPT j
, AkPLX
,l
N PLX N
 ACPTk CPTl
, PLX N  CTX N
then it is possible that there exists context CTXX, whose CPTi  CPTi   PLX X  CPTj  CPTj   PLX N (3)
subset is new association plexus PLXX, which is
topologically analogous to known context CTXN:
such that:
CTX X  CTX X  PLX X  CTX X CTX N (2) 1. CPTi CPTj

Page 194 of 478


ICIST 2014 - Vol. 1 Regular papers

 
5. 4.
CTX M CTX M
2. CPTi  CPTj  CPTi  CPTj  CPTi  CPTj in ACPTi  CPTk
 ACPTi  CPTk
 CTX M 
general context (element  denotes
ACTX M
A CTX N
, r  CPTi   r  CPT j  ;
association between concepts) CPTi  CPTk CPT j  CPT j 1

CTX N
ACPT j  CPT j 1
CTX N
 ACPT 
j  CPT j 1
 CTX N   A CTX X
:tA   tA 
CPTi  CPTk
CTX X
i,k
CTX M
i,k (6)
3.
c  AiCTX   c A ,d  A   d  A 
A   PLX
X CTX M CTX X CTX M
CTX N
CPT j  CPT j 1 N , PLX N  CTX N ,k i,k i,k i,k

If such concepts are found, ASM adds associations of h  AiCTX


,k
X
  h A , s  A   s  A
CTX M
i,k
CTX X
i,k
CTX M
i,k 
known association plexus involving found TCC-s, except
that the concept from known association plexus will be where AiCTX
,k
X CTX X
 ACPTi CPTk
, AiCTX
,k
M CTX M
 ACPTi CPTk
.
replaced by its TCC in new association plexus (Fig. 5):
PLXX ≍ PLXN
CPT3 CPT16

A :tA   tA 
CPT13 АN13,16
CPT6
CTX X CTX X CTX N R3
R3
CPTi  CPT j 1 i , j 1 j , j 1
АX1,3
АN11,13

   c A , d  A   d A 
CPT5 CPT15
CTX X CTX N CTX X CTX N АN11,15
c A i , j 1 j , j 1 i , j 1 j , j 1 (4) R1
R1
CPT1 АN14,15

h A   h A , s  A   sA 
R5 CPT11 АN11,14
CTX X CTX N CTX X CTX N R2 R8 R5
R2 R8
i , j 1 j , j 1 i , j 1 j , j 1 CPT4 CPT14
АX1,2 R7
PLXX R4 R6 R7 АN11,12
R6 АN12,14
CPT2 R4
CTXX PLXN CPT12
CTX N CTX N
where AiCTX CTX X
, j 1  ACPTi CPT j 1 , Aj , j 1  ACPT j CPT j 1 .
X CTXN

PLXX ≍ PLXN
CPT3
CPT13
CPT16 Figure 6. Association plexus upgrading in first attempt (semantically
CPT6 АN13,16
R3
R3 distant TCC-s). TCC-s (CPT1, CPT11) and (CPT2, CPT12) are
АX1,3 semantically distant.
АN11,13
CPT5 CPT15
АN11,15
R1
R1
CPT1
R5 CPT11
R5 АN11,14
АN14,15
B. Second attempt
R2 R8 R8
R2
PLXX
АX1,2
R4 R6 R7
CPT14
АN11,12 R7
CPT14
The second attempt is carried out in several iterations.
CPT12 CTXX PLXN
R4
CPT12
R6 АN12,14
The procedure for each iteration is identical. Second
CTXN attempt ends in situation when ASM is not able to add
new association to known association plexus. Complete
first attempt is carried out between second attempt
Figure 5. Association plexus upgrading in first attempt (semantically iterations. The second attempt will continue from the
close TCC-s). TCC-s (concept CPT12) of new and known association situation illustrated in Fig. 6 (association plexus upgrading
plexus are identical.
in first attempt for semantically distant TCC-s).
2) Semantically distant TCC-s ASM searches for concepts in the semantic network
Recognition of semantically close TCC-s is followed by which are similar to concepts from new association plexus
the recognition of semantically distant TCC-s of new and in specific context, and are involved in associations which
known association plexus: are topologically correspondent to associations from
known association plexus. It is necessary to find the
concepts in the semantic network which are similar to
CPTi  CPTi   PLX X  CPTj  CPTj   PLX N (5) concepts from new association plexus in at least third
class of similarity (absolute value of the difference of
accuracy and significance for all association pairs
such that: connecting these concepts have to be less than 0.5; all
1. CPTi CPTj association pairs connecting these concepts have to have
CTX 0
the same type of associating (and the same corresponding
CPTi  CPT j , ACPTi  CPT j
 concept roles) and the same characters of associating):


CTX 0
2. t ACPTi  CPT j

 similarity 
CPTi  CPTi   CTX X  CPTj  CPTj   CTX K (7)
tA CTX 0
CPTi  CPT j   synonymous
such that:
CTX N
ACPT j  CPT j 1
 A   CTX 
CTX N
CPT j  CPT j 1 N  3.
3. 1. CPTi  CPT j
A CTX N
  PLX , PLX  CTX
 
CPT j  CPT j 1 N N N
CTX K CTX K
If such concepts are found, ASM searches for all ACPT j  CPT j 1
 ACPT j  CPT j 1
 CTX K 
2.
associations in the semantic network involving concepts A CTX K
A CTX N
from new association plexus, which are topologically CPT j  CPT j 1 CPTk  CPTk 1

correspondent to associations from known association If such concepts are found, ASM adds associations
plexus involving their TCC-s, and adds these associations involving them, which are topologically correspondent to
if TCC-s have the same roles in them (Fig. 6): associations from known association plexus, except that

Page 195 of 478


ICIST 2014 - Vol. 1 Regular papers

the found concept will be replaced by its similar concept C. Third attempt
in new association plexus (Fig. 7): Third attempt does not have iterations. After the third
attempt is carried out, the user, depending on whether he
is satisfied with the results, decides whether to complete
A CTX X
CPTi  CPT j 1 :tA   tA  CTX X
i , j 1
CTX K
j , j 1 the upgrading procedure or to carry it out from the
beginning (from the first attempt, with the same new
  
c AiCTX
, j 1
X
 
 c ACTX CTX X
j , j 1 , d Ai , j 1
K
 d ACTX K
j , j 1    (8) association plexus).
h A   h A , s  A
CTX X
i , j 1
CTX K
j , j 1
CTX X
i , j 1   sA CTX K
j , j 1  The goal of the third attempt is to find candidate
concepts in the semantic network which should be
connected with the remaining concepts (concept CPT3)
where AiCTX X CTX X CTX K CTX K
, j 1  ACPTi CPT j 1 , Aj , j 1  ACPT j CPT j 1 . from new association plexus. Candidate concepts and their
corresponding concepts (concept CPT16) from known
CPT52 association plexus are usually semantically distant. Focus
AS R9
of the third attempt is the similarity between associations
CPT1 CPT51
involving candidate concepts and associations involving
CPT56
R10 CPT5
their corresponding concepts from known association
CPT58 CTXK plexus.
In the third attempt ASM recognizes concepts (concept
CPT3
PLXX ≍ PLXN
CPT16 CPT16) involved in associations from context whose
CPT13
R3
CPT6
R3
АN13,16
subset is known association plexus, which do not have
АX1,3
R10 АN11,13
TCC-s in the context whose subset is new association
R10
R1
CPT5
АN11,15 CPT15 plexus. After that ASM identifies all association plexuses
R9 R11
CPT1
R1 R9
АN14,15
R11
with associations involving recognized concepts, as well
R2
R14 CPT11
R2
АN11,14 R14 as their topologically analogous association plexuses. In
CPT4
PLXX
АX1,2
R4 АN11,12
CPT14
the last step ASM identifies TCC-s of the recognized
АN12,14
CPT2 CTXX PLXN
R4
CPT12
concepts which are involved in the same or similar set of
CTXN TCA-s in recognized topologically analogous association
plexuses (Fig. 8).
CPT65
If such TCC-s are found ASM adds associations
CPT4 AS CPT63 R14 between these concepts (concept CPT6) and corresponding
CPT67
R11 CPT5 concepts (concept CPT3) from new association plexus
which will have the same parameters as the associations
CPT69 CTXL
from known association plexus recognized at the
beginning of the attempt (association between concepts
Figure 7. Association plexus upgrading in second attempt. Concepts CPT13 and CPT16) (Fig. 9).
CPT1 and CPT51 are similar in context CTXK, while concepts CPT4 and
CPT63 are similar in context CTXL.

CPT111 CPT211 CPT523

PLX2 PLX1
PLX1 CPT6 PLX2 CPT16
CPT125 CPT522

CPT112 CPT212 CPT521 PLX5


PLX2 PLX9
PLX3 PLX5
CPT911
CPT323 CPT723

PLX9 CPT79 PLX3 PLX7


CPT16 CPT322 CPT6 CPT722

CPT912
CPT321 PLX3 CPT721 PLX7

PLX3 PLX8
CPT632
PLX6
CPT823

CPT631 CPT6 CPT633 CPT79 CPT822

CPT821 PLX8
PLX4 PLX6

CPT1032 CPT432
PLX10 PLX4

PLX4 PLX10

CPT1031 CPT125 CPT1033 CPT431 CPT16 CPT433

Figure 8. Recognizing candidate concept(s) in the semantic network which should be connected with the concept CPT3. Concepts CPT6 and CPT16
are TCC-s in most cases of identified topologically analogous association plexuses.

Page 196 of 478


ICIST 2014 - Vol. 1 Regular papers

PLXX ≍ PLXN
“old” semantic relations, which are not applicable for the
CPT3 R4
CPT13
R4
R2
АN13,16
CPT16 actual context.
CPT6
R3 R2 R3
АX1,3
CPT5
АN11,13
CPT15
ACKNOWLEDGMENT
АN11,15
R1
R1
АN14,15
The paper represents a summary about a part of the
CPT1
R2
CPT11 АN11,14 research that is conducted within the project “Virtual
PLXX
АX1,2
CPT4 R2
АN11,12
CPT14 human osteoarticular system and its application in
R4
CPT2 CTXX PLXN
R4
АN12,14
preclinical and clinical practice” (project id III 41017)
CPT12
CTXN which is funded by the Ministry of Education, Science and
Technological Development of Republic of Serbia for the
period 2011-2014.
Figure 9. Association plexus upgrading in third attempt
REFERENCES
V. CONCLUSION [1] M. Stojković, M. Manić, M. Trifunović, and D. Mišić, “Semantic
categorization of data by determining the similarities of
As it is shown, ASM brings original approach in associations of the semantic network,” E-Society Journal
realization of ABR in semantic network. The core of the Research and Applications, vol. 2, no. 1, pp. 3-13, July 2011.
ABR process and semantic interpretation of data is [2] M.C. Daconta, L.J. Obrst, and K.T. Smith, The Semantic Web: A
performed by ASM’s association plexus upgrading Guide to the Future of XML, Web Services, and Knowledge
Management. Indianapolis, IN: Wiley Publishing, Inc., 2003.
procedure which is based on recognition and determining
[3] D. Gentner, K.J. Holyoak, and B.N. Kokinov, The Analogical
the similarity of association plexuses. Determining the Mind: Perspectives from Cognitive Science. Cambridge, MA: The
similarity of association plexuses is performed by the MIT Press, 2001.
recognition of topological analogy between association [4] K.D. Forbus, “Exploring Analogy in the Large,” in The Analogical
plexuses. Relaying on this approach ASM responds to an Mind: Perspectives from Cognitive Science, D. Gentner, K.J.
unpredicted input, which is defined through input Holyoak, and B.N. Kokinov Cambridge, Eds. MA: The MIT
association plexuses, by upgrading that association plexus Press, 2001, pp. 23-58.
modeled on remainder of the context whose subset is [5] A. Newel, and H.A. Simon, Human Problem Solving. New Jersey,
recognized as topologically analogous association plexus. NY: Prentice Hall,1972.
ABR process designed in this way enables autonomous, [6] A. Kostić, Kognitivna psihologija. Beograd: Zavod za udžbenike i
flexible and analytic semantic interpretation of data nastavna sredstva, 2006.
described in the semantic network. [7] M. Stojković, Analysis of the Manufacturability Parameters Based
on Semantic Structures of the Digital Product Model, PhD Thesis,
The ability to recognize analogy between semantically University of Niš, Faculty of Mechanical Engineering in Niš, Niš,
very distant situations is considered as one of the essential Serbia, 2011.
characteristics of creativity. Creative conclusions usually [8] J.R. Anderson, and G. H. Bower, Human Associative Memory.
start by recognizing similarity between apparently Washington, DC: Winston, 1973.
semantically disconnected elements and arise by creating [9] M. Trifunović, M. Stojković, M. Trajanović, D. Mišić, and M.
new semantic relations between these elements or ideas. Manić, “Interpreting the meaning of geometric features based on
According to another stand, creative conclusions arise by the similarities between associations of semantic network,”
FACTA UNIVERSITATIS Series: Mechanical Engineering, vol.
creating new context-suitable semantic relations between 11, no. 2, pp. 181-192, December 2013.
elements or ideas which are already connected by some

Page 197 of 478


ICIST 2014 - Vol. 1 Regular papers

Mapping ebXML standards to ontology


Branko Arsić1, Marija Đokić1, Nenad Stefanović1
1
Faculty of Science, University of Kragujevac

Abstract - Finding the best business partner can be a real integrations, because they don't have expressive power to
challenge due to the fact that it is often necessary to capture all necessary constraints.
exchange large amounts of data and documents. For
efficient and flexible B2B cooperation, in modern However, globally accepted standard that will provide
enterprise, ebXML standards can be applied. cooperation between companies is still missing. This
Collaboration Protocol Profile (CPP) and Collaboration standard should enable companies to find most suitable
Protocol Agreement (CPA) as a part of ebXML are used business partner. Agreement between companies should
to make ad hoc agreements on the electronic document be established by using this standard in a short time and if
communication between two companies. In this paper, possible automatically. Accepted standard should enable
ontologically-enhanced ebXML is presented and the effort enterprises to collaborate even if they have different
has been invested in translating CPPs to ontologies. The business applications.
presented step pave the way for the future SPARQL based
reasoning over these OWL documents aiming to create One of the standards that meets the above conditions is
CPA document. Our mapping approach can be used for ebXML (Electronic Business using eXtensible Markup
every ebXML standards and utilize all the benefits that Language) [6]. ebXML is a suite of specifications that
ontology offers, such as reasoning, defining rules and enables enterprises to conduct business over the internet.
computing additional property values. The specifications cover the analysis of business
processes and business documents, the documentation of
1. INTRODUCTION capabilities of company and the document transfer to
conduct e-business. A lot of large companies are involved
In B2B e-commerce, there are many standards which in developing and evaluating this standard and this makes
provide means to exchange data between applications [1]. it very attractive for the others to embrace it.
But, it does not guarantee interoperability. On the Collaboration-Protocol Profile (CPP) and Collaboration-
syntactic level, this requires a treaty on an e-business Protocol Agreement (CPA) [7] as a part of the ebXML
vocabulary and even more important, on the semantic standard allow finding the most suitable business partner
level, business partners must share a common attitude and make ad hoc agreements on the electronic document
unambiguously constraining the generic document types communication. A CPP defines the company's capabilities
[2]. If we talk about benefits of using this standards, it is to involve in electronic business with other companies.
known that processing and communication between CPA is a document that represents a formal agreement
companies, using this systems, are quicker and less error- between collaborating companies and defines specific
prune, because we have less human intervention and less terms of message exchange that is agreed between the two
paper work which reduces possibility for errors and so companies.
there will be better accuracy, stability, efficiency and
dependability of company method. At the beginning, in a Many approaches are proposed in order to standardization
series of e-business standards we are mentioning some of of the ebXML. However, they lack the semantic
them. representation. The proposed ontologically-enhanced
ebXML solution shows how existing ebXML challenges
Electronic Data Interchange (EDI) [3] is the process by can be solved by using semantic technologies and the
which companies and institutions exchange trade related effort has been invested in translating CPPs to ontologies
documents in electronic form. EDI is based on the [8]. For CPP XML Schema we created a respective OWL
concept of transactions that comprise messages (business model [9]. Further, we generate one out of the CPP
documents) in predefined formats. However, using of ad- instance document. Approach for improvement mapping
hoc conducting business between companies without prior scheme to present OWL concepts in terms of ebXML
agreement of any kind was not possible. construct is described. This access exploits new ability
and power for efficiently semantic characterization of
One of the most popular standards, the Extensible Markup documents.
Language (XML) [4] has become the first choice for
defining data interchange formats in business The paper is structured so that the second section gives an
collaboration. XML provides a more open and flexible overview of literature that is critically analyzed. The third
business of transactions unlike the EDI. section talks about motivation for this work and what
benefits we can get at the end. The fourth section presents
RosettaNet [5] defines common inter-company public part of ebXML - CPP and CPA documents. In the fifth
processes and their associated business documents section focus is on the process of mapping the CPP
expressed in DTD or XML Schema in order to reducing documents into ontology. Last section is reserved for
of cost and extensibility benefits. But use of these conclusions and future work.
documents does not resolve interoperability issues in B2B

Page 198 of 478


ICIST 2014 - Vol. 1 Regular papers

2. RELATED WORK Our approach proposed the method for translating CPPs
documents to ontology. The representation of standards
The main goal of developing a standard for e-business is into ontology allows automation of mutual mapping
to create a basis for automated mapping system that process by using different computational algorithms and
would be able to convert the concepts of various possibility to avoid all disadvantages of the XML
standards in an independent manner. This entails that the representation. This paper is also an application of some
meaning of terms, relationships, restrictions and rules in existing and novel approaches, but in a new domain
the standards should be clearly defined in the early stages (ebXML CPP), because there is not any comprehensive
of standard development. paper with similar topic. Previous studies have mainly
focused on improving the ontological ebXML registry
Semantic Web [10] and ontologies [11] as core and repository [19].
component of Semantic Web have a potential to deal with
these problems. The idea of the Semantic Web and its 3. MOTIVATION
related technologies are expressing information not only
in natural language or in predominantly syntactical It is known fact that messages have been designed only
formats, but in a way that it can be read and used by syntactically. With an ambiguity of natural languages
software agents. come real problems on the precise meaning of the words.
As a consequence the set-up time is quite involved as it
The real question which imposes is the aim of often takes developers a long time to agree on the precise
ontologizing. What system we need to obtain so it can be meaning of the message content. Frequently mentioned
expandable? The solution should be flexible to adding example in the literature is the shipping date. Does the
new partners and new collaborative processes and support shipping date refer to the date the supplier transfers the
coping with inevitable variations that come when working goods to the shipper or the date when the transport vehicle
with multiple partners. Companies have invested departs the premises?
considerable amounts of money and resources to
implement current B2B integrations based on existing Ontologies represent a core pillar of the Semantic Web
B2B standards, and they have the supporting idea, as they define a set of concepts within a domain and
infrastructure largely in place. The papers [11, 12, 13] the relationships between those concepts. More formally,
described several mapping processes from e-business an ontology defines the vocabulary of a problem domain,
standards to ontologies, trying to resolve heterogeneities, and a set of constraints (axioms or rules) on how terms
not structurally and semantically covered by the can be combined to model specific domains. An ontology
specifications of these standards. is typically structured as a set of definitions of concepts
and relations between these concepts. Ontologies are
The step of mapping is in our focus, and it is only one machine-processable, and they also provide the semantic
step of our unified semantic B2B collaboration model context by adding semantic information to models,
which is now being developed. It comprises the complete thereby enabling natural language processing, reasoning
B2B process automation cycle starting from creation of capabilities, domain enrichment, domain validation, etc.
CPPs, and all the way to the semi-automatic generation of
CPA. Current EDI, RosettaNet and ebXML specifications are
not represented in a languages which support reasoning
Several strategies for mapping XML technologies to about the objects within that domain. Ontology
OWL have been proposed [14, 15, 16, 17]. Some papers Engineering (OE) is sometimes seen as the next level in
focused more on a general mapping between XML and knowledge modeling, aiming at avoiding conceptual
RDF, others aim at mapping XML Schema to OWL ambiguities, advocating reuse and standardization, and
without considering XML instance data. serving as building blocks for more complex automated-
reasoning systems [20]. In addition, it is possible to define
Our work improves the previous approaches of mapping property domains, cardinality ranges, and reasoning rules.
in creating individuals for every CPP document element. Some reasoning engines, such as Pellet, or Jess can be
In previous works, with Object Properties also come the used to infer additional facts about the knowledge
individuals, but without naming convention and solving explicitly included in OWL ontologies. Reasoning in
conflicts with the same-type elements and same-level OWL can be performed at a class, property or instance
individuals. Especially, we pay attention to this part of level. For example, it is possible to define rules for
mapping process, because it is crucial in our model. Our checking class equivalence, for classifying individuals, or
present research is based on SPARQL reasoning [18] over for computing additional property values using
individuals with main goal to get CPA document as final transitiveness.
product. Further, we are using the names of attributes for
detecting semantic in CPP documents, so we are able to The analysis of Appendix E of the CPPA specification
present it in ontology. Other approaches didn’t deal with showed how CPPA elements and attributes are inter-
the discovering semantics in and out of the document dependent. For example, if non repudiation is required,
structure, so this is another improvement of previous then the necessary child elements must be present. XML
works. schema cannot express all possible rules, so an additional

Page 199 of 478


ICIST 2014 - Vol. 1 Regular papers

rules definition in the forms of ontology could be of value It represents a formal management between of
and natural choise. collaborating parties. CPA is actually result of
intersection of two CPPs. This document contains
4. EBXML STANDARDS - CPP AND CPA common or compatible elements between partners. CPA
document consists of general information about this
Enterprises must collaborate with each other largely if document, two PartyInfo elements – for each business
they want to complete their cooperation. Exchanging data partner and one Packaging element. PartyInfo elements
and different types of documents between them can be have the same structure like particular PartyInfo element
extremely difficult process. Many large companies and in CPP. If negotiation involved values of some PartyInfo
enterprises are involved in developing and evaluating elements must be changed. Packaging element must be
ebXML standard in order to carry out these processes. the same as appropriate element in CPP.

Main parts of ebXML standard are Collaboration Protocol The example (Fig.2) shows how one company can engage
Profile (CPP) and Collaboration Protocol Agreement in ebXML, how an already registered ebXML company
(CPA). Major purpose of these documents is to help find searches for a new trading partner and how both then
the most relevant business partner and make ad hoc engage in electronic business.
agreements on the electronic document communication as
specified in [7].

The CPP defines the party's opportunities to engage in


electronic business with other parties. Opportunities
include business processes that the party supports as well
as particular technical details of supported means of
message exchange. CPP is an XML document containing
elements that describe the processing of some business
unit such as PartyInfo and Packaging. The first element
describes supported business collaborations, role in the
business collaboration and the technology details about
message exchange (Fig.1) [7]. The second element
provides specific information about how the Message
Header and payload constituent(s) are packaged for
transmittal over the transport.
Figure 2. ebXML Overview (adapted from the ebXML
Process specification layer defines interaction between Technical Architecture specification)
partners, actually role that each company plays.
DeliveryChannels layer represents the characteristics of
Message-receiving. CPP can have several delivery 1. Company A browses the ebXML Registry to see
channels. DocExchange layer accepts the business which collaborative business processes and business
document of one company from ProcessSpecification document schemas are already available.
layer, encrypts and/or adds a digital signature (based on 2. Company A needs a local ebXML system to
specification), and sends it to transport layer for communicate with trading partners.
transmission to another partner. Transport layer is
responsible for message delivery using the selected 3. Company A has to create a CPP which describes the
transport protocol. Packaging elements ensures supported collaborative business process capabilities.
information about message header packaging for transfer After that Company A registers its CPP at the ebXML
over the transport layer. Registry.
4. Company B is already registered at the ebXML
registry and is looking for new trading partners.
Company B queries the ebXML Registry and receives
company A’s CPP. Company B then has two CPP’s:
Company A’s CPP and its own. The two companies
have to come to an agreement on how to do business
(CPA).
5. The CPA template has to be accepted by both parties.
A CPA negotiation finalizes the CPA template to a
final CPA if there is anything left to be negotiated.
Figure 1. Layered structure of CPP
6. The companies then use the underlying ebXML
system to exchange business documents conforming
The CPA is a document that represents special terms of to the CPA.
message exchange that is agreed between the two parties.

Page 200 of 478


ICIST 2014 - Vol. 1 Regular papers

5. MAPPING OF EBXML STANDARDS TO any CPP document, we create the individual using parent
ONTOLOGY name as basename and abbreviation of element name as
suffix according to rule b). The only difference is a
The ontologies have the potential to improve the quality basename. Conflicts for same-type elements at the same
of standards and lead to infer additional facts about the level, within same CPP document, are solved by using
knowledge. In our research we decided to make the different element’s ID attribute according to rule a) and
mapping of CPP electronic business standard by CPP XML Schema. At any time we know the classes to
developing individual ontologies. In the focus of our which individuals belong, so we have a rule for future
research are documents which are based on Collaboration individual comparisons. For example, transportA1 →
Protocol Profile and Agreement specification version 2.0. transportA1Ts, transportA1TsTcs, etc; transportA2 →
In CPP there are several Certificate parts, several transportA2Ts, transportA2TsTcs, etc. Same classes
Transport and Documents Exchange ways, a few Security individuals are pairs (transportA1, transportA2),
parts which company offers in its business. (transportA1Ts, transportA2Ts), (transportA1TsTcs,
transportA2TsTcs), etc.
ebXML standards cover the area of electronic
procurement and a mapping between them is a real and FOURTH STEP: Beside OWL classes and individuals,
practical need because with OWL we get some every child element is mapped into OWL syntax as OWL
advantages in comaprision with XML. Further, we Object Property. The name of the object property is
present our algorithm steps for mapping CPP documents formed as an IRI, where the name has the prefix ”has“.
to OWL. Same steps could be used for any ebXML
standard. The OWL Class to which a particular property belongs, is
mapped by using rdfs:domain. The OWL subclass for
FIRST STEP: All XML elements of CPP are mapped to Object Property are mapped as rdfs:range. There is no
OWL Classes. other way to connect individuals in different levels.

The name of the class is of type Internationalized FIFTH STEP: If one element in CPP has several ID
Resource Identifiers (IRI). Each IRI is defined by its numbers, we only use relevant ID for naming, and the
name and namespace taken from original CPP document. other ID’s are used as OWL Object Property for linkage
We use the same namespaces as defined by the CPP XML among them.
Schema [7]. <tp:DeliveryChannel tp:channelId="syncChannelA1"
tp:transportId="transportA1"
SECOND STEP: Hierarchy between CPP elements is tp:docExchangeId="docExchangeA1">
...
mapped by using the rdfs:subClassOf property. (Fig.6) </tp:DeliveryChannel>

THIRD STEP: Creating the individuals for every Listing 2. DeliveryChannel layer in CPP document
element using new name convention:
a) If mapping element has its own unique id number,
we take it for name
b) Otherwise, parent name + capitalize_first_letter
(acronym of element name)
Figure 3. Object properties for asyncChannelA1
For example, individual
<tp:Transport tp:transportId="transportA1">
<tp:TransportSender>...</tp:TransportSender>
<tp:TransportReceiver>...</tp:TransportReceiver> SIXTH STEP: Element attributes are mapped as OWL
</tp:Transport> Datatype Properties.
Listing 1. Transport layer in CPP document The attribute’s OWL Class is mapped by using
rdfs:domain. rdfs:range is used for defining primitive
we get following individuals: “transportA1” for class data types like built-in data types of XML Schema.
Transport, “transportA1Ts”for TransportSender class and
“transportA1Tr” for TransportReceiver class. In this way SEVENTH STEP: If element has text node we mapped
we can create individuals with different names and it as „hasValue“ OWL Datatype Property.
concrete values to make them SPARQL [21] aware for
<tp:TransportProtocol tp:version="1.1">
comparisons. HTTP
</tp:TransportProtocol>
Exception for individuals is PartyInfo class whose
Here, except creating class and individual we mentioned
individual take name of the PartyName attribute for
earlier, we create also two Datatype Properties and for
simplicity, instead of PartyId text node.
TransportProtocol individual we associated next values:
A very important fact is that all same-type individuals hasValue = "HTTP" and version="1.1"
have fixed suffix name. For every same-type element, in

Page 201 of 478


ICIST 2014 - Vol. 1 Regular papers

According to CPP XML Schema the restriction for the element, further, DeliveryChannel element point out to
attribute values is made by rdfs:range of OWL Datatype certain TransportProtocol and DocExchange elements.
Property. Datatype Properties hasValue has range
xsd:string and uri has range xsd:anyUri etc.

EIGHTH STEP: There are also attributes which have


semantically important values set in advance, with
indicator „type=name.type“ in schema. We mapped
attribute to a particular Datatype Property with „name“
from the previous alias. Element’s OWL class is
rdfs:domain and resticted values are rdfs:range.

<tp:DeliveryChannel
tp:channelId="asyncChannelA1"
tp:transportId="transportA2"
tp:docExchangeId="docExchangeA1">
<tp:MessagingCharacteristics
tp:syncReplyMode="none" tp:ackRequested="always"
tp:ackSignatureRequested="always"
tp:duplicateElimination="always"/>
</tp:DeliveryChannel>

Listing 3. DeliveryChannel layer in CPP document

For syncReplyMode attribute we got the following note:

Figure 4. Datatype Property SyncReplyMode

Datatype properties with the same values set are mapped Figure 6. OWL Classes hierarchy
as equivalent datatypes. From the Listing 3, attributes
ackRequested, ackSignatureRequested and
The CPP XML Schema is mapped in OWL only once.
duplicateElimination are equivalent to
Further, we atomatically generate one out of the CPP
perMessageCharacteristics Datatype Property.
instance document with individuals, following obtained
There are also other exeptions which we have to deal ontology. The Listing 4 present the obtained ontology
with. If some element has resticted values set for from one CPP document by using mapping process.
hasValue property (described in seventh step) we create it
as subproperty. For naming subproperty we use attribute <tp:Transport tp:transportId="transportA1">
name (first letter written in lowercase) and rdfs:range <tp:TransportSender>
<tp:TransportProtocol tp:version="1.1">
with possible values. HTTP
</tp:TransportProtocol>
<tp:AccessAuthentication>
basic
</tp:AccessAuthentication>
<tp:AccessAuthentication>
digest
</tp:AccessAuthentication>
<tp:TransportClientSecurity>
<tp:TransportSecurityProtocol tp:version="3.0">
SSL
</tp:TransportSecurityProtocol>
<tp:ClientCertificateRef
tp:certId="CompanyA_ClientCert"/>
<tp:ServerSecurityDetailsRef
tp:securityId="CompanyA_TransportSecurity"/>
</tp:TransportClientSecurity>
Figure 5. AccessAuthentication attribute with two </tp:TransportSender>
possible values, basic and digest <tp:TransportReceiver>
<tp:TransportProtocol tp:version="1.1">
HTTP
The proposed mapping process is shown in the example </tp:TransportProtocol>
<tp:AccessAuthentication>
of mapping from CPP to OWL. For simplicity of the basic
presentation, definition of the CPP presented bellow does </tp:AccessAuthentication>
not include all the elements from the original definition. <tp:AccessAuthentication>
digest
PartyInfo element point out to certain DeliveryChannel </tp:AccessAuthentication>

Page 202 of 478


ICIST 2014 - Vol. 1 Regular papers

<tp:Endpoint tp:uri="https://www.CompanyA.com/servl [6] UN/CEFACT, OASIS. ebXML Website.


ets/ebxmlhandler/sync" tp:type="allPurpose"/>
<tp:TransportServerSecurity> http://www.ebxml.org/
<tp:TransportSecurityProtocol tp:version="3.0">
SSL [7] Oasis Ebxml Collaboration Protocol Profile And
</tp:TransportSecurityProtocol> Agreement Technical Committee, Collaboration-Protocol
<tp:ServerCertificateRef tp:certId="CompanyA_Server
Cert"/>
Profile And Agreement Specification Version 2.0.OASIS
<tp:ClientSecurityDetailsRef tp:securityId="Company and UN/CEFACT, 2002.
A_TransportSecurity"/>
</tp:TransportServerSecurity> [8] Oberle, D., Guarino, N., and Staab, S., “What Is An
</tp:TransportReceiver> Ontology?“. In: Handbook On Ontologies, Springer, 2 nd
</tp:Transport>
Edition, 2009.
Listing 4. Generated ontology from CPP document [9] “OWL 2 Web Ontology Language Document
Overview“. W3C. 2009-10-27.
In this way we get mechanism to connect related
individuals for SPARQL queries. [10] Berners-Lee, T., Hendler, J. and Lassila, O., “The
Semantic Web“. Scientific American, Retrieved March
6. CONCLUSION 26, 2008.
[11] Anicic, N., Nenad I., and Albert J., “An Architecture
This paper presented a mapping of ebXML standards For Semantic Enterprise Application Integration
based on CPPA 2.0 specification into ontology using Standards“ . In Interoperability Of Enterprise Software
OWL. Our work was based on CPP and CPA standard, And Applications, pp. 25-34. Springer London, 2006.
but the same mapping rules can be applied to all ebXML
standards. [12] Foxvog, D., and Bussler, C., “Ontologizing EDI
Semantics“. In Proceedings of the Workshop on
Our primary goal in the future will be the automation of Ontologising Industrial Standards, pp. 301-311. Springer,
negotiationing between different company’s CPPs, and an Tucson, AZ, USA, 2006.
important step towards its achievement is the mapping e-
business standards to ontologies. Using our new approach [13] Haller, A., Gontarczyk, J., and Kotinurmi, P.
we will try to match the ontologies parts that are the most “Towards A Complete Scm Ontology: The Case Of
relevant in negotiations using SPARQL queries and form Ontologising Rosettanet“. In Proceedings Of The 2008
document with best matches and document with conflicts. Acm Symposium On Applied Computing (pp. 1467-
In this way we made this standards OWL aware and we 1473). ACM, 2008.
are able to create reasoning rules because there exist inter- [14] Ferdinand, M., Christian, Z. and David, T. “Lifting
relationships among elements which are not presented in XML schema to OWL.” In Web Engineering, pp. 354-
CPP format. 358. Springer Berlin Heidelberg, 2004.
In parallel, we are building a system for the [15] Bohring, H. and Sören, A. “Mapping XML to OWL
automatization of the CPA formation process, including Ontologies.” Leipziger Informatik-Tage 72, pp. 147-156,
the CPA composition in a bound environment. De facto, 2005.
we will try to expand and elaborate early mentioned step
5 in Fig. 2. [16] Anicic, N., Ivezic, N. and Marjanovic, Z. “Mapping
XML schema to OWL.” Enterprise Interoperability.
Springer London, pp. 243-252, 2007.
7. REFRENCES
[17] Bedini, I., et al. “Transforming XML schema to
[1] Liegl, P., et al. “State-Of-The-Art In Business OWL using patterns.” Semantic Computing (ICSC), 2011
Document Standards“.Industrial Informatics (Indin), 2010 Fifth IEEE International Conference on. IEEE, 2011.
8th Ieee International Conference On. IEEE, 2010. [18] Coppens, S., Vander Sande, M., Verborgh, R.,
[2] Hofreiter, B., and Christian, H. “B2B Integration - Mannens, E., and Van de Walle, R. “Reasoning over
Aligning Ebxml And Ontology Approaches“. Eurasia- SPARQL”. In Proceedings of the 6th Workshop on
Ict:Information And Communication Technology. Linked Data on the Web, 2013.
Springer Berlin Heidelberg, 2002. pp. 339-349, 2002. [19] Dogac, A., et al. “Enhancing ebXML Registries To
[3] Becker, M. “Electronic Data Interchange (EDI) Make Them OWL Aware“. Distributed And Parallel
(Interoperability Case Study)“. Berkman Center Research Databases 18(1), pp. 9-36, 2005.
Publication(2012-5), 2012. [20] Eiter, T., Ianni, G., Polleres, A., Schindlauer, R., and
[4] Boone, Keith W. “Extensible Markup Language“.The Tompits, H., “Reasoning With Rules And Ontologies“.
Cda Tm Book. Springer London, pp. 23-34, 2011. In Reasoning Web. Springer Berlin Heidelberg. pp. 93-
127, 2006.
[5] Damodaran, S. “B2B Integration over the Internet
withXML: RosettaNet Successes and Challenges“, [21] Eleven Sparql 1.1 “Specifications Are W3C
Proceedings of the Thirteenth World Wide Web Recommendations“. W3.Org. 2013-03-21. Retrieved
Conference, pp. 188-195, 2004. 2013-04-25.

Page 203 of 478


ICIST 2014 - Vol. 1 Regular papers

Addressing the cold-start new-user Problem for


Recommendation with Co-training
Jelena Slivka*, Aleksandar Kovačević*, Zora Konjović*
* University of Novi Sad/Faculty of Technical Sciences/Computing and Control Department, Novi Sad, Serbia
[email protected], [email protected], [email protected]

Abstract—Many online domains rely on recommender recommender system. This is known as the new-user cold-
systems for personalization of the content offered to their start problem [5].
customers. In order to personalize the content for the user, In this paper we strive to alleviate the new-user
recommender systems rely on user’s rating history. When a problem by reducing the number of user’s ratings needed
new user is joining the system, the lack of rating history is a for accurate recommendation. In other words, for a given
serious problem for building a personalized model. In this new user that has rated very few items, our goal is to
paper we strive to alleviate this new-user problem by achieve the performance the recommender system would
reducing the number of user’s ratings needed for quality have if it was provided a substantial amount of user’s
recommendation. We pose the recommendation problem as ratings.
a classification problem: for each user a separate
classification model is trained. Given the item unrated by We have posed the recommendation problem as a
that user, this model predicts whether the user will like or classification problem: for each given user-item pair we
dislike the item. In order to alleviate the new-user problem, predict whether the user will “like” the item (i.e. we
we employ the semi-supervised co-training algorithm. The should recommend it) or “dislike” the item (i.e. we should
co-training algorithm assumes a multi-view setting (i.e. a not recommend it). For building the classification model
defined split of features in the dataset). In this paper we we use the items previously rated by user as training data.
propose to use features based on users’ ratings as the first For the new users we have a very few training examples,
view and item description (content features) as the second but, on the other hand, we also have a huge amount of
view. We perform our experiments on a popular MovieLens unlabeled data (items that the user has not rated). This is
dataset, and show that, by using co-training, for the users an ideal setting for semi-supervised learning techniques
that have rated very few items; we can achieve the that are, under certain conditions, capable of producing a
performance a supervised system would have, given a huge high quality classifier from only a small amount of labeled
rating history for that user. data and a sufficiently large amount of unlabeled data. We
have chosen to use a broadly used semi-supervised
technique – the co-training [6] algorithm. Co-training
I. INTRODUCTION implies a multi-view problem setting (i.e. the feature set of
Users today are faced with an excessive number of each example can be naturally partitioned into two distinct
choices, whether they are shopping, looking for feature sets called views). This also suits our
restaurants, movies or even education. Thus, many online recommendation problem very well as we can easily
domains rely on recommender systems to personalize the define different views that describe the items. For
desired content for each user [1]. Amazon1 uses example, information about users’ ratings can be treated
recommender algorithms to personalize shopping as the first view and item description can be treated as the
experience for customers [2]. MovieLens2 gives second view.
personalized recommendations and rating prediction for In this paper we propose a novel multi-view, hybrid
the movies user has not rated [3]. TripAdvisor3 is a travel recommender system based on co-training that considers
search engine that assists consumers in search for travel ratings given by other users’ in the system as the first view
information [4]. (collaborative filtering predictor), and item description as
There are three major approaches for building a second view (content-based predictor).
recommender systems: content-based filtering, In the initial experiments with our algorithm, we
collaborative filtering, and hybrid filtering which consider the movie recommendation problem, by using a
combines the two former approaches. Collaborative subset of popular MovieLens4 corpus. We use information
filtering algorithms are based on users’ similarity – the about movie genre and movie plot description found on
assumption is that the users with similar tastes will rate IMDB5 as the content description for each movie.
items similarly. On the other hand, Content-based filtering In the paper we test several co training settings:
approaches offer recommendations based on the rating standard co-training [6] run using a “natural” feature split,
history of a user and item content relevance. The co-training run with random feature split, Majority Vote of
assumption of both approaches is based on user’s rating several different co-training classifiers run with random
history, and the lack of needed ratings when the new user feature split, and, finally, Random Split Statistic
joins the system can seriously hurt the performance of the algorithm (RSSalg) we have developed earlier in [7] with
1
http://www.amazon.com/
2 4
http://www.movielens.org/ http://files.grouplens.org/papers/ml-10m.zip
3 5
www.tripadvisor.com http://www.imdb.com/

Page 204 of 478


ICIST 2014 - Vol. 1 Regular papers

the goal of boosting the performance of co-training. We  Authors in [9] propose a purely content-based
define several “natural” feature splits based on different recommender system, while we propose a hybrid
view combinations. As the first view we propose to use recommendation system that utilizes not only item
the users’ ratings, and, for the second view, we description, but also rating histories of other users’ in the
experiment with using just genre information, using just system.
plot description, and using the combination of plot and  In [9], single-view recommendations are based on
genre features. We also do an additional experiment with finding user’s k nearest neighbors and assigning the user’s
a purely content-based recommender: we employ genre rating based on most frequent rating from the obtained
features as the first view and plot features as the second nearest neighbor set. This rating is also assigned a
view. We show that, by employing co-training, for each prediction score based on the combination of distance
feature split we are able to boost the performance of the measures between the given movie and its nearest
initial classifier and even achieve the performance the neighbors and number of times the given rating was
supervised classifier would have given a huge rating assigned to nearest neighbor movies. In this paper, we
history. The best performing settings in our experiments treat both item description and users’ rating as features in
were both co-training and RSS algorithms when applied the classification problem and employ a machine learning
with users’ ratings as the first view and the combination of algorithm in order to predict the rating. Our proposed
genre and plot features as the second view, or users’ approach can be used with any classifier that can handle
ratings as the first view and the just genre features as the missing data.
second view. We also show that using a purely content
predictor constructed with plot and genre features can be  After single-view score assignment and
very useful in combination with co-training if there are no recommendation in [9], multi-view profile enrichment is
available ratings from other users. applied. In the co-training process, only the movies rated
with the same score by all of the views are added to the
This paper is organized as follows. Section 2 presents training set. In our setting, standard co-training [6] is
the related work. Section 3 describes our methodology. applied for user enrichment.
Section 4 presents the experiments conducted in this paper
and achieved results. Finally, section 5 concludes the In [10] an idea to create a hybrid recommender system
paper and gives directions for future work. by combining content and social information with co-
training in order to induce a more accurate model of
II. RELATED WORK human preferences is proposed. However, a concrete
methodology or experimental results are not provided.
As a first view in our multi-view setting we use a Authors in [11] develop a hybrid recommender system.
collaborative-filtering predictor. Authors in [8] treat CF They analyze product descriptions and user behaviors in
as a classification problem and discretize ratings into a order to automatically extract semantic attributes. The
small number of classes. They define their CF method as process of semantic attribute extraction is facilitated by
a machine learning framework and build a separate model semi-supervised learning. After extraction, a Naive Bayes
for each user in the database. We have adopted this classifier is used in order to implement a content-based
approach for our CF predictor. recommender system. In contrast to [11], we employ
By applying semi-supervised learning techniques we semi-supervised learning directly in the context of
are able to induce high quality classifiers from only a recommender system training.
small amount of labeled examples, thus greatly reducing
the manual work needed for labeling training sets. One III. METHODOLOGY
major semi-supervised learning technique is co-training
[6], which is based on multi-view learning. The goal of In this section we describe the hybrid multi-view
multi-view learning is to combine predictors derived from recommendation system proposed in this paper. As
each of the views separately, in such a way that the proposed in [10], the two views used in our framework to
resulting combined predictor outperforms predictors describe the data are:
trained on each of the views separately, or predictors  social information (i.e. users’ ratings) which is used
trained on trivial combinations of the views. Although to construct a collaborative filtering predictor, and
both multi-view and semi-supervised learning seem to fit  content information which is used to construct a
perfectly as a solution for cold-start problem in content-based predictor.
recommendation settings, there are very few papers that In order to apply classic co-training [6] we have
utilize them for recommendation problem [9]. defined item recommendation as a classification problem.
In [9] authors develop a content-based movie As in [8], we are trying to induce a model for each user
recommendation system that integrates the content from separately, that will allow the classification of items
three different data sources associated with movies unseen by that user in two classes – Like and Dislike. We
(image, text and audio). Each data source, i.e. media type discretize the rating value in these two classes by defining
is considered to be a different view of the data in the a rating value threshold t and treating the ratings that
designed multi-view framework. The authors employ co- exceed this threshold as label Like, and the rest of the
training in order to enrich the user profile in case where ratings as label Dislike.
there are only a few rating histories for a given user. In sections A and B we describe the way the two
Similar to [9], we also employ co-training with the same single-view predictors that will be used in our co-training
goal of alleviating the recommendation process for the setting are created. In section C we describe our co-
new users; however there are several important training settings.
differences between our work and work presented in [9]:

Page 205 of 478


ICIST 2014 - Vol. 1 Regular papers

A. First view: A collaborative filtering predictor  User_Genre&Plot: User view treated as the first view
Users’ rating data can be represented as a sparse matrix and Genre&Plot view treated as the second view.
where rows (training examples) correspond to items and We apply several different co-training settings:
columns (features) correspond to users’ ratings for given  Natural: standard co-training, as proposed in [6]
items [8]. We refer to the matrix as sparse because most of applied with “natural” feature splits defined above,
the values are missing (users typically rate a small subset
of all possible items). The value of attribute u for training  Random: standard co-training applied with random
example i corresponds to the rating6 given by user U to feature split (obtained by randomly splitting all
item I. We will refer to the feature set constructed this way available features from both views in two feature sets)
as User view.  Majority Vote (MV): algorithm that constructs the
The prediction task can be seen as filling in the missing ensemble of diverse co-training classifiers by creating
values of the matrix. We build a separate model for each a number of different random feature split and using
user by treating the corresponding user’s rating feature as them to train different co-training classifiers. MV
label. For each user, we use the items that the user has combines the predictions of the obtained ensemble in
rated as training data for our model. The rest of the items a simple majority vote fashion [7].
(the user has rated) are used as examples for which we  Random Split Statistic Algorithm (RSS): algorithm
need to induce the label (user’s rating for the item). we have developed earlier in order to boost the
After the data is represented in the described way, we performance of co-training and enable its application
apply a machine learning algorithm that can tolerate to single-view datasets [7]. In the same way as MV,
missing values in order to train the model for each user7. RSS trains diverse co-training classifiers. The training
However, for the new users that have rated just few of the set produced by each co-training process is different
items, the resulting model would be very weak due to a and it consists of initially labeled examples and
small number of training examples. examples labeled in co-training process. All of these
co-training results are processed by selection of the
B. Second view: A content-based predictor examples that appear in most resulting training sets
and for which most of the resulting co-training
The second view of the data consists of features
classifiers agree on the label. The final training set is
extracted from item description. In the experiments
formed from these selected results and it is used for
conducted in this paper, as the item description in the
learning a model with much higher classification
context of movies, we use text data (movie plot
performance then the initial model trained solely on
description, which we refer to as Plot view), list of movie
labeled data. Finally, we will dente RSS optimized on
genres (which we refer to as Genre view) and the
the test data (upper bound performance for RSS) [7] as
combination of these two feature sets constructed by
RSS_best.
putting all of these features together (referred to as
Genre&Plot view). Each genre that appears in the dataset
is represented as a binominal feature that can have values IV. EXPERIMENTAL RESULTS
true (the movie belongs to this genre) or false (the movie For evaluating the performance of our solution, we
does not belong to this genre). have used a popular MovieLens dataset. We adopted a
subset of this data for which the authors in [12] harvested
C. The applied co-training settings content-descriptions from the IMDB Web site. We have
In its original form, co-training is applicable to the processed the movie plot descriptions by applying
dataset that has a natural partitioning of the features in tokenization, conversion to lower case, a stop-word filter
two disjoint subsets (views) where each view is sufficient and Porter’s stemmer. From the words obtained this way
for learning and conditionally independent of the other we have built a dataset based on bag-of-words model,
view given the class label [6]. Co-training exploits the using the TF-IDF (term-frequency-inverse-document-
two views in order to train two classifiers using the frequency) measure as the value of the word in the
available training examples. Then, iteratively, each obtained feature vector. We have discretized the ratings (1
classifier selects and labels some unlabeled examples in to 5) in the following way: {1,2,3}→Dislike, {4,5}→Like.
order to improve the accuracy of the other classifier by In order to evaluate co-training performance we have
providing it with unknown information. This is an ideal used the stratified 10-fold-cross validation described in
setting for movie recommendation as we can easily [7]. In the standard 10-fold-cross validation procedure the
obtain information about the movie from several different experimental data is divided in 10 folds, and in each of
sources of information. the 10 rounds, a different fold (10% of the data) is used
In this paper we experiment with several different for testing, while the remaining 9 folds (90% of the data)
“natural” feature splits: are used for training. Typically, co-training uses only a
small amount of both labeled and unlabeled data, and
 User_Plot: User view treated as the first view and
applying the standard 10-fold-cross validation procedure
Plot view treated as the second view.
on co-training results with many examples being omitted
 User_Genre: User view treated as the first view and from both testing and training data. In order to better
Genre view treated as the second view. utilize the available data, the size of the test set is
6
increased in order to improve the evaluation without
Instead of actual rating value, we use our derived classes, i.e. Like and
Dislike, because this approach yielded with a slightly better
significantly reducing the quality of the obtained
performance in our experiments classifier. Thus, we divide the data in 10 stratified folds.
7
Authors in [8] also propose a way to transform this representation in In each round of 10-fold-cross validation process, a
order to apply machine learning algorithms that cannot handle missing different fold is selected for random selection of required
values

Page 206 of 478


ICIST 2014 - Vol. 1 Regular papers

number of labeled training examples. The remaining data User4 are presented in Figures 1-4. Also, in Table 1 we
from that fold, as well as 5 adjacent folds are used as give details about the accuracy achieved for User1 by
unlabeled training examples, and finally, the remaining 4 different co-training settings. Due to space limitations we
folds are used as testing examples. In this way, in each omit details about F-measure, as well as other users, as
round, 60% of the data is used for training and remaining they all display the same behavior.
40% of the data is used for testing. Each fold is used The different settings we use (horizontal axes) are:
exactly once for the selection of labeled data, five times it
is included as unlabeled data and four times it is used as a  L: NB classifier trained on the labeled portion of the
part of the testing set. dataset (6 examples);
The base classifier used in co-training algorithm is  All: NB classifier trained on both labeled examples
Naive Bayes (NB). This classifier was chosen both for its and unlabeled examples that are assigned the correct
speed (which is an important factor due to the complexity label. This is the goal performance we would like to
of RSSalg) and ability to handle missing data. achieve with co-training;
The problem of movie rating prediction can be highly  Natural, Random, MV, RSS and RSS_best are the Co-
imbalanced – before watching a movie users usually refer Training settings introduced in section III.C.
to the plot description, producer, cast and other factors The listed algorithms are tested using the different views
they find important in order to see whether the movie introduced in section III.C.
appeals to them [9]. The consequence is that users will
generally watch and rate movies that they like. Thus, as
the measure of performance we use both accuracy and
micro and macro f-measure [13].
In our experiment setting we assume the scenario
where we already have a number of users in the system
and the new user is joining the group. New users have a
very small number of ratings. In our experiments we
make them rate only 3 movies that they like and 3 movies
that they dislike, i.e. the small initial training set L for co-
training consists of 3 positive and 3 negative examples.
The number of examples labeled by co-training inner
classifiers in each iteration are chosen proportional to the
class distribution in the dataset as suggested in [6]. The
size of the unlabeled pool is 50. We run co-training
algorithm until we label all unlabeled data. The number
of different random splits used in RSSalg is 100. All these
parameters were empirically chosen.
For the User view we form for each user we use the top
50 users in terms of number of rated movies. Ideally, we
Figure 1 User1: Accuracy, macro, and micro F-measure, respectively.
would want to utilize all users in the system and perhaps The baseline accuracy is 74.0%, baseline macro F-measure is 42.7%
apply some dimensionality reduction technique, or and baseline micro F-measure is 74.6%. The number of annotated
choose the subset of users most strongly correlated examples used in All is 541, and in L and co-training settings is 6.
(positively or negatively) to the new user. However, due
to the small number of ratings we have for the new user,
we are unable to obtain a reliable similarity measure in
order to determine those users. Thus, for now, in these
initial experiments we simply use the top 50 users
because of their high rating number which makes the
User view less sparse.
For our experiment we have chosen 4 random users
that we will treat as new users. The first two users (User1
and User2) belong to the group of top 50 users. The
second two users (User3 and User4) do not belong to this
group. As mentioned before, for each user we leave only
3 positive and 3 negative ratings (randomly chosen) and
treat all other ratings as unlabeled/test data.
The reason we chose to take users from two different
groups is to ensure that we are not getting good results
only because the chosen users are highly correlated to the
used set of top 50 users. By utilizing all ratings, for each
“new” user we have calculated 50 most similar users. It
turned out that most of the top similar users (more than Figure 2 User2: Accuracy, macro and micro F-measure, respectively.
25) of User1 and User2 actually belong to the top 50 The baseline accuracy is 74.6%, baseline macro F-measure is 42.7%
users, but both User3 and User4 only have one user that and baseline micro F-measure is 74.4%. The number of annotated
belong to both top rated and top similar groups. examples used in All is 576, and in L and co-training settings is 6.
Accuracy, macro and micro F-measure for User1–

Page 207 of 478


ICIST 2014 - Vol. 1 Regular papers

Figure 3 User3: Accuracy, macro and micro F-measure, respectively. Figure 4 User4: Accuracy, macro and micro F-measure, respectively.
The baseline accuracy is 67.4%, baseline macro F-measure is 40.2% The baseline accuracy is 75.9%, baseline macro F-measure is 43.1%
and baseline micro F-measure is 67.2%. The number of annotated and baseline micro F-measure is 75.9%. The number of annotated
examples used in All is 256, and in L and co-training settings is 6 examples used in All is 249, and in L and co-training settings is 6.

TABLE I.
USER1: ACCURACY AND STANDARD DEVIATION FOR DIFFERENT ALGORITHMS/FEATURE SPLIT. THE SIZE OF THE ANNOTATED SET USED IN ALL IS 541,
AND THE SIZE OF ANNOTATED SET USED FOR L AND CO-TRAINING SETTINGS IS 6

User Genre Plot User_Genre User_Plot User_Genre&Plot Genre_Plot


L 74.7±8.2 48.3±8.4 53.9±11.5 74.5±8.1 62.2±15.7 62.1±15.7 54.1±11.6
All 79.9±3.2 73.9±1.3 66.0±2.4 79.7±3.2 79.9±2.8 79.9±3.0 67.1±2.2
Natural 80.7±2.9 72.3±16.4 80.4±3.2 64.7±2.2
Random 80.2±3.7 71.3±16.7 71.3±16.9 62.7±2.0
MV 80.4±3.1 76.4±11.2 75.8±12.7 65.1±3.0
RSS 80.3±3.3 80.0±4.2 79.8±3.8 70.0±4.8
RSS_best 80.9±3.2 81.0±3.9 81.0±3.8 74.3±0.5

Figures 1-4 and Table 1 show that the behavior of our description, and these highly correlated attributes are
algorithms is similar for all users. We can draw the probably the reason for poor performance.
following conclusions: However, although features from Genre and Plot
 Out of all single-views, in the terms of accuracy and seem useless when compared to User view in a single-
macro and micro F-measure different settings have view setting, we can see from the results that they are
achieved using that view, User view is the strongest. This very useful in a multi-view setting:
is not surprising as CF algorithms are shown to generally  For all view combinations, Natural was able to
outperform content-based algorithms, but also due to the greatly improve the performance of the weak initial
fact that we are using very basic features for the content classifier (L). Its performance is substantially better from
classifier which we tend to improve in the future. both the combined version of the views and single views
 The weakest single view is the Plot view. We have (e.g. Natural applied to User_Genre split is better than L
constructed the Plot view based on IMDB plot on User_Genre, L on Genre alone and L on User alone).
descriptions and as the authors in [9] notice – the Also, for all multi-view settings except Genre_Plot, the
storyline in IMDB is generally short and the resulting performance of Natural is in the rank of performance of
dataset is very sparse, which is probably the reason for All setting, meaning that by applying co-training on a
the poor performance of the Plot view. very few labeled examples we have succeeded to achieve
 For NB single-view performance (L and All), the performance we could achieve if we had a large
combining User view with other views (i.e. User_Genre, number of labeled examples. For Genre_Plot the
User_Plot and User_Genre&Plot) only slightly (if at all) performance of Natural is slightly worse than All setting
improves the performance of the User view. for the same view, but it is still able to greatly improve
the performance of the weak initial classifier (L setting on
 The combination of Plot and Genre views (i.e. the Genre_Plot view).
Genre_Plot) is the weakest one, in some cases even
weaker than Genre and Plot views alone (for both L and  As expected, for all settings Random achieves worse
All). Even if we had labels for all training data (All), performance than Natural. MV is better than Random and
Genre_Plot combination is worse than User trained on Natural and even slightly surpasses the performance of
the labeled data alone (L). This is not surprising as All (it achieves a slightly better accuracy, but also better
Authors in [9] state that there is a high correlation micro and macro F-measure). RSS is slightly worse than
between the movie genre and words in the plot MV, but it is in the rank of All setting. Finally, the

Page 208 of 478


ICIST 2014 - Vol. 1 Regular papers

RSS_best setting outperforms all settings, but as it is would have given a huge rating history. The results
optimized on the test set, it can only be seen as the upper presented in this paper are just preliminary experiments
bound for the performance of RSS algorithm [7]. performed in order to get a general idea whether a hybrid
 As for the views, the best combinations are multi-view system we proposed could address the new-
User_Genre and User_Genre&Plot (whose performances user problem.
are approximately the same). There are many ways to extend this work in the future.
 Not surprisingly, the Genre_Plot multi-view setting First, we plan to enrich the content-based predictor used
has the worst performance of the multi-view settings. RSS as the second view in our co-training based framework
and RSS_best are able to boost the performance of the with other kinds of information, such as those used in [9].
weak initial classifier beyond the performance of All In the experiments presented here, we have used only a
setting for the Genre_Plot view combination. This is subset of top 50 users in the terms of number of given
consistent with findings in [7] – RSSalg has the best ratings. We intend to experiment by utilizing all available
performance if the features are highly redundant, and in user ratings. A task for the future is also to see whether
Genre_Plot view where we have correlated features [7]. our framework can also be applied to address the new-
item problem.
 The performance of RSS_best in this case is in the
rank of All setting for the single Genre view. However, in ACKNOWLEDGMENT
some cases (e.g. User2), this performance can still be
worse than the performance L setting has on the User Results presented in this paper are part of the research
view. However, it should be noted that this combination conducted within the Grant No. III-47003 financed by the
can still be very useful. Consider the situation where the Ministry of Education and Science of the Republic of
new user has only rated the movies that none of the other Serbia.
users has rated. In this situation, applying co-training, or,
better, RSS algorithm with this view combination can REFERENCES
significantly boost the performance of the initial [1] P. Resnick, and H. Varian, “Recommender Systems,”
classifier. Finally, we should note that these are only the Communications of the ACM 40(3), 56–58, 1997.
initial experiments. In the future we plan to enhance the [2] G. Linden, B. Smith, and J. York, „Amazon.com
content classifier, e.g. by including semantic [9]. Recommendations: Item-to-Item Collaborative Filtering,“ IEEE
Internet Computing 7(1), 76–80, 2003.
V. CONCLUSION [3] Y. Chen, M. Harper, J. Konstan, and X. Li, „Social Comparisons
and Contributions to Online Communities: A Field Experiment on
In this paper we address the new-user problem in a MovieLens,“ .American Economic Review 100(4), 2010.
recommender system, i.e. the situation where, due to the [4] Y. Wang, S.C. Chan, and G. Ngai, “Applicability of Demographic
lack of user’s rating history, the recommender system is Recommender System to Tourist Attractions: A Case Study on
unable to give quality personalized predictions. We have Trip Advisor,” Web Intelligence/IAT Workshops pp. 97-101, 2012.
posed the recommendation problem as a classification [5] G. Adomavicius, and A. Tuzhilin, “Toward the next generation of
recommender systems: A survey of the state-of-the-art and
problem. For each user, we build a separate model which, possible extensions,” IEEE Trans. on Knowledge and Data
given an item unrated by user, predicts whether the user Engineering, vol. 17, pp. 734-749, June 2005.
will like or dislike the item. In order to alleviate the new- [6] A. Blum, and T. Mitchell, “Combining labeled and unlabeled data
user problem, we propose a multi-view hybrid with co-training,” Proc. Eleventh Annual Conference on
recommendation system that uses other users’ ratings as a Computational Learning theory COLT'98, ACM, pp 92-100, 1998.
first view and item description data as the second view. [7] J. Slivka, A. Kovačević, and Z. Konjović, "Combining co-training
We apply our algorithm to the problem of movie with ensemble learning for application on single-view natural
recommendation and use the popular MovieLens dataset. language datasets,” Acta Polytechnica Hungarica, Vol. 10, No 2,
As item description we use movie genre and plot pp. 133-152, 2012.
description. We have tested several co-training settings [8] D. Billsus, and M. Pazzani, “Learning collaborative information
filters,” Int’l Conference on Machine Learning, Morgan
using different co-training algorithms (classic co-training Kaufmann Publishers, 1998.
[6] and RSSalg [7]) and different view combinations. In [9] W. Qu, K-S. Song, Y-F. Zhang, S. Feng, D-L. Wang, and G. Yu,
all settings, by applying co-training we were able to “A Novel Approach Based on Multi-View Content Analysis and
improve the weak initial classifier (a supervised Semi-Supervised Enrichment for Movie Recommendation,”
algorithm trained on the labeled version of the data) and Journal of Computer Science and Technology 28(5): 776-787,
even achieve the performance a supervised classifier September 2013.
would have if we had labels for all training data (both [10] J. Delgado and N. Ishii, "Formal Models for Learning of User
labeled and unlabeled). The best performing settings in Preferences, a Preliminary Report," Proc. Int’l Joint Conf. on
Artificial Intelligence (IJCAI-99), Stockholm, Sweden, July, 1999.
our experiments were both co-training and RSSalg
[11] R. Ghani, and A. Fano, "Building recommender systems using a
applied with users’ ratings as the first view and the knowledge base of product semantics," Proc. Workshop on
combination of genre and plot features as the second Recommendation and Personalization in E-Commerce, at the 2nd
view, or users’ ratings as the first view and just genre Int'l Conf. on Adaptive Hypermedia and Adaptive Web Based
features as the second view. We also show that using a Systems, Malaga, Spain, May 2002.
purely content predictor constructed with plot and genre [12] D. Jannach, L. Lerche, F. Gedikli, and G. Bonnin, "What
features can be very useful in combination with co- recommenders recommend - An analysis of accuracy, popularity,
training if there are no available ratings from other users. and sales diversity effects," 21st Int’l Conf. User Modeling,
Adaptation and Personalization (UMAP 2013), Rome, Italy, 2013.
In the results presented here, we showed that by [13] M. Sokolova, and G. Lapalme, “A systematic analysis of
starting from just 6 rated items, by employing co-training, performance measures for classification tasks,” Inf. Process.
we can achieve the performance a supervised system Manage, 45(4): 427-437, 2009.

Page 209 of 478


ICIST 2014 - Vol. 1 Regular papers

An Approach to Consolidation of Database


Check Constraints
Nikola Obrenović*, Ivan Luković**
* Schneider Electric DMS NS Llc., Novi Sad, Serbia
** Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia
[email protected], [email protected]

Abstract—Independent modeling of parts of an information an IS. A subschema is formally consolidated with its
system, and consequently database subschemas, may result schema if each its concept, such as domain, attribute,
in conflicts between the integrated database schema and the relational scheme or constraint, is consolidated with the
modeled subschemas. In our previous work, we have appropriate concept of the schema. Also in [2], the author
presented criteria and algorithms for resolving such presents algorithms for checking consolidation of
conflicts and a consolidation of a database subschema with subschemas with the schema with respect to domains,
the database schema with respect to various database relational schemes, primary keys, uniqueness constraints,
concepts, e.g. domains, relation schemes, primary key referential integrity and inverse referential integrity
constraints, etc. In this paper we present an approach and constraints. The proposed algorithms are also
new algorithms for identification of conflicts and subschema implemented in IIS*Case.
consolidation against check constraints.
This paper extends our previous work with an algorithm
for consolidation of subschemas with respect to check
I. INTRODUCTION constraints. So as to provide consolidation test, we need to
find a solution to the implication problem for two check
Modeling of relational database schemas can be constraints, i.e. how to detect if one check constraint is a
performed in two ways: 1) directly, by having the entire logical consequence of the other one. However, the nature
database schema modeled at once, and 2) part-by-part, by of check constraints is different from other types of
modeling independently parts of database schema, i.e. database constraints, since they represent complex logical
subschemas. When the schema is modeled part-by-part, expressions. Consequently, a test of logical consequence
the created subschemas need to be integrated into a of check constraints requires different methods than those
unified database schema. used for other types of database constraints, such as
Our previous work advocates that an Information functional dependencies and keys. Therefore, another goal
System (IS), including a relational database schema as its of this work is to formulate the appropriate method for a
underlying foundation, should not be designed directly. test of logical consequence of check constraints.
Designing the whole IS at once can easily overcome Beside the Introduction and Conclusion, this paper
designer’s capabilities and result with a model of poor consists of four sections. The related work is presented in
quality ([1, 2]). Therefore, we have developed a Section 2. In Section 3, we present the algorithm for
methodology for a gradual development of a database subschema consolidation with respect to check constraints
schema ([1, 3]), followed by a tool supporting the only. The method for testing implication of two check
methodology, named Integrated Information Systems constraints is presented in Section 4. In Section 5, some
CASE or IIS*Case for short. In IIS*Case, designers details of the algorithm implementation are discussed.
specify models of isolated parts of an IS, i.e. information
subsystems, in an independent way, by using a platform- II. RELATED WORK
independent model (PIM) of form types ([1, 4]). A form
type concept is an abstraction of screen forms or In [6], the authors have presented an overview of
documents that users utilize to communicate with the IS. methodologies and techniques for integration of
By specifying form types of an information subsystem, independently modeled relational database subschemas
designers also specify a database subschema with its and detection of conflicts between the subschemas. By
constraints, as it is presented in [1, 4]. that, at least one of the presented methodologies
By applying a number of algorithms, IIS*Case addresses conflicts at the level of attribute naming,
transforms a set of form types into a relational database domains, cardinalities, primary keys or entity usage.
schema. The formal description of the transformations is However, none of the methodologies considers conflicts
out of the scope of this paper and can be found in [4, 5, 1]. between check constraints. Furthermore, to the best of
Thereby, a database subschema is obtained from the set of our knowledge, such a methodology has not been defined
form types specified at the level of an information yet.
subsystem. Similarly, the database schema of the whole IS Also in [6], the authors concluded that the most of
is derived from the union of form types of all information surveyed methodologies propose general guidelines for
subsystems. subschema integration, but they lack an algorithmic
In [2], it is shown that each database subschema must specification of integration steps. On the other hand, we
be formally consolidated with the integrated schema, in propose the subschema integration which is formally
order to obtain a valid specification and implementation of defined and implemented in IIS*Case.

Page 210 of 478


ICIST 2014 - Vol. 1 Regular papers

III. CONSOLIDATION OF CHECK CONSTRAINTS


FACULTY r
In IIS*Case, check constraints can be modeled at the
level of domain, attribute or component type, which is a
FacId, FacName
logical part of a form type ([7]). When check constraints
are transformed from the model of form types into the
relational model, they become check constraints at the
level of domain, attribute or a set of relational schemes, DEPARTMENT r,i,u,d
respectively ([8]).
The concepts of domain and attribute are modeled in DepId, DepName, DepBudget, DepResearchBudget,
the scope of entire IS, i.e. a domain or attribute inherits the DepConfBudget
same definition in each information subsystem as it is
defined in the scope of the IS. Consequently, a check
constraint at the level of a domain or attribute in a Figure 2. FACULTY ORGANIZATION form type
database subschema is identical to the check constraint at
the level of the same domain or attribute in the database Form type UNIVERSITY ORGANIZATION consists
schema. of three component types: UNIVERSITY, FACULTY and
DEPARTMENT, used for viewing and manipulating data
On the other hand, a component type check constraint is
about the university, faculties and the belonging
modeled in the scope of an information subsystem. A
departments, respectively. In order to control budget
database subschema is a result of transformation of form
levels of the faculties and departments, the following
types that represent one information subsystem. Likewise,
check constraint is modeled in the DEPARTMENT
a relational database schema is obtained by transforming
component type:
the union of all information subsystem specifications into
the relational data model. Therefore, after transformations, DepBudget > DepResearchBudget AND
a component type check constraint exists both in the DepResearchBudget > DepConfBudget.
database schema and in the appropriate database Form type FACULTY ORGANIZATION consists of
subschema. the same two component types FACULTY and
DEPARTMENT. However, component type FACULTY
Two or more information subsystems can contain
is used only for viewing existing faculties, while
model of the same data, each of them from its own point
DEPARTMENT is also used for insert, update and delete
of view. Consequently, two information subsystems can
operations. This information subsystem might be designed
impose different constraints over the same set of data. In
by another designer, whose point of interest differs from
other words, two check constraints from different
the first one. Therefore, he or she could model a different
information subsystems may refer to the overlapping sets
check constraint over DEPARTMENT component type:
of relation schemes of database schema, which is
illustrated in Example 1. DepBudget > DepConfBudget.
Example 1. Let us consider two form types Form types UNIVERSITY ORGANIZATION and
UNIVERSITY ORGANIZATION (Figure 1) and FACULTY ORGANIZATION are transformed into the
FACULTY ORGANIZATION (Figure 2) which belong to following set of relation schemes given in the form
different information subsystems of a university IS. The N(R,K), where N is the name of the relation scheme, R is
form type UNIVERSITY ORGANIZATION is used for the set of attributes and K is the set of keys:
manipulation of information about faculties and their • University({UniId, UniName, UniShortName},
respective departments at the level of the whole {UniId});
university. On the other hand, the form type FACULTY • Faculty({FacId, FacName, FacShortName, Dean,
ORGANIZATION is used at the faculty level for FacBudget, UniId},{FacId}); and
manipulating data about faculty departments.
• Department ({FacId, DepId, DepName, DepBudget,
DepResearchBudget, DepConfBudget, UniId},
UNIVERSITY {FacId+DepId}).
r
Thereby, relation scheme Department inherits check
UniId, UniName, UniShortName constraints from both component types.
With respect to all constraints, a subschema is
consolidated with its schema iff for each schema
FACULTY r,i,u,d constraint of interest, there is an equally strong or stronger
constraint in the subschema ([2]). Thereby, a schema
constraint is of interest if it affects data modeled through
FacId, FacName, FacShortName, Dean, FacBudget
the observed subschema.
Hence, for each check constraint in the schema, the
consolidation algorithm first determines the information
DEPARTMENT r,i,u,d
subsystems, i.e. their database subschemas, which the
observed check constraint is of interest for. In the
DepId, DepName, DepBudget, DepResearchBudget, following text, we denote these subschemas as the
DepConfBudget corresponding subschemas. Further, in each of the
corresponding subschemas, the algorithm checks if there
Figure 1. UNIVERSITY ORGANIZATION form type.

Page 211 of 478


ICIST 2014 - Vol. 1 Regular papers

PROCESS CheckCheckConstraints(I(S, ICC, SISUB), IV. IMPLICATION PROBLEM OF CHECK CONSTRAINTS


O(Ind, Report), As it is presented in the consolidation algorithm, in
IO( )) order to check consolidation between a database
SET Report ¬ ∅ subschema and a database schema, we need to be able to
SET Ind ¬ True determine whether a subschema check constraint implies
the corresponding schema check constraint, i.e. we
DO CheckEachCheckConstraint (∀iS ∈ ICC) evaluate validity of the formula:
DO CheckEachSubschema (∀(Si, Ii) ∈ SISUB) (1) iSS ⇒ iS,
IF Attr(iS) ∩ Attr(Si) ≠ ∅ THEN where iSS is a subschema check constraint and iS is the
IF Attr(iS) ⊆ Attr(Si) THEN corresponding schema check constraint. In the further text,
SET Found ¬ False we denote formula (1) also as the check constraint
implication formula.
DO CheckSubschemaConstraints (∀iSS ∈ Ii)
The body of a check constraint is a logical expression.
IF iSS ⇒ iS THEN Its interpretation, i.e. evaluation, is a three-state Boolean
SET Found ¬ True function which evaluates to true, false or unknown. Its
BREAK result determines whether a tuple satisfies (true), violates
END IF (false) or neither satisfies nor violates the constraint
(unknown). For the sake of simplicity, the term check
END DO constraint is further also used to denote the logical
IF Found = False THEN expression of the constraint.
SET Ind ¬ False In the further text, it is assumed that all check constraints
SET Report ¬ Report ∪ (Si, iS) are given in the conjunctive normal form (CNF):
END IF
ELSE
(2) ∧ (∨ l ) ,
m
i =1 i
where each li represents an atomic logical expression,
SET Ind ¬ False
denoted as literal. The transformation of a logical formula
SET Report ¬ Report ∪ (Si, iS) into its CNF is described thoroughly in [9].
END IF The literals of check constraints usually are not just
END DO Boolean variables or predicates. Instead, they are often
END DO expressions of various types: from integer and real, linear or
non-linear arithmetic or over date, string or set types. The
END PROCESS formal definition of a check constraint logical expression,
Figure 3. Algorithm for subschema consolidation with respect to which determines all possible literals, may be found in [7].
check constraints Example 2. The following expressions may represent
check constraint literals:
is a check constraint equally strong or stronger than the
schema check constraint. If this condition is satisfied for • A > 0;
each schema check constraint, database subschemas are • 0.3*A + B > 15;
consolidated with the schema with respect to check • DOB > ToDate(‘1900-01-01’);
constraints. The pseudo code of the algorithm is given in • SURNAME LIKE “JOHN%”; or
Figure 3.
• X IN [1,2,3,5,7,11],
In the pseudo-code, the following notions are used:
where A, B, DOB, SURNAME and X are database schema
• S – a set of relation schemes of the database schema; attributes defined over some domains, i.e. data types.
• ICC – a set of check constraints of the database Since logical expressions of check constraints normally
schema; comprise sub-expressions of various types, they are more
• SISUB – a set of pairs (Si, Ii), where Si denotes set of complex in regard to the implication problem test than
relation schemes of subschema i, while Ii denotes set database constraints of many other types. Let us observe the
of check constraints of the subschema i; following examples. Key is a typical database constraint,
formalized just with a single Boolean predicate, Key(N, X),
• Attr – a function that returns set of attributes where N is the name of the relation scheme and X is a set of
referenced by its argument, e.g. a subschema or a attributes, while functional dependency is a single Boolean
check constraint; predicate of the form X→Y, where X and Y are attribute
• Report – set of pairs (Si, iS) where iS is the schema sets ([2]). Consequently, testing the implication of functional
check constraint which makes subschema Si dependencies is a deterministic problem, for which we have
the appropriate polynomial algorithm that do not consider
unconsolidated with the database schema; and
domains of attributes in any way. On the contrary, testing
• Ind – a Boolean indicator stating whether all the implication of check constraints is, in its general case,
subschemas are consolidated with the database more complex problem, since any algorithm for this purpose
schema with respect to check constraints. needs to consider the properties, relations and operations
Proving the implication between check constraints is over domains associated to all attributes included in the
the essential part of the consolidation algorithm, and it is constraint.
presented in the following section.

Page 212 of 478


ICIST 2014 - Vol. 1 Regular papers

Proving validity of logical formula (1), where each cannot be satisfied. By using this approach, we prove
literal is a proposition, i.e. a Boolean variable, or a validity of (1) and consequently prove logical implication
Boolean predicate, is a kind of a Boolean satisfiability of check constraints.
problem (SAT problem, [10]). This class of problems
belongs to the automated theorem proving problems and V. INTEGRATION OF IIS*CASE AND SMT SOLVERS
there is a vast number for algorithms and tools named In order to test subschema consolidation, a SMT solver
SAT solvers intended for its solving ([11]). is integrated into IIS*Case in the following manner.
However, an application of SAT solving techniques for The specification of the negation of (1) is first
proving (1) would imply that each check constraint literal transformed into the form and language required by the
is treated without taking its actual meaning into account. SMT solver and written into an input file for the SMT
Also, the relations between different literals, which can be solver. With the input file, IIS*Case executes the SMT
derived from the meaning, would be disregarded, as it is solver as an external process, which tries to prove the
illustrated by the following example. satisfiability of the input formula.
Example 3. In Example 1, two check constraints are Further, the SMT solver creates an output file with the
introduced: result of the satisfiability check, which is parsed by
i1: DepBudget > DepResearchBudget AND IIS*Case. If the satisfiability check fails, the check
DepResearchBudget > DepConfBudget constraint implication formula is valid.
and The creation of the input file for SMT solver consists of
i2: DepBudget > DepConfBudget. the following steps:
These check constraints contain the following literals: 1. Transformation of the negation of (1) into CNF;
• l1: DepBudget > DepResearchBudget; 2. Preprocessing the negation of (1) in order to remove
• l2: DepResearchBudget > DepConfBudget; and expressions not supported by the SMT solver; and
• l3: DepBudget > DepConfBudget. 3. Transformation of the negation of (1) into the
language understandable by the SMT solver.
Let us further assume that i1 is a subschema check
constraint and i2 is its corresponding schema check All of these steps are further described in the subsequent
constraint. subsections.
By taking into account the transitivity property of the A. Transformation of The Check Constraint Implication
operator greater than over integer or real variables, one Formula’s Negation into CNF
can infer the following relation between the
abovementioned literals: To the best of our knowledge, all SMT solvers require
processed formulas to be represented as a set of clauses,
l1 ∧ l2 ⇒ l3. where each clause represents a conjunct of the formula’s
On the other hand, a SAT solver would treat the CNF.
operator greater than only as an uninterpreted two- Therefore, since we need to prove unsatisfiability of the
argument Boolean predicate and could not infer any negation of (1), which is:
relation between the literals. Consequently, a SAT solver
could not infer that i1 implies i2, i.e. that the subschema is (3) ¬ (iSS ⇒ iS),
consolidated with the schema with respect to check we need first to transform it into its CNF:
constraints i1 and i2. (4) iSS ∧¬iS.
Therefore, in order to prove validity of (1), we also
need to interpret the semantics of check constraint literals, Further, formulas iSS and ¬iS are replaced with their
which a pure SAT solver is not capable of. This CNF forms, respectively, in order to obtain the CNF of the
disadvantage of SAT solvers initialized development of whole (3), i.e. the set of input clauses for a SMT solver.
another research field named Satisfiability Modulo Theory
(SMT, [11, 12]). SMT algorithms represent extensions of B. Preprocessing of Check Constraint Implication
SAT algorithms with the knowledge and capability to Formula’s Negation
reason over additional theories of interest, such as: linear The state-of-the-art SMT solvers support a large
arithmetic over integer or real numbers, non-linear number of background theories ([13]). However, to the
arithmetic’s over real numbers, theory of uninterpreted best of our knowledge, none of the currently available
functions, theory of arrays, bit-vector theory, etc. By the SMT solvers supports operations over date, string or set
SMT terminology, such theory is referred to as the variables which are allowed in the definition of a check
background theory, while the reasoning methods deployed constraint.
inside a theory are named the decisions procedures. In Therefore, in order to use SMT solver for proving
analogy to SAT solvers, software tools implementing implication of check constraints, the negation of check
SMT algorithms are named SMT solvers. constraint implication formula needs to be transformed
All SMT solvers provide checking the satisfiability of a into a logical formula that can be interpreted by the SMT
logical formula and have an explicit command for this solver, i.e. a formula that does not contain date, string nor
purpose. On the other hand, most available SMT solvers set operations. By that, the resulting formula’s
do not provide an explicit command for proving the satisfiability must imply the satisfiability of the original
validity of a logical formula. However, validity proof of a formula. Additionally, the transformations need to
logical formula is a dual problem to proving its preserve as much knowledge as possible about original
satisfiability ([13]). That is, we can prove that a logical literals and relations between them. This approach of
formula is valid by proving that the formula’s negation preprocessing a logical formula before proving its

Page 213 of 478


ICIST 2014 - Vol. 1 Regular papers

satisfiability is known as the eager strategy for solving According to the first abovementioned step, since each
SMT problems ([13]). In this work, we propose the member of s1 is substring of the member of s2 at the same
following transformations of literals that contain date, position, it is concluded that l2 implies l1. Hence, each lk,
string or set operations. k∈{1,2}, is replaced with a proposition pk and (6) is
1) Trasformations of Literals Containing Date extended to the following formula:
Variables (7) p2 ∧ ¬p1 ∧ (p2 ⇒ p1).
Literals that contain date variables and operations over Since (7) is an unsatisfiable formula, it is concluded that
dates retain the same operator. On the other hand, date (5) is valid.
variables are declared as integer variables and date
3) Trasformations of Literals Containing IN
constants are converted into number of milliseconds from
January 1st 1970. By this, expressions over date variables Operators.
are transformed into expressions from linear arithmetic Literals containing IN operators are also transformed
over integer numbers. into Boolean propositions through the following three
steps, executed in the given order:
Example 4. The literal
1. Each pair of literals li and lj are transformed into
DOB > ToDate(‘1969-01-01’) proposition pi and pj, respectively, and the formula (4)
is transformed into is extended with the conjunct
DOB > -31536000000. pi ⇒ pj, i.e., ¬pi ∨ pj,
2) Trasformations of Literals Containing String iff right operand of li is a subset of the right operand
Variables of lj.
Literals containing strings are transformed into Boolean 2. If literals li and lj are identical and
propositions through the following subsequently executed 2.1. neither of them processed in step 1, they are
steps: transformed into the same proposition pi; or
1. Each pair of different literals li and lj is transformed 2.2. one of them is transformed into a proposition pk
into propositions pi and pj, respectively, and the in step 1, the other literal is transformed into the
formula (4) is extended with the conjunct same proposition.
pi ⇒ pj, i.e., ¬pi ∨ pj, 3. Each literal li containing a string variable and not
iff both li and lj contain operator LIKE and lj can be processed through steps 2 and 3, becomes a
inferred from li according to the following condition. proposition pi.
Literal lj can be inferred from li iff the following Example 6. Let us observe the following two check
relation applies between right operand ROi of li and constraint literals, belonging to the same check constraint
right operand ROj of lj. Let si be the array of strings implication formula:
created by splitting ROi by character ‘%’ and let sik be
the k-th member of that array. Analogously, let us l1: X IN [1,3,5,7,9] and l2: X IN [1,5,9].
define sj and sjk for ROj. If arrays si and sj are of the Since [1,5,9] is a subset of [1,3,5,7,9], the first
same length and each sik is a substring of sjk, literal lj transformation step is applied to the two literals, where
can be inferred from li. each lk, k∈{1,2}, is replaced with a proposition pk, and the
2. For each pair of literals containing strings, li and lj, if check constraint implication formula is extended with the
they are identical and conjunct
2.1. neither of them processed in step 1, they are p2 ⇒ p1.
transformed into the same proposition pi; or
C. Transformation of Check Constraint Implication
2.2. one of them is transformed into a proposition pk Formula’s Negation into a SMT Language
in step 1, the other literal is transformed into the
same proposition. Each SMT solver provides an input language for
specifying a SMT problem and interaction with the solver.
3. Each literal li containing a string variable and not
Also, a large number of the modern SMT solvers support
processed through steps 2 and 3, becomes a
the standardized SMT-LIB language ([14]).
proposition pi.
As none of the existing SMT solvers can solve all
Example 5. Let us define the following two check
problems, it is useful to check satisfiability of a logical
constraints, each of them containing only one literal:
formula with more than one solver. Therefore, we
i1 = l1: NAME LIKE ‘J% DOE’ and transform check constraints specifications into SMT-LIB
i2 = l2: NAME LIKE ‘JO% DOE’. language.
Let us further assume that i2 is a subschema check An input SMT-LIB file consists of three sections:
constraint while i1 is the corresponding schema check 1. declarations of attributes and functions used in the set
constraint, and that we need to prove validity of of clauses,
(5) i2 ⇒ i1, 2. the set of clauses derived from the negation of the
i.e. to prove satisfiability of check constraint implication formula and
(6) l2∧¬l1. 3. the command that starts satisfiability check.
If we split right-hand operands of each literal ik, Example 7. Let us observe the two check constraints
k∈{1,2}, over character ‘%’, we obtain the following from Example 1 and test if:
arrays: DepBudget > DepResearchBudget AND
s1 = {‘J’, ‘ DOE’} and s2 = {‘JO’, ‘ DOE’}. DepResearchBudget > DepConfBudget

Page 214 of 478


ICIST 2014 - Vol. 1 Regular papers

implies them can solve all of them. Therefore, by using the


DepBudget > DepConfBudget. standardized SMT-LIB language, it is possible to utilize
multiple SMT solvers to check satisfiability of a logical
For this purpose, a SMT-LIB file is created, with the
specification of the check constraint implication formula’s formula.
negation, as it is presented in Figure 4. As a part of our future work, we will provide
transformations of check constraint specifications into non-
The first file section contains declarations of attributes
standard SMT languages, e.g. CVC ([16]), in order to extend
referenced in the check constraints, given in the SMT-LIB
the list of SMT solvers which can be integrated with
syntax.
IIS*Case. Also, we intend to extend one of the existing SMT
The second file section contains clauses that correspond solvers with the rules for handling operations with date,
to the negation of the implication formula of the two string and set variable, as it is described in Section V.B.
check constraints:
• DepBudget > DepResearchBudget; REFERENCES
• DepResearchBudget > DepConfBudget; and [1] I. Luković, P. Mogin, J. Pavićević and S. Ristić, “An Approach to
• ¬(DepBudget > DepConfBudget). Developing Complex Database Schemas Using Form Types”,
Software: Practice and Experience, vol. 37, no. 15, pp. 1621-1656,
In SMT-LIB language, binary operators are given in the 2007.
prefix notation. [2] S. Ristić, “Problem Research of Database Subschemas
The last section contains the command “check-sat” that Consolidation” (PhD thesis, in Serbian), University of Novi Sad,
Faculty of Economics, Subotica, Serbia, 2003.
starts SMT algorithm over clauses given in the previous
[3] I. Luković, S. Ristić, P. Mogin and J. Pavićević, “Database Schema
file section. Integration Process – A Methodology and Aspects of Its Applying”,
A detailed description of the SMT-LIB syntax may be Novi Sad Journal of Mathematics, vol. 36, no. 1, pp. 115-150, 2006.
found in [15]. [4] I. Luković, “Automated Generation of Relational Database
Subschemas Using the Form Types” (MSc thesis, in Serbian),
University of Belgrade, Faculty of Electrical Engineering, Belgrade,
VI. CONCLUSION Serbia, 1993.
In order to maintain consistency and provide correct [5] J. Pavićević, “Development of a CASE Tool for Automated Design
manipulation of data through information subsystems, the and Integration of Database Schemas” (MSc thesis, in Serbian),
database subschemas have to be consolidated with the University of Montenegro, Faculty of Science, Podgorica,
Montenegro, 2005.
integrated schema. From the aspect of check constraints,
[6] C. Batini, M. Lenzerini, S. B. Navathe, “A Comparative Analysis of
consolidation means that each schema constraint that spans Methodologies for Database Schema Integration”, ACM Computing
subschema data must have a corresponding subschema Surveys (CSUR), vol. 18, no. 4, pp. 323-364, 1986.
constraint which is equally strong or stronger. In this way, a [7] I. Luković, A. Popović, J. Mostić and S. Ristić, “A Tool for
subschema check constraint must imply the corresponding Modeling Form Type Check Constraints and Complex
check constraint of the integrated database schema. We Functionalities of Business Applications”, Computer Science and
implemented an algorithm for testing check constraints Information Systems (ComSIS), vol. 7, no. 2, pp. 359-385, April
consolidation and embedded it into IIS*Case tool. 2010.
[8] N. Obrenović, S. Aleksić, A. Popović and I. Luković,
We further concluded that the check constraint “Transformations of Check Constraint PIM Specifications”,
implication problem represents a SMT problem and Computing and Informatics, vol. 31, no. 5, pp. 1045-1079,
consequently, should be solved by utilizing SMT solvers. December 2012.
[9] E. Mendelson, Introduction to Mathematical Logic, 4th Edition,
We also defined and implemented transformations of Chapman & Hall, London, United Kingdom, 1997.
check constraint PIM specifications into the form and [10] S. A. Cook, “The complexity of theorem-proving procedures”,
language understandable by SMT solvers. Each SMT solver STOC '71 Proceedings of the third annual ACM symposium on
can solve a subset of all possible SMT problems, but none of Theory of computing, pp.151–158, New York, USA, 1971.
[11] F. Marić, “Formalization and Implementation of Modern SAT
Solvers”, Journal of Automated Reasoning, vol. 43, no. 1, pp 81-
;declarations section 119, June 2009.
(declare-fun DepBudget () Real) [12] L. de Moura and N. Bjørner, “Satisfiability Modulo Theories: An
Appetizer”, in Formal Methods: Foundations and Applications, pp.
(declare-fun DepResearchBudget () Real) 23-36, Springer-Verlag, Berlin, Heidelberg, Germany, 2009.
(declare-fun DepConfBudget () Real) [13] C. Barrett, R. Sebastiani, S. A. Seshia and C. Tinelli, “Satisfiability
Modulo Theories” (book chapter), in A. Biere, M. Heule, H. Maare
and T. Walsch, “Handbook of Satisfiability“, IOS Press, USA,
;clauses section February 2009.
(assert (> DepBudget DepResearchBudget)) [14] D. R. Cok, “The SMT-LIB v2 Language and Tools: A Tutorial”,
available online:
(assert (>DepResearchBudget DepConfBudget)) http://www.grammatech.com/resource/smt/SMTLIBTutorial.pdf,
(assert (not(> DepBudget DepConfBudget))) December 2013.
[15] C, Barrett, A. Stump and C. Tinelli: “The SMT-LIB Standard
Version 2.0”, available online:
;command for starting the satisfiability test http://smtlib.cs.uiowa.edu/papers/smt-lib-reference-v2.0-
(check-sat) r12.09.09.pdf, December 2013.
[16] “CVC4 User Manual”, available online:
Figure 4. Algorithm for subschema consolidation with respect to http://cvc4.cs.nyu.edu/wiki/User_Manual, December 2013
check constraints

Page 215 of 478


ICIST 2014 - Vol. 1 Regular papers

EDITOR FOR AGENT-ORIENTED PROGRAMMING LANGUAGE ALAS


Dušan Okanović1, Milan Vidaković1, Željko Vuković1, Dejan Mitrović2, Mirjana Ivanović2
1
Faculty of Technical Sciences, Novi Sad
{oki, minja, zeljkov}@uns.ac.rs
2
Faculty of Sciences, Novi Sad
{dejan, mira}@dmi.uns.ac.rs

Abstract – ALAS language provides constructs which web browser can be used as client, since the interaction
hide the complexity of the agent development process. goes through standard communication protocol - SOAP
Also, it is a tool for writing programming agents [5].
regardless of the programming language in which the While this new design allows for implementation of
underlying MAS is implemented. We show how Eclipse agents that are able to run on different platforms, another
editor for ALAS language can be implemented using Xtext issue is raised: how to implement an agent that can run in
framework in Eclipse. The ALAS grammar, previously truly heterogeneous environment.
written in EBNF notation, is translated to Xtext notation. Agent LAnguage for SOM (ALAS) has been proposed in
Xtext generates required language artifacts and full [3]. Its programming constructs hide the overall
editor infrastructure. complexity of the agent development process. Also, it
Keywords: agents, language editor, Xtext serves as a tool for writing agents that can execute their
task regardless of the programming language underlying
1. INTRODUCTION MAS has been implemented on. ALAS was originally
Software agents are executable software entities. They designed to tackle the issue of agent regeneration [4].
are characterized by autonomous behavior, social The goal of this paper is to show the implementation of
interaction with other agents, reactivity to environmental editor for ALAS language using Xtext framework in
changes, and the ability to take the initiative and express Eclipse. In order to help developers, this editor has to
goal-directed behavior. This definition can be extended provide syntax coloring, content assist, as well as other
to include mental categories, such as beliefs, desires, and features common to modern IDEs. The editor is
intentions (the so-called BDI agents). Agents are usually implemented for Eclipse IDE, which already provides
situated inside an environment - a multiagent system support for various programming languages. Thanks to its
(MAS) - that controls the agent life-cycle and provides modular architecture and wide support, Eclipse IDE can
infrastructure and service subsystem. This subsystem be easily extended and new features, such as editors for
supports agents by allowing them to access resources, new languages, are easily incorporated.
execute complex algorithms, etc. A special kind of agent - The rest of the paper is organized as follows. In the next
a mobile agent - is able to physically leave its current section provides a short overview of the ALAS language.
MAS and continue pursuing its goals in another machine Section 3 shows Xtext - a framework for development of
in a network. programming languages and domain specific languages.
A MAS described in [3], an EXtebsible Java EE-based In section 4 the implementation of ALAS editor is shown.
Agent Framework (XJAF), is designed as a modular An example of how an agent can be written using this
architecture, comprised of a set of managers. Managers editor is shown in section 5. In section 6 conclusions are
are relatively independent modules. Each one is in charge drawn and outlines for future work.
of handling a distinct part of the overall agent- 2. ALAS
management process. Such design has several benefits, ALAS is an agent-oriented language [6]. Agent-oriented
the main being the possibility of extending the languages focus on the concept of software agents, rather
functionality by addition of new managers. In addition, than the concept of objects. Agents have interfaces and
the behavior of managers can be easily changed, because messaging capabilities.
they are accessed through their interface. Hot compilation is one of the main characteristics of the
The problem with early versions of XJAF was the ALAS platform. Let say that we have SOM implemented
coupling with the Java programming language, which was in one programming language. When an agent arrives to
the result of XJAF's Java-based implementation. This this SOM, its ALAS source code is transformed on the fly
meant that only Java-based clients were able to use the into the source code written in SOM's implementation
system and interact with agents. language. The generated source code is then forwarded to
In order to allow clients implemented in other languages, the native compiler, if any, to produce the executable
XJAF has been redesigned as service-oriented code for the target platform. This whole process provides
architecture [4], becoming service-oriented architecture basis for implementation of heterogeneous agent mobility.
based MAS - SOM. The modular designed that uses The entire process is shown in Fig. 1.
managers has been retained. Using this redesign, even a

Page 216 of 478


ICIST 2014 - Vol. 1 Regular papers

Fig. 1. Transformation of ALAS code into executable code

Agent code is parsed, fed into VM selector and associated 1 AgentState = ("state"
with the standard library. Libraries provide common "{" {LocalVar ";"} "}" |
functions, such as network communication, file LocalVar ";");
2 LocalVar = Type Var { "," Var }
management and string processing. In the next step, MAS
3 Var = Identifier [ "=" Expression ]
replaces standard library calls with the native calls, calls
Listing 2. Syntax for defining agent runtime state
in its implementation language. In case of languages like
Python, the resulting code can execute immediately. For
Every property defined within the state block is
Java based MASs, execution starts after the compiler is
considered persistent, in contrast to temporary properties.
invoked. The security of the agent code and the MAS can
Persistent properties will be saved and, when an agent
be implemented using certificates [7], that are checked
moves to another MAS, these values will automatically be
before code is parsed.
restored. Values of temporary agents are not transferred
As shown in Listing 1, agent definition is comprised of
with agent.
agent name and body. An agent body, in turn, is
Agent mobility is supported in ALAS using the syntax
comprised of services, functions and states. Service
shown in Listing 3.
definition starts with the keyword service, followed by
the return type, service name (which is unique), list of 1 MoveStatement = ( "copy" | "move" )
formal parameters, and a body. Services can grouped "(" Expression [ "," Expression
under a single services block. { "," MoveArg } ] ")" ";" ;
2 MoveArg = StringLiteral "=" Expression ;
1 AgentDefinition = "agent" Identifier Listing 3. Syntax for defining instructions for agent
"{" { AgentBodyDef } "}" mobility
2 AgentBodyDef = (LookAhead(3) AgentState |
"service" Function |
"services" "{" {Function} "}" | An agent can be copied or moved to a target MAS. Target
LookAhead(3) Function) MAS is defined using Expression. The second Expression
3 Function = ResultType Identifier is the name of the agent's service on the target MAS. This
ParamList Block service will be automatically invoked when the agent
4 ParamList = "(" Param { "," Param } ")" reaches target MAS.
5 Param = Type Name 3. XTEXT
Listing 1. A part of the syntax defining for defining agent Since most of our development has been performed using
and its services Eclipse, we needed an Eclipse editor for the ALAS
language. The idea was to develop this editor as Eclipse
The syntax of the ALAS code is based on C and Java plug-in. There are several frameworks for defining
programming language, and it supports some of general domain specific languages.
concepts, such as if-than-else and switch statements, and When developing DSL in Eclipse framework, developers
various loops: for, while, do-while. It supports simple usually choose between Xtext and EMFText.
types that match those from Java, including void. String is EMFText is based on EMF and Ecore model. The choice
the only complex type supported, while other complex between these two frameworks comes down to whether
types will be supported in future. the user is familiar with EMF/Ecore or prefers to use the
The syntax for representing agent state is shown in Xtext's syntax that is similar to EBNF [7]. In addition,
Listing 2. EMF/Ecore requires developers to create separate files for
defining context of the language, and for the language

Page 217 of 478


ICIST 2014 - Vol. 1 Regular papers

itself. Because we are more familiar with EBNF, we 5 AgentBodyDef:


chose Xtext. This, however, retains full compatibility with agentState = AgentState
EMF, since Xtext can both import and use Ecore models | 'service' function=Function
and generate Ecore models based on the language | 'services' '{'
grammar. functions+=Function '}'
Since 2008, Xtext has been developed as an Eclipse | function = Function
subproject by Itemis. It provides support for syntax 10 ;
coloring, code completition, outline view, source-code 11
mavigation, code folding, static analysis, refactoring, and 12 Function:
other common features. 13 resultType = JvmTypeReference
name=ValidID params = ParamList
3.1 Syntax definition block=Block
Xtext is installed into Eclipse as a plug-in. First, 14 ;
developer has to create a new Xtext project. Based on this 15
project, the complete language infrastructure will be 16 Block:
created. In our case the name of the project is statements+=Statement
rs.ac.uns.alas (Xtext can use namespaces to organize | {Block}
imports). The file extension ALAS editor will use is '{'(statements+=Statement )* '}'
going to be alas. Additional projects are automatically 19 ;
generated and contain unit tests (.tests project) and 20
Eclipse editor and workbench functionality (.ui project) 21 Statement:
and the project containing configuration files pointing to 22 IfStatement
the Eclipse plug-ins (.sdk project). These projects are | MoveStatement
shown in fig. 2. | ReturnStatement
| VarStatement
| FunctionCallStatement
| LocalVarStatement
23 ;
24
25 FunctionCallStatement:
name=ValidID
params=ConcreteParamList ';'
26 ;
Listing 4. A part of ALAS language grammar written in
Xtext

Grammar starts with the definition of agent (line 1). The


name of the agent (line 2) is defined using ValidID.
ValidID is predefined Xtext type, that can be found in the
super-grammar - the Xtype.xtext file (distributed with the
Xtext plug-in). It is defined as an array of characters that
can start with any letter.
The next step is the definition of agent's body (line 5). It
can contain unspecified number of state definitions,
services, service blocks and/or functions (lines 2 through
10). The definition of function (line 12) contains
Figure 2. Initial contents of projects JvmTypeReference (line 13). JvmTypeReference defines
the syntax for full Java-like type names. This includes
The main file is Alas.xtext. In this file, the ALAS simple names, fully qualified names, fully-fledged
language grammar is defined. A grammar is defined as a generics, wildcards, lower bounds and upper bounds.
set of rules. Each rule has a name, followed by colon, Within the function, there is a block (line16) which
after which comes rule definition. Rules end with contains statements. There are various types of statements
semicolon. (line 22). An example of function call statement is shown
Contents of the ALAS grammar file are shown in Listing here (line 25).
4. As previously stated, ALAS provides constructs for usual
language constructs, such as if-then-else. Listing 5 shows
1 AgentDefinition: rule for this construct.
2 'agent' name=ValidID '{' 1 IfStatement:
(agentDefs+=AgentBodyDef)* '}' 2 'if' condition=XExpression
3; then=Block
4 (=>'else' else=Block)?;
Listing 5. if-then-else rule for ALAS

Page 218 of 478


ICIST 2014 - Vol. 1 Regular papers

Other statements (return-, var-, and local-var-) are


XExpression rule is defined Xbase.xtext. Xbase is a defined similarly, and are omitted due to space
grammar library containing definition of general language constraints. ALAS also supports package declaration,
constructs, which enable integration of Java-like which is omitted for the same reason.
expressions in a language
Move statement is defined using rules shown in Listing 6. 3.2 Generating Language Artifacts
GenerateAlas.mwe2 is a file that holds information on
1 MoveStatement: how the language and additional artifacts are generated. It
2 ('copy' | 'move') uses MWE2 language. After it is started, language
'(' exps+=XExpression infrastructure is generated.
(',' exps+=XExpression Running the GenerateAlas.mwe2 (Run As -> MWE2
(','movArgs+=MoveArg)* )? workflow) triggers language generator. In this step, parser,
')'';' serializer and some additional infrastructure code.
3; After this step, in order to test the editor's IDE integration,
4 we can run new Eclipse instance using preconfigured
5 MoveArg: shortcut in Eclipse's Run -> Run Configurations.
6 string=STRING '=' 4. WRITTING AGENTS USING NEW EDITOR
exp=XExpression In this new Eclipse instance, the plug-in for ALAS
7; language is included. We can create new Project, and in
Listing 6. Rule for move and copy syntax this project we create the new file with the .alas
extension. Eclipse will recognize the extension and we
Copy and move statements consist of copy and move key can test our language functionality. The editor is shown in
words, one or more expressions and move argument. fig. 3.

Figure 3. New .alas file being edited with ALAS editor showing syntax coloring and using code suggestion

We can see that editor performs syntax coloring, code Another feature is the outline view of the file provided by
suggestion and code completion. Because we used editor. It is shown in fig. 5. This automatically generated
JvmTypeReference from Xbase, byte and String types are outline can be further customized by adding icons and
the same ones as in Java, as seen in fig. 4. labels that are a better representative of the ALAS
language concepts.

Figure 4. String in ALAS editor corresponds to


java.lang.String class

Page 219 of 478


ICIST 2014 - Vol. 1 Regular papers

Wooldridge, M., Jennings, N. (eds.) Intelligent


Agents, Lecture Notes in Computer Science, vol.
890, pp. 1–39. Springer Berlin / Heidelberg (1995)
[2] Wooldridge, M., Jennings, N.: Intelligent agents:
Theory and practice. Knowledge Engineering
Review 10, 115–152 (1995)
[3] Mitrović, D., Ivanović, M., Budimac, Z., Vidaković,
M.: Supporting heterogeneous agent mobility with
ALAS. Computer Science and Information Systems,
Vol. 9, No. 3, 1203-1230. (2012)
[4] Mitrović, D., Ivanović, M., Vidaković, M.:
Figure 5. Outline view for ALAS editor Introducing ALAS: a novel agent-oriented
programming language. In: Simos, T.E. (ed.)
5. CONCLUSION Proceedings of Symposium on Computer
This paper presented the use of the Xtext framework to Languages, Implementations, and Tools (SCLIT
define grammar for ALAS language. We show how the 2011) held within International Conference on
EBNF notation for ALAS, provided in our previous work, Numerical Analysis and Applied Mathematics
is translated into notation the Xtext uses. (ICNAAM 2011). pp. 861–864. AIP Conf. Proc.
After the grammar is defined, Xtext was used to generate 1389 (September 2011), iSBN 978-0-7354-0956-9
language artifacts and implement language editor [5] World Wide Web Consortium (W3C) SOAP version
infrastructure and functions. 1.2. http://www.w3.org/TR/soap/, retrieved on
The editor was then successfully tested as a plug-in for December 7, 2011
Eclipse environment. We have successfully written [6] Shoham, Y. Agent-Oriented Programming.
PingAgent using this editor. Technical Report STAN-CS-90-1335. Stanford
Further work will focus on defining rules for generation University, Computer Science Department (1990).
of native code for MASs implemented in Java, Python [7] Christopher Guntli. Create a DSL in Eclipse.
and .NET. For Java, we can use the Technical report. HSR - University of Applied
AlasJvmModelInferer.xtend file, which defines mapping Science, Rapperswil. 2010.
of ALAS concepts to Java concepts, but for other
languages we will have to define templates for each ACKNOWLEDGMENTS
ALAS construct . Results presented in this paper are part of the research
Custom validation rules also need to be specified. conducted within the Grant No. III-44010, Ministry of
Science and Technological Development of the Republic
6. REFERENCES of Serbia.
[1] Wooldridge, M., Jennings, N.: Agent theories,
architectures, and languages: A survey. In:

Page 220 of 478


ICIST 2014 - Vol. 1 Regular papers

Grader: An LTI app for automatic, secure,


program validation using the Docker sandbox
Petrović Gajo, Nikolić Aleksandar, Segedinac Milan, Kovačević Aleksandar, Konjović Zora
University of Novi Sad, Faculty of Technical Sciences, Novi Sad, Serbia
{gajop, anikolic, milansegedinac, kocha78, ftn_zora}@uns.ac.rs

Abstract— In this paper we present a software framework repository1 . The system modules are presented on Figure
for automatic validation of code submitted for 1, also showing the control flow within a system, and each
programming assignments. The framework consists of 1) a module is described in detail in the corresponding section
website interface which can be used by teachers to add new of the paper.
assignments and by students to submit solutions, 2) a REST
API interface used to submit solutions programmatically, 3)
test environment which invokes assignment-specific tests for
the supported programming languages, 4) a sandbox
environment, created using the Docker containers, which
allows for secure execution of unsafe code and 5) LMS
integration by implementing the LTI specification. We have
added initial support for writing simple tests in Matlab,
Python and Java, as well as the ability to extend it via
plugins to other programming languages.
Figure 1. System modules

I. INTRODUCTION
This paper is organized as follows: section 1 is the
In recent years we are witnessing an explosion of e- introduction, section 2 presents the website GUI and API
learning systems and online courses. More and more interfaces, section 3 describes the validation module,
students are taking online courses, which can be seen from section 4 introduces the Docker sandboxing environment,
the rise of sites offering vast varieties of courses, section 5 describes the LTI specification and LMS
(examples include Coursera, Udacity, Edx and similar [1- integration and lastly, section 6 is the conclusion in which
3]). The term MOOC [4] (Massive Open Online Course) we summarize what was done.
has also been coined recently to denote the ever increasing
presence of online courses that have large amount of II. WEBSITE INTERFACE
enrolled students. In these courses, with student numbers
sometimes reaching tens and hundreds of thousands, it's The course instructor submits tasks on the web site by
unfeasible to manually inspect submitted work and uploading files and selecting their type, possible options
automatic grading is therefore necessary. For online include:
programming courses it would therefore be useful to have  Implementation - file containing a correct solution,
a way to create and publish code assignments which can supplied by the instructor , optional
be automatically validated. Traditional university courses  Test - a file containing a collection of tests used
also often have an online component, usually in the form
of a site where students can obtain course information, when validating code correctness, at least one is
lecture notes, assignments and similar. As the components required
of these sites tend to be rather similar even in different  Unspecified - any additional file not belonging to
areas of science, many universities have adopted an LMS the first two categories (e.g. PDF files that give
(Learning Management System) [5] designed to provide detailed assignment instructions, or input files
common functionalities. As the usage of these LMSs has needed to be read by the program).
grown, so has the demand for additional features, and in Once submitted, the task will become available to
recent years we have seen a push towards the creation of students who can then attempt to solve them by submitting
LTI apps [6] (Learning tools interoperability applications, their solutions to the site. This is done either by using the
i.e. content providers) that can be added to specific web interface or programmatically, using the REST API.
courses (content consumers). In order to reduce teacher To use the REST API, the client must do the following: 1)
workload when grading exams, as well as to create better create a zip archive from all solution files, 2) create a
learning material, we propose a framework for automatic base64 encoded string from the created archive and 3) do
validation. Our framework consists of an extendable a POST request with the new string as one of the
validation module that tests submitted solutions, a parameters. The server will then decode the string, unzip
Docker-based sandboxing framework and a web the files and then proceed normally as if the files were
application that implements the LTI interface, allowing it submitted using the web site interface.
to be integrated within an LMS. The web application has
been created using the Django framework [7]. The
implemented system is available on our BitBucket
1
Repository of the implemented system:
https://bitbucket.org/gajop/automatic-grading-ftn.

Page 221 of 478


ICIST 2014 - Vol. 1 Regular papers

The submitted code is copied to a testing folder and add extra constraints that increase the security and
then, using a new Docker container, validation is stability of untrusted code execution. In this case, we
performed for each test, after which test results are limited maximum RAM usage by supplying the
returned and then displayed to the user. option ?m=MAX_RAM (in megabytes) on each execution.
This prevents the tested program to exhaust the system
III. VALIDATION MODULE memory and therefore crash various other OS programs
The validation module provides an extendable platform when they try to allocate memory for themselves.
for executing various user-defined tests. These tests, often Docker containers are created by making a
created by teachers along with the correct solution, are configuration file which is done by taking a base image
used to verify program correctness. We have implemented (such as stock Ubuntu), and then listing commands that
support for the creation of simple tests in Matlab, Python initialize it to a desired state. This usually involves making
and Java programming languages. Validation is performed some system-wide settings such as obtaining software
by invoking functions from the code being tested and packages or changing configuration files, as well as
verifying if the results match an expected outcome. copying some custom files one would need.
However, the key feature in the validation module is In our case, we created two Docker images. The first
that it can be easily extended with new test types, by image, grading-base, starts from the official Ubuntu base,
writing modules that comply with a simple interface. updates the system packages and installs a few additional
These modules (from hereon called plugins) are added ones (python-pip and octave). Detailed information on
to the application by modifying the main Django how to setup a configuration script is available2, and we
configuration file. They can be added by assigning a list of have also released our grading-base script at the GitHub
viable testers for each programming language, so new repository 3 . The second image, grading, uses the
plugins could just be appended to the list. Support for new previously created grading-base as its foundation and adds
programming languages can also be added, which implies a few additional files used to run tests (most notable being
that test writing for arbitrary programming language could the validation plugins) 4 . The reason we opted to create
be supported by creating plugins. two containers instead of one was because this way we
have one image (grading-base) that takes a while to build,
As mentioned before, plugins are required to implement but will remain mostly unchanged through time. This
a certain interface. In this case, this includes creating a allows us to add and modify plugins fast, as the other
function which takes submitted code, tests and additional container can be rebuilt quickly, having to only copy a
data (usually correct code if any) as input and provides few of the plugin files, a process far faster than doing
test results as output. Test results include a Boolean value system-wide updates.
denoting each test’s successfulness, along with an optional
message, and another Boolean value denoting the An alternative to Docker containers (more specifically
successfulness of the entire test suite (this will most often LXC) would be to use full blown VMs such as VirtualBox
be true only if all tests are successful, and otherwise false). or VMware that partially emulate the hardware to increase
the level of virtualization. While these alternatives tend to
As Docker provides a way to sandbox code execution, be more secure than Docker as it’s usually harder to break
the plugins don’t need to be written by taking malicious through the hypervisor that’s used by VMs, they tend to
code into consideration. That said, they still need to use have poor performances [10], seeing how it can take
whatever constructs of the language are available to check minutes for VMs to start cleanly and often 100s of MBs of
for certain types of erroneous code such as the case of: RAM to start a new OS instance. In comparison to VMs,
infinite loops, exception throwing, incorrect function we have found that Docker container startup measures in a
interfaces (e.g. wrong type or number of function few seconds, which makes it fast enough.
arguments), and so on. This requires the plugins to be
non-trivial, but the hope is that they will usually be written Multipliers can be especially confusing. Write
once per programming language, and that the most users “Magnetization (kA/m)” or "Magnetization (103 A/m).”
will be writing simple tests for specific tasks. Figure labels should be legible, about 10-point type.

IV. DOCKER V. LTI – LEARNING TOOLS INTEROPERABILITY


Docker [8] is an open source project created by Docker LTI apps are educational tools which implement the
Inc. (formerly dotCloud) capable of running programs LTI specification as described by IMS Global. These tools
inside LXC (Linux Containers) [9]. LXC allow users to provide integration with a tool consumer, usually an LMS.
execute code in an environment isolated from the host Creating separate LTI apps instead of bundling their
operating system, and Docker wraps around LXC and functionality within an LMS allows tool reuse in multiple
provides a higher level API that allows easier creation and LMS systems (e.g. Canvas, Moodle, Blackboard [12-14],
management of such containers. etc.) without any modifications. This also allows for the
creation of LMS systems with just the core functionality
This method of isolating executed code from the host that is a necessity for most users, while the specific
operating system, also called sandboxing, is often used requirements of a learning institution (e.g. high school or
1) to ease portability ? as software only needs to work university) or course are met by adding custom LTI apps.
within a predefined container, and that container can then
be distributed, and 2) to safely run code ? by separating
the host operating system from the container, the 2
Docker configuration script instructions:
programs within the container are unable to easily damage http://docs.docker.io/en/latest/use/builder/
the system (they would need to break containment, which 3
Grading-base configuration script https://github.com/gajop/grading-
while sometimes possible with kernel exploits, usually base
isn’t trivial). In the case of Docker, it’s also possible to 4
Grading configuration scirpt: http://bit.ly/MhOAok

Page 222 of 478


ICIST 2014 - Vol. 1 Regular papers

LTI apps are web applications running on arbitrary VI. CONCLUSION


servers that implement a simple interface by handling a In this paper we presented a framework for creating and
certain POST request, named app launch [15]. In app executing automatic tests that validate user submitted
launch, which happens automatically when the user first programs. Programs can be submitted through the website
tries to use the tool, the LMS sends the POST request either by uploading files manually using the GUI, or
comprised of useful information, largely to identify the programmatically using a REST API.
user, course and usage context, for which the most
important parameters are: The system is capable of executing untrusted code
securely by sandboxing it within a Docker container.
 user_id: Unique id of the user. Along with the added security, this also frees resources
 roles: Roles that the user has in this context (most that would otherwise need to go into considering the
common ones being Learner and Instructor) security aspects when building new validation plugins.
 lis_person_name_full: Full name of the user. This We have also implemented the LTI specification and
won’t be sent if the app is configured to launch thus our web application can be used as an LTI app, and
users in anonymous mode. can be embedded in various LMS. This can speed up the
 context_id: Context (course) unique id. deployment time greatly, and it gives us an ability to
create tools that focus on doing one job really well,
 context_title: Name of the context (course). without the need to add common functionality that can be
After the initial request, the tool is started and will be found in an LMS (e.g. no need to create interfaces for
rendered within an LMS. The rendering will usually be authorization, account creation and similar).
done in an iframe as displayed in Figure 2. If implemented As future work we would like to implement the ability
correctly, by rendering it within an iframe it can seem as if to store and display code analysis other than correctness
the LTI app is a part of a single LMS, which users tend to validation, such as code complexity analysis or plagiarism
prefer. If applicable, the tool may provide configuration detection.
files for a specific LMS which defines how the tool can be
accessed within the LMS, usually by putting links in the ACKNOWLEDGMENT
LMS to certain parts of the tool to better integrate each
functionality. Example Canvas configuration files are also Research presented in this paper is partly funded by the
provided5. Ministry of Education, Science and Technological
Development of the Republic of Serbia, Grant No. III
47003.
REFERENCES
[1] “Coursera online course platform” [Online], Available:
https://www.coursera.org/, [Accessed 30 1 2014].
[2] “Udacity online course platform” [Online],
https://www.udacity.com/ [Accessed 30 1 2014]
[3] “Edx online course platform” [Online], https://www.edx.org/
[Accessed 30 1 2014]
[4] S. Kolukuluri, “Massive Open Online Courses, Enhancement to
edX-platform”, master thesis, Indian Institute of Technology,
Bombay Mumbai, 10.2013
[5] W. R. Watson, S. L. Watson, “An Argument for Clarity: What are
Learning Management Systems, What are They Not, and What
Figure 2. LTI integration within the Canvas LMS (red outline denotes the Should They Become? “, pp. 28-34, TechTrends, 2007
LTI components.) [6] S. Booth, S. Peacock, S. P. Vickers, “Plug and play learning
application integration using IMS Learning Tools Interoperability”,
As the LTI app receives parameters defining the active ascilite, pp. 143-147, 2011
user and their permissions in the request, there needs to be [7] “Django framework” [Online], https://djangoproject.com,
a way that the authenticity of the request sender (LMS) [Accessed 30 1 2014]
can be verified. To do this, the LTI specification defines [8] “Docker container manager” [Online], http://www.docker.io/
the use of the OAuth protocol, which requires a key and a [Accessed 30 1 2014]
shared secret [16]. The key is transmitted with each [9] “LXC – Linux Containers” [Online], http://linuxcontainers.org/
[Accessed 30 1 2014]
message along with an OAuth signature generated based
on the key. The consumer key is used by both parties to [10] M. G. Xavier, M. V. Neves, F. D. Rossi, T. C. Ferreto, T. Lange,
C. A. F. De Rose, “Performance Evaluation of Container-based
identify who they are talking to, while the secret is used to Virtualization for High Performance Computing Environments“,
digitally sign messages in both directions. Both the tool Parallel, Distributed and Network-Based Processing, pp. 233-240,
and the consumer need to agree on a key and secret, and 2013
after those are set, a higher level of security will be [11] “IMS Global Learning Consortium” [Online],
achieved. http://www.imsglobal.org/ [Accessed 30 1 2014]
[12] “Canvas LMS by Instructure”, [Online]
http://www.instructure.com/ [Accessed 30 1 2014]
[13] “Moodle LMS” [Online] https://moodle.org/ [Accessed 30 1 2014]
[14] “Blackboard LMS” [Online] http://www.blackboard.com/
[Accessed 30 1 2014]
5 [15] “Writing LTI Stuff” [Online] https://lti-
Canvas configuration file can be found at the repository URL:
examples.heroku.com/code.html [Accessed 30 1 2014]
http://bit.ly/LqlNOq

Page 223 of 478


ICIST 2014 - Vol. 1 Regular papers

[16] “IMS GLC Learning Tools Interoperability Basic LTI [Accessed 30 1 2014]
Implementation Guide” http://bit.ly/1fgmOmn [Online]

Page 224 of 478


ICIST 2014 - Vol. 1 Regular papers

Tulipko interactive software for visualization of Monte


Carlo simulation results
Tara Petrić*, Predrag Rakić *, Petar Mali**, Lazar Stričević*, Slobodan Radošević **
* Faculty of Technical Sciences, University of Novi Sad, Serbia
**
Department of Physics, Faculty of Science, University of Novi Sad, Serbia
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

Abstract - Observing physical characteristics of the lattice To make research easier with this simulation method,
Markov Chain Monte Carlo simulation method is difficult specialized visualization software -Tulipko was developed
without proper graphical visualization. Tulipko is an
using the Python [4] programming language and its
interactive visualization and graph plotting software
application developed for this very purpose - to convert libraries [5,6,7,8].
simulation numerical data into 2D graphical visualization Although Tulipko was originally developed for observing
images. It is written in Python in a modular fashion, which the characteristics of the Classical Heisenberg Model
makes it easily extensible and being able to support [2,9], it can also be used for other classical models such as
numerous output graphical file formats. Tulipko is
distributed as open-source software. Ising [10] and Potts [11].

Keywords: Monte Carlo simulations, Python, visualization Tulipko does various calculations of physical quantities
tool, statistical analysis, physics software needed for understanding the nature of the physical
system being simulated. To do this, the software uses lazy
initialization, and does the calculation only when the user
I. INTRODUCTION needs to see the results. This gives the user control over
which data will be included in the calculation, and later
Markov Chain Monte Carlo(MCMC) simulations of spin possibilities to easily change their choice.
lattice models, is a way to find minima of energy in a Tulipko works in a non destructive manner (without
certain temperature region for the purpose of calculating changing any of the simulation data) and independently
physical properties such as magnetization, heat capacity of the method used to produce the results, provided that
and susceptibility, which is important in studies of the output text file is in the right format. The application
magnetic materials [1]. The output of the simulation are expects each simulation sample in the text file to be
numerical results, usually arranged in plain textual files. separated by a newline character. Each sample needs to
One example of MCMC simulation method is described be composed out of two or more columns, divided by at
in [2]. least one whitespace. The content of the first columns
The visualization of the obtained numerical results is depends on the used method, the second one represents
necessary for picking the relevant ones. Those results will the energy of the simulated system, and the others - spin
then be used when calculating the quantities of interest for components.
observing the characteristics of the simulation. Later,
those calculated values can be visualized as well, in, for An important feature of graphs produced by Tulipko is
example, the function of temperature. that their appearance is completely customizable. This
includes everything from line width, line color, line style,
Without having the specialized software, MCMC lattice marker style and marker fill, to axis label size, and axis
simulation results had to be processed by many complex, tick size. Also it has full support for LaTeX style
unintuitive and non-interactive scripts. The process had to formatting [12] within axes labels and titles, thus
be repeated every time additional results were produced. providing visual uniformity when used in documents
This would be unnecessarily timely and very limited in written in LaTeX. Each graph in Tulipko is provided with
lacking any flexibility later on. The results would then be the standard Matplotlib toolbar [12] that allows the user to
plotted by a plotting tool such as Gnuplot [3]. Since some alter the width and height of the axes, to zoom in certain
simulations produce a lot of data, which is very hard to parts of the graph by lasso selecting the area, and to save
manipulate in this way, additional tools are needed. the graph in one of the library supported graphical file
formats [12].

Page 225 of 478


ICIST 2014 - Vol. 1 Regular papers

parameters such as the Lattice Size, Temperature, number


Tulipko has three main functions that are visually divided of Lattice Sweeps and the number of samples taken into
in two tabs - one for over-viewing the properties of the account for each plot. However, while Lattice Size,
simulation program itself, and determining optimal results Temperature and number of Lattice Sweeps are all
- ThermPanel , and another one for plotting other determined at the time of running the simulation, the
quantities of interest – Aggregate panel. number of samples used can be changed to an extent
inside Tulipko for plotting purposes. Namely, the user can
specify the number of samples he wants to plot the graph
I . THERMPANEL for, with the restriction that it be less than or equal to the
number of samples available for the chosen parameters.

As shown on the plots in Figure 1, Tulipko has the ability


to plot results from different simulations on the same axis.
This can also be useful for any experimental parameter or
change added to the simulation code, since it can be
compared to the already tested results.

This panel contains the standard Matplotlib toolbar (Fig.


1, 1), for adjusting the axes, and two sliders controlling
the size of axis ticks and labels (Fig. 1, 2).
Figure 1: ThermPanel (1 – Matplotlib tool-bar, Another convenience is the legend that can be adjusted to
2-Appearance customization widgets, 3-Selection
show all or any of the parameters of the data used for
widgets, 4-Plotting buttons, 5-Sample selection
plotting the graph. Users can choose which parameters are
widgets)
shown in the legend, by checking or unchecking the
check-box widgets (Fig. 1, 2).
First tab – ThermPanel (see Figure 1) is especially The selection widgets (Fig. 1, 3) allow the user to choose
important for observing the characteristics of the respectively, the Simulation Folder, the Lattice Size, the
simulations, and later treatment of the results. It is Temperature, the desired number of samples and the
important to optimize the simulation by reducing, where physical quantity he wants to plot.
possible, the number of iterations (Lattice Sweeps) the If the user wants to draw another plot on the same axis
algorithm goes through. The user can use Tulipko to after drawing the first one, he only needs to select the
determine, by observing the precision and accuracy of the parameters for the other one, and press the 'Draw' button
simulation results in plots in Figure 1, when the system (Fig. 1, 4). Otherwise he should first clear the axes by
has reached thermal equilibrium. Furthermore, since the pressing the 'Clear' button.
optimal simulation parameters strongly depend on the
Temperature (T) and Lattice Size (L), an automatized By entering the desired number in a text-box widget (Fig.
algorithm for choosing the optimal parameters is not 1, 5) the user can choose for which number of samples he
straightforward and has not been devised yet. That is why wants to draw plots, that is, with which precision. After
it's important to allow the user to do so himself, according entering the number and pressing 'Generate', it will appear
to the observation made in ThermPanel. among the choices in the combo-box widgets(Fig. 1, 3).
This is useful for evaluating the precision on the second
Tulipko has a reload option available through the Reload axis.
menu item so the user doesn't have to exit the application
when the additional results are ready for plotting. After he
reloads it, he can choose the new results, and draw them.

First axes shows how physical quantity such as


magnetization changes with respect to the number of
Lattice Sweeps performed, and the second one how the
number of samples chosen for plotting affect the precision
of the results.
Figure 2: ThermPanel plotting area.
Multiple-choice widgets such as combo boxes (Figure 1,
3) are used, exposing only the relevant information to the
user. This makes it straight-forward to manipulate any

Page 226 of 478


ICIST 2014 - Vol. 1 Regular papers

For example, the plots in Figure 2 show the hand is the possibility to observe and compare, on the
magnetization properties for three different simple cubic same graph, beside the results from different Lattice Sizes
lattices with L3 sites. By observing on the left graph when - results from completely different simulations.
the results have saturated, it becomes easy to determine The horizontal axis represents the Temperature, and
how many number of Lattice Sweeps are needed for any different thermodynamical properties of the system can be
Lattice Size. The results that have reached saturation are plotted against it.
considered representative.
When wanting to quickly check simulation results for a
certain Lattice Size, the 'Best' button (Fig. 3, 5) is useful,
II. AGGREGATE because it allows automatic choosing of simulation results
with the most samples and Lattice Sweeps, to be then
presented as points on the plot. However, more interesting
is the option to choose points by hand, after determining
the optimal ones.

Another useful feature is the display of exact coordinates


of the cursor within the plot (Fig. 3, 7), which makes it
easier to read info from the graph.

Figure 3: Aggregate panel (1-Matplotlib tool-bar,


2-Appearance customization widgets, 3-Selection
widgets, 4- Plotting buttons, 5-Automatic point
selection button, 6- The plot legend, 7-Coordinates of
the cursor, 8- Point choosing widget )

The second tab – Aggregate panel (See Figure 3) allows


the user to, having observed the optimal parameters of
simulation results in the previous tab, make a selection
Figure 4: Aggregate panel color-picker
determining which of them will be included in the
calculation of quantities of interest. This level of
In addition to the other customization options, Aggregate
interactivity is essential for the new and experimental panel also has a color-picker widget that enables the user
approach to Monte Carlo simulations. to hand-pick the color for the most recently plotted line
(Figure 4).
Within the Aggregate panel choosing points is easy and
intuitive using the ListControl widget (Fig. 3, 8).
Everything except the information relevant to the III. SOFTWARE OVERVIEW
simulations themselves is again abstracted. Within a few
mouse clicks any point can be selected, and the Python was originally used because it offers the integrated
application leads the user through the steps by timely environment both for plotting and for easy data analysis.
revealing the subsequent choice possibilities. Sometimes, What makes Python so good for data analysis and
after choosing a point and drawing it, the user might want visualization are third party Python libraries such as
to exclude it. He does that by right clicking on the chosen Matplotlib [6], Numpy [5] and Pandas [8] that make it
temperature, or he can just straight away choose a comparable to professional tools such as R [13] and
different point. Matlab [14]. But unlike Matlab, within Python it is
possible to manipulate with the data in an object oriented
Another useful feature is the possibility to annotate points fashion, giving the developer more control.
in the plot. The annotations show the number of iterations
(Lattice Sweeps-THERM) and number of samples (SP) of Pandas was very useful, since it allows manipulation of
a particular simulation result the user chose to include in large amount of data in a tabular and semantically clear
the plot. This is done by using the check-box 'Annotate' fashion. Clarity was important to help avoid mistakes in
(Fig. 3, 2). It is also possible to hide unwanted annotations calculations, and to allow the scientist with little
by left-clicking on them. What is also very useful for programming knowledge, to check the validity of the
observing the quantities of interest for the simulation at

Page 227 of 478


ICIST 2014 - Vol. 1 Regular papers

formulas. Pandas allows easy matrix, both row and REFERENCES


column oriented, calculations, thus allowing
generalization, such as the arbitrary number of columns in [1] Landau, David P., and Kurt Binder. A guide to Monte Carlo
simulations in statistical physics. Cambridge university press,
simulation result files.
2009.
[2] Rakic, Predrag S., et al. "Multipath Metropolis Simulation of
Python, with its third party libraries, integrates scientific Classical Heisenberg Model." arXiv preprint arXiv:1305.6758
(2013).
and non-scientific code, so the developer doesn't have to
[3] Janert, Philipp K. Gnuplot in action: understanding data with
switch between tools or programming languages to work graphs. Manning Publications Co., 2009.
with both. [4] Van Rossum, Guido. "Python Programming Language." USENIX
Annual Technical Conference. 2007.
[5] Oliphant, Travis E. A Guide to NumPy. Vol. 1. USA: Trelgol
To build a scientific application like this, several styles Publishing, 2006.
and many design patterns of programming are needed. [6] Hunter, John D. "Matplotlib: A 2D graphics environment."
Python has both the procedural and object oriented design Computing in Science & Engineering (2007): 90-95.
[7] Rappin, Noel, and Robin Dunn. wxPython in Action. Manning,
patterns ingrained in the language itself. This combined 2006.
with the before mentioned packages, makes Python very [8] McKinney, Wes. "pandas: a Foundational Python Library for Data
flexible for making graphical user environments with Analysis and Statistics." (2011)
strong scientific underground. [9] Joyce, G. S. "Classical heisenberg model." Physical Review 155.2
(1967): 478.
[10] Wu F. Y. , The Potts model, Rev. Mod. Phys. 54 (1982) 235–268
Tulipko was developed following the Model View [11] Newell G. F. , Montroll E. W. , On the Theory of the Ising Model
Controller – GUI (Graphical User Interface) design of Ferromagnetism, Rev. Mod. Phys. 25 (1953) 353–389.
pattern. This is an extended version of the MVC [15] [12] Overview – Matplotlib 1.3.1 Documentation .
http://matplotlib.org/contents.html (accessed January 15, 2014 )
design pattern. This pattern requires that the GUI part be [13] Team, RDevelopment Core. "R: A language and environment for
separate from the MVC part, and that it contains almost statistical computing." (2005): 3-900051.
[14] Grant, Michael, Stephen Boyd, and Yinyu Ye. "CVX: Matlab
nothing but the code specific to the library used for software for disciplined convex programming." (2008).
developing the user interface - WxPython [7]. It is [15] Krasner, Glenn E., and Stephen T. Pope. "A description of the
model-view-controller user interface paradigm in the smalltalk-80
bare-bones link from the user to the rest of the program, system." Journal of object oriented programming 1.3 (1988):
providing all the display necessities, while containing the 26-49.
[16] The GNU General Public License v3.0
minimum of the logic. http://www.gnu.org/licenses/gpl.html (accessed January 15, 2014)
[17] Tulipko https://bitbucket.org/iTrustedYOu/tulipko (accessed
January 15, 2014)
IV. CONCLUSION

Tulipko is intended to be of use to scientists doing


Markov Chain Monte Carlo simulations.
It’s modular software architecture makes it easily
extensible and it’s functionalities are arranged visually in
tabs, which allows adding completely new parts and
functionalities. Even those who need only a part of it's
functionality can make use of it.

This software is distributed as open source under the GNU


General Public License [16]. The executable binaries for
the current stable version and the source code of the
development version can be downloaded from [17], where
users can also post detected issues and requests for new
features and improvements.

ACKNOWLEDGMENT
This work was supported by the Serbian Ministry of
Education and Science under Contract No. OI-171009.

Page 228 of 478


ICIST 2014 - Vol. 1 Regular papers

PERFORMANCE EVALUATION OF THE ARPEGGIO PARSER

Igor Dejanović, Gordana Milosavljević


Faculty of Technical Sciences, University of Novi Sad

Abstract - PEGs (Parsing Expression Grammar) is a type set of rules used to recognize strings in the given
of formal grammar which is gaining significant traction language. At the same time PEG is a declarative
in the industry and academia. This paper presents a description of the top-down recursive-descent parser. In
performance evaluation of our Arpeggio parser - PEG contrast to context-free grammars (CFG) which are
interpreter in Python programming language. The aim of generative in nature, PEGs are oriented toward string
the Arpeggio parser is to be used in our DSL framework recognition. One of the often cited property of PEG is its
for both compiling (transformation) and code editing unambiguity, i.e. if the input string parses there is exactly
support (e.g. syntax checking and highlighting, code one valid parse tree. This property is a direct consequence
completion/navigation, tree outline, code visualization). of using prioritized choice operator which instructs the
The evaluation is performed by measuring parsing time parser to match alternatives in the strict order – from left
with different grammars and inputs and comparing to right.
parsing times with two other popular python PEG parser
The other distinctive feature of PEGs are syntax
interpreters (pyparsing and pyPEG).
predicates: not (!) and and (&). Syntax predicates
describe rules that must not match (not) or must match
1. INTRODUCTION
(and) at the current position in the string without actually
consuming any input.
The Arpeggio parser [1,2] is a recursive descent parser
with full backtracking and memoization based on PEG Since PEG is a declarative description of the top-down
(Parsing Expression Grammar) grammars. This kind of recursive-descent parser, it is straightforward to make an
parsers is called packrat parser. Arpeggio’s purpose is to interpreter of PEG grammars. Arpeggio, as well as
be used in our DSL framework[13,14]. Although we are pyparsing and pyPEG, are PEG interpreters. This means
mainly concerned with the functionality of the Arpeggio that no code generation takes place. PEGs can be regarded
parser, its performance is of no less importance to us. We as “programs” for parsing strings in the defined
strive to make it reasonably fast to be used during editing languages. PEG interpreters run this programs. Of course,
session for various feedback and features such as syntax this programs can be transformed to other programming
highlighting, code outline, code completion and languages (e.g. Python, Java, Ruby) using parser
navigation. For those purposes parser and semantic generators.
analysis should be executed as background process during
editing session. For a good user experience this PEG parsers use unlimited lookahead. In some cases this
background tasks should introduce as little overhead as could yield exponential parsing time since parser needs to
backtrack on failed matches. To remedy this, Bryan Ford
possible.
has introduced recursive-descent parser with backtracking
In this paper we investigate the performance of the that guarantee linear parse time through the use of
Arpeggio parser by performing various run-time memoization[6]. Memoization is a technique where the
benchmarks and comparison with some popular Python result of each match at each position is remembered and
parsers (pyparsing[3] and pyPEG[4]). Tests has been will be used on subsequent matching of the same rule on
performed using various language grammars and input the same position in the input string. Using this technique,
strings. parse time will be linear at the expense of more memory
consumption. PEG parser that employs memoization is
The purpose of this work is to assess the current state of
called packrat parser.
the Arpeggio parser in terms of performance and to guide
its further development. We also make some notes on the
3. ARPEGGIO
lessons learned along the way regarding PEG parsing
performance.
Arpeggio is a packrat parser written in Python
The paper is structured as follows: In the Section 2 we programming language. Its purpose is to be used in our
give an overview of PEG parsers in general; Sections 3, 4 DSL framework. An early design decision was to use
and 5 give a descriptions of Arpeggio, pyparsing and interpreting approach instead of code generation. We
pyPEG respectively; Section 6 describes test setup, strongly believe that interpreter would give us more
grammars and input files; Section 7 presents results; In flexibility and faster round-tripping from language
Section 8 related work has been presented. In the Section description to functional editor in comparison to parser
9 we conclude the paper. code generation. Furthermore, we argue that for
recursive-descent top-down parser written in Python there
2. PEG would be no significant difference in terms of
performance between the interpreter and the generated
Parsing Expression Grammar (PEG) is a type of formal parser. The only difference would be parser startup time.
grammar introduced by Bryan Ford [5] which specifies a In case of the Arpeggio interpreter, grammar needs to be

Page 229 of 478


ICIST 2014 - Vol. 1 Regular papers

analyzed and so called “parser model” needs to be


created. For generated PEG parsers this initial setup is 4. PYPARSING
non-existing. But for our DSL framework this setup will
be done during framework startup and upon changes in Pyparsing [3] is parser combinator written in Python
the language description, where the generated parser programming language. It is one of the most popular
would also be regenerated. parsers in Python with the long history and strong
community. Grammar is specified by combining simple
Arpeggio has two equivalent ways for grammar
parsers given in the form of Python class instances.
definition: canonical (internal DSL style) and textual PEG
Combination is performed through overloaded operators
notation (external DSL style). The canonical description
(+, |, <<, ~, etc.). This gives pyparsing a great flexibility
uses Python language constructs (functions, lists, tuples,
and composition properties.
objects). This approach is inspired by the previous version
of pyPEG. Support for textual PEG notation has been Figure 3 presents the same grammar for simple arithmetic
built using canonical description of the textual PEG expressions from the previous section implemented with
language1. pyparsing..

Figure 1. Grammar for arithmetic expressions given in the


Arpeggio’s internal DSL form.

Figure 1 shows a grammar for simple arithmetic


expressions with five basic operations (negation, addition, Figure 3. Grammar for arithmetic expressions in pyparsing
subtraction, multiplication and division). Each PEG rule
is given in the form of Python function. Rule functions Pyparsing uses semantic actions to perform semantic
accept no parameters and return Python object describing analysis during parsing. Without using semantic actions,
PEG rule. Prioritized (ordered) choice is given as a parse result is a flat list of tokens. Group class may be
Python list while a sequence is specified using tuple used to construct a data structure similar to the Arpeggio
objects. Rule number is a regular expression match of parse tree. The addition of Group instances in the
unsigned real number. grammar introduces an additional overhead. To examine
how this grammar alteration influences parsing speed we
have performed measurements for both cases (with and
without Group).

5. PYPEG

PyPEG [4] is another interesting python PEG grammar


Figure 2. Grammar for arithmetic expressions given in the interpreter. In version 2.x it uses Python classes for
Arpeggio’s external DSL form. grammar definition. It is oriented towards automatic AST
(Abstract Syntax Tree) construction and pretty-printing
The same grammar, in the form of PEG textual notation, (code generation). Each PEG rule in pyPEG is specified
is given in Figure 2. Arpeggio parses this description as a Python class with grammar class attribute. This class
using canonical parser for PEG language and, using will get instantiated where match succeeds.
semantic actions, instantiates the parser for the specified A part of the grammar for arithmetic expression in pyPEG
grammar. This definition will construct the same parser as is given in Figure 4.
the canonical definition from Figure 12.
From the grammar definition given in either form,
Arpeggio will instantiate the parser which can be
visualized using Graphviz tool3. Figure 5. shows a parser
instance for grammar for simple arithmetic expression.
This can be handy for grammar debugging.

Figure 4. A part of expression grammar in pyPEG.

1 See ParserPEG python class from


https://github.com/igordejanovic/arpeggio/blob/maste
r/arpeggio/peg.py
2 The details can be found in the arpeggio source code.
3 Graphviz - http://www.graphviz.org/

Page 230 of 478


ICIST 2014 - Vol. 1 Regular papers

n>=1} [5]
6. TEST SETUP
The first three languages are widely known and we think
Testing has been performed for 5 different grammars and that they do not require further explanation so we focus
6 different input string sizes for each grammar. For each on the fourth and the fifth.
input and each grammar 1000 parser runs have been In the fourth grammar we have made modification given
performed. Measurement has been performed on the in Figure 6 to induce backtracking on the always failing
conventional laptop computer (Core 2 Duo 2.26GHz, expression2 rule. The input was the same as in the case of
8GB RAM). We did our best to reduce additional grammar 1 so expression2 kept failing while trying to
overhead while running tests (no other applications were match '#' at the end of the expression. Memoizing parser
running, cron/at service was disabled etc.). Benchmarking should not introduce significant overhead because after
is performed using python timeit module. Input loading is failing match on expression2 the parser will try
done in the setup code to avoid I/O overhead. For each expression rule but this time it would use match results
grammar and parser we tried to find the fastest way to from the cache.
perform parsing (e.g. for pyparsing we used regular
expression match instead of Combine class).

Figure 6. The change in the grammar to induce backtracking.

The fifth grammar is given in Figure 7. Syntactic


predicates And, Not and Empty do not consume any input.
And will succeed if the given PEG expression is matched,
Not will fail if given expression is matched and will
succeed otherwise, Empty will always succeed. It is clear
that this grammar is highly recursive so it is interesting to
see how this recursion influences parsing performance
with increasing input size.

7. RESULTS

In this section we present test results and give a brief


discussion.
The results for simple expression language are given in
Table 1.

Table 1. Results for simple arihtmetic expression grammar.

Figure 5. Parser structure for simple arithmetic expressions These results show that Arpeggio and pyparsing have
grammar. similar speed for this language. Arpeggio is a little slower
in comparison to pyparsing with plain expression
Testing has been performed using the following grammar but when pyparsing Group is used it becomes a
grammars: little faster. Those differences are small enough to have
marginal impact in practical use. These results were
1. The simple language for arithmetic expressions
consistent with the increasing input size.
given in Figure 1.
What was surprising to us was the low performance of
2. BibTeX [7] – language for defining bibliography pyPEG. It can be seen that pyPEG is considerably slower
references. in comparison to both Arpeggio and pyparsing. In our
opinion pyPEG in version 2.x is considerably redesigned
3. CSV (Comma-Separated Values) [8] – a simple
and oriented towards language design and code
format for storing tabular data in textual form.
generation so the speed is probably not the current focus
4. The language for arithmetic expressions altered of the project. pyPEG was significantly slower in all tests
to induce backtracking. so we decided to remove its result from graphs in the rest
of the paper.
5. A classical non context-free language {anbncn,

Page 231 of 478


ICIST 2014 - Vol. 1 Regular papers

Table 3. Results for CSV grammar.

Figure 7. Graph results for simple arihtmetic expression


grammar

The graph in Figure 7 shows differences between


Arpeggio and pyparsing for the simple expression
language and various input sizes.
Figure 9. Graph results for CSV grammar.
Table 2 and Figure 8 present results for BibTeX grammar.
This time Arpeggio is slightly faster than pyparsing in The results of the fourth test are given in Table 4 and
both Group and non-Group versions, but again there was Figure 10. This time there is a noticeable difference in
not much difference so it should not matter much in the speed in favor of Arpeggio. For grammars and inputs
real-world use-cases. where backtracking is significant Arpeggio performs
better. What is also evident is that for the same inputs as
in test 1, Arpeggio introduced an overhead of roughly
20% where at the same time pyparsing introduces
overhead of approximately 95%. This difference is due to
packrat parsing memoization. Packrat machinery is
Table 2. Results for BibTeX grammar. disabled in pyparsing by default while Arpeggio uses
packrat parsing always. In the documentation of
pyparsing it is stated that it supports packrat parsing, but
enabling packrat parsing with
ParserElement.enablePackrat() (as suggested) has
made parsing time even worse. It might be a bug in the
pyparsing packrat implementation (we were using the
latest 2.0.1 version at the time of this writing) so further
investigation should be done.

Table 4. Results for altered expression grammar.

Figure 8. Graph results for BibTeX grammar.

For CSV grammar results given in Table 3 and Figure 9


pyparsing is faster in both Group and non-Group version.
The difference is again marginal, especially for the Group
version which is slower for smaller input but gets faster as
the input increases (the slope of the Arpeggio time
increase is bigger).

Page 232 of 478


ICIST 2014 - Vol. 1 Regular papers

There are many open-source PEG-based parsers in


different programming languages. The community
maintained list of parser generators and interpreters can
be found at Wikipedia [11]. At the time of this writing
there were more than 30 parsers for PEG grammars.
For performance optimization there is an interesting
project called parsimonious [12] whose main goal is to
implement the fastest python PEG parser but at the time
of this writing the parsimonious author states in the
documentation that the speed optimization has not been
finished yet. Nevertheless, ideas employed in this project
could be beneficial to the Arpeggio parser.
Another way to increase performance, not only for parser
but for the DSL framework as a whole is to use tools like
Figure 10. Graph results for altered expressions grammar.
PyPy [13] which could bring noticeable speedups without
code changes.
The results of test 5 given in Table 5 and Figure 11 are
also interesting. This grammar is highly recursive so, in 9. CONCLUSION
classical PEG parsing, the maximum recursion depth for
Python is eventually reached. Pyparsing reached this limit In this paper we have investigated the current state of our
for 150<n<200 while Arpeggio hit the same limit for Arpeggio parser in terms of parsing speed. Although
200<n<250. At the same time Arpeggio was faster. For speed is not our main goal, we are well aware of the fact
pyPEG we have not been able to define the same that, in a textual DSL framework, parsing is an operation
grammar as the current version lacks syntactic predicates. that is executed often for features such as syntax check
and highlighting, code navigation, tree outline etc. Thus,
for guiding further development, we have compared the
speed of Arpeggio with the two popular Python PEG
parser interpreters: pyparsing and pyPEG.
The results show that Arpeggio’s performance is
Table 5. Results for anbncn grammar comparable to pyparsing and outperform it in some cases.
Furthermore, Arpeggio’s parsing speed is much better
than the current pyPEG version.
We have seen that packrat parser implemented in
Arpeggio achieves good performance in the event of
significant backtracking. The overhead in test 4 was not
above 20%.
The performance tests presented here will help us develop
new features in Arpeggio while being aware of their
implications on the parsing performance.

BIBLIOGRAPHY

[1] Dejanović, I.; Perišić, B., Milosavljević, G. Arpeggio:


Pakrat parser interpreter, Zbornik radova na CD-ROM-u,
YUInfo 2010, 2010.
Figure 11. Graph results for anbncn grammar. [2] Arpeggio parser,
https://github.com/igordejanovic/arpeggio, online,
accessed January 28, 2014.
8. RELATED WORK
[3] pyparsing parser, http://pyparsing.wikispaces.com/,
PEG, as an alternative to CFG, has gained significant online, accessed January 28, 2014.
traction in literature in the last decade. Bryan Ford in [5,6]
[4] pyPEG parser, http://fdik.org/pyPEG/, online,
have introduced PEGs and packrat parsing based on the
accessed January 28, 2014.
previous research of Alexander Birman [9] and Aho et al.
[10]. PEGs represent schematic description of recursive- [5] Ford, B. Parsing Expression Grammars: A
descent parsers which makes them relatively easy to Recognition-Based Syntactic Foundation, ACM SIGPLAN
comprehend and maintain. The construction of PEG Notices, vol. 39, pp. 111-122, ACM New York, NY, USA,
parser interpreter is straightforward using dynamic 2004.
programming languages such as Python or Ruby.

Page 233 of 478


ICIST 2014 - Vol. 1 Regular papers

[6] Ford, B., Packrat Parsing: Simple, Powerful, Lazy, [12] parsimonious parser,
Linear Time, Proceedings of the seventh ACM SIGPLAN https://github.com/erikrose/parsimonious, online,
international conference on Functional programming, pp. accessed January, 28. 2014.
36-47, 2002.
[13] PyPy, http://pypy.org/, online, accessed January, 28.
[7] BibTeX, http://www.bibtex.org/, online, accessed 2014.
January, 28. 2014.
[13] Dejanović, I.; Perišić, B. & Milosavljević, G. MoRP
[8] Comma-separated values (CSV),
Meta-metamodel: Towards a Foundation of SLEWorks
http://en.wikipedia.org/wiki/Comma-separated_values,
Language Workbench 2nd International Conference on
online, accessed January, 28. 2014.
Information Society Technology (ICIST 2012), pp. 36-40
[9] Alexander Birman. The TMG Recognition Schema. , 2012.
PhD thesis, Princeton University, February 1970.
[14] Dejanović, I.; Milosavljević, G.; Perišić, B.;
[10] Alfred V. Aho and Jeffrey D. Ullman. The Theory of Vasiljević, I. & Filipović, M. Explicit Support For
Parsing, Translation and Compiling - Vol. I: Parsing. Languages and Mograms in the SLEWorks Language
Prentice Hall, Englewood Cliffs, N.J., 1972. Workbench 3rd International Conference on Information
[11] Comparison of parser generators, Society Technology and Management (ICIST 2013),
http://en.wikipedia.org/wiki/Comparison_of_parser_gener 2013.
ators, Wikipedia, The Free Encyclopedia, online, accessed
January, 28. 2014.

Page 234 of 478


ICIST 2014 - Vol. 1 Regular papers

Rough Sets Based Model as Project Success


Support
Makitan Vesna*, Brtka Vladimir*, Brtka Eleonora*, Ivkovic Miodrag*
*University of Novi Sad/Technical Faculty “Mihajlo Pupin”, Zrenjanin, Republic of Serbia
[email protected], [email protected], [email protected]

Abstract—This paper describes rough sets based model project time, cost, duration or any other significant
application during the entire project life cycle, starting from constraint for which available values exist.
its initial phase, through realization and finally to closing.
The main purpose of this model is data analysis support to II. RELATED WORK
successful project realization. Data analysis is based on the
network diagram of particular project and implies
There are a lot of examples of implemented heuristics
calculation of a “path value” according to the selected in some areas of project management given by users,
weight, such as path’s: duration, cost, scope, etc. For a researches and software companies. The most used ones
chosen weight every project activity may has one of the or so called “rules of thumb”, project managers apply
three possible estimates. After selecting the set of estimates, when planning and overseeing project tasks in order to
model’s heuristics finds a value of every path in the project respond to the complex environment [1]. Those heuristics
network diagram and evaluates possible scenario of project are based on experience and should be applied and help in
realization. Model implementation may improve project decision making when certain problem occurs [2, 3, 4].
realization, especially in projects with higher level of The other heuristics implementation is mostly related to
uncertainty, such as research and development ones. scheduling problems [5, 6, 7]. Useful categorization of a
large number of heuristics for the resource-constrained
project scheduling problem, their evaluation in a
I. INTRODUCTION computational study and comparison with the proposed
The meaning of the project management concept is to solution may be found in [8].
help people run successful projects, those which are Other researches deal with application of heuristics in
accomplished in a projected scope, time and cost and/or project duration assessment, such as [9] that showed
their sponsor is satisfied. That is why a lot of project several methods for coping with uncertain estimates of
management standards, methodologies, techniques, tools duration: PERT, fuzzy theory, and probabilistic
and other guidelines are available nowadays. Even more, computations.
the new ones are approaching offering guaranties of Implementation of rough sets in project management
project success. may be found in [10]. This research describes so called
This paper represents one of the new approaches to critical success factors and the analysis of IT project
successful project realization. It is the model that includes according to these criteria. Rough sets are used to improve
heuristics as a data analysis support to the project this analysis and increase IT projects success rate.
realization. In every phase of the project life cycle this There is a textbook about some rough sets methods in
model enables calculation of the “path value” according to IT project management [11]. It describes concepts of
the project duration, cost, scope or some other significant rough set theory that may be applied to the management
constraint. In that way, a project manager has an insight in of information technology implementation projects in
the possible project scenarios caused by the most rigid order to assist project managers to place their emphasis on
constraint at specific moment. the “important” aspects of the project.
Nowadays most of the project realizations are followed In [12] the principle and step of performance evaluation
by high level of uncertainty caused by new technologies of project management based on rough set method are
or the lack of resources, or raising risks, etc. In those cases studied. Rough sets are used for project risk assessment,
activities that were not critical become critical and may too [13, 14, 15].
cause project failure. Good project management should
include tools and techniques that deal with various The rough set theory based model in [16] is proven to
scenarios of project realization. be the most appropriate for data evaluation in many
domains. The model is formed by numerous If-Then
The model described in this paper enables prediction of expressions which are readable and easy to understand.
“good” or “bad” paths in the project network diagram This model is frequently used in domains of: medicine,
suggesting which one may become “critical” according to economy, management, education, etc.
its duration, cost, scope, quality, etc.
Software solutions for project management support
As it will be seen in the next chapter, there are not so have many features and some of them enable evaluation of
many papers related to this topic. Most of them are project duration, costs, resource availability, etc. then
dealing with scheduling problems, heuristics based on creation of “what if” scenarios, PERT analysis, template
experience or template usage. This model enables usage, etc. In that way, they provide heuristics
heuristics implementation in a wider range, including implementation as well.

Page 235 of 478


ICIST 2014 - Vol. 1 Regular papers

III. THE MODEL DESCRIPTION according to the path activities, selected durations for
In order to explain basic features of the model simple chosen scenario and other features known to the expert.
example of the project network diagram is chosen and Other features of the model are:
presented in the Figure 1. It shows project activities (A, B,  Number of paths is arbitrary;
C,…, S), its events (1, 2, 3,…, 15) and relations. Possible  Number of activities on the path is arbitrary (number
value assigned to activities may be related to its: duration, of columns is equal to the number of activities on the
cost, scope, quality, risk, etc. This value may be one of the longest path);
three-element set contained of: optimistic, modal and  Decision column may have more than two outputs.
pessimistic estimate. By choosing one of these values for Any activity weight could be expressed by real
every activity on the network diagram possible scenario of number.
project realization is created. Existing paths of network All of that enable project estimates in a wide range of
diagram are then evaluated by the expert, which means possible situations and should improve decision making
that, every one of them is marked as a “good” or a “bad” during the entire project life cycle.
one for a project realization. Bad means that path has
activities, that could cause project failure and project IV. ROUGH SETS BASED PART OF THE MODEL
managers should pay especial attention on those activities.
The Rough Set Theory (RST) is relatively new, it was
For model presentation duration is chosen to be
constraint (weight) for which path value will be introduced by Zdzislaw Pawlak in the early 1980`s [17,
calculated. As it was said earlier every activity has PERT- 18]. Basis of the RST is relation called indiscernibility
like duration estimation, its: optimistic, modal and relation. This relation is the mathematical basis of the
pessimistic time. Rough sets based part of the model RST, meaning that some information is associated with
calculates path value according to the chosen project every object of the universe of discourse (data,
scenario. It includes possible paths in the network knowledge) [18, 19]. The objects characterized by the
diagram, its activities and for every one of them, one same information are indiscernible in view of the
duration time assigned from the predefined set of values: available information about them. As mathematical basis
optimistic, modal and pessimistic. In this case, when for RST indiscernibility relation must be formally
duration represents the most rigid constraint (selected defined. First of all this relation is equivalence relation, it
weigh), critical/subcritical paths should be recognized as is reflexive, symmetric and transitive. As in [20], let U
“bad” ones.
be a finite set of objects (universe), Q  {q1, q2 ,..., qm} is
a finite set of attributes, Vq is the domain of attribute q
and V   qQ Vq . The RST information system is the 4-
tuple S  U , Q,V , f  where f  U  Q  V is a
function such that f ( x , q )  Vq for each q  Q, x  U ,
called information function. To every non–empty subset
of attributes P is associated indiscernibility relation on
U , denoted by I P :
I P  {( x, y) U  U : f ( x, q)  f ( y, q),q  P} (1)
Figure 1. Network diagram of a sample project
The family of all equivalence classes of the I P is
Table 1 represents paths in the network diagram of a
sample project with assigned value for every activity. denoted by U I P , and class containing an element x by
TABLE I. I P (x) . The indiscernibility relation induces a partition of
POSSIBLE PATHS IN A SAMPLE PROJECT NETWORK DIAGRAM the universe into blocks of indiscernible objects called
Activities elementary sets. The information about the real world is
A1 A2 A3 A4 A5 Decision
Paths given in the form of a decision system. The decision
P1 1 2 3 1 3 1 system is RST information system, but attributes which
P2 2 2 3 3 0 1 describe an object are divided so they form two sets: the
P3 2 2 3 3 0 2 set of condition attributes and the set of decision
P4 3 2 1 2 2 2 attributes, although there is usually one attribute called
P5 1 1 2 3 1 1 decision attribute while all other attributes are called
P6 3 1 3 3 0 2 condition attributes. So, the RST information system with
There are six possible paths in the network diagram. defined set of condition attributes and decision attribute is
For example, path P1 contains activities: A, D, I, N and Q called decision system. Table I represents the RST
(denoted by A1, A2…) with durations: 1, 2, 3, 1 and 3, decision system for network diagram of a sample project.
where 1 represents optimistic time, 2 modal and 3, There are five condition attributes (A1, …, A5), and one
pessimistic duration. Path P2 has only four activities: A, binary decision attribute D. RST deals with the set of
E, J and Q and that is why A5 has zero value. Decision approximations: let X be a non–empty subset of U, and
about path value is given in the last column according to PQ (P is a non–empty subset of condition
the expert judgment. Path may be a “good” one with value attributes). Set X is approximated by P–lower (2) and P–
1 or “bad” with value 2. Those values are assigned upper (3) approximations of X:

Page 236 of 478


ICIST 2014 - Vol. 1 Regular papers

P( X )  {x U : I P ( x)  X } (2) V. MODEL EVALUATION


P( X )   I P ( x) (3) The data set presented in the Table I is used as a
xX training set. After data analysis, previously mentioned five
The P–boundary of X is denoted by Bn (X ) : rules are synthesized. Rule synthesis is done by software
system Rosetta – A Rough Set Toolkit for Analysis of
Bn( X )  P( X )  P( X ) (4) Data. Rosetta system was developed as a cooperative
If some object x belongs to lower approximation of X, effort involving the Knowledge Systems Group,
it is certainly an element of X, but if x belongs to Department of Computer and Information Science at
boundary region of X, then there is a probabilistic NTNU, Norway, and the Logic Group, Institute of
measure that x may belong to the set X. If this is a case, Mathematics at Warsaw University, Poland.
then nothing can be said with certainty about its After the rule set creation, test set is prepared in order
belonging to the set X. to evaluate the model. Test set is given in Table II.
According to Table I, the approximation of the set TABLE II.
X={P1, P2, P5}, i.e. the set X contains those objects TEST SET FOR MODEL EVALUATION
(paths) for which decision is 1 (“good”), is given by: Activities
Lower approximation {P1, P5}, Boundary region {P2, A1 A2 A3 A4 A5 Decision
Paths
P3}, Upper approximation {P1, P2, P3, P5}. P1 1 2 3 1 3 1
So, the set X is approximated, paths P1 and P5 are P2 2 2 3 3 0 1
certainly “good”, but we are not sure about path P2. The P3 2 2 3 3 0 2
exact reason why path P2 is the element of boundary P4 3 2 1 2 2 2
region and, consequently upper approximation, is that 1 1 2 3 1 1
P5
there exists path P3 with same values of condition
P6 2 3 3 3 0 2
attributes while the value of decision attribute differs.
P7 2 2 1 1 3 1
Therefore, according to values of condition attributes of
P8 2 3 3 3 0 2
P2 and P3 we are not sure if this kind of path is “good” or
“bad”. Set approximation enables the synthesis of the P9 2 2 3 3 3 2
P10 3 2 3 2 2 2
decision rules in the If-Then form: , where  is the
P11 2 1 2 3 1 1
antecedent of the rule while  is the consequent. As in
[18] each object x that belongs to a decision system P12 2 2 3 3 3 2
determines one decision rule: P13 1 2 3 1 1 1
P14 2 1 2 2 1 1

aC
a  a ( x)  
d  d ( x) , here a(x) stands for the
d D
P15 2 2 3 3 1 2
value of attribute a of an object x. The expression P16 3 2 1 2 2 2
P17 1 2 1 2 1 1
a  a(x) is called descriptor. If there is one decision
P18 2 3 3 3 3 2
attribute d we have:
 a  a( x)  d  d ( x) .
aC
P19 1 2 3 1 1 1
P20 2 2 3 3 2 2
Based on lower approximation of set X, two rules are 2 2 3 3 0 1
P21
synthesized:
P22 3 2 1 2 1 1
1. A1=1  A2=2  A3=3  A4=1  A5=3  D=1, 1 1 1 3 1 1
P23
supported by {P1}
P24 2 3 2 3 2 2
2. A1=1  A2=1  A3=2  A4=3  A5=1  D=1,
supported by {P5} As test set contains 24 paths, while training set contains
Based on boundary region of set X, one rule is only six paths, the goal of this evaluation is to verify
synthesized: functionality of the model. Synthesized rules act as a
3. A1=2  A2=2  A3=3  A4=3  A5=0  D=1  classifier, five rule synthesized by training set classify test
set paths. The performance of the classifier is measured by
D=2, supported by {P2, P3}
confusion matrix C: Vd  Vd matrix, where Vd is a set of
As it can be seen from this rule, logic of the RST
accepts possibility of both values 1 and 2, for the same possible values of decision attribute (“good” or “bad”).
set of activities durations. This feature emulates real This matrix with integer entries summarizes the
situations logic. performance of rule set while classifying the set of paths.
Rules for which the value of decision attribute is Entry: Ci, j  {x U : d ( x)  i, d ( x)  j} , where d (x) is
certainly 2 (“bad”) are synthesized according to “negative the actual decision and d (x) is the predicted decision,
region” of set X that contains paths for which we are counts the number of paths that really belong to class i,
certain that the value of decision attribute is 2 (“bad”): but were classified to class j. It is desirable for the
4. A1=3  A2=2  A3=1  A4=2  A5=2  D=2, diagonal entries to be as large as possible. The results are
supported by {P4} show in Figure 2. This means that four paths that are
5. A1=3  A2=1  A3=3  A4=3  A5=0  D=2, actually “good” are correctly classified as “good”, while
supported by {P6} two paths that are actually “bad” are correctly classified as
“bad”. One path that is actually “bad” is classified as

Page 237 of 478


ICIST 2014 - Vol. 1 Regular papers

“good”, while 17 paths were not classified at all. This It is obvious that rules are synthesized according to
clearly shows two things: three reduct sets, so some rules contain attribute A1 in the
IF part, some rules contain attributes A2, A3 and A4,
while some rules contain attributes A2 and A5. Figure 3
shows the classification power of 13 rules while
classifying test set. The results are shown by confusion
matrix.

Figure 2. The confusion matrix after test set classification by five rules
without reduct sets computation

1. The training set is too small because there are 17


unclassified paths. This was expected because there
are six paths in training set while there are 24 paths in
test set. Figure 3. The confusion matrix after test set classification by 13 rules
2. The model evaluation considered as successful, there with exhaustive reduct sets computation
is only one misclassified path.
Now, there are 11 paths from test set that are correctly
The evaluation showed that it is possible to use this classified as “good” and four paths from test set, which
kind of model to classify project network diagram paths, are correctly classified as “bad”. However, there are eight
although the results have not been good enough so far. paths from test set that are actually “bad” but are
In order to achieve better classification power it is incorrectly classified as “good” and one path from test set
possible to keep only those condition attributes of the that is actually “good” but is incorrectly classified as
paths that preserve indiscernibility relation and, “bad”.
consequently set approximation. This set of attributes is The most important fact is that there are no unclassified
called reduct set. The rejected attributes are redundant paths. It is possible to further increase classification power
since their removal cannot worsen the classification. Let P by usage of training set that contains more paths, which
be a non-empty sub-set of condition attributes: PQ, would result in a larger number of rules. It is also clear
and a is some attribute from set P: aP. Attribute a is that reduct set computation contributes to the accuracy of
redundant (superfluous) in P if IP=IP-{a}. In other words, if classification.
attribute a is excluded from the set P while indiscernibility
relation I, defined by (1) stays unchanged, then attribute a VI. CONCLUSION AND FURTHER WORK
is redundant and can be omitted. Rosetta software system
is capable of reduct set calculations. These calculations The main purpose of the model showed in this paper is
are often very complex, but in many practical applications to improve decision making during the project life cycle,
it is not needed to calculate all the reducts, but only some and in that way to provide project success. The model
of them. Since the training set is very small it is possible addresses PERT-like network diagram of a specific
to employ exhaustive reduct sets calculation algorithm, project in which every activity gets one of the three
which is implemented in Rosetta. This algorithm will possible values. Those values are related to the project
calculate all possible reduct sets. Following reduct sets duration, cost, scope etc. By choosing activity values
were calculated by exhaustive algorithm: R1={A1}, possible scenario of project realization is created. Then
R2={A2, A3, A4}, R3={A2, A5}. rough sets based part of the model evaluates every path of
the network diagram and implies which ones are “bad”.
Now, it is possible to synthesize rules according to The “bad” paths are in some way critical and project
reduct sets R1, R2 and R3. Following 13 rules were manager should take that in account. Thanks to the fact
synthesized: that model enables different outputs for the same set of
1. A1=1  D=1 activity values, its usage is similar to the real-life decision
2. A1=2  D=1 OR D=2 making. More profound analysis may be provided by
extending the set of Decision values of the model on three
3. A1=3  D=2 or five of them. In that way, paths would be evaluated on
4. A2=2  A3=3  A4=3  D=1  D=2 a larger scale. For a project manager who wants to be
5. A2=2  A5=4  D=1  D=2 familiar with every possible scenario of his project
6. A2=2  A3=3  A4=1  D=1 realization and to reduce any raising risk, this could be
very important knowledge. Everything abovementioned
7. A2=2  A3=1  A4=2  D=2 implies that the main purpose of described model is
8. A2=1  A3=2  A4=3  D=1 fulfilled. Further work will investigate the possibility of
9. A2=1  A3=3  A4=3  D=2 using actual activity weights and to discretize them via
discretization process, which will result in finite number
10. A2=2  A5=3  D=1 of classes. One more possible future research in this
11. A2=2  A5=2  D=2 domain is to use real-life projects with some missing
12. A2=1  A5=1  D=1 values and to apply and test algorithms for missing values
completion.
13. A2=1  A5=4  D=2

Page 238 of 478


ICIST 2014 - Vol. 1 Regular papers

ACKNOWLEDGMENT Management Institute of New Zealand (PMINZ) 2006


Conference, Christchurch, New Zealand, 4-6 Oct. 2006.
This research is financially supported by Ministry of [10] Peters, G. and Gordon Hunter M. “Disclosing Patterns in IT
Education and Science of the Republic of Serbia under the Project management – a rough set perspective”, Springer-Verlag
project number TR32044 “The development of software Berlin Heidelberg, PReMI2009, LNCS5909, pp 591-596, 2009.
tools for business process analysis and improvement”, [11] Peters, G., Lingras P., Slezak, D. and Yao, Y., “Rough sets:
2011-2014. selected methods and applications in management and
engineering”, Springer-Verlag, London, 2012.
REFERENCES [12] Zhang, Q., “A rough set method for performance evaluation of
project management”, Second international conference on
[1] Purvis, R. L., McCray, G. E. and Roberts T. L, “The impact of computer engineering and technology, pp 348-350, April, 2010.
project management heuristics to IS projects”, IEEE Computer [13] Zhengyuan, J. and Lihua, G. “The project risk assessment based
society, Proceedings of the 36th Hawaii International Conference on rough sets and neural network (RS-RBF)”, 4th international
on System Sciences (HICSS’03), 0-7695-1874-5/03 , 2002. conference on wireless communications, networking and mobile
[2] Agarwal, R., Tannru, M., & Dacruz, M. (1992). Knowledgebased computing, pp 1-4 October, 2008.
support for combining qualitative and quantitative judgments in [14] Ping, Z. and Shi-xiang, Y., “An approach to project risk analysis
resource allocation decisions. Journal of Management Information based on rough sets”, Control and decision conference (CCDC),
Systems, 9 (1), 165–184. pp 1197-1202, May, 2010.
[3] Bukszar, E., & Connolly, T. (1998). Hindsight bias and strategic [15] Gang, X., Jinlong, Z., Lai, K.K. and Lean, Y., “Variable precision
choice: Some problems in learning from experience. Academy of rough set for group decision-making: An application”, Elsevier,
Management Journal, 31 (3), 628–641 International journal of approximate reasoning, vol. 49, no. 2, pp
[4] Hogarth, R., & Einhorn, H. (July, 1990). Venture theory: A model 331-343., October, 2008.
of decision weights. Management Science, 36 (7), 780– 803 [16] Dobrilovic, D., Brtka, V., Berkovic, I., Odadzic, B., „Evaluation
[5] Davis, E. W. and Patterson J. H., “A Comparison of Heuristic and of the Virtual Network Laboratory Exercises Using a Method
Optimum Solutions in Resource-Constrained Project Scheduling”, Based on the Rough Set Theory“, Computer applications in
Management Science, vol. 21, no. 8, pp. 944-955, April 1975. engineering education, vol. 20, no. 1, pp. 29-37, 2012.
[6] Boctor, F. F., “Some efficient multi-heuristic procedures for [17] Z. Pawlak, J. Grzymala-Busse, R. Slowinski and W. Ziarko,
resource-constrained project scheduling”, Elsevier, European “Rough sets”, Association for Computing Machinery.
Journal of Operational Research, vol. 49, no. 1, pp 3-13, 1990. Communications of the ACM; Nov 1995; 38.
[7] Tormos, P. & Lova, A., “A competitive heuristic solution [18] Z. Pawlak, A. Skowron: Rudiments of rough sets, An International
technique for resource-constrained project scheduling”, Springer, Journal of Information Sciences 177, pp. 3–27, 2007.
Annals of Operation Research no. 102, pp 65-81, 2001. [19] Z. Pawlak, “Rough set approach to knowledge-based decision
[8] Kolisch, R. and Hartmann S., “Experimental investigation of support”, European Journal of OR, 1997; 99, pp. 48-57.
heuristics for resource-constrained project scheduling: An update”, [20] S. Greco, M. Benedetto, R. Slowinski “New Dewelopments in the
Elsevier, European Journal of Operational Research, vol. 174, no. Rough Set Approach to Multi – Attribute Decision Analysis”
1, pp 23-37, October 2006. Bulletin of International Rough Set Society, Volume 2, Number
[9] D. Pons, Does Reduced Uncertainty Mean Greater Certainty? – 2/3, September 1998; pp. 57–87.
Project management with uncertain durations. Project

Page 239 of 478


ICIST 2014 - Vol. 1 Regular papers

Enabling Interoperability as a Property of


Ubiquitous Systems: Towards the Theory of
Interoperability-of-Everything
Milan Zdravković*, Herve Panetto**, Miroslav Trajanović*
* Faculty of Mechanical Engineering in Niš, University of Niš, Niš, Serbia
** Université de Lorraine, Vandœuvre-lès-Nancy Cedex, France, CNRS, CRAN, France
[email protected], [email protected],
[email protected]

Abstract— With the advent of the future Internet-of- exchanging information “sensed” from their near
Things, and consequent increasing complexity and environment, while reacting to the real world events and
diversification of the systems landscape, the even affecting it by triggering some actions. Intelligent
interoperability becomes a critical requirement for its interfaces will facilitate interactions with these “things” on
scalability and sustainable development. Can the current the Internet, query and change their state and any
considerations of the interoperability paradigm meet these information associated with them, while also taking into
challenges? In this paper, we define the interoperability as account security and privacy issues.
a property of ubiquitous systems. In doing so, we use the With the advent of IoT and implementing technologies
anthropomorphic perspective to formally define this (it is forecasted that the number of devices connected to
property’s enabling attributes (namely, awareness,
Internet will grow to 50 billion, by 2020 [3]), the
perceptivity, intelligence and extroversion), with objective
computing will become ubiquitous – in any device, any
to take the initial steps towards the Theory of
location and/or any format. Ubiquitous computing aims to
Interoperability-of-Everything. The identified concepts
provide more natural interaction of human with
and their interrelations are illustrated by the presented I-
o-E ontology.
information and services, by embedding these information
and services into their environment, as unobtrusively as
possible [4]. Sometimes, this interaction is not evident,
I. INTRODUCTION namely, humans may not be aware of the fact that it occurs
in the background. It is being carried out in the context,
As computer systems become omnipresent, the namely, the devices that interact with humans (and with
contemporary paradigm of systems interoperability turns themselves) must be aware of this context.
out to be incomplete and insufficient in attempt to
address the complex interrelationships of diversified IoT is expected to evolve from the current research on
technical environment in which we live and work today. Wireless Sensor Networks (WSN). WSN usually consists
The future Internet-of-Things becomes a reality; hence, of a set of wireless sensors nodes (from a few tens to a few
the mobile devices, sensors, tags and other identifiable hundreds, even thousands), which acquire, store, transform
resources with communication and processing capability and communicate data using wireless technologies [5].
need to be taken in the picture. These autonomous nodes are spatially distributed with aim
to monitor physical or environmental conditions, such as
In such technically complex circumstances, the temperature, sound, pressure, etc., to cooperatively pass
perception of interoperability needs to evolve from the their data through the network to a main location, but also
consideration of interoperating pairs of systems to the to enable a control of a sensor or associated device’s
capability of an autonomous system to sense, interpret, activity. Today, WSN are mostly used in military
understand and act upon arbitrary messages received applications, environmental (indoor and outdoor)
from a potentially unknown sender, based on the known monitoring, logistics, healthcare applications and robotics
relevant or non-relevant, intrinsic and extrinsic [6]. Some of the most cited application domains of the
properties (facts) of the world in its environment. In this future IoT are energy efficient homes with self-
sense, interoperability becomes in fact a property of the customizable living environment; smart cities with
system. coexisting industry, retail, residential and green spaces;
Internet of Things (IoT)[1] is defined as a dynamic pervasive healthcare, offering non-intrusive, transparent
global network infrastructure with self-configuring monitoring of everyday activities; intelligent logistics and
capabilities based on standard and interoperable transportation, with safety and environmental concerns
communication protocols [2]. In IoT, the “things” will embedded into the process; retail with customizable
have identities, physical attributes, and virtual shopping experience and full product traceability.
personalities. They will be expected to become active One of the greatest challenges for the IoT is about
participants in business, information and social processes making different devices exchange the relevant
where they are enabled to interact and communicate information and consequently, making them interoperate.
among themselves and with the environment by ISO/IEC 2382 vocabulary for information technology

Page 240 of 478


ICIST 2014 - Vol. 1 Regular papers

defines interoperability as “the capability to as the number of connected devices and their technological
communicate, execute programs, or transfer data among diversity grows, it would become more and more difficult
various functional units in a manner that requires the to work on reaching these pre-agreements. More
user to have little or no knowledge of the unique important, a current approach will inevitably lead to
characteristics of those units” [Def1]. application silos, with fragmented architectures,
In a more broad sense, IEEE [7] defines incoherent unifying concepts, and hence, little reuse
interoperability as: “the ability of two or more systems or potential.
components to exchange information and to use the Thus, it is highly likely that the “things” of the future
information that has been exchanged” [Def2]. In this IoT will be required to interpret ad-hoc signals and
case, two systems function jointly and give access to requests from other devices, including the motivation
their resources in a reciprocal way. The interoperation behind these signals and act according to the interpreted
property in this case is not absolute. Namely, it can be motivation.
assessed in terms of maturity levels, as proposed by
Guédria [8]. A. Interoperability as a property of the systems
Interoperability is sometimes related to the federated Let us consider a simple future internet of things where
approach, which implies that systems must there is a surveillance camera which registers undesirable
accommodate on the fly in order to interoperate – no pre- event and urgently needs to send SMS to a property owner
determined assets for interoperations are assumed. In (see Fig.1). However, it seems that the text sending unit
fact, this lack of technical pre-conditions is the key failed. Now, the camera broadcast the message (without a
argument for distinguishing between integration and knowledge about its receiver, or if there is a receiver at
interoperability. Interoperability lies in the middle of an all). In its environment, there are other devices (systems),
“Integration Continuum” between compatibility and full e.g. thermostat. It appears that thermostat also have text
integration [8]. sending unit (to send information about rapid temperature
In light of the requirements of the future IoT, we drop or rise). Thermostat registers this message, interprets
identify two main problems with the current definitions it and acts (send SMS about surveillance camera event).
of interoperability. First, they assume necessary
awareness and agreement of both actors about their
behaviors for a given interaction. This assumption is
derived from the predefined motivation to interoperate.
Second and even more general, they assume awareness
of the coexistence of two systems that interoperate. Both
assumptions cannot, by default hold in future ad-hoc
communications and interoperations of the variety of
systems in ubiquitous computing. Even though the
current collaboration culture assume sharing and a social
context, we consider these as the obstacles for
interoperability because they imply the previous
agreements between the interoperating systems.
Removing these agreements would mean that
interoperability will become, in fact, a semantic
interoperability. To support that, we can refer to the
often used definition of interoperability, by an unknown Figure 1. Example IoT scenario
author:
[Def3] “Interoperability is a property of a product or The problem, described in this case can be resolved by
system, whose interfaces are completely understood, to the Internet-of-Services. However, the latter implies the
work with other products or systems, present or future, functional organization, namely the thermostat's capability
without any restricted access or implementation”. to send SMS messages is defined in advance as a service.
Such a service is associated with required input requests,
In this paper, we discuss on what is needed to develop by means of format, protocol to deliver, etc. All these
this property. In specific, the following research question requirements are pre-conditions to interoperate, hence, the
is asked: What is needed for one system to operate based obstacles. It is important to highlight that, in this case, a
on the message(s) of the arbitrary content, sent by the communicating entity is not aware of the sole existence of
(an)other (unknown) system(s)? In order to answer this the receiving entity, not to mention the capability to
question, first, we define the key principles for the future perform the required task by the latter. This is an extension
considerations of interoperability as a property. Then, we of [Def1] definition of interoperability, which assumes no
discuss about the enabling technologies, based on the “knowledge of the unique characteristics of the
identified desirable attributes. Finally, we propose I-o-E interoperating systems”.
(Interoperability-of-Everything) ontology, which
illustrates the explanation of the interoperable “thing”, in As explicitly stated in [Def3], with the current
a formal way. consideration of the autonomous systems, the perception
of interoperability has to be changed to a property of a
II. INTEROPERABILITY AS A PROPERTY single system. This property determines the capacity of a
system (in a general sense) to adapt, respond, act internally
The cases for the future IoT are typically based on the or externally, etc. upon some circumstance. As referred in
pre-agreement of the different devices to exchange [Def3], this capability depends on the “understanding of
information and to act upon this information. However, the interfaces”.

Page 241 of 478


ICIST 2014 - Vol. 1 Regular papers

However, interoperability can still be considered as a distinguish between functional and universal
property of a pair (in a traditional sense of environmental awareness.
interoperability). Then, it must be taken into account that Perceptivity is a property of a “thing”, related to its
it is only a specialization of the above defined property. capability to assign a meaning to the observation from its
The key consequence of this consideration is that now, environment or from within itself. While awareness and
interoperability is seen as unidirectional property of a self-awareness are properties that have been already
pair. achieved by WSN nodes, but only in the restricted, strictly
A social context is important to define the functional scope, perceptivity goes one step further, by
interoperability as it is used to determine the purposeful facilitating its universal awareness. It enables the “things”
interoperations. In this context, vaguely defined, to observe based on the arbitrary stimuli and consequently
interoperability would simply be properly reacting to the to perceive these observations, namely to transform the
utterances of others. What is “properly” may be related physical observation to a meaningful percept. It is
to a post-agreement on what a proper reaction is. In other important to highlight that these observations are typically
words, the ultimate test to see whether one system have multi-modal (e.g. temperature, light, sound, etc.) and
reacted properly may be to see how its environment diverse in many dimensions (e.g. they are time and
reacts to its reaction. location dependent).
The social context of the interoperation may be pre- Then, based on this percept, a “thing” should be able to
determined. Namely, sometimes systems expose their decide on the consequent action. This decision is a result
capabilities by using services and this Service-oriented of a cognitive process, which consists of identification,
principles modeling are already endorsed by the IoT analysis and synthesis of the possible actions to perform in
community. Spies et al proposed a vision of web service response to the “understood” observation, namely a
mash-ups, implemented by SOCRADES Integration percept. The intelligence, as an attribute of the
Architecture (SIA) where enterprise business processes interoperability property also encompasses assertion,
interact with and consume data from a wide range of storing and acquisition of the behavior patterns, based on
networked devices [10].is exactly the case for paradigm the post-agreements on the purposefulness of the
of Internet-of-Services. performed actions.
Finally, the last attribute of the “thing” - extroversion is
III. ENABLING ATTRIBUTES AND FACTORS related to the willingness and capability of the “thing” to
When enabling factors for the above scenarios are articulate the above action. It demonstrates the thing’s
considered, we first identify the key attributes of the concern about its physical and social environment.
“things”, required for their interoperable behavior. Then, In the reminder of this section, we provide more
we identify the candidate technologies, methodologies detailed elaboration including an overview of the existing
and assets to achieve each of these attributes. technologies, methodologies and assets that might be used
The minimum requirements for an autonomous, to enable the above attributes, to facilitate the
intelligent, purposeful, social behavior of a “thing” in the interoperability property of ubiquitous systems.
interoperable environment, such as WSN, are:
awareness, perceptivity, intelligence and extroversion. A. Enabling awareness
Obviously, this consideration of “things” is The behavior related to the self-awareness of the nodes
anthropomorphic. A short elaboration of the arguments can be facilitated by using sensor ontologies. Several
for this choice is given. ontologies have been developed to represent sensors and
We can distinguish between two aspects of awareness: their behavior, since 2004 [11]. Some of the most relevant
self-awareness and environmental awareness. Self- are MMI ontology of oceanographic devices [12], CSIRO
awareness is related to the capability of a “thing” to ontology for description of sensors for use in workflows
sense a phenomenon or an event within itself. For [13], SWAMO ontology [14], A3ME ontology with
example, WSN nodes need to be aware of the available classification for self-description and discovery of devices
energy levels. Namely, data communication policy of a and their capabilities [15] and O&M-OWL (SemSOS)
node may differ from the acquisition policy (different ontology for reasoning over sensor data to infer “high-
frequency), due to the energy issues. The decisions of level” concepts from “low-level” phenomena [29].
adapting these policies to the current energy constraints The above ontologies are highlighted based on the
could be made autonomously by the nodes and the nodes extensive review of the W3C Semantic Sensor Network
behavior may be adapted in time to optimize their Incubator Group [16]. Exactly this review was made for
lifetime. the purpose of developing W3C Semantic Sensor Network
Awareness is related to the capability of a “thing” to (SSN) Ontology.
sense a phenomenon or an event from its environment. SSN Ontology [17] is a formal OWL DL ontology for
We also extend this consideration by adding the simple modeling sensor devices (and their capabilities), systems
capability to receive a message from its environment. and processes. It extends DUL (Dolce Ultra Lite) upper
The former is a core functionality of a node in WSN and ontology. It is universal in sense that it does not assume a
hence, it will not be elaborated in detail. However, it is physical implementation of a sensor. Namely, it can be
important to highlight that the awareness of the current used to describe the process of sensing by the WSN nodes,
nodes is functional in its nature and thus, restricted. as well as by the humans.
Namely, the sensor is aware only of the environmental SSN unfolds around the central pattern that relates what
features of its (pre-determined) interest. The similar the sensor observes to what it detects. While the latter is
point can be made related to the capability of a “thing” to determined on basis of its capability, namely accuracy,
receive a message (of a known format). Hence, we can latency, frequency, resolution, etc. and a stimulus, the

Page 242 of 478


ICIST 2014 - Vol. 1 Regular papers

former is related to the concepts of features of interest, incoming message. Consequently, we discuss about the
their properties, observation result and sampling time, observational and communicative perceptivity. It goes
etc. The skeleton of SSN ontology is illustrated on Fig.4. without saying that a “thing” that exhibits both capabilities
may process the sensor data and messages in a combined
way.
The observational perceptivity is related to computing a
percept on basis of a raw sensor data. Here, we refer to the
work of Kno.e.sis, USA and the University of Surrey, UK.
They developed and implemented a methodology [23] to
identify patterns from sensor data, by using Symbolic
Aggregate Approximation (SAX). These patterns are then
translated into abstractions with an abductive logic
framework called Parsimonious Covering Theory (PCT)
[24], approximated by the authors by using OWL. The
abstractions are directly, or by using reasoning
mechanisms, related to an event or a phenomenon. PCT
uses domain-specific background knowledge to determine
Figure 2. Skeleton of the Semantic Sensor Network ontology the best explanation for a set of observations, namely to
link the patterns to semantic descriptions of different
Stimuli are detectable changes in the environment that relevant thematic, spatial and temporal features.
trigger the sensors (or a decision of a sensor to perform In the subsequent effort, with the objective to provide a
observations). They are related to the observable formal semantics of a machine perception, Henson et al
properties and hence, to the features of interest. The developed IntellegO ontology of perception [25].
same types of stimulus can trigger different kinds of IntellegO was made based on the principles of Neisser’s
sensors and can be used to reason about different Perception Cycle [26], according to which a perception is
properties. considered as a cyclic process, in which the observation of
Sensors perform observations; they transform the environment, followed by the creation of the initial
incoming stimulus to another representation. They are percepts, is often affected by the process in which we are
related to a procedure of sensing – on how a sensor directing our attention for further exploration, in order to
should be realized and deployed to measure a certain get more stimuli required for constructing the final
observable property. Observations are also seen as parts percept. In this process, humans generate, validate and
of an observation procedure. consequently reduce the hypotheses that explain their
Properties are qualities of the feature of interest observations.
(entities of the real world that are target of sensing) that According to IntellegO, based on the observed qualities
can be observed via stimuli by the sensors. of the inherent properties of the observed object, a subject
Obviously, sensor ontology is an useful asset for creates a number of percepts as parts of the so-called
directly facilitating self-awareness. Furthermore, it can perceptual-theory. Then, in order to clarify which qualities
be extended to enable processing of the pre-determined, enable the reduction of the perceptual-theory, following
expected observations and making direct conclusions, types are classified: expected, unknown, extraneous and
thus facilitating functional environmental awareness. discriminating qualities. Hence, the specific goal of the
Some examples are: IoT-enabled business services, perception cycle is to generate a minimum perceptual-
collecting and processing sensor data within a rescue theory for a given set of percepts. These percepts may not
environment [18], smart products [19], semantic-based come only from the features of interest but also from the
sensor network applications for environmental general environment of a “thing”, to which some questions
management [20] and agri-environmental applications may need to be asked. Hence, perceptivity cannot be
[21]. addressed independently of extroversion, which is used to
articulate these questions.
B. Enabling perceptivity The trend of service-enablement of “things” pushes us
Cognitive psychology considers perception as the to consider also their capability to perceive interfaces
organization, identification, and interpretation of sensory (services), rather than data and/or information. Although
information carried out with the objective to represent this is somewhat out of the scope of the initial research
and understand the environment [22]. question, it must be taken into account, as the services are
Perceptivity is tightly related to the awareness credible elements of the “things” environment.
attribute in the sense that constructing the meaning from Current work on defining the models in IoT domain is
the observations data is a pre-condition for mostly focused on the resources description in
understanding the context in which some interoperations management. However, the aspect of accessing and
or communications occur. In other words, the known utilizing information, generated in IoT is equally important
meaning of the sensor data or data pattern contributes to as enabler of the aforementioned descriptions. Exactly this
its communication context awareness, or, in specific, its aspect is addressed by Wang et al, who developed a
situational awareness. comprehensive ontology for knowledge representation in
When considering the awareness capabilities, the IoT [27]. This ontology extends the current work on
mentioned above, we can distinguish between the representation of resources in IoT, by introducing service
perceptivity related to perceiving the sensor data and the modeling, Quality of Service (QoS) and Quality of
perceptivity related to assigning a meaning to an Information (QoI) aspects.

Page 243 of 478


ICIST 2014 - Vol. 1 Regular papers

Perceiving service interfaces in IoT is tightly related formalisms and to make meaningful but unambiguous
to their discovery. Here, we refer to the work of Guinard conclusions by using variety of engines.
et al, who proposed the architecture for dynamically
querying, selecting and using services running on D. Enabling extroversion
physical devices [28]. This architecture can be Extroversion as a property is considered as a capability
particularly useful for finding the relevant observation in of a “thing” to commit to articulating and performing an
a specific context. With regard to this, it is important to action, based on a decision.
take into account the work of specification of Sensor It reflects its commitment to act socially, namely to
Observation Service (SOS) web service [29] by the Open actively inform, affect or change its environment, where
Geospatial Consortium (OGC). Finally, Pschorr et al this engagement is related also to endorsing or denouncing
[30] have shown that publishing sensor data as Open other “things” actions. It also reflects its curiosity, namely
Linked Data, when complementing the use of sensor the capability to articulate the request for any additional
discovery service can enable the discovery and accessing information needed for a complete reasoning during the
the sensors positioned near named locations of interest. processes of perception and decision.
When IoT capabilities are considered, we can
distinguish between the following types of core, general IV. INTEROPERABILITY-OF-EVERYTHING (I-O-E)
services of “things”: observational, computational, and ONTOLOGY
actuating services.
In this section, we summarize the discussion above in a
C. Enabling intelligence formal way, by synthesizing the identified concepts into I-
In a really broad sense, intelligence as an attribute of o-E (Interoperability-of-Everything) ontology. At this
the “thing” is related to its processing or computational point, I-o-E ontology is only considered as an illustration
capability. The processing unit (also associated with of identified principles for interoperability of ubiquitous
small storage unit) is already embedded in the current systems. Also, I-o-E ontology does not include
architecture of nodes in WSNs and its key objective is to implementation details; hence, for example, services are
reduce energy consumption. This is especially important not defined.
in multi-hop WSNs. I-o-E unfolds around two central patterns. Vertical
A unique feature of the sensor networks is the pattern encloses thing-attribute generic relationships, while
cooperative work of sensor nodes, where multiple and horizontal one defines stimulus-observation-perception-
multi-modal observations data is distributed to a central decision-action cycle.
gateway (or another node) which is in charge for their I-o-E extends the SSN ontology to stimulus-
processing. Instead of continuously sending raw data to observation-perception-decision-action cycle (see Fig.3) in
the nodes responsible for their interpretation and which the value of a stimulus is gradually added with the
processing, sensor nodes use their own processing objective to perform purposefully and socially. These
capabilities to locally carry out simple computations and aspects of the action are realized by the possibility of other
transmit only the required and partially processed data. “things” to endorse the performed action, thus turning the
In a more specific sense and in context of defining the instance of the cycle to the candidate pattern of behavior.
interoperability as a property of a “thing”, we consider Hence, we distinguish between intrinsic and extrinsic
the intelligence as the capability to perform any and intelligence. Intrinsic intelligence is exhibited if this cycle
every step of processing, needed for determining the barely exists; e.g. the thing is intrinsically intelligent if it is
meaningful and purposeful response to the perceived capable to simply decide on the action. Extrinsic
observations. This definition implies that the necessary intelligence is exhibit if these actions received the
condition for a cognitive activity is certainly an action. endorsement of other things.
More important, it assumes purposefulness, which is
determined socially.
It is important to highlight that this capability has a
social context. Namely, when processing requires the
computation, which is not possible within a single node,
then this computation is requested from its environment.
Thus, as it was the case for awareness attribute,
intelligence cannot be considered in isolation from the
extroversion attribute. Also, it is tightly related to self-
awareness, since a particular computation capability is
an internal attribute of a ‘thing”.
When enabling technologies are discussed, a key thing
to focus at is a particular kind of logic or logics that
could facilitate inference in the context, defined by the Figure 3. UML representation of the central vertical pattern of I-o-E
above attributes. Although great most of the current ontology
efforts in developing sensor, IoT and WSN ontologies
are implemented by using OWL, it is our opinion that These concepts are related to the theory of systems
this poses a serious constraint to the future developments intelligence, proposed by Hämäläinen and Saarinen [31].
related to enabling “things” with intelligence. Namely, The systems intelligence is measured by successful
interoperability as a future property must also consider interactions with an environment, and a person’s ability to
the possibility to “understand” and combine different

Page 244 of 478


ICIST 2014 - Vol. 1 Regular papers

modify their behavior based on feedback from that ∀t(∃s(sensedBy(s,t)) ∧


environment. ∃o(observedBy(o,t) ∧ relatedTo(o,s)) ∧
On Fig.3, dashed lines illustrate dependency. They ∃p(perceivedBy(p,t) ∧ relatedTo(p,o)) ∧
indicate necessary conditions for concepts. Hence, a ∃d(madeBy(d,t) ∧ relatedTo(d,p)) ∧
stimulus exists only if it is sensed – by a thing. However, ∃a(performedBy(a,t) ∧ relatedTo(a,d)))
it may be created by a thing. A thing has a minimum one [R6] (thing(t) ∧ thing(t’) ∧ stimulus(s) ∧
domain of interest; however, it may sense a stimuli for observation(o) ∧ percept(p) ∧ decision(d)
which we do not know if it comes from any domains of ∧ action(a)) ∧
interest, since originateFrom(stimulus, domain-of- exhibitsAttribute(t,’extrinsic-
interest) is not a necessary condition for a stimulus. intelligence’)) ⇒
Fig.4 illustrates the central horizontal pattern of I-o-E ∀t(∃s(sensedBy(s,t)) ∧
ontology: thing-attribute. All possible attributes are ∃o(observedBy(o,t) ∧ relatedTo(o,s)) ∧
represented as individuals. ∃p(perceivedBy(p,t) ∧ relatedTo(p,o)) ∧
∃d(madeBy(d,t) ∧ relatedTo(d,p)) ∧
∃a(performedBy(a,t) ∧ relatedTo(a,d)) ∧
∃t’(t≠t’ ∧ endorsedBy(a,t’)))

Note that relatedTo is a transitive symmetric property;


hence it is possible to infer relatedTo(p,s) and
relatedTo(a,s) in [R4] and [R5][R6], respectively.
However, direct assertions of relatedTo(a,s) are also
possible in cases when the “thing” needs to make
additional observations (and subsequent perceptions) in
order to get some missing information from its
environment (or from within itself), needed to complete
the inference of the decision and, consequently formulated
action.
Again, we highlight that the extrinsic intelligence is an
Figure 4. UML representation of the central horizontal pattern of I-o- attribute which is exhibited by a thing t, only if an action is
E ontology performed by this thing, based on the set of stimuli it
sensed, and only if there exist at least one thing t’,
In order to make possible the evaluation of the different from t, which endorsed this action.
interoperability property, namely the related attributes,
the assertion of things that do not exhibit above attributes A. Modeling intelligence
by default, is allowed. In other words, association of a
The above rules can be used only to validate if there
thing to an attribute is not a necessary condition for a
exist stimulus-observation-perception-decision-action
thing.
cycles where a thing exhibits one or more of the attributes.
Attribution to the things is asserted by the following They are only formal definitions of these attributes.
rules: However, substantial intelligence of the “thing”, as its
[R1] (thing(t) ∧ stimulus(s) ∧ observation(o) attribute can be confirmed if and only if intelligence is
∧ exhibitsAttribute(t,’awareness’)) ⇒ exhibited for all these cycles.
∀t(∃s(sensedBy(s,t)) ∧ The assumption that the “things” act upon every
∃o(relatedTo(o,s) ∧ observedBy(o,t))) observation they make may sound too optimistic.
[R2] (thing(t) ∧ stimulus(s) ∧ observation(o) However, we should take into account that simple storage
∧ exhibitsAttribute(t,’self-awareness’)) of the sensation-observation-perception triple can be
⇒ considered as an action. These asserted triples can later be
∀t(∃s(sensedBy(s,t)) ∧ used for experience-based reasoning.
∃o(relatedTo(o,s) ∧ createdBy(s,t) ∧ We discuss about the substantial intelligence in context
observedBy(o,t))) of the observation sets. An observation set is a set of
[R3] (thing(t) ∧ stimulus(s) ∧ observation(o) observations all of which are related to an action. This
∧ exhibitsAttribute(t,’environmental- context is anthropomorphic because it involves
awareness’)) ⇒ consciousness; namely, it does not consider all stimuli
∀t(∃s(sensedBy(s,t) ∧ ¬createdBy(s,t)) sensed by the “thing” but only those that are observed (and
∧ ∃o(relatedTo(o,s) ∧ observedBy(o,t))) in fact, acted upon).
Thus, member-of-observation-set class is defined as
[R4] (thing(t) ∧ stimulus(s) ∧ observation(o) equivalent class:
∧ percept(p)) ∧
member-of-observation-set ≡ observation(o) ∧
exhibitsAttribute(t,’perceptivity’)) ⇒
(action(a) ∧ relatedTo(o,a))
∀t(∃s(sensedBy(s,t)) ∧
∃o(observedBy(o,t) ∧ relatedTo(o,s)) ∧ All observations are automatically classified to this
∃p(perceivedBy(p,t) ∧ relatedTo(p,o))) class if the above conditions are met. All observations that
[R5] (thing(t) ∧ stimulus(s) ∧ observation(o) are related to a single specific action are considered as the
∧ percept(p) ∧ decision(d) ∧ action(a)) members of one observation set.
∧ exhibitsAttribute(t,’intrinsic- Also, we discuss about the substantial intelligence in
intelligence’)) ⇒ context of the perceptual sets. Similarly to an observation

Page 245 of 478


ICIST 2014 - Vol. 1 Regular papers

set, a perceptual set is a set of percepts all of which are pushes forward a concept of social agreement as an
related to an action. alternative for a logical truth. A social agreement provides
member-of-perceptual-set ≡ percept(p) ∧ validation of one pattern of system behavior, then
(action(a) ∧ relatedTo(p,a)) transforming it to a template of system behavior.
Reliability of such a validation is argued by the fact that it
The definitions of the above two equivalent classes are occurs in multi-faceted framework - a specific action of a
introduced to illustrate that we distinguish meaningful system is judged in the different contexts where the
observations and percepts from the non-functional ones. different other systems live.
In fact, during the process of deciding on the possible One of the most obvious and direct effects of such an
action, the “thing” may look up among the relationships approach in the future are related to addressing key
between the existing members of these two classes (and technological challenge of WSNs: to decrease the energy
resulting actions), similarly to human mind’s consumption of “things” and to extend the lifetime of the
consideration of knowledge and experience. nodes. First, perceiving raw sensor data in multi-hop
While the “occurrences” of intelligent behavior are WSNs and consequently, transmitting the meaningful
formalized by exhibitsAttribute relationship, the percept (or, acting upon this percept) instead of this raw
substantial intrinsic [R7] and extrinsic intelligence [R8] data, can significantly reduce the data volume that needs to
of the “thing” are represented by the inferred be communicated from the sensor nodes to the gateways
hasAttribute(thing(t), ‘intrinsic-intelligence’) and or processing components. Second, introducing a
hasAttribute(thing(t), ‘extrinsic-intelligence’) processing capability of the “things” may, in fact, revoke
relationships. These relationships are inferred, based on the need for these components, thus having the similar
the following rules: effect on the traffic.
[R7] (thing(t) ∧ stimulus(s) ∧ observation(o) Furthermore, encoding some kind of “intelligence” into
∧ percept(p) ∧ decision(d) ∧ action(a)) individual things contributes significantly to the possibility
∧ hasAttribute(t,’intrinsic- of one network of things to scale more effectively and
intelligence’)) ⇒ efficiently, even across the boundaries of the other
∀t(∀s(sensedBy(s,t)) networks. This future benefit is derived from the foreseen
(∀o(observedBy(o,t) ∧ relatedTo(o,s)) capability of “things” to sense, perceive and act
(∀p(perceivedBy(p,t) ∧ relatedTo(p,o)) independently of the predetermined features of interest.
(∀d(madeBy(d,t) ∧ relatedTo(d,p)))))
∃a(performedBy(a,t) ∧ relatedTo(a,d)))
The amount of the research opportunities in this area is
immense, even without considering the technical
[R8] (thing(t) ∧ stimulus(s) ∧ observation(o) (hardware) challenges. They are mostly related to the
∧ percept(p) ∧ decision(d) ∧ action(a)) development of top-level theories and strategies which are
∧ hasAttribute(t,’extrinsic- foreseen neither to replace nor update current approaches,
intelligence’)) ⇒ but to reconcile them, by enabling things’ proficiency in
∀t(∀s(sensedBy(s,t)) different standards, languages, even logics.
(∀o(observedBy(o,t) ∧ relatedTo(o,s)) Even more complex measures of the things’ intelligence
(∀p(perceivedBy(p,t) ∧ relatedTo(p,o)) can be introduced when referring to the notion of social
(∀d(madeBy(d,t) ∧ relatedTo(d,p))))) intelligence, defined as the capacity to effectively
∃a(performedBy(a,t) ∧ relatedTo(a,d)) ∧ negotiate complex social relationships and environments.
∃t’(t≠t’ ∧ endorsedBy(a,t’))) Here, the role of the thing will extend from simply
Note that according to the proposed definition, the affecting its social environment (systems intelligence) to
substantial extrinsic intelligence is inferred in case of navigating through the complex social situations and
endorsement of only one thing t’, different from t. In adapting to dynamic conditions.
simple words, if the performed action is useful for at
least one another thing, the behavior is characterized by VI. ACKNOWLEDGEMENT
intelligent, independently of the possible denouncements The work presented in this paper was supported by the
or indifference of the other things in the environment. program for scientific cooperation between Serbia and
France CNRS (project “Interoperabilité sémantique pour la
V. CONCLUSIONS production intégrée”); and the Ministry of Education and
In this paper, we argue that the interoperability as a Science of the Republic of Serbia (project No. III41017).
property of ubiquitous systems can be enabled if these Authors wish to thank Paul Oude Luttighuis from
systems are empowered with the attributes of awareness, Novay, Netherlands, for vibrant discussions and
perceptivity, intelligence and extroversion. These inspiration that lead to defining the approach presented in
attributes then enable the systems to behave and this paper.
communicate autonomously and openly, without
considering the designated features of interest, similarly REFERENCES
to humans, in the activities of sensation, perception, [1] Ashton, K. (2009). That 'Internet of Things' Thing. RFID Journal.
cognition and articulation. The anthropomorphic 22 July 2009
commitment is also kept when social context of the [2] Vermesan, O., Friess, P., Guillemin, P., Gusmeroli, S.,
interoperations is considered. Namely, this context is Sundmaeker, H., Bassi, A., Jubert, I.S., Mazura, M., Harrison, M.,
neither predetermined nor pre-agreed. Rather, it is Eisenhauer, M., Doody, P. (2009). Internet of Things Strategic
established post-mortem in endorsements or Research Roadmap. Internet of Things Initiative
denouncements of the actions, performed as the
outcomes of the interoperations. In a way, this approach

Page 246 of 478


ICIST 2014 - Vol. 1 Regular papers

[3] D. Evans, The Internet of Things, How the Next Evolution of the [18] Desai et al (2011) SECURE: Semantics Empowered Rescue
Internet is Changing Everything, Whitepaper, Cisco Internet Environment, In: 4th International Workshop on Semantic Sensor
Business Solutions Group (IBSG), April 2011. Networks 2011 (SSN 2011)
[4] Estrin et al, Connecting the physical world with pervasive [19] Nikolov et al (2011) Conceptual Framework in SmartProducts,
networks D.2.1.3: Final Version of the Conceptual Framework.
[5] I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. [20] Frazer et al (2011) Semantic Access to Sensor Observations
Wireless sensor networks: a survey. Computer Networks, pages through Web APIs. In: Fifth IEEE International Conference on
393– 422, 2002 Semantic Computing, Stanford University, Palo Alto, CA, USA,
[6] Arampatzis et al, A Survey of Applications of Wireless Sensors 19 - 21 Sep 2011. IEEE.
and Wireless Sensor Networks [21] Bendadouche et al, Extension of the Semantic Sensor Network
[7] IEEE, "IEEE Standard Computer Dictionary: A Compilation of Ontology for Wireless Sensor Networks: The Stimulus-WSNnode-
IEEE Standard Computer Glossaries", Institute of Electrical and Communication Pattern
Electronics Engineers, 1990 [22] Schacter, Daniel (2011). Psychology. Worth Publishers.
[8] Guédria, M., Naudet, Y., Chen, D. (2013) Maturity model for [23] Payam Barnaghi, Frieder Ganz, Cory Henson, and Amit Sheth.
enterprise interoperability. Enterprise Information Systems Computing Perception from Sensor Data. In proceedings of the
[9] Panetto, H. (2007) Towards a Classification Framework for 2012 IEEE Sensors Conference, Taipei, Taiwan, October 28-31,
Interoperability of Enterprise Applications. International Journal 2012.
of Computer Integrated Manufacturing, 20 (8), pp. 727-740 [24] J. A. Reggia and Y. Peng, “Modeling diagnostic reasoning: a
[10] Spiess et al (2009) SOA-based Integration of the Internet of summary of parsimonious covering theory,” Computer Methods
Things in Enterprise Services and Programs in Biomedicine, vol. 25, no. 2, pp. 125 – 134, 1987.
[11] Compton et al, A Survey of the Semantic Specification of [25] Henson et al (2011) An Ontological Approach to Focusing
Sensors Attention and Enhancing Machine Perception on the Web
[12] MMI Device Ontologies Working Group. [26] Neisser, U. (1976). Cognition and Reality. Psychology (Vol. 218).
http://marinemetadata.org/community/teams/ontdevices San Francisco: W.H. Freeman and Company.
[13] Holger Neuhaus and Michael Compton. The semantic sensor [27] Wang, W, De, S, Toenjes, R, Reetz, E and Moessner, K (2012) A
network ontology: A generic language to describe sensor assets. Comprehensive Ontology for Knowledge Representation in the
In AGILE Workshop: Challenges in Geospatial Data Internet of Things In: IEEE 11th International Conference on
Harmonisation, 2009. Trust, Security and Privacy in Computing and Communications
(TrustCom), 2012-06-25 - 2012-06-27, Liverpool, UK.
[14] Kenneth J Witt, Jason Stanley, David Smithbauer, Dan Mandl,
Vuong Ly, Al Underbrink, and Mike Metheny. Enabling Sensor [28] Guinard et al, Interacting with the SOA-Based Internet of Things:
Webs by utilizing SWAMO for autonomous operations. In 8th Discovery, Query, Selection, and On-Demand Provisioning of
NASA Earth Science Technology Conference, 2008. Web Services
[15] A.Herzog, D. Jacobi, and A. Buchmann, A3ME-an Agent-Based [29] Cory Henson, Josh Pschorr,Amit Sheth, Krishnaprasad
middleware approach for mixed mode environments, in The Thirunarayan, 'SemSOS: Semantic Sensor Observation Service', In
Second International Conference on Mobile Ubiquitous Proceedings of the 2009 International Symposium on
Computing, Systems, Services and Technologies, 2008, pp. Collaborative Technologies and Systems (CTS 2009), Baltimore,
191196. MD, May 18-22, 2009.
[16] Semantic Sensor Network XG Final Report, W3C Incubator [30] Joshua Pschorr, Cory Henson, Harshal Patni, and Amit Sheth,
Group Report 28 June 2011, Sensor Discovery on Linked Data
http://www.w3.org/2005/Incubator/ssn/XGR-ssn-20110628/ [31] Raimo Hämäläinen, Esa Saarinen (2004) Systems Intelligence -
[17] M. Compton et al., “The SSN Ontology of the W3C Semantic Discovering a hidden competence in human action and
Sensor Network Incubator Group”, Journal of Web Semantics, organizational life: Systems Analysis Laboratory Research Reports
2012.

Page 247 of 478


ICIST 2014 - Vol. 1 Regular papers

Interoperability as a Property: Enabling an Agile


Disaster Management Approach
Ovidiu Noran*, Milan Zdravković**
*School of ICT, Griffith University, Australia
** Faculty of Mechanical Engineering in Niš, University of Niš, Niš, Serbia
[email protected] , [email protected]

Abstract— Catastrophic events triggered or augmented by Therefore, although DMOs’ organisational diversity
regional conflicts, climate change and new disease strains makes proper and effective collaboration more difficult
appear to be increasing in intensity and frequency. (Whitman & Panetto, 2006), such synergy in disaster
Preparation, response and recovery are essential to survive management is essential in order to minimize the loss of
the new wave of large-scale disasters; unfortunately
however, the heterogeneous set of institutions and
property and human life.
organisations responsible for delivering emergency Interoperability is key to effective cooperation (Trakas,
response services often fail to rise up to the task, with the
lack of proper collaboration featuring as a main culprit.
2012); however, selecting its suitable type and applicable
Previous research and applications have advocated for and aspects to disaster management is a non-trivial task. In
presented a holistic and integrated approach to improve addition, the advent of Internet technology feeding the
interoperability seen as an essential component of emergence of the ‘Internet of Things’ (IoT) allowing the
collaboration. In this position paper, we aim to contribute different artifacts to sense, process, share and act in an
to advancing that research by providing a novel ubiquitous, Internet-like environment, brings a novel
perspective to interoperability issues, that takes into interoperability paradigm, namely Interoperability as a
account the advent of Internet technology feeding the Property (IaaP).
emergence of the ‘Internet of Things’ (IoT), enabling the
different artefacts to sense, process, share and act in an This position paper sets out to identify disaster
ubiquitous, Internet-like environment. Thus, we management collaboration problems, investigate and
investigate the potential application of a novel IoT-aware prioritize suitable interoperability aspects. Then, it
‘interoperability as a property’ (IaaP) paradigm in order
to provide a sound state-of-the-art platform for efficient
discusses how IaaP requirements and enabling factors
preparation by disaster management organisations and applied to DMOs in the context of the mainstream
agile, adaptive response delivered by synergic task force interoperability aspects deemed relevant to disaster
and rescue teams. management can help improve interoperability and thus
agility and effectiveness of disaster response teams.
I. INTRODUCTION
II. COLLABORATION IN DISASTER
The rate and force of natural and man-made disasters, MANAGEMENT
whether triggered or augmented by new strains of drug-
The cooperative operation of emergency services is
resistant diseases, regional conflicts and climate change,
typically legislated at various state, national and
appears to be on the rise. In this context, it is nowadays
international levels (e.g. (Australian Government, 2011;
essential to effectively prevent, prepare for, promptly
Federal Emergency Management Agency, 2011;
respond to and recover from catastrophic events.
Government of South Australia, 2004; United Nations
Governments worldwide typically tackle this challenge
International Strategy for Disaster Reduction Secretariat
by creating specific policies, departments and ‘disaster
(UNISDR), 2011)). However, merely mandating
management’ organisations’ (DMOs). Such
organisations (of any type) to ‘cooperate’ has proven
organisations operate in a highly varied and complex
insufficient; the lack of true collaboration has brought
historic, traditional, geographical, cultural and political
about increased response times, confusion about the
environment, which typically results in a high
situation on the ground and sometimes even dispute as to
organisational diversity of DMOs.
who, where and when is in charge. Wilson et al. (2005)
Coping with large scale catastrophic events typically reinforce this point by stating that collaboration does not
demands resources and capabilities beyond those of any automatically occur but rather must be “constructed,
individual organisation; thus, the effective cooperation learned […]” and importantly, “[…] once established,
of DMOs at all necessary levels and addressing all protected” (ibid.).
relevant aspects is essential (Australian Psychological
Coordination in crisis situations is also difficult due to
Society, 2013; Kapucu, Arslan, & Demiroz, 2010;
incompatibilities in infrastructure and difficulty in
Tierney & Quarantelli, 1989; Trakas, 2012; Waugh &
filtering and validating the typical flood of information
Streib, 2006; World Health Organisation, 2011).

Page 248 of 478


ICIST 2014 - Vol. 1 Regular papers

generated during disaster events. For example, Patricelli, 2008) has also been found to have limited
inconsistency in alert notice types and formats may applicability due to potential over-reliance on failure-
confuse response teams and fuel a situation whereby the prone civilian communication infrastructure. The disaster
population is saturated with ambiguous and/or irrelevant management federalisation approach offered as an
messages (Ellis, Kanowski, & Whelan, 2004; alternative to central command and military styles has
Queensland Floods Commission of Enquiry, 2011; also achieved sub-optimal results in the past as reflected
Victorian Bushfires Royal Commission, 2009). This can in criticism expressed in the relevant literature ('t Hart et
lead to sub-optimal prevention and response by intended al., 2005; Clark, 2006; Wiese, 2006). However, this
recipients and potential property and life loss. Efforts to approach may be substantially improved by properly
standardise warning message protocols are still rather achieving cooperation preparedness.
localised, with low take-up rates (Moore, 2010; OASIS,
2005). Literature further argues that collaborative disaster
management can be enhanced by modelling and
Various documents, inquiries, reviews and reports (('t participatory design (Kristensen, Kyng, & Palen, 2006)
Hart, Boin, Stern, & Sundelius, 2005; Brewin, 2011; aimed at integrating scientific but also administrative and
Igarashi, Kong, Yamamoto, & McCreery, 2011; political aspects into a whole-system approach
Queensland Floods Commission of Enquiry, 2011; (Moghadas, Pizzi, Wu, & Yan, 2008; Utah Department of
United Nations International Strategy for Disaster Health, 2007; World Health Organisation, 2011). Thus,
Reduction Secretariat (UNISDR), 2011; Victorian poor aspect coverage, lack of commonly understood
Bushfires Royal Commission, 2009; Wiese, 2006), etc.), integrated models and a missing mature cooperation
suggest that the root causes of current shortcomings paradigm appear to be the major obstacles in achieving
could in fact be the inadequate preparedness and suitable collaborative preparedness.
information flow and quality between the participants
(Prizzia & Helfand, 2001; Wickramasinghe & von III. INTEROPERABILITY
Lubitz, 2007), owing mostly to incompatibilities
originating in their inherent heterogeneity, in the lack of Successful disaster management cooperation involves the
trust, organisational confusion and even due to will and capability of the participating organisations to
misguided competition beliefs. Thus, true collaboration work together in an optimal way. The concept of
is intricate and multifaceted, involving information, interoperability (and its levels of maturity) is often used
processes, resources and organisational cultures of the as a measure of cooperation capability ((DoD
participants (Kapucu et al., 2010; Trakas, 2012). Architecture Framework Working Group, 2004; Guédria,
Chen, & Naudet, 2009)). Importantly, the analysis of
An important part of disaster management is interoperability in the disaster management domain must
represented by health-related incidents. Despite include the interoperability extent, approach and aspects.
significant advances such as the wide use of vaccines,
eradication of serious diseases and large reductions in Each disaster event is quite unique; thus, there can be no
communicable disease epidemics and chronic illnesses ‘one size fits all’ DMO interoperability extent. At a
(Fielding, 1999; World Health Organization, 1998), minimum, the participating organisations’ systems should
nowadays we are still confronted with global health be compatible, so at least they don’t hinder each other’s
hazards owing to causes such as new strains of diseases operations (see Fig.1).
(Kilbourne, 2006) and climate change (Donohoe, 2003).
Typical psychological effects triggered by disaster Interoperability
events such as uncertainty, anguish, confusion, panic are Integrated
amplified in pandemic-type situations and thus reinforce
the general disaster management need for collaboration
High Agility
preparedness of the participant organisations (U.S. Dept
of Health and Human Services, 2005; World Health
Organisation, 2011).
Various approaches have been attempted to disaster Low agility
management. To start with, the ‘central command’-type
approach sometimes triggered by the urgency and the Independence
Compatible
slow reaction of some participants (Waugh, 1993) has
proven to be unsustainable, as successful disaster Central Command Federated
management relies on a wide range of community
economic, social-psychological, and political resources. Fig.1. Interoperability vs. Independence
This cooperation brings communities together, gives
them a sense of usefulness (ibid.) and thus alleviates A high degree of integration is not desirable as it would
negative psychological effects of disaster events. The imply that the members of the task force created by the
adoption of military-type network-enabled capabilities DMOs could not fully function independently. In an
in disaster management (von Lubitz, Beakley, & emergency situation, some response team members may
be affected or even cease to function; the other

Page 249 of 478


ICIST 2014 - Vol. 1 Regular papers

participants should be able to continue without and may be expensive or impractical (e.g. viewed in an
significant performance loss (see for example the interstate or international context).
ARPANET ‘resilient network’ concept espoused by
Heart, McKenzie, McQuillian, & Walden (1978)) and The federated approach, in principle very attractive for
compensate for the failed or ailing member/s. disaster management due to ensuring resilience owing to
Coordination can also be severely hindered by the independence of the response team members has also
communication infrastructure breakdown (Crawford, often fallen short, mainly due to the impracticality of
2012; Queensland Floods Commission of Enquiry, negotiating ‘on the fly’. However the new technologies
2011); in this situation, the participants should be able available such as ubiquitous computing may have the
to autonomously carry on their duties for a certain answer as further shown in this paper.
amount of time. This requires preparedness acquired in
advance based on commonly-agreed procedures and B. INTEROPERABILITY ASPECTS
shared knowledge. Standards such as ISO14258 (2005) and various
interoperability frameworks such as the European
In addition, we propose that, in the context of
Interoperability Framework (EIF)(2004), IDEAS project
ubiquitous computing and connectivity, this ‘classic’
(2003), ATHENA Interoperability Framework
disaster management paradigm can be evolved to
(AIF)(2004) and the INTEROP Network of Excellence
develop the capability the remaining participants to
(NoE) Interoperability Framework (Chen, 2005) provide a
search for other suitable resources, liaise and cooperate
plethora of viewpoints to be considered in an
with them. This new paradigm relies on dynamic
interoperability maturity assessment and enhancement.
reconfiguration of agile disaster response members who
can search for, recognise and adapt in order to use any In researching the above-mentioned standards and
new resources available, even if they do not conform to frameworks, we have found that these frameworks have
the known, agree-upon guidelines. overlapping and complementary areas; in addition, it is
important that combinations of aspects are also
A. INTEROPERABILITY APPROACH considered. Therefore, a combined model has been
In reviewing the relevant research and body of constructed and applied for identifying the most relevant
knowledge we have found several definitions of the aspects for healthcare interoperability (Noran, 2013;
term ‘interoperability’. Noran & Panetto, 2013) (see Fig. 2).

Thus, ISO/IEC 2382 vocabulary for information Semantic


Authority
Responsibility
technology defines interoperability as “the capability to Pragmatism Syntax Culture
communicate, execute programs, or transfer data among
various functional units in a manner that requires the
user to have little or no knowledge of the unique
Business
characteristics of those units”. In a more broad sense, Barrier
IEEE defines interoperability as “the ability of two or
Service
more systems or components to exchange information
and to use the information that has been exchanged” Process
(IEEE, 1990).
Data Policies
ISO14258 (2005) establishes several ways to achieve
Capability
interoperability: integrated (common format for all Concern Organisational
models), unified (common format at meta level) and Technological
federated (participants negotiating an ontology as they Conceptual
go to achieve a shared understanding of models). In the
case of DMOs, full integration appeared to have Legend: = critical = primary = secondary = tertiary

achieved the desired results, mainly due to the Fig. 2. INTEROP NoE Interoperability Framework
organisational heterogeneity of DMOs and the lack of (Chen, 2005) enriched with concepts from ISO14258,
understanding in relation to the degree and aspects of EIF, IDEAS, ATHENA AIF, (Panetto, 2007) and
cooperation required. (Noran & Bernus, 2011).
The unified approach requires the ontology to be As it is illustrated, the data and process aspects on the
negotiated in advance. Unfortunately, notwithstanding ATHENA-inspired ‘concern’ axis have been ranked as
significant advances in ontology integration (Farquhar most stringent in DMO collaboration. This is because
et al., 1995; Pinto, Prez, & Martins, 1999), currently the typically, the ability to extract and exchange data from
only sustainable solution to semantic disaster heterogeneous sources, delivering a large amount of often
management interoperability appears to be DMOs quite ‘noisy’ data during disaster events, is paramount to
‘spending time together’ to agree on the meanings being aware of the conditions on the ground and avoiding
associated with the concepts used to exchange potentially life-threatening situations for emergency
knowledge. This ‘co-habitation’ needs to be recurrent crews and population. Although prior agreements on data

Page 250 of 478


ICIST 2014 - Vol. 1 Regular papers

format and especially on its meaning are very beneficial infrastructure using low-power devices, made possible by
in this case, often this may not be possible. This is one technological advances – such as Wireless Sensor
of the areas where the new interoperability paradigm Networks (WSN).
proposed may help, as further described.
IoT technology can be used to enhance disaster
Organisational interoperability is an essential aspect in management prevention, preparation, and response by
disaster management, as task force participants typically WSN (Aziz & Aziz, 2011; da Silva, Del Duca Almeida,
exhibit significant structure diversity. The issues Poersch, & Nogueira, 2010) e.g. for bushfires (Angeles
identified by Chen (2006) based on the EIF (2004), Serna, Bermudez, & Casado, 2013) or for providing
namely responsibility, authority and type of emergency care in large disasters (Gao et al., 2008) using
organisation can all impact heavily on the functionality respondent assignment based on location information
of a disaster management task force. Although in a (Rastegari, Rahmani, & Setayeshi, 2011). Note that, while
crisis situation it would be beneficial to establish and typically the physical location of people or objects is a
agree upon the roles and hierarchy of all participants deciding factor for promptly taking the right decision, it
(who is in charge of and does what, etc.), as previously often needs to be interpreted in the context of other
shown, in the disaster response phase some task force information – such as environmental factors (temperature,
members and/or coordination may fail; therefore, the air composition etc.). The synthesis of data acquired from
remaining participants must be able to dynamically the potentially large number of sensors is useful in
reorganize (and if necessary, renegotiate) in order to disaster prevention by facilitating large scale field studies
continue responding to the emergency in the most for example to track the spread of diseases (Hanjagi,
efficient way. This agility may be facilitated by the new Srihari, & Rayamane, 2007).
interoperability paradigm proposed.
V. INTEROPERABILITY IN THE INTERNET OF
Cultural interoperability (described e.g. by Whitman
THINGS
and Panetto (2006)) appears to be one of the hardest
obstacles to overcome. Previous research (Noran & One of the greatest challenges for the IoT is making the
Bernus, 2011; Noran & Panetto, 2013) has observed that increasingly large number of heterogeneous connected
within an integrated or unified interoperability devices exchange the relevant information so they can
approach, the only solution appears to be the regular interoperate. In this context, ‘interoperability’ means that
immersion of the participant organisations in each systems must negotiate ‘on the fly’ in order to
other’s cultures, which facilitates the transfer and interoperate, with no pre-determined assets or agreements
conversion of tacit and explicit knowledge between the for interoperation. This poses an essential problem with
participants. However, in the new ubiquitous computing the current definitions of interoperability (see Section III),
and connectivity context, the federated approach holds who assume ‘coexistence awareness’ of the interoperating
the promise of a possibly more efficient solution to systems and agreement of the involved actors in regard to
promote interoperability and collaboration, leading to an their behaviours for a given interaction, typically derived
agile and adaptive disaster management approach. from a (mandated) motivation to interoperate.
Unfortunately however, such assumptions cannot hold
IV. THE INTERNET OF THINGS AND
true in the ad-hoc communication and interoperation
APPLICATIONS
required by the vast variety of systems involved in future
Ubiquitous computing aims to provide a seamless ubiquitous computing. The current collaboration culture,
interaction of the humans with information and services who assumes sharing and a social context, may in fact
by embedding specific artefacts into their environment become a barrier to interoperability because it implies
as unobtrusively as possible (Estrin, Culler, Pister, & previous agreements between the interoperating systems.
Sukhatme, 2002). An important aspect is that the
ubiquitous computing artefacts (devices) that interact A. INTEROPERABILITY AS A PROPERTY (IAAP)
with humans and among themselves must be context-
Current use cases for the future IoT are typically based on
aware. Advances in Internet technology enabling
pre-agreements of the various devices to exchange
pervasive internet connectivity facilitated the emergence
information and to act upon this information. However, as
of the so-called Internet-of-Things (IoT) paradigm,
the number of connected devices and their technological
where all entities of interest (including humans) could
diversity grows, it would become more and more difficult
be equipped with identifiers based e.g. on Radio
to work on reaching these pre-agreements. Removing
Frequency Identification (RFID), barcodes, near field
these agreements will effectively reduce interoperability
communication, digital watermarking (Magrassi &
to a semantic issue.
Berg, 2002), etc. This would help tackling significant
and urgent challenges faced by the human society such It is highly likely that the ’things’ belonging to the future
as improving quality of life of an ageing population and IoT will be required to receive ad-hoc signals and
reducing greenhouse emissions by minimising waste. requests from other devices, interpret their meaning and
The most dynamic and promising area of IoT is the act accordingly. That can be dealt with from an
ubiquitous wireless connection to an Internet-like anthropomorphic perspective, where the systems can

Page 251 of 478


ICIST 2014 - Vol. 1 Regular papers

sense, observe, perceive and if necessary, act. Thus, Firstly, the thing must display awareness, more
interoperability will in fact become the property of a specifically self-awareness and environmental awareness.
single system. Self-awareness is related to the capability of the thing to
sense a phenomenon or an event within itself. For
Let us consider an IoT scenario where an emergency example, WSN nodes need to be aware of the available
response crew with an embedded GPS sensor is energy levels. Environmental awareness is related to the
deployed on the ground, moving between response areas capability of the thing to sense a phenomenon or an event
in the conditions of a chain of catastrophic events - e.g. from its environment, extended by the capability to
an earthquake triggering a toxic / radioactive spill (see receive a message from its environment. It is important to
Fig. 3). This sensor (N1) is capable to sense and highlight that currently, the awareness of nodes is
perceive any message received from its environment. functional in its nature and thus, restricted; namely, the
sensor is aware only of the environmental features
matching its pre-determined interest. A similar point can
be made related to the capability of the thing to receive a
message of a known format. Hence, we can also further
distinguish between functional and universal
environmental awareness.
A second important property would be perceptivity, i.e.
the capability to assign a meaning to an observation. Note
that observations can occur within the thing itself or in its
environment; observations are also typically multi-modal
(e.g. temperature, light, sound, etc.) and possibly multi-
dimensional (e.g. they may be time and location
dependent). While awareness and self-awareness have
been already achieved by WSN nodes, it has been only in
the restricted, functional scope; perceptivity goes one step
further by facilitating universal awareness. Perceptivity
Fig. 3. IoT-enabled emergency response scenario enables things to observe based on arbitrary stimuli and
interpret these observations, transforming the physical
In the environment of N1, there are other sensors (e.g. observations into a meaningful percept. Based on this
low power wireless sensor nodes within a network), perception, the thing should be able to decide on the
observing the environment and continuously consequent action.
transmitting the observed data. For example,
temperature sensor N2 is continuously sending message The decision to act based on a perception should be the
AN2, with air composition or radioactive level. This result of a cognitive process, consisting of identification,
message is sensed and observed (ON1N2) by N1. In the analysis and synthesis of the possible actions to perform
meantime, the GPS sensor is continuously collecting its in response to the understood observation (i.e. the
own observations (ON1N1). Perception of the crew percept). Therefore, interoperability as a property must
position, in the context of the air composition of the possess a third feature, i.e. intelligence - encompassing
environment can lead to recognising a life-threatening assertion, storing and acquisition of the behaviour
situation for the crew. In this case N1 is creating a patterns, based on the post-agreements in regards to the
percept P1, based on two observations, namely ON1N2 purposefulness of the performed actions.
and ON1N1. Based on this perception, N1 is capable to
Another required attribute of an artefact featuring IaaP
make a decision D1, e.g. to send SMS to a command and
would be extroversion, related to the willingness and
control centre and/or other crews. Hence, N1 articulates
capability of the artefact to articulate its actions. This
and sends out a message AN1, with request to send SMS
would demonstrate its concern about the physical and
with designated content and recipient. Finally, there is a
social environment. An associated capability would be
device N3 (e.g. embedded in the crew in question,
‘curiosity’, i.e. articulating the request for additional
ground-based station or another crew) with SMS
information needed to perform a complete reasoning
sending capability, which observes this message and
during the perception and decision processes.
acts further upon it.

B. ENABLING FACTORS FOR IAAP VI. IAAP IN DISASTER MANAGEMENT


Using the typical scenario in Fig. 3, we can set the basic The impact of the IoT and IaaP paradigms impact on
requirements for the autonomous, intelligent, purposeful disaster management must be assessed in an integrated
and social behaviour of a ‘thing’ in an interoperable manner, i.e. taking into account the interoperability
environment (e.g. such as a Wireless Sensor Network extent, approach and aspects identified in Section III.B in
(WSN) engaged in a disaster management Early the context of the enabling IaaP attributes described in
Warning Network (FAO, 2003). Section V.B.

Page 252 of 478


ICIST 2014 - Vol. 1 Regular papers

In regards to the interoperability extent, the IoT and The social effect of an extrovert DMO, materialised by
IaaP concepts would assist DMOs in becoming agile, transparency towards other DMOs, relevant organisations
thus be able to interoperate to a larger degree without (e.g. community, non-governmental and faith groups,
having to become integrated within a specific negotiated etc.) and general public would bring significant benefits.
framework or system of systems (see Fig.1). Preserving In large scale catastrophic events, trust and
organisation independence and resilience would prove communication are paramount in an effective response
crucial in emergency situations where task force and minimising negative effects ('t Hart et al., 2005;
partners may fail, with the rest of the team having to Waugh, 1993; Wray, Rivers, Whitworth, & Jupka, 2006).
promptly reorganise / find replacements in order to The above-mentioned ‘curiosity’ would manifest through
recover missing functionality (see Section III). This internal and public requests for information pertaining to
would require prompt, ad-hoc interoperation in areas not prevention, preparation and especially disaster response.
previously negotiated, which, as shown in Section VI, Often, in a disaster situation, the population self-organises
can be facilitated by acquiring IaaP. Here, the DMOs in novel and efficient ways; DMOs must tap into this
will be in fact applying a federated interoperability resource and use it to optimize their operations; for this to
approach, made feasible in the context of IoT and IaaP. happen however, in addition to gaining community trust
(which cannot be rushed), DMOs must also be able to
As described in Sections I and II, DMOs are highly interoperate at short notice and without previous
heterogeneous and hierarchical, posing a variety of preparation and negotiation – in effect displaying IaaP.
internal and external interoperability barriers as
described by Noran and Panetto (2013). Thus, true and The IaaP paradigm and its enabling factors would also
efficient collaboration is not possible unless the benefit the technical aspect of IoT by resolving some of
organisational cultures, processes and resources of the the issues specific to its application in disaster
participants possess the required interoperability management. Thus, for example, although the concept of
preparedness (Kapucu et al., 2010). Universal IaaP potentially implies more traffic between the artefacts
environmental awareness would greatly enhance the composing the IoT, such an increase would be
DMO’s preparedness for cooperation, both inside and compensated by the intelligent processing capability. For
outside its own boundaries. Thus, on internal level example, in multi-hop WSNs, perceiving raw sensor data,
collaboration between various departments would be interpreting it and transmitting the resulting meaningful
dramatically improved if all staff understood the way percept (or acting upon it) as opposed to simply passing
the organisation they belong to works at all levels; this this raw data can significantly reduce the volume of data
understanding should be supported by an enterprise- that needs to be communicated from the sensor nodes to
wide repository representing data and processes across the gateways or processing components. Therefore,
the organisation. In addition, human resource strategies allocating a processing capability to IoT artefacts may in
such as staff rotation (whereby roles are periodically fact reduce the number of components and thus the traffic.
changed) would enable staff to gather a large variety of
skills and sensitivity to all aspects of the organisation. VII. CONCLUSIONS AND FURTHER WORK
Thus, the lack of interoperability of the current human,
machine and hybrid systems (some of which do not The current disaster management approaches seem to fall
presently satisfy even the compatibility requirement) short in the context of ever increasing occurrence and
would be replaced by ubiquitous awareness and data amplitude of disaster events. Previous research has
sharing. On the external level, by displaying universal identified DMO cooperation as a major culprit in these
awareness the DMO would be able to seamlessly shortcomings and has proposed an interoperability-centric
exchange information with other DMOs and relevant approach to improve the situation. In this paper, we aimed
organisations and monitor heterogeneous disaster to take that research further by incorporating the progress
response crews’ progress in real time, irrespective of in ubiquitous computing that enables a new, ‘IoT’
location and taking into account ambient factors. paradigm who promises to challenge but also
revolutionise the concept of interoperability as we know
All DMOs feature some kind of knowledge it. Thus, in the new IoT context, pre-negotiated protocols
management and/or business intelligence capability; and formats as a sine-qua-non basis for interoperability
however, typically they only cover the upper and are no longer feasible. As a possible solution, we have
possibly middle management levels. In the IaaP proposed a new paradigm describing interoperability as a
scenario, the knowledge management mechanism would property of every system aspiring to efficiently
evolve into an enterprise-wide expert system, extending interoperate (and thus survive) in the future IoT. Next, we
from top management to the real-time response units, have defined the enabling factors in the evolution of
covering all relevant aspects, as shown in Fig. 2 and interoperability from a typical set of agreements shared
enabled by a pervasive ubiquitous computing between several interoperating parties to a property
framework integrating intelligent sensors and owned by a single system. Finally, we have investigated
controllers. In effect the DMO would now become a the applicability of the concept to a larger scale and the
learning organisation that constantly adjusts, learns and changes that ‘interoperability as a property’ may bring to
improves its response to external challenges in an agile the current disaster management scenario.
manner.

Page 253 of 478


ICIST 2014 - Vol. 1 Regular papers

Future work will aim to test and refine the IaaP concept nii/docs/DoDAF_v1_V_I.pdf,
and its enabling factors application to disaster http://www.dod.mil/cio-nii/docs/DoDAF_v1_V_II.pdf
management in the context of the required Donohoe, M. (2003). Causes and health consequences of
interoperability extent, approach and aspects. Thus, the environmental degradation and social injustice. Social
relevance and impact of IaaP on human-specific aspects, Science and Medicine, 56(3), 573-587.
such as cultural interoperability and trust, has to be EIF. (2004). European interoperability framework for
further clarified. Importantly, the life cycle of the pan-European eGovernment services. Luxembourg:
DMOs and other relevant participants must also be Interoperable Delivery of European eGovernment
incorporated so as to construct a more complete image Services to public Administrations, Businesses and
of the IoT and IaaP effects towards achieving an agile Citizens (IDABC). (Vol. 30).
and adaptive disaster management approach. Ellis, S., Kanowski, P., & Whelan, R. (2004). National
inquiry into bushfire mitigation and management
REFERENCES
(Vol. 2011). Canberra: Commonwealth of Australia.
Estrin, D., Culler, D., Pister, K., & Sukhatme, G. (2002).
't Hart, Paul, Boin, Arjen, Stern, Eric, & Sundelius,
Connecting the Physical World with Pervasive
Bengt. (2005). The Politics of Crisis Management:
Networks. IEEE Pervasive Computing, 1(1), 59-69.
Public Leadership under Pressure. Cambridge UK:
FAO, Food and Agricultural Organization of the United
Cambridge University Press.
Nations - Headquarters. (2003). Integrating Early
Angeles Serna, M., Bermudez, A., & Casado, R. (2013).
Warning into Disaster Risk Reduction Policies.
Circle-based approximation to forest fires with
www.fao.org/giews/english/otherpub/ewdrd.pdf
distributed wireless sensor networks. Wireless
Farquhar, A., Fikes, R., Pratt, W. , Rice, J. (1995), ,,
Communications and Networking Conference
Technical Report KSL-95-63, Knowledge, & Systems
(WCNC), 4329 - 4334.
Laboratory, Stanford University. (1995).
ATHENA (2004). State of the art of Enterprise
Collaborative Ontology Construction for Information
Modelling Techniques and Technologies to Support
Integration - Technical Report KSL-95-63. Stanford
Enterprise Interoperability. Deliv D.A1.1.1.
University: Knowledge Systems Laboratory.
Retrieved 2011, from http://www.athena-ip.org
Federal Emergency Management Agency. (2011).
Australian Government. (2011). Attorney's General's
National Response Framework. Retrieved 2011,
Office - Emergency Management in Australia.
from www.fema.gov/pdf/emergency/nrf/about_nrf.pdf
Retrieved 2011, from http://www.ema.gov.au/
Fielding, Jonathan E. (1999). Public Health in the
Australian Psychological Society. (2013). Disaster
Twentieth Century: Advances and Challenges. Annual
Response Network (DRN). Retrieved April, 2013,
Reviews in Public Health, 20, xiii-xxx.
from http://www.psychology.org.au/medicare/drn/
Gao, T., Pesto, C., Selavo, L., Chen, Y.Ko, J., Lim, J.H.,
Aziz, N.A.A., & Aziz, K.A. (2011). Managing disaster
Terzis, A., Welsh, M. (2008). Wireless medical sensor
with wireless sensor networks Paper presented at the
networks in emergency response: Implementation and
13th International Conference on Advanced
pilot results. Proceedings of IEEE Conference on
Communication Technology (ICACT).
Technologies for Homeland Security, 187-192.
Brewin, Bob. (2011). Tsunami response reveals poor
Government of South Australia. (2004). Emergency
radio interoperability. Retrieved 2012, from
Management Act 2004. Retrieved 2011, from
www.nextgov.com/nextgov/ng_20110415_3972.php
http://www.legislation.sa.gov.au/LZ/C/A/EMERGEN
Chen, D. (2005). Practices, principles and patterns for
CY MANAGEMENT ACT 2004.aspx
interoperability. INTEROP-NOE, Interoperability
Guédria, W., Chen, D., & Naudet, Y. (2009). A Maturity
Research for Networked Enterprises Apps Software
Model for Enterprise Interoperability. In R.
Network of Excellence, n° IST 508-011 (Vol. 2011).
Meersman, P. Herrero & T. Dillon (Eds.), Lecture
Chen, D. (2006). Framework for Entrerprise
Notes in Computer Science (pp. 216-225).
Interoperability. 2011 http://www.fines-cluster.eu/
Hanjagi, A., Srihari, P., & Rayamane, A.S. (2007). A
fines/jm/Download-document/53-Framework-for-
public healthcare information system using GIS and
Enterprise-Interoperability-Chen.html
GPS: A case study of Shiggaon. GIS for Health and
Clark, J. L. (2006). Practical aspects of federalizing
the Environment: Lecture Notes in Geoinformation
disaster response. Critical Care, 10, 107-113.
and Cartography, 2007, 243-255.
Crawford, Susan. (2012). Why Cell Phones Went Dead
Heart, F., McKenzie, A., McQuillian, J., & Walden, D.
After Hurricane Sandy. Retrieved 2013, from
(1978). ARPANET Completion Report. In Bolt,
http://www.bloomberg.com/news/2012-11-15/why-
Beranek & Newman (Eds.). Burlington, MA.
cell-phones-went-dead-after-hurricane-sandy.html
IDEAS. (2003). IDEAS Project Deliverables (WP1-
da Silva, R.I., Del Duca Almeida, V., Poersch, A.M., &
WP7), Public reports. 2011, www.ideas-roadmap.net
Nogueira, J.M.S. (2010). Wireless sensor network
IEEE. (1990). IEEE Standard Computer Dictionary: A
for disaster management Network Operations and
Compilation of IEEE Standard Computer Glossaries:
Management Symposium (NOMS), 810-873.
Institute of Electrical and Electronics Engineers,.
DoD Architecture Framework Working Group. (2004).
Igarashi, Y., Kong, L., Yamamoto, L., & McCreery, C.S.
DoD Architecture Framework Ver 1.0. Retrieved
(2011). Anatomy of Historical Tsunamis: Lessons
Feb 2007, 2007,fromhttp://www.dod.mil/cio-

Page 254 of 478


ICIST 2014 - Vol. 1 Regular papers

Learned for Tsunami Warning. Pure Applied spider.org/book/5143/4c-challenge-communication-


Geophysics(168), 2043-2063. coordination-cooperation-capacity-development
ISO. (2005). ISO14258 Industrial Automation Systems – U.S. Dept of Health and Human Services. (2005). HHS
Concepts and Rules for Enterprise Models. Pandemic Influenza Plan. Retrieved May, 2013, from
Kapucu, N., Arslan, T., & Demiroz, F. (2010). http://www.flu.gov/planning-
Collaborative emergency management and national preparedness/federal/hhspandemicinfluenzaplan.pdf
emergency management network. Disaster United Nations International Strategy for Disaster
Prevention and Management, 19(4), 452-468. Reduction Secretariat (UNISDR). (2011). Hyogo
Kilbourne, Edwin D. (2006). Influenza Pandemics of Framework for Action 2005-2015: Building the
the 20th Century. Emerg Inf Dis, 12(1). resilience of nations and communities to disasters.
Kristensen, Margit, Kyng, Morten, & Palen, Leysia. Retrieved 2011, http://www.preventionweb.net/files/
(2006). Participatory Design in Emergency Medical 1037_hyogoframeworkforactionenglish.pdf
Service: Designing for Future Practice. Paper Utah Department of Health. (2007). Governor’s Task
presented at the Conference on Human Factors in Force for Pandemic Influenza Preparedness - Final
Computing Systems - CHI 2006, Montréal, Québec report to Governor. Retrieved 2013, http://pandemic
Magrassi, P., & Berg, T. (2002). A World of Smart flu.utah.gov/docs/PandInfluTaskforceFinalReport.pdf
Objects: Role of Auto-Identification Technologies: Victorian Bushfires Royal Commission. (2009).
Gartner Group. Submissions to the Enquiry. Retrieved 2011, from
Moghadas, S.M., Pizzi, N.J., Wu, J., & Yan, P. (2008). http://www.royalcommission.vic.gov.au/Submissions/
Managing public health crises: models in pandemic View-Submissions
preparedness. Influenza Othr Respi Vir, 3(2), 75-79. von Lubitz, Dag K.J.E. , Beakley, James E., & Patricelli,
Moore, Linda K. (2010). The Emergency Alert System F. (2008). Disaster Management: The Structure,
(EAS) and All-Hazard Warnings. Retrieved 2013, Function, and Significance of Network-Centric
http://www.fas.org/sgp/crs/homesec/RL32527.pdf Operations. Journal of Homeland Security and
Noran, O. (2013). Enhancing Collaborative Healthcare Emergency Management, 5(1), Art 42.
Synergy. IFIP Advances in Information and Waugh, William L. (1993). Coordination or Control:
Communication Technology, 408, 459-467. Organizational Design and the Emergency
Noran, O., & Bernus, P. (2011). Effective Disaster Management Function. International Journal of
Management: An Interoperability Perspective. Disaster Prevention and Management, 2(4), 17-31.
Lecture Notes in Computer Science, 7046, 112-121. Waugh, William L., & Streib, Gregory. (2006).
Noran, O., & Panetto, H. (2013). Modelling Sustainable Collaboration and Leadership for Effective
Cooperative Healthcare: An Interoperability-Driven Emergency Management. Public Administration
Approach. LNCS, 8186 238-249. Review, 66(s1), 131-140.
OASIS. (2005). Common Alerting Protocol v1.1. 2012, Whitman, L., & Panetto, H. (2006). The Missing Link:
www.oasisopen.org/committees/download.php/1513 Culture and Language Barriers to Interoperability.
5/emergency-CAPv1.1-Corrected_DOM.pdf Annual Reviews in Control, 30(2), 233-241.
Panetto, H. (2007). Towards a classification framework Wickramasinghe, N, & von Lubitz, Dag K.J.E. . (2007).
for interoperability of enterprise applications. Int. J. Knowledge Based Enterprises: Theories and
of Comp. Integrated Manufacturing, 20, 727-740. Fundamentals (Vol. 5). Hershey PA: IGP
Pinto, H., Prez, A., & Martins, J. (1999). Some Issues on Wiese, C. R. (2006). Organizing Homeland Security after
Ontology Integration (Proceedings of IJCAI-99 Katrina: is adaptive management what’s missing?
workshop on Ontologies and Problem-Solving Public Admin. Review, 66(3), 302-318.
Methods (KRR5)). Stockholm, Sweden. Wilson, K., Coulon, L., Hillege, S., & Swann, W. (2005).
Prizzia, Ross, & Helfand, Gary. (2001). Emergency Nurse Practitioners' Experience of Working
preparedness and disaster management in Hawaii. Colaboratively with General Practitioners and Allied
Disaster Prevention and Mgmt, 10(3), 163-172. Health Professionals in NSW, Australia. Australian
Queensland Floods Commission of Enquiry. (2011). Journal of Advanced Nursing, 23(2), 22-27.
Submissions to the Enquiry. Retrieved 2011, from World Health Organisation. (2011). Pandemic Influenza
http://www.floodcommission.qld.gov.au/submission preparedness Framework. 2013 http://whqlibdoc.
Rastegari, E., Rahmani, A., & Setayeshi, S. (2011). who.int/publications/2011/9789241503082_eng.pdf
Pervasive computing in healthcare systems. World World Health Organization. (1998). The world health
Academy Science, Eng & Tech, 2011(59), 187-192. report --life in 21st century: a vision for all. 2013,
Tierney, K., & Quarantelli, E. L. (1989). Needed http://www.who.int/whr/1998/en/whr98_en.pdf
Innovation in the Delivery of Emergency Medical Wray, R., Rivers, J., Whitworth, A., & Jupka, K. (2006).
Services in Disasters: Present and Future. Disaster Public Perceptions About Trust in Emergency Risk
Management, 2(2), 70-76. Communication: Qualitative Research Findings. Int J
Trakas, Athina. (2012). Interoperability - A key of Mass Emergencies and Disasters, 24(1), 45-75.
requirement for emergency and disaster
management. 2012, from http://www.un-

Page 255 of 478


ICIST 2014 - Vol. 1 Regular papers

THE USE OF ONTOLOGIES IN CADASTRAL SYSTEMS

Dubravka Sladić, Aleksandra Radulović, Miro Govedarica, Dušan Jovanović, Dejan Rašić
[email protected], [email protected], [email protected], [email protected], [email protected]
Faculty of Technical Sciences, University of Novi Sad

Abstract – This paper presents the application of knowledge of the real world allowing users to define a set
ontologies in the field of real estate cadastre. Ontologies of concepts, relationships between concepts, and rules of
can be seen as a form of metadata that provide a higher inference on a particular domain. Ontologies on the
level of interoperability and integration within the Spatial Semantic Web are represented using Web Ontology
Data Infrastructure, not only on the syntax level but on Language (OWL) [8] a W3C standard built on top of
the semantic level as well. The application of ontologies RDF (Resource Description Frame) [9]. The technologies
in this domain is shown on the example of data of Semantic Web are used because the goal of SDI is to
integration of the Serbian national cadastre and the facilitate spatial data dissemination via World Wide Web,
INSPIRE cadastral parcels, based on the Land and for that goal the service-oriented architecture is used.
Administration Domain Model defined in ISO 19152 Geospatial web services are mostly based on OpenGIS
standard. consortium implementation specifications of service
interfaces for geospatial data access and processing.
Keywords – ontologies, cadastre, LADM, SDI
Ontologies are used to describe a certain domain and to
1. INTRODUCTION reason about the properties of that domain by inferring the
new knowledge from the asserted facts. Its role is to
In the modern Spatial Data Infrastructures (SDI) [1] the provide a shared vocabulary within a certain domain such
key issue is finding appropriate data and services and as the land administration. The Land Administration
their integration into single usable information. For this Domain Model (LADM) specified in ISO 19152
purpose, catalogue services are used which store and international standard [10] provides a base for building
serve metadata about geospatial resources, with different ontologies in land administration domain to enable
catalogue information models, i.e. metadata formats [2]. involved parties, both within one country and between
Problems appear because of interoperability issues, where different countries, to communicate, based on the shared
the usability of information created in one context is often vocabulary (ontology) implied by the model. Therefore
of limited use in another context, due to insufficient the core ontology for the cadastre should be developed
means for meaningful interpretation [3]. This problem is according to this domain model in order to achieve the
known as semantic heterogeneity. The standards in the intended goal of the standard, while domain ontologies of
field of GIS increase interoperability at syntactic and different cadastral systems should be based on this core
structural level, since they standardize data structure and ontology.
service interfaces, but these standards do not solve
semantic problems. Searching for information is often This paper is organized as follows: next Section presents
affected to low recall and precision [4]. Law recall means related works in the field of research. Then, the ontology
that some relevant information sources may not be architecture in the domain of land administration has been
discovered, while low precision means that some of the presented in Section 3. The purpose of this Section is to
discovered information may not be relevant. Semantic describe how domain ontology for the national cadastre
heterogeneity is caused by different conceptualizations of can be developed, shown on the example of Serbian real
real world facts and can be divided into cognitive estate cadastre using LADM as the basis for the
heterogeneities in which the same names are given for the development of the core ontology for cadastre. Section 4
different real world objects (homonyms) and naming presents the case study of semantically enhanced
heterogeneities in which different names are given to the discovery process based on proposed ontology. After that
same real world objects (synonyms) [5]. conclusions are discussed.

The problem of semantic heterogeneities may be solved 2. RELATED WORK


using technologies of the Semantic Web [6]. The
Semantic Web is an extension of the World Wide Web in The importance of ontologies for solving semantic
which information is given well-defined explicit meaning problems during discovery, retrieval and integration of
through the use of ontologies which are used to geospatial data and services has been widely recognized
communicate a shared and common understanding in geospatial community. The concept of Geospatial
(between people and computers) of some domain of Semantic Web has been introduced in [11]. The
discourse, because they represent an explicit formal importance of Geospatial Semantic Web is also
specification of a shared conceptualization of the domain recognized by the OpenGIS Consortium (OGC) [12]
[7]. Ontologies provide semantic representations about where there are several OGC initiatives considering

Page 256 of 478


ICIST 2014 - Vol. 1 Regular papers

development of the Geospatial Semantic Web. In [13] the the application and are mapped to domain ontology. The
attempts to extend existing OGC services, encodings, and reason for such a proposal is to achieve the appropriate
architectures with Semantic Web technologies in order to level of granularity at each layer, so that concepts are not
achieve semantic interoperability are presented. In [14] too specific or too generic for use in real applications.
semantic annotations at three different levels are This allows better acceptance of the proposed ontology by
discussed: geospatial service metadata, data models and users and also facilitates the maintenance of ontology.
process descriptions, and actual data instances in the
database. In [15] a proposal for OGC catalogue service The authors used an open source ontology editor Protégé
based on catalogue information model of RDF, RDF [23] for the development of cadastral ontology. Protégé
Schema and OWL elements is described. allows the specification of ontologies in the OWL and
OWL 2 language. It also allows automated reasoning
Some researches in this area are focused on the systems using inference engine. The proposed ontology
development process of ontologies itself, proposing for the real estate cadastre has been implemented using
different ontology architectures for the geospatial OWL.
ontologies [16] or domain ontologies for different fields
of application such as environment, land cover and Descriptive Ontology for Linguistic and Cognitive
topography, observations and measurements and land Engineering (DOLCE) [24] has been used as an upper
administration [17, 18]. Other researchers are focused on level ontology since it best suits the application in a web
the application of ontologies in the discovery and environment for the information discovery and retrieval.
retrieval, composition and integration of geospatial Its advantages are also relatively small number of basic
resources [19]. concepts and implementation in the OWL language.
Legal ontology [25] developed on top of the Description
These research results provide a significant input for the and Situation ontology (DnS) – an extension of DOLCE,
application of ontologies in the field of real estate also provides a good basis for the development of the
cadastre. They are either general for geospatial domain or ontology for the real estate cadastre since the land
are focused on specific domains such as environement, administration represent lawful relationship between
but there are few results in the field of land people and land.
administration, especially considering the use of current
standards. In this paper the authors present a model and The second layer of the ontology for the real estate
implementation of ontologies in cadastral systems based cadastre is the ontology that describes concepts in the
on LADM using case study for Serbian cadastre, whose geospatial domain, such as feature, geometry, topology,
main application is the integration of existing spatial data etc. This ontology is called Geospatial Feature Ontology
and services through an automated discovery and and it is based on ISO 19100 series of standards [26].
integration process of cadastral resources. A recent These standards define the basic structure and semantics
research in [20] also uses LADM to build ontologies in of geospatial data and services to enable interoperability
this domain, but it is mostly focused on representing roles between different GIS systems. The basic concept of ISO
in land administration and is not based on upper ontology. 19100 series is a feature representing an abstraction of a
real world phenomenon which has spatial and non-spatial
3. ONTOLOGY MODEL characteristics. The spatial characteristics are geometry
and topology of objects in some coordinate system related
Based on the degree of generality, ontologies can be to the Earth, while the non-spatial characteristics can be
divided into three levels [21]: top-level ontologies, thematic or temporal.
domain ontologies and application ontologies. According
to this classification the authors developed the ontology The third layer of the ontology architecture is the Core
model that consists of four layers as described in [22]. Ontology for Cadastre. It contains ontology of basic
This paper gives the overview of the model with more concepts related to the real estate cadastre used in
practical examples. The first layer is an upper level different countries and is based on ISO 19152
ontology that is used to connect ontologies from different international standard. The focus of this standard is on
domains. The next layer is the ontology that describes that part of land administration that is interested in rights,
concepts in the geospatial domain, such as feature, responsibilities and restrictions affecting land, and the
geometry, topology, etc. The third layer contains ontology geometrical (spatial) components. Central part of the
of basic concepts related to the real estate cadastre used in LADM model are four classes: LA_Party representing the
different countries and is based on ISO 19152 property owner or the person that is given certain rights
international standard. The final layer is the ontology that over real estate; LA_RRR representing rights, restrictions
describes concepts related to the cadastre in a specific and responsibilities; LA_BAUnit containing
country. administrative data on spatial units with equal rights,
restrictions and responsibilities; LA_SpatialUnit
The proposed ontology is a knowledge model in the field representing territorial units, parcels, buildings, etc.
of real estate cadastre for a specific country. It specifies Concepts in the core ontology for cadastre follow the
the concepts that should be referenced by the concepts meaning of these classes defined in the standard.
from the application ontology. Application ontologies are
used in different applications to describe the specificity of

Page 257 of 478


ICIST 2014 - Vol. 1 Regular papers

The concept Party represents a party i.e. a person or


organization that plays a role in a rights transaction. The Figure 1 shows an example from all layers of ontology
concept RRR represents legal aspects over real estates. Its containing concepts related to the roles in legal affairs
subclasses are rights, restriction and responsibilities. related to land administration and to legal aspects of land
Rights are formal or informal entitlement to own or to do administration. RRR is a DnS:Description that defines
something. Restrictions are entitlement to refrain from concepts: Party and Basic Administrative Unit. In this
doing something. Responsibility is an obligation to do way the relationship between people and land linked by
something. The concept Basic Administrative Unit (ownership) rights is expressed as it is established in land
represents an administrative entity consisting of zero or registry or cadastre. Party is a Role played by the
more spatial units against which unique and homogeneous Agentive Figure which can be a person (the concept
rights, responsibilities and restrictions are associated to Natural Person) or organization (Organization). The role
the whole entity. The concept Spatial Unit represents an of the party can also play a basic administrative unit.
area of land or a volume of space structured in a way to Basic Administrative Unit is a role played by Spatial Unit
support the creation and management of basic (parcel or building). The holder of the right (Nosilac
administrative units. These four concepts are subsumed Prava) is a natural or a legal person that has acquired a
by the concepts from DOLCE and DnS ontology. right over a real estate. Real estate includes land,
buildings and part of buildings. The concept Stvarno
The final layer of the ontology contains concepts present Pravo (real estate law) is subsumed by the concept
in the land administration system of the specific country LADM:Right, the concept Nosilac Prava (right holder) is
and it is the domain ontology for the national cadastre. In subsumed by LADM:Party, and the concept Nepokretnost
Serbia this layer contains concepts related to geodetic (real estate) is subsumed by LADM:Basic Administrative
reference, cadastral parcels, parts of cadastral parcels Unit. Real right defines holders of the rights and real
according to land use, buildings, network utilities, spatial estates the same way as right defines parties and basic
units, elevation model of terrain and topography [27]. administrative units in LADM. Real right includes
The real estate cadastre contains data about real estates ownership rights and rights of usage.
and the holders of rights on the real estate. The basic
concepts from the real estate cadastre are: Real Estate,
Right Holder and Real Right.
class Pra...

edns:played-by

Ogranicenj a Teret

Administrativ e::
Administrativ e::
Responsibility
Restriction

Role
Role Description Administrativ e::
edns:defines edns:defines
Administrativ e:: Basic
Party::Party
RRR Administrativ e
Parameter Unit
edns:defines
Administrativ e::
Share

Administrativ e::
Right
Obim Prav a

ends:defines
edns:defines edns:defines
Nosilac Prav a Stv arno Prav o Nepokretnost

Prav o Sv oj ine Prav o Koriscenj a

Figure 1. The basic cadastral concepts

Page 258 of 478


ICIST 2014 - Vol. 1 Regular papers

4. CASE STUDY cadastral objects, like buildings, which is beyond the


scope of INSPIRE. The basic concepts related to cadastral
The use of ontologies in cadastral systems can be shown parcels in INSPIRE are Cadastral Parcel, Basic Property
in the process of integration and harmonization of Serbian Unit, Cadastral Boundary and Cadastral Zoning. These
national cadastre with the cadastral model of the concepts are subsumed by the concepts: Spatial Unit,
INSPIRE directive [28] using domain ontology for Basic Administrative Unit, Boundary Face String and
cadastre based on LADM. INSPIRE directive defines data Spatial Unit Group, respectively.
specifications for various themes to facilitate cross-border
discovery and access of data in European countries, which Table 1 shows a comparative review of the names and
are primarily intended for users in the field of attributes of feature types that represent a land parcel
environment. Cadastral parcels are one of the datasets according to three different data models including
which are harmonized in INSPIRE and they serve as a national cadastre in Serbia, INSPIRE and LADM. These
generic information locator for environmental feature types represent outputs from WFS services. These
applications, such as discovery and retrieval of other feature types are comprised of different attributes, as well
spatial information [29]. INSPIRE data model for as similar attributes but with different names. Keyword-
cadastral parcels has been developed in parallel with based search is not able to determine the relationship
LADM model, which has resulted in concept consistency between these three outputs from Web Feature Service
and compatible definitions of common concepts. In that (WFS), a standard interface developed by OpenGIS
way the consistency of these models is provided. The Consortium for Web services that accesses geospatial data
difference arises from the different scopes and targeted in vector format [30]. But if these WFS services are
application areas. INSPIRE focuses on the application in semantically annotated, it is possible to perform semantic
the field of environmental protection, whereas LADM has search and determine the correct relationship among
a multi-purpose character such as providing support to them. The usage of semantic annotations with geospatial
legal certainty, the formation of taxes, planning, real Web services is described in [31].
estate valuation, etc. It also deals with LADM 3D

WFS output SERBIAN INSPIRE LADM


CADASTRE
Feature type name Parcel CadastralParcel SpatialUnit
Number number nationalCadastralReference x
Subnumber subnumber nationalCadastralReference x
Geometry gometry geometry ass. class
Area area areaValue area
Land use wayOfUse ass. class ass. class
Unique identifier x inspireID suID
Dimension x x dimension
Description description label label
Reference point x referencePoint referencePoint
External address potesOrAddress x extAddressID

Table 1. Anonymized search and read permissions for Anonymous

In order to harmonize data about cadastral parcels in


INSPIRE and national cadastre it is necessary to The result of subsumption reasoning [32] on application
semantically annotate feature types Parcel and ontologies shows that application ontologies
CadastralParcel. Semantic annotations of the feature type ParcelFeatureType and CadastralParcelFeatureType are
Parcel reference the ParcelFeatureType application both sub concepts of the concept SpatialUnit from the
ontology, whereas semantic annotations of the feature domain ontology. In this way the link between WFS
type CadastralParcel reference the services whose output are these feature types is
CadastralParcelFeatureType application ontology. These established during the semantic search.
two application ontologies semantically describe the
output from WFS services delivering data according to Another example how onotlogies and the semantic search
national cadastre and INSPIRE schema. can be useful in cadastral systems is creating party
portfolio defined in ISO 19152. This standard defines
Application ontologies ParcelFeatureType and interface classes whose purpose is to generate and manage
CadastralParcelFeatureType are subsumed by the products and services. Interface classes represent views
concepts from domain ontology for cadastre. on aggregated data from other classes, and do not contain
ParcelFeatureType is subsumed by the concept Parcel, data themselves. An example of such an interface class is
whereas a CadastralParcelFeatureType is subsumed by PartyPortfolio that contains overview of all rights,
SpatialUnit. Listing 1 shows the application ontology for restrictions and responsibilities, all basic administrative
the feature type Parcel that references concepts from units and all spatial units for one specific party. This
domain ontology for cadastre. concept is similar to the real estate deed in Serbian real

Page 259 of 478


ICIST 2014 - Vol. 1 Regular papers

estate cadastre that contains data about real estates and deed contains data concerning one specific real estate and
real rights on them for one specific holder of the rights. it is similar to the interface class containing the overview
Real estate deed contains all the data about real estates of all parties, rights, restrictions and responsibilities and
belonging to the same party. Other kind of real estate all basic administrative units for one specific spatial unit.

kn:ParcelaFeatureType
a owl:Class ;
rdfs:subClassOf owl:Thing , kn:Parcela ;
rdfs:subClassOf
[ a owl:Restriction ;
owl:allValuesFrom <http://www.owl-ontologies.com/OntologyISO19107.owl#GM_Object>;
owl:onProperty default:hasGeometry
] ;
rdfs:subClassOf
[ a owl:Restriction ;
owl:cardinality "1"^^xsd:int ;
owl:onProperty default:hasNumber
] ;
...
rdfs:subClassOf
[ a owl:Restriction ;
owl:onProperty default:hasGeometry ;
owl:someValuesFrom <http://www.owl-ontologies.com/OntologyISO19107.owl#GM_Object>
] .
Listing 1. Application ontology for the feature type Parcel

In order to create party portfolio it is necessary to perform integration and harmonization with other cadastral
a semantic search on all real estates, which may be in systems using standard based domain ontology for
different cadastral municipalities, and real rights on them cadastre on national and international level. Domain
for a specific person. Concerning land ownership, the ontology for cadastre is based on the upper level ontology
semantic search is done similar to the previous example. to avoid semantic ambiguities on the domain level and it
During the semantic search, subsumption reasoning is is coordinated with standards in the geospatial domain,
used to infer the hierarchy of concepts representing WFS since they precisely define concepts (terminology) in
outputs and to determine the relationship between them. geospatial domain and their relations with other concepts.
The core ontology for cadastre is based on international
While a subsumption reasoning is a kind of type standard ISO 19152 to provide reference for domain
reasoning, i.e. reasoning on description logic concepts (or ontologies of national cadastres and achieve semantic
OWL classes) that inheres a hierarchy of concepts, there interoperability between cadastral systems. This can
is also the instance reasoning using the query language improve discovery, retrieval and integration of data in
SPARQL [33] whose purpose is to retrieve individuals of cadastral information systems, and raise it from the
certain OWL classes. Using SPARQL, not only it is syntactic to the semantic level.
possible to discover appropriate WFS service containing
data but it is also possible to retrieve data itself. PREFIX db:
<http://biostorm.stanford.edu/db_table_classes
In order to use SPARQL to create party portfolio it is /[email protected]#>
PREFIX edns:<http://www.loa-
necessary to convert data from database relational model cnr.it/ontologies/ExtendedDnS.owl#>
into RDF graph model (e.g. using DataMaster plugin for PREFIX ladm: <http://www.owl-
Protégé). As a demo example a database from a software ontologies.com/LADM.owl#>
package Terrasoft for the area of cadastral municipality SELECT ?rrr ?baunit ?spatialunit
WHERE {
Agino Selo is used [34]. Listing 2 shows a SPARQL ?rrr rdf:type ladm:RRR.
query that retrieves data for the party portfolio, i.e. all ?baunit rdf:type ladm:BAUnit.
properties (parcels, buildings, parts of buildings...) on ?rrr edns:defines ?baunit.
which the party has certain rights (ownership, co- ?spatialunit rdf:type ladm:SpatialUnit.
?baunit edns:played-by ?spatialunit.
ownership, the right of use...). The convenience about this ?rrr edns:defines db: V_N_RS_OWNER_Instance.
kind of distributed query is that it collects data from }
different sources using general concepts to obtain Listing 2: SPARQL query
individuals of all their sub concepts (e.g. all individuals of
a Spatial Unit will be parcels, sub parcels, buildings, Semantic search and integration of cadastral data based on
networks...), so that only one query is enough instead of the developed ontology model has been tested using data
many. from Serbian real estate cadastre. Proposed ontology
model has been verified on several examples, including
4. CONCLUSION integration with international frameworks such as
INSPIRE, semantic search based on reasoning to find
This paper presents the ontology model for the real estate party portfolio, and distributed queries in SPARQL to
cadastre in Serbia. Ontologies are useful for data retrieve data. Future work will include the alignment with

Page 260 of 478


ICIST 2014 - Vol. 1 Regular papers

the emerging standard for geospatial ontologies – ISO [19] Andrei, M., Berre, A., Costa, L., Duchesne, P.,
19150. Fitzner, D., Grcar, M., Hoffmann, J., Klien, E.,
Langlois, J., Limyr, A., Maue, P., Schade, S.,
REFERENCES Steinmetz, N., Tertre, Vasiliu, L., Zaharia, R.,
Zastavni, N., SWING: An Integrated Environment for
[1]Nebert, D., 2004. Developing Spatial Data Geospatial SemanticWeb Services, The Semantic Web
Infrastructures: The SDI Cookbook. Online access: Research and Applications, 3: 767-771, 2008.
http://www.gsdi.org/docs2004/Cookbook/cookbookV2.0 [20] Soon, K., H., Representing Roles in Formalizing
.pdf, (28.01.2014). Domain Ontology for Land Administration, 5th Land
[2] Govedarica, M., Bošković (Sladić), D., Petrovački, D., Administration Domain Model Workshop, Kuala
Ninkov, T., Ristić, A., Metadata Catalogues in Spatial Lumpur, Malaysia, pp. 203- 222, 2013.
Information Systems, Geodetski list, 64 (87) 4, 313- [21] N. Guarino, “Formal ontology and information
334, 2010. systems”, Proceedings of the First International
[3] Bernard, L., Einspanier, U., Haubrock, S. Hubner, S., Conference on Formal Ontologies in Information
Kuhn, W., Lessing, R., Lutz, M., Visser, U., Ontologies Systems, FOIS'98, Trento, Italy, pp. 3–15, 1998.
for intelligent search and Semantic Translation in [22] Sladić, D., Govedarica, M., Pržulj, Đ, Radulović, A.,
Spatial Data Infrastructures, Photogrammetrie- Jovanović, D., Ontology for Real Estate Cadastre,
Fernerkundung-Geoinformation 6: 451–462, 2003. Survey Review, Vol. 45 No. 332, pp. 357-371, 2013.
[4] Klein, M., Bernstein, A., Toward high-precision service [23] Knublauch, H., Fergerson, R.W., Noy, N.F., Musen,
retrieval. IEEE Internet Computing 8 (1): 30–36, 2004. M.A., The Protégé OWL Plugin: an open development
[5] Bishr, Y., Overcoming the semantic and other barriers environment for semantic web applications, Lecture
to GIS interoperability, International Journal of Notes in Computer Science 3298, Springer: 229-243,
Geographical Information Science 124: 299–314, 1998. 2009.
[6] Berners-Lee, T., Hendler, J., Lassila, O., The Semantic [24] Masolo, C., Borgo, S., Gangemi, A., Guarino, N.,
Web, Scientific American 184 (5): 34–43, 2001. Oltramari, A., Ontology Infrastructure for the
[7] Gruber, T.R., A translation approach to portable Semantic Web, WonderWeb Deliverable D18, IST
ontology specifications, Knowledge acquisition 5: 199– WonderWeb Project 2001-33052. 349 pages, 2003.
220, 1993. [25] Gangemi, A., Sagri, M., Tiscornia, D., A Constructive
[8] Antoniou, G., Van Harmelen, F., Web ontology Framework for Legal Ontologies, International
language: OWL. In: Staab, S., Studer, R. (editors), Semantic Web Conference - ISWC: 97-124, 2003.
Handbook on Ontologies, Springer: 91–110, 2009. [26] ISO / TC211 Geographic Information / Geomatics,
[9] Pan, J. Z. Resource Description Framework, In: Staab, www.isotc211.org, (28.01.2014).
S., Studer, R. (editors), Handbook on Ontologies, [27] The Law on State Survey and Cadastre (Zakon o
Springer: 71-90, 2009. državnom premeru i katastru),
[10] ISO 19152:2012 Geographic information – Land http://www.rgz.gov.rs/web_preuzimanje_datotetka.asp
Administration Domain Model (LADM), ?FileID=321, (28.01.2014).
http://www.iso.org/iso/catalogue_detail.htm?csnumber [28] INSPIRE, http://inspire.jrc.ec.europa.eu/ (17.08.2012).
=51206, (28.01.2014). [29] D2.8.I.6 INSPIRE Data Specification on Cadastral
[11] Egenhofer, M. J., Toward the semantic geospatial web, Parcels – Guidelines,
Proceedings of the 10th ACM International Symposium http://inspire.jrc.ec.europa.eu/documents/Data_Specifi
on Advances in Geographic Information Systems, ACM cations/INSPIRE_DataSpecification_CP_v3.0.1.pdf,
Press, New York: 1–7, 2002. (28.01.2014).
[12] OpenGIS Consortium, http://www.opengeospatial.org/, [30] Vretanos, P., Web feature service (WFS)
(28.01.2014). implementation specification. Version 1.1.0. OGC 04-
[13] Lieberman, J., Geospatial Semantic Web 094. Open Geospatial Consortium, Inc. 131 pages,
Interoperability Experiment Report, OGC 06-002r1. 2005.
Open Geospatial Consortium, Inc. 70 pages, 2006. [31] Sladić, D., Govedarica, M., Ristić, A., Petrovački, D.,
[14] Maué, P. Semantic annotations in OGC standards, OGC 2012a. Semantičko označavanje OGC baziranih
08-167r1, Open Geospatial Consortium, Inc. 50 pages, geoservisa. InfoM, 42, pp. 29-36, 2012.
2009. [32] Baader, F., McGuinness, D., Nardi, D., Patel-
[15] Stock, K., Catalogue Services – OWL Application Schneider, P.F., Description Logic Handbook: Theory,
Profile of CSW, OGC 09-010. Open Geospatial Implementation and Applications, Cambridge
Consortium, Inc. 70 pages, 2009. University Press. 574 pages, 2002.
[16] Klien, E., Probst, F., Requirements for Geospatial [33] SPARQL Query Language for RDF,
Ontology Engineering, 8th Conference on Geographic http://www.w3.org/TR/rdf-sparql-query/,
Information Science, Portugal: 251-260, 2005. (28.01.2014).
[17] van Oosterom, P., Zlatanova, S., Creating Spatial [34] Govedarica, M., Ristić, A., Sladić, D., Pržulj, Đ.,
Information Infrastructures: Toward the Spatial LADM profil za Republiku Srpsku. Kongres o
Semantic Web. Taylor & Francis. 216 pages, 2009. katastru u BiH, Sarajevo, Bosna i Hercegovina, 2011.
[18] Stuckenschmidt, H., Stubkjær, E., Schlieder, C., The
Ontology and Modelling of Real Estate Transactions,
Ashgate Publishing, 170 pages, 2004.

Page 261 of 478


ICIST 2014 - Vol. 1 Regular papers

An approach for the development of context-


driven Web Map solutions based on interoperable
GIS platform
Miloš Bogdanović*, Aleksandar Stanimirović*, Leonid Stoimenov*
University of Niš, Faculty of Electronic Engineering, Niš, Serbia
*

[email protected], [email protected], [email protected]

Abstract— In this paper we will define and describe a novel a particular subset of geo-information and maps – the geo-
approach for the development of context-driven Web Map information and maps he/she is currently interested in.
solutions. Our approach relies on an architecture we define Among all available data (services, geospatial layers,
and present in this paper as an enhancement of GIS documents, etc), Web GIS users need a mechanism to
application interoperability platforms. The enhancement is easily find (discover) what they are searching for – using
performed through introduction of a specific architectural their own words, their own language [6]. This information
layer which enables the development of context-driven Web determine user context within Web Map (Web GIS)
Map solutions. A novel architectural layer we introduce solution in terms of displayed geo-information and maps.
consists of two central components: Web Map Context For that reason, Web Map (Web GIS) should rely on an
Service and Context Proposal Service. These services take infrastructure which enables discovery and usage of
advantage of existing GeoNis framework for appropriate geo-information sources, integration of
interoperability of GIS applications and enable users get information from appropriate geo-information sources and
appropriately visualized geospatial data depending on their storing of user context information in terms of displayed
context. The enhanced platform is capable of adapting to geo-information and maps.
different users’ needs without changing its internal
structure and improves the level Web Map solution A novel approach, which we define and describe in this
usability. paper, foreseen to be used for these purposes, is an
enhancement of GIS application interoperability platforms
through introduction of a specific architectural layer
I. INTRODUCTION which enables the development of Web Map context-
based Web GIS solutions. We have defined this layer by
Interoperability has been long foreseen as an ultimate specifying and developing its two main components: Web
mean for resolving geospatial data source heterogeneity. Map Context Service (WMCS) and Context Proposal
Over the years, scientists and engineers have struggled to Service (CPS). Web Map Context Service is foreseen as a
develop an interoperable geo-information dissemination mediator between users and GIS application
environment through the development of ontology-driven interoperability platforms. In particular, we will present an
geo-information integration architectures (platforms) [1]. architecture which takes advantage of existing GeoNis
Ontology-driven geo-information integration architectures framework for interoperability of GIS applications [7][8]
(platforms) are designed to describe the semantics of geo- to demonstrate the advantage introduced by WMCS and
information sources and to make their content explicit CPS services. WMCS helps users get appropriately
through means of ontologies. They provide powerful visualized geospatial data depending on their context.
semantic reasoning capabilities and utilize ontologies for User context information is stored in a context document
the discovery and retrieval of geo-information [1][2]. which is created according to Open Geospatial
Mostly, the retrieval of geo-information is based on Consortium (OGC) specification. Context documents are
utilizing connections (mappings) between ontologies and created, maintained and manipulated through Web Map
geo-information sources [3][4][5]. Geo-information Context Service operations. The creation of the initial map
sources used within these architectures can be accessed context proposal of a new Web-based GIS user, based on
through means of geospatial services conforming to OGC the description of the data that the particular user is
specifications. Thus, if the geo-information sources within interested in, is the basic functionality of Context Proposal
these architectures expose their interface as Web services Service.
conforming to OGC specifications, then the problem of
geo-information service discovery can be transferred into The rest of this paper will present WMCS and CPS
the problem of discovering geo-information sources services and their specification used for selection of the
within ontology-driven geo-information integration most appropriate geospatial content for a particular user.
architectures. We will also present a brief overview of most prominent
similar solutions and conclude with a discussion and an
Nevertheless, if this approach is used, searching for outlook to future work.
suitable geo-information source for a particular user is still
a challenging task. Such task is particularly hard when
II. RELATED WORK
implemented within Web Map solutions which have a
significant number of different users and rely on a number Context-driven Web Map solutions can be observed as
of heterogeneous information sources. Each user expects a member of a group of personalized software. The
Web Map (Web GIS) solution to be capable of displaying fundamental problem of personalized software

Page 262 of 478


ICIST 2014 - Vol. 1 Regular papers

development is an approximation of user preferences with is not created and maintained according to the existing
a little amount of relevant information [9]. This (OGC) standards which decreases the interoperability
information represents the foundation of the user context. level of the presented systems. Also, a majority of
The reported techniques used for user context extraction adaptive cartographic visualization systems imposes a
are mostly based on determination of user preferences and tight coupling between a map rendering services and
categorization of users according to their behavior symbology used for the visualization of geospatial
[10][11][12]. In the field of GIS methodologies, context- information. Therefore, evaluated systems do not provide
driven GIS have been studied mostly within the their users with ability to determine the styles which
development of mobile applications [13][14]. These should be used for the visualization of geospatial
proposals emphasize the need for different levels of information that they are interested in. Rather, the
adaptation within the geospatial data presentation process presented system uses internal style development formats
[14][15], as well as the need for the development of or integrated Styled Layer Descriptor documents. Further,
methodologies that would consider different contextual the usage of WFS services is not provisioned in the
dimensions together [7]. All together, these approaches majority of these solutions. The direct usage of WFS
share a goal – to make GIS able to automatically services can be very significant if clients are capable of
determine and derive its content. adapting geospatial data presentation according to the
Previously reported contextual cartographic style provided on the basis of the user context.
visualization system proposals are in most cases based on The focus of our research was the defining and
client–server architecture. A solution for adaptive development of a general architecture of contextual Web
visualization of geospatial information on mobile devices Map solution which overcomes the determined problems
proposed in [16] performs adaptive cartographic with a purpose of improving the level of usability of
visualization on the server side. The limitations introduced contextual geospatial information visualization systems.
by the environment of this system resulted in client being The architecture we have developed takes advantage of
responsible only for the presentation of geospatial data the existing GIS application interoperability platforms for
[16]. The context types used by this solution are user context creation purposes. Our architecture relies on
predefined. Another proposal based on client-server GeoNis interoperability platform and its taxonomy to
architecture can be found in GiMoDig project [17]. The determine user preferences and perform ‘on-demand’
architecture of GiMoDig project uses extensions of OGC integration of selected information from multiple
Web Map Service and Web Feature Service specification. heterogeneous information sources. Also, architecture we
These extensions are introduced for the purpose of present in this paper utilizes existing Web Map solution
establishing communication between client and server components and introduces an additional architectural
sides. The elementary context types used by GiMoDig layer which contains Web GIServices capable of
solution are invariant. supporting contextual geospatial visualization. A newly
An implementation encountered in the field of added layer does not influence the existing Web Map
contextual cartographic visualization which we consider solution architectures, e.g. the omission of this layer will
in some extent similar to our proposal is named Sissi – not influence the usual functioning of the existing Web
Contextual Map Service [18]. Sissi is a Web-based server Map solutions. Therefore, this layer will add contextual
application which provides context-aware maps for Web geospatial information visualization capabilities to the
GIS clients. Although it is also based on client-server existing Web Map solutions without introducing any
architecture, Sissi differs in more that few characteristics modification of the existing functionalities.
when compared to previously described solutions. We
consider these characteristics to be very significant. Sissi III. ARCHITECTURE
does not have a predefined set of elementary context types The main goal while specifying and developing
which is how it differs from the previously described architecture for contextual geospatial data visualization
solutions. This characteristic makes Sissi capable of was to design Web Map Context Service (WMCS) as a
supporting different contexts. Sissi specification Web service that has an ability to integrate itself into the
represents an extension of Web Map Service specification existing GIS environments in order to transform such
with extending requests – GetElementaryContextType and systems into contextual geospatial visualization
GetMapWindows. Another difference compared to Web environments. Most of the existing GISs are built upon
Map Service specification is the modification of service-oriented architecture (SOA) principles and use
GetCapabilities request in order to include an additional GIService which provide geospatial data (such as Web
context parameter. Context parameter is used for user Feature Service), perform visualization (such as Web Map
context encoding in the form comma-separated context Service) and maintain styles. WMCS is a Web service
values. Symbology used for the rendering of adapted designed as a mediator between these services and end-
(contextual) maps is an integral part of Sissi and is defined users. The main purpose of WMCS is to maintain user
using Styled Layer Descriptor styling language [19]. context document and to combine the existing services
Hereby presented contextual cartographic visualization according to user context information in order to provide
solutions, which we consider to be the prominent ones, users with the appropriate maps and features. The design
indicate that though significant research and development of WMCS and its operating environment, along with
results exist in this field, a significant effort should be put additional Web services, was the main objective of our
into improving the usability of contextual cartographic research and development. The result was named after the
visualization systems. For instance, although a majority of specification used for the development of context
these systems rely upon the usage of OGC standards documents – Web Map Context Service.
(mostly Web Map Service and Web Feature Service
implementation specifications), user context information

Page 263 of 478


ICIST 2014 - Vol. 1 Regular papers

Figure 1: An architecture for contextual geospatial data visualization

Web Map Context Service is also designed as a context them to the clients and to the Context Proposal
document repository and it does not have the capability to Service.
match user's preferences with the existing context. In  Context Proposal Service – This service is capable of
order to allow context approximation, we propose another providing clients with specific context that describes
service that allows third-parties to customize this service the data and maps relevant to the situation which is of
with their own matching algorithm. The architecture of the users’ interest.
system that WMCS can operate in is shown in Figure 1.
 GeoNis – GIS application interoperability platform.
GIS capable of performing map adaptation should Semantic interoperability in GeoNis, resolved by
consist of several components that perform all the tasks Semantic Mediator [20], is the ability of sharing
needed to help users to get appropriately visualized geospatial information at the application level,
geospatial data depending on the user context: without knowing or, understanding terminology of
 Clients (desktop, mobile, Web GIS) - GIS other systems.
applications capable of displaying geospatial data in  OGC Web Map Services (WMS)[21] and Web
the form of electronic maps. Since clients should be Feature Services (WFS)[22] – Services developed
capable of visualizing geospatial data appropriately, according to OGC WMS and WFS standards. The
these applications have to be able to perform the geospatial data provided from these services is used
following tasks: acquire context documents from in different contexts. Clients can request data from
WMCS, extract contextual data from the received these services only if these services are registered
documents, and create appropriate requests to within Web Map Context Service.
services on the basis of the extracted data and to
properly visualize received data.  Symbology Encoding Repository Services (SER
Services) – Services that provide styling documents
 Web Map Context Service – Stand-alone Web service developed using Styled Layer Descriptor [23] or
responsible for maintaining information considering Symbology Encoding styling language [24]. The
all registered services and style repositories in the information contained within these documents is used
system. Furthermore, WMCS maintains information for adaptation of geospatial data visualization
considering registered user contexts and provides process. Coupled with geospatial data, these

Page 264 of 478


ICIST 2014 - Vol. 1 Regular papers

documents are used for the purpose of creating and where a WMS service which does not support SLD
registering contexts within Web Map Context styling is used, users are not able to choose styles for
Service. layers, and they will receive images with a default style
Context-driven GIS architectures, including the applied. If a WMS service supports SLD styling, clients
proposed one, have shifted towards an agreement on a need to embed the obtained symbology in the form of an
common interoperability architecture based on emerging SLD document into WMS GetMap request in order to
GIS interoperability specifications, most of them issued receive an image with the appropriate symbology applied.
by OGC. These specifications follow SOA principles and This process is shown in Figure 2.
move GIS applications towards a distributed architecture In order to visualize geospatial data acquired from the
based on interoperable GI services. Also, services WFS service, clients need to have a mechanism which
developed according to OGC standards have standardized enables visualization of data acquired from WFS
interfaces which provide GIS developers with a possibility according to styles obtained from one of the Symbology
to easily combine several services capable of processing Encoding Repository Services. This process is shown in
and visualizing geospatial data. Combined with services Figure 3.
that provide user context, e.g. WMCS instances, these
services represent a solid foundation for the development
of a distributed context-driven GIS. In this context, we
consider our architectural proposal to be a significant step
forward in terms of usability and modularity of contextual
geospatial visualization environments.

IV. WMCS AND CPS – A FOUNDATION FOR THE


DEVELOPMENT OF CONTEXT-DRIVEN WEB MAP
SOLUTIONS
Web Map Context Service (WMCS) is the major
component used for the development of context-driven
web map solutions. The main purpose of the WMCS is to
provide users with appropriate geospatial content relevant
to user’s context. WMCS is used as a mediator in the
process of adaptation of geospatial data representation.
For these purposes, WMCS is combined with GIS clients
and distributed Web services that provide geospatial data
and styling documents. These services need to be Figure 3: Communication between a client and the system in cases
registered within WMCS before they can be used. where a WFS service is used

First, a client needs to send a GetContext request to the


WMCS. After receiving a context document, according to
the extracted information, the client creates and sends
GetFeature request to WFS services and GetStyle request
to SER Services. Finally, the data received from WFS
services is visualized according to styles received from
SER Services and displayed to user.
WMCS creates a context document for each registered
context according to OGC Web Map Context Documents
implementation specification [25]. Basic information
considering users’ contexts is stored in a database while
the created context documents are being stored on the file
system. An example of a context document is shown in
APPENDIX A.
As previously stated, OGC Web Service Common
Standard [26] was used as a starting point for the
development of WMCS specification. According to this
specification, the following operations have been
Figure 2: Communication between a client and the system in cases specified:
when a WMS which support geospatial data styling is used
 operation used in order to provide metadata regarding
In cases where a context document contains a capabilities provided by WMCS service
description of WMS service layers, a client creates an  operation used in order to provide context documents
appropriate GetMap request according to the extracted to the clients
information, sends the request to the WMS service and WMCS specifies additional operations that are not
displays the resulting images. WMCS enables the usage of specified by OGC Web Service Common Standard:
WMS services which support geospatial data styling
 operation used for registering services that provide
according to Styled Layer Descriptor implementation geospatial data and styles (WMS, WFS, SER
specification, as well as WMS services which do not Services)
support user defined geospatial data styling. In cases

Page 265 of 478


ICIST 2014 - Vol. 1 Regular papers

 operations used for manipulating user contexts using text strings in order to identify a geographical entity.
 operation used to obtain temporary context document These problems are well-known and are related mostly to
from WMCS synonymy and ambiguity. In order to partly overcome
these problems, in the current development stage, a user-
These operations represent the minimal WMCS defined set of keywords is expanded by the CPS using
operation set. All WMCS operations have the following WordNet lexical database. For each of user-defined terms,
parameters inherited from Web Service Common CPS expands user-defined set of terms with synonyms,
Standard [26]: first level hyponyms and all terms in the hypernym tree
 SERVICE – Service type identifier obtained from WordNet lexical database. The resulting set
 REQUEST – Operation name of terms is compared with GeoNis taxonomy e.g. it is
compared to the names of GeoNis global ontology
 VERSION – Specification version for operation
concepts. Since GeoNis ontology concepts are mapped to
If WMCS encounters an error while processing user geospatial information sources, that in turn can be
requests, it shall return an exception report message as accessed through WMS and WFS interfaces, the matching
specified in Clause 8 of OGC Web Service Common process result will contain the names of WMS layers
Standard [26]. and/or WFS feature types which correspond to global
WMCS defines the following operations: ontology concepts whose names are similar to user-
 RegisterService operation – WMCS allows defined terms. The resulting expanded set of terms is
registration of distributed Web services that provide submitted to WMCS service by invoking GetLayers()
geospatial data and styling documents through the operation of the WMCS service whose argument is the
RegisterService operation resulting term set.
 GetCapabilities operation – The GetCapabilities The core of the geospatial data source discovery is a
operation provides a client with metadata regarding matching process, based on a similarity measurement
the capabilities provided by WMCS service. performed between terms extracted from the user-defined
geospatial data description and expanded by CPS, and
 RegisterContext operation – The RegisterContext GeoNis global ontology concepts. The matching process
operation provides users with the ability to register is performed by Context Proposal Service. Context
their context within WMCS. Proposal Service will load GeoNis global ontology,
 GetContext operation – The GetContext operation extract ontology concepts and determine similarity
allows retrieval of a single or all context documents between the expanded term set and the ontology concepts.
from WMCS. The similarity measurement is based on the use of a
 UpdateContext operation – The UpdateContext combination of unsupervised word sense disambiguation
operation provides users with the ability to update the methods, which utilize WordNet computational lexicon.
existing context within WMCS. The GeoNis global ontology concepts, whose similarity
with user-defined terms exceeds a predefined threshold
 DeleteContext operation – The DeleteContext value, will be used to determine the names of WMS layers
operation allows the removing of the existing context and/or WFS feature types which will be added to the
from WMCS. resulting context document. This process is automatic due
 GetLayers operation - The GetLayers operation to GeoNis platform which contains mappings between
allows retrieval of temporary context document from concepts and data sources, which in turn expose the data
WMCS as a result of comparison of two term sets – a through WMS and WFS services.
term set received as an argument of GetLayers The similarity measurement between terms extracted
operation and a term set which consists of the names from the user-defined geospatial data description and
of data layer which can be obtained from the WMSs GeoNis global ontology concepts is performed in through
and WFSs registered within WMCS. The resulting the following steps. For each pair of terms T EX (from the
context document is not stored within WMCS expanded term set) and TC (from the concept term set),
service. perform the geospatial data discovery process by repeating
Besides the ability of obtaining the context document the following steps:
directly from the Web Map Context Service, client can Measure a similarity between the terms TEX and TC
obtain the context document from the Context Proposal
Services (CPSs). CPS instances are considered to be  Compute edit distance similarity for terms TEX and TC
integral parts of the proposed service-oriented Edit distance similarity is measured according to
architecture. CPS is a customizable service that can be Levenshtein distance [2] and is given by
implemented by a third-party. The basic functionality of dist(length(TEX ), length(TC )) whereas the formula
CPS instance is the creation of the initial map context used to calculate dist(length(TEX ), length(TC )) is
proposal of a new Web GIS application user based on the
description of the data that the particular user is interested given in APPENDIX A
in.  Compute semantic similarity sim(TEX , TC ) between
The textual description of geographical entities which the terms TEX and TC according to the algorithm
appear in WMS and WFS data layers, exposed through the described in [28]. According to this algorithm,
data layer name, can be very different. These textual
descriptions have to match user-defined keywords. Since
sim(TEX , TC ) is determined by considering the
textual descriptions of geographical entities, e.g. data layer depths of the TEX and TC synsets in the WordNet
names and keywords, will be used to identify the content computational lexicon, along with the depth of their
suitable for a particular user, this will raise problems of least common subsumer (LCS). The LCS of synsets

Page 266 of 478


ICIST 2014 - Vol. 1 Regular papers

TEX and TC is the most specific synset that is an of the overall system. Furthermore, the usage of WFS
ancestor of both synset TEX and TC. services is not provisioned in the Sissi environment. The
2 * depth( LCSTEX ,TC ) direct usage of WFS services can be very significant if
sim(TEX , TC )  clients are capable of adapting geospatial data presentation
depth(TEX )  depth(TC )
according to the style provided on the basis of the user
 Determine final semantic similarity according to the context. Also, it is our opinion that the used symbology
following equation: should not be restricted to SLD. Furthermore, styling rules
semsim(TEX , TC )  Max(dist(length(TEX ), length(TC )), sim(TEX , TC )) can be provided by independent services, possibly in the
form of styling document repositories [30]. These
After a matching set of ontology concepts is calculated,
capabilities are also integrated into Web Map Context
CPS will utilize GeoNis semantic mediator to determine
Service specification.
OGC Web services (WMS and WFS instances) used as
interfaces of geo-information sources connected to The architecture presented in this paper should be
ontology concepts from the matched set. This process uses considered an excellent starting point for the development
existing connections (mappings) between global ontology of service oriented GIS capable of supporting contextual
and geo-information sources within GeoNis cartographic visualization. Future research and
interoperability platform. Once OGC Web services development of the presented service and its environment
instances are determined, a set of layer names and/or should cover an extension of WMCS specification in
feature type names is sent back to WMCS for context terms of new operations. These operations will provide
document creation purposes in the form of an argument of WMCS with the ability to use Context Proposal Services
GetLayers operation of WMCS service. (CPS) developed by a third-party. This operation
extension will be based on Web Processing Service OGC
Based on the received term set, GetLayers operation of
Standard (WPS). Currently, coupled with our
WMCS service will create a temporary contextual
implementation of Context Proposal Services, WMCS
document and apply appropriate ordering of results. For
enables users to be introduced with the already-existing
example, if a match is found among the keywords used in
similar contexts which lead to a faster adaptation of
one or more of the existing contextual documents, WMCS
geospatial data visualization and improve reusability of
adds all data layers from each of the contextual documents
the existing symbology. Once extended according to WPS
into the resulting set of data layers. For this reason, a
standard, WMCS will be able to use CPSs developed for a
preference in result ordering is given to data layer name
particular domain which can be very significant for users
matches.
which consider themselves experts for the observed
domain. Further, the WMCS specification will be
V. CONCLUSION AND FUTURE WORK extended with operations which will allow users to
In this paper, our objective was to define and develop a register style transformation scripts. Registered scripts
general architecture of contextual Web Map solutions will perform transformation between a custom styling
which will improve the level of scalability and usability of document and a styling document developed according to
contextual geospatial information visualization systems. OGC specification. Thus, WMCS will be able to
The main result within this research and development is transform styling documents developed according to third-
the ability of our proposal to apply different contexts and party styling languages into styling documents developed
styles for viewing maps. This is achieved by introducing according to OGC specification. Each styling language
an additional architectural layer into the Web-based GIS developer will use WMCS operations in order to register a
application interoperability platforms. This layer consists XSLT [30] or a procedural transformation of its styling
of Web GIServices capable of supporting contextual language into SLD or SE styling language. We are
geospatial visualization and we envision Web Map convinced that these improvements will lead to our
Context Service as its most important component. proposal becoming a solution highly applicable within any
There are few other proposals comparable to WMCS, at existing geospatial data visualization environment and its
least not many of them which cover all functionalities usage will turn such environment into an adaptive
specified and implemented within WMCS. However, geospatial data visualization environment.
there are prominent proposals which can be considered
similar to WMCS to some extent. As we previously stated, ACKNOWLEDGMENT
we consider Contextual Map Service named Sissi to be the Research presented in this paper was funded by the
most similar solution compared to our proposal. However, Ministry of Science of the Republic of Serbia, within the
WMCS differs from Sissi in more than few characteristics project "Technology Enhanced Learning", No. III 47003.
in terms of both surrounding architecture and specified
functionalities. Although Sissi does not have a predefined REFERENCES
set of elementary context types, a variety of user context
types could be more efficiently covered if the user context [1] A. Buccella, A. Cechich, and P. Fillottrani, Ontology-driven
geographic information integration: A survey of current
is stored as a separate document and developed according approaches. Computers & Geosciences, 35 (4), 2009
to Web Map Context Documents specification. This [2] H. Wache, T. Vogele, U. Visser, H. Stuckenschmidt, G. Schuster,
capability is supported by WMCS. Unlike Sissi, WMCS H. Neumann and S. Hubner, Ontology-based integration of
does not perform any rendering in terms of merging information—a survey of existing approaches. In IJCAI-01
images from different Web Map Services. Rather, WMCS Workshop: Ontologies and Information Sharing, Seattle, USA,
uses the rendering capabilities of the existing Web Map 108–117, 2001
Service rendering, therefore does not multiply requests [3] N. Cullot, R. Ghawi and K. Yetongnon, DB2OWL: a tool for
towards the existing Web Map Services. This automatic database-to-ontology mapping. Proceedings of 15th
Italian Symposium on Advanced Database Systems (SEBD 2007),
characteristic can be significant in terms of performances Fasano, Brindisi, Italy, 491–494, 2007

Page 267 of 478


ICIST 2014 - Vol. 1 Regular papers

[4] M. Baglioni, M. V. Masserotti, C. Renso and L. Spinsanti, May, pp. 509-518, ISBN: 972-8093-13-6, 2005
Building geospatial ontologies from geographical databases. In http://plone.itc.nl/agile_old/Conference/estoril/themes.htm
Lecture Notes in Computer Science: Proceedings of The Second [18] J. Kozel, R. Stampach, Practical Experience with Contextual Map
International Conference on GeoSpatial Semantics (GeoS 2007), Service. In M. Konecny, S. Zlatanova, T. Bandrova: Geographic
Vol. 4853. Springer: Berlin/Heidelberg, 195–209, 2007, DOI: Information and Cartography for Risk and Crisis Management.
10.1007/978-3-540-76876-0_13. London : Springer. ISBN: 978-3-642-03441-1, 2010
[5] A. Stanimirović, M. Bogdanović and L. Stoimenov, Methodology [19] M. Lupp, (Ed.) 2007. Styled Layer Descriptor Profile of the Web
and intermediate layer for the automatic creation of ontology Map Service Implementation Specification, v1.1.0. OGC 05-
instances stored in relational databases, Software: Practice and 078r4. Open Geospatial Consortium, Inc., 53pp.
Experience, 2012, DOI: 10.1002/spe.2103 http://portal.open-geospatial.org/files/?artifact_id=22364
[6] A. Tellez-Arenas, Best Practice Report on Geoportals, ECP-2007- [20] L. Stoimenov, A. Stanimirović and S. Đorđević-Kajan,
GEO-317001, OneGeology-Europe, 2009 Realization of Component-Based GIS Application Framework,
[7] L. Stoimenov, S. Đorđević-Kajan, Framework for Semantic GIS Proceedings printed as book, Eds. F.Toppen, P.Prastacos, 7th
Interoperability, FACTA Universitatis, Series Mathematics and AGILE Conference on Geographic Information Science, AGILE
Informatics, Vol.17, 2002, pp.107-125, 2002. 2004, Heraklion, Crete, Greece, April 29 – May 1, 2004., ISBN
[8] L. Stoimenov, S. Đorđević-Kajan, An architecture for 960-524-176-5, 2004, Crete University Press, pp.113-120, 2004
interoperable GIS use in a local community environment, [21] J. de la Beaujardiere, (Ed.) OpenGIS Web Map Server
Computers & Geosicence, Elsevier,.31, 211-220, 2005 Implementation Specification, v1.3.0. OGC 06-042. Open
[9] M. Petit, C. Ray, C. Claramunt, A User Context Approach for Geospatial Consortium, Inc., 85pp. http://portal.open-
Adaptive and Distributed GIS. In: Proceedings of the 10th geospatial.org/files/?artifact_id=14416, 2006
International Conference on Geographic Information Science [22] P. A. Vretanos, (Ed.) Web Feature Service Implementation
(AGILE'07), March. Aalborg, Denmark: Springer-Verlag, LN Specification v1.1.0. OGC 04-094, Open Geospatial Consortium,
series on Geoinformation and Cartography, pp. 121-133, 2007 Inc., 131pp., 2005 http://portal.open-
[10] M. Petit, C. Ray, C. Claramunt, A contextual approach for the geospatial.org/files/?artifact_id=8339
development of GIS: Application to maritime navigation. In J. [23] M. Lupp, (Ed.) Styled Layer Descriptor Profile of the Web Map
Carswell and T. Tekuza, editors, Proceedings of the 6th Service Implementation Specification, v1.1.0. OGC 05-078r4.
International Symposium on Web and Wireless Geographical Open Geospatial Consortium, Inc., 53pp. http://portal.open-
Information Systems, number 4295 in LNCS, pages 158–169. geospatial.org/files/?artifact_id=22364, 2007
Springer-Verlag, 2006 [24] M. Müller, (Ed.) Symbology Encoding Implementation
[11] S. Shearin, H. Lieberman, Intelligent profiling by example. In Specification, v1.1.0. OGC 05-077r4, Open Geospatial
Proceedings of the International Conference on Intelligent User Consortium, Inc., 63pp. http://portal.open-
Interfaces, pages 145–152. ACM Press, 2001 geospatial.org/files/?artifact_id=16700, 2006
[12] Y. Yang, C. Claramunt, A hybrid approach for spatial web [25] J. Sonnet, (Ed.) Web Map Context Documents v1.1.0. OGC 05-
personalisation. In C. Vangenot and K. Li, editors, Proceedings of 005, 30pp., 2005, Available from:
the 5th international workshop on Web and Wireless Geographical https://portal.opengeospatial.org/modules/admin/license_agreeme
Information Systems, number 3833 in LNCS,pages 206–221. nt.php?suppressHeaders=0&access_license_id=3&target=http://po
Springer-Verlag, 2005 rtal.opengeospatial.org/files/%3fartifact_id=8618
[13] M. Hampe, V. Paelke, Adaptive maps for mobile applications. In [26] A. Whiteside, J. Greenwood, (Ed.) OGC Web Service Common
Proceedings of the Mobile Maps Workshop at MobileHCI’05, Standard, v2.0.0. OGC 06-121r9, Open Geospatial Consortium,
2005 Inc., 207pp., 2010, http://portal.open-
[14] T. Reichenbacher, Adaptive methods for mobile cartography. In geospatial.org/files/?artifact_id=38867
Proceedings of the 21st International Cartographic Conference, pp. [27] V. Levenshtein, Binary codes capable of correcting deletions,
1311–1322, 2003 insertions and reversals. Soviet Physics-Doklady, Vol. 10, No. 8,
[15] A. Zipf, Using Styled Layer Descriptor (SLD) for the dynamic 707–710. Original in Russian in Dokl. Akad. Nauk SSSR 163, 4,
generation of user- and context-adaptive mobile maps – a 845–848, 1965.
technical framework, 5th International Workshop on Web and [28] Z. Wu, and M. S. Palmer, Verb Semantics and Lexical Selection.
Wireless Geographical Information Systems (W2GIS), Lausanne, Proceedings of the 32th Annual Meeting on Association for
Switzerland, 2005 Computational Linguistics, (pp. 133‐ 138). 1994 Las Cruces, New
[16] T. Reichenbacher, Mobile Cartography-Adaptive Visualisation of Mexico.
Geographic Information on Mobile Devices. Ph.D. Dissertation, [29] M. Bogdanović, D. Vulović, L. Stoimenov, 2010. Symbology
Institute of Photogrammetry und Cartography, Technical Encoding Repository, in Proceedings of 13th AGILE International
University, Munich, Germany, 198pp., 2004 Conference on Geographic Information Science 2010, Guimaraes,
[17] T. Sarjakoski, M. Sester, L. T. Sarjakoski, L. Harrie, M. Hampe, Portugal, 10.-14. May, ISBN: 978-989-20-1953-6,
L. Lehto, T. Koivula, Web generalisation services in GiMoDig – http://agile2010.dsi.uminho.pt
Towards a standardised service for real-time generalisation, in [30] J. Clark (Ed.), XSL Transformations (XSLT) v1.0., World Wide
Proceedings of 8th AGILE International Conference on Web Consortium (W3C), 1999,
Geographic Information Science 2005, Estoril, Portugal, 26.-28. http://www.w3.org/TR/1999/REC-xslt-19991116

APPENDIX A
 Max (iTEX , jTC ), when  Min(iTEX , jTC )  0

  distTEX ,TC (iTEX  1, jTC )  1
distTEX ,TC (iTEX , jTC )   
Min  distTEX ,TC (iTEX , jTC  1)  1 , when  Min(iTEX , jTC )!  0
 
 distTEX ,TC (iTEX  1, jTC  1)  (TEX [iTEX ]  TC [ jTC  1] ? 0 : 1)

Levenshtein distance for terms TEX and TC

Page 268 of 478


ICIST 2014 - Vol. 1 Regular papers

An example of a WMCS context document created according to OGC Web Map Context Documents implementation
specification

Page 269 of 478


ICIST 2014 - Vol. 1 Regular papers

SocIoTal: Creating a Citizen-Centric Internet of


Things
Nenad Gligoric*, Srdjan Krco*, Ignacio Elicegui**, Carmen López**, Luis Sánchez**, Michele Nati***, Rob
van Kranenburg****, M. Victoria Moreno*****, Davide Carboni******
*DunavNET, Research and Development, Antona Cehova 1, 21000 Novi Sad, Serbia
**University
of Cantabria, 39005 Santander, Cantabria, Spain
***Centre for Communication Systems Research, University of Surrey, Guildford, GU2 7XH, Surrey, UK
****University of Liepaja, Lielā iela 14, Riga, Latvia
*****Computer Science Faculty, University of Murcia, Espinardo Campus, 30100 Murcia, Spain
****** Information Society Research, CRS4,Parco Tecnologico, 09010, Pula (CA), Italy

{nenad.gligoric, srdjan.krco}@dunavnet.eu , {iemaestro, clopez, lsanchez}@tlmat.unican.es, [email protected],


[email protected], [email protected], [email protected]

Abstract—In this paper the vision and the main objectives of concepts such as smart object, entity, partial entities,
the FP7 SocIoTal project are described together with a access control capabilities, and relations among different
description of the initial activities and results in relation to objects cooperating to provide a requested service, trust
the scenario definition. Contrary to the general approach of and reputation of entities.
creating Internet of Things (IoT) services from a business
In this paper a step further is taken by analyzing
perspective, the project addresses the design of citizen
existing solutions and platforms to provide relevant use
centered IoT solution. For this, it is required to create new
cases based on citizens‟ feedback from the City of
IoT solutions from the citizens' point of view and for the
Santander and the City of Novi Sad.
immediate benefit of the citizens without necessarily
involving city or commercial service providers. In the initial The final goal is to create a consolidated architecture
period of the project, it was focused on the definition and enabling an increased citizen participation in the Internet
analysis of potential scenarios and use cases. A co-creation of Things (IoT) while lowering the barriers to the creation
approach towards definition of user scenarios was adopted of novel citizen-centric and citizen-generated services.
resulting in a series of workshops with groups of citizens in The structure of the paper is as follows: in Section II an
two cities: Santander and Novi Sad, discussing the issues analysis of existing previous work is given and existing
they face and how IoT could help and improve the quality of solutions are analyzed. Section III defines the main
life in their home, building or a neighborhood. The results features of SocIoTal project that fill the gap that previous
of the workshops, i.e. the user scenarios, are presented, as solutions have identified. Section IV describes the
well as the requirements involved in using the methodology scenarios and use cases selected to identify new
defined by the IoT-A Architecture Reference Model (ARM). requirements that need to be solved from the user-centric
In addition, a survey of existing citizen-centric applications, perspective of the project. And finally, Section V presents
services and platforms is provided. This survey shows that conclusions and the future directions of this work.
there is justified motivation for fostering of creation of a
privacy-preserving architecture for the citizen-centric II. PREVIOUS WORK
Internet of Things, in which several existing platforms can
be used as a foundation for attaining this goal. In order to envision the platform that will enable a large
scope of supported applications, we have surveyed
I. INTRODUCTION existing work on this area. Accordingly, this section
presents an analysis of existing solutions based on the
The growing citizen expectation on Smart Cities is
placing increasing pressure on European cities to provide information collected about the FP7 projects, platforms
better and more efficient infrastructures and services, and commercial/non-commercial applications that
ranging from the use of discrete new technology provide services for a city or a citizen.
applications such as RFID and the Internet of Things (IoT) A. Citizen-centric applications and services
[1] to a more holistic conception integrating users and
technology that is closely linked to the concept of Living The analysis of the application and services was
Labs and user-generated services. This fosters an also centered on the functionalities offered, the security and
growing number of initiatives focused on new services trust capabilities and the way the citizens are involved.
development that interconnect citizens and cities The following criteria were identified:
infrastructures. (1) Functionalities, related to the different features
In previous work [2] we have highlighted the the services provide but focused on who is
challenges that the creation of a privacy-aware providing (producer agent) data and who is
demanding the information (consumer agent).
framework needs to face for envisioning the social
This way, we may find several options: city to
perspective of citizen-centric services based on the IoT citizen where the city council information system
paradigm. For this, a taxonomy is provided to define is gathering data from different sources, compiles

Page 270 of 478


ICIST 2014 - Vol. 1 Regular papers

and “publishes” it for the final users (citizens, experimentation of architectures, as well as services and
tourists, etc.) to access it; Citizen to citizen, a applications for a smart city (parking management,
citizen (or a producer user) provides data from its participatory sensing, augmented reality, environmental
own information sources directly to other citizens monitoring).
(or consumer users), through its own created Citizens‟ opinion has been recognized as important in
services or through some kind of platform; related projects such as UrbanSensing [5], Live+Gov [6]
citizen to city, where the citizen here provides and Nomad [7]; as feedback from the citizen to the
data to the city information system, coming from municipality and the citizen participation in decision
its sensors, data bases or services; and combined, making can have positive impact on the life of the people
where the service itself takes information from in the city. Different crowd sourcing and participatory
producer citizens and information sources (such sensing applications [4][5][6][8] already provides the
the city council or the utilities‟ data services) to citizens with a tools to point out to authorities about
provide information to both, the consumer citizen different events and issues in the city. These services are
and the city. mainly based on a user generated reports, feedbacks or
(2) Application domain as the scope of applicability information extracted from user devices, social networks,
of the services analyzed. Current interesting sensors, etc. Similar to these are Smart Santanders‟
areas for smart cities are sustainability, efficiency environmental monitoring application and other eco
and public services improvement what promotes crowd sourcing applications [9][10][11][12].
the development of services for citizens on urban Transport services [10][13][14] offer different
transport, touristic information, waste information to the user about traffic: road cameras, nearest
management, environment, energy savings and bus and metro position and stops over map; rental (e.g.
efficiency, maintenance of urban resources, bike rental, car rental); or even motivates citizens to use
safety and so on. Citizens (as producers) are alternative environmental friendly transport and to
fostering instead the Sharing Information compete with other citizens.
services and platforms, which allow users to
Safety applications are ranging from disaster relief
publish data from their own sources to be shared [15][16] and alert systems [17]; over health and
with other consumer users.
emergency services [18][19] to citizens auto-protection
(3) Privacy and Trust management criterion searches applications [20].
for privacy and trust features in the analyzed
Majority of the above solutions as graphically presented
solutions, as one of the main SocIoTal
in Figure 1, does not satisfy many of predefined criteria:
application areas. This involves (but is not functionalities are only provided or consumed by one side;
limited to) security, privacy, policies definition,
privacy and trust management are not employed; the
identity management (including identification
business model is limited or solution is not “meant to
and authentication), trust and reputation
last”; and the most important, there is no open source code
management features. that will enable further progress of the service.
(4) The Business model analysis aims to the way
services are delivered and marketed. Here we can
find free access services, premium licenses, pay-
per-use models etc. The different business
models and related options offered inside
SocIoTal will foster the development of new
services and involve stakeholders: the more
business opportunities, the more actors working.
(5) Proprietary code vs open source issue, related
also with the business models, addresses the
nature of the solution: is it an open source service
that can be modified and adapted by other users
or does it consist of a closed service.
Other aspect taken into account during the initial
analysis was the level of accessible documentation for Figure 1. Citizen centric applications and services
each solution, which directly impacts on the level of
technical details available for elaboration and comparison. B. Ctizen-centric platforms
The existing solutions are presented by application Analysis of the existing platforms will serve as
domain criterion. evaluation guidelines for the selection of the platforms,
Cities of Barcelona and Santander were making facing the early trials, the pilots and the final SocIoT
significant effort to bring “smart” into their cities. alarchitecture. The main criterion followed was platform
Barcelona City Council launched “Barcelona openness and extensibility. As SocIoTal aims firstly to
CiutatIntelligent” project [3], aiming to define, design and help existing eco-system to grow, thus allowing their
develop a reference model of a network management vertical scalability, by introducing new concepts in the
platform and sensor data and finally validate it in IoT world and tools to support them, platform openness
Barcelona, with the ultimate goal of being able to adopt and extensibility is an important dimension for the
this model to any other city around the world. evaluation of existing platforms. SocIoTal project will
SmartSantander project [4] provides a unique in the target platforms satisfying the following requirements: (1)
world city-scale experimental facility for the research and Extensibility, thus allowing to add new functionalities

Page 271 of 478


ICIST 2014 - Vol. 1 Regular papers

defined as part of the SocIoTal research and innovation relevant aspect of FI-LAB/FI-WARE is its Developer
efforts; (2) Open source is preferred in order to allow ease Community and Tools (DCT) architecture, designed to
of modification, while closed source could be still possible offer a multi-functional development, enabling the
as long as extensible around the closed core and in case development and management of the applications and
the close core should be already very well aligned with services built on top of FI-LAB.
SocIoTal requirements and should not prevent other The DeviceHive platform provides the necessary
desired but missing features to be realized; (3) Non- communication framework and tools for the efficient
commercial license is essential as no available money can integration of IoT devices to the cloud server that already
be allocated to fund such platform. comprises of the concept of Data Producer (Device) and
There are many existing platforms that to some extent Consumer (Client). Therefore, its scope is considered as
can be utilized for building IoT ecosystem [21][22] quite relevant to the SocIoTal concept, which focuses on
[23][24][25][26][27][28][29][30][31][32][33][34][35] enabling active citizen integration and participation in the
[36][37][38][39][40][41][42][43][44][45]; but after IoT. The richness of APIs would allow extending the
selection of the solutions that are open sourced, client side by building SocIoTal dashboard and
extendable and in its final phase, we have found that customized SocIoTal user environments. Privacy, trust
Webinos, Butler, OpenIot, DI.ME, FI-LAB and and reputation management schemes should be
DeviceHive are the best candidate for further analysis. significantly enhanced: by introducing more sophisticated
BUTLER is an open platform provided under non- policies for data access and sharing on the client side
commercial license. The modular approach of the building a reputation framework, adding policies to
BUTLER platform makes it easily extensible, thus would control the data generation from the user side, thus
allow straight forwardly enhancing it with SocIoTals‟ implementing context-based privacy and security
security solutions. Its service oriented architecture would mechanisms.
allow creation of tools of service composition to enable C. State of the art summary
developers and users applications reusing BUTLER smart
objects and services that provide open well documented After looking into the previous solutions, it can be
APIs. The support of various IoT protocols is an stated that the privacy, trust and reputation aspects are
advantage for demonstrating SocIoTal solutions on a wide partially treaded (such is done for OpenIoT, DI.ME and
variety of IoT devices. Adding new IoT devices is FI-WARE platforms), or is not considered at all. In many
possible by creating the necessary protocol bridge with a of these services, final user that consumes the service is
quite small effort. ignorant about the user that provides the service, but there
is no layer that secures users‟ stored data and that provides
OpenIoT is focused on building open source solutions, any logic for the trustworthy validation and involved data
thus research and providing the means for formulating and sharing. In addition, although some of services have very
managing environments comprising IoT resources, which strong utility value, it can be noted that there is a gap that
can deliver on-demand utility IoT services such as SocioTal can fill in by providing appropriate architecture
„sensing as a service‟. Smart city services/applications are that will support identified use cases empowered by trust
a big segment of OpenIoT portfolio and thus this project and privacy mechanisms that can be built from scratch or
should be referenced in the SocIoTal. The OpenIoT based upon existing framework, i.e. one or several open
project has also developed and implemented a Security source platforms.
Architecture [46] for authorizations and trust mechanisms
for sensor based readings. The currently implemented III. SOCIOTAL FILLING THE GAP
method for calculating the reputation score introduces
many risks so a more reliable and efficient method would SocioTal have four main target groups of stakeholders:
need to be included. Implemented trust mechanisms for (1) a technical Ecosystem of enablers: Ipv6, RFID,
event based entities should also be further developed to Sensors, QR codes, barcodes, large service providers
satisfy the requirements of SocIoTal. (telco‟s and data integrators, corporate IT, startups, etc);
Platform DI.ME supports multiple identities by (2) a policy ecology of local neighborhood groups, city
implementing the concept of different user-defined profile councils, regional incubators, national and EU policy
cards, which represent multiple identities of the same makers; (3) developer communities: Arduino, Raspberry
individual. The proposed privacy-enhancing technologies Pi, Internet of Things Meetups (20.000 members
by the platform could serve as a useful starting point for globally) 3D Printing etc., accelerated by inexpensive
SocIoTal view regarding security and privacy open hardware, software, database storage and data
requirements about information sharing and the analytics; and (4) citizens that are co-creator of scenarios.
application to the use cases which are being envisioned by The SocIoTal use cases are verymuch in line with the
SocIoTal. Additionally, the platform can be extended by
activating additional connectors and services. driving location based aspects that underlay many of
smart city applications. As we iterate and refine the use
FI-WARE provides a Security Architecture
cases with input from our four Stakeholder Groups we
specification to address both the security properties of the
FI-WARE platform itself and the applications that will be believe that „sharing‟ and „facilitating sharing‟ as well as
built on top of it. As such, the Security Architecture will „collaboration‟ will become a main driver for adoption of
focus on four main modules: Security monitoring; Generic smart city applications.
Security Services (Identity Management, Privacy, Data Taking into account the analysis made in section 2, it can
Handling); Context-Based Security and Compliance; and be concluded that the current trend in citizen centric
Optional Generic Security Services (Secure Storage services, applications and platforms is mainly oriented
Service, Morphus antivirus, DB Anonymiser). Other towards concentrate information from different sources

Page 272 of 478


ICIST 2014 - Vol. 1 Regular papers

(provided from citizen, extracted from authorities or C. Sharing Entities


specific information systems, etc.) and present it through It supports communications between two users who want
some kind of web portal or events reporting system, such to share physical objects. The physical entity here is the
a Publish/Subscribe service, keeping the citizen aware object to be shared. First, the object/entity that the
about the public transport, parking, environment and producer user wants to share is registered in a resource
events, and the corresponding municipality up to date directory, becoming a virtual entity. The information
about what‟s going on in the city. In main lines, privacy provided in this registration process will describe the
and trust issues are not addressed enough in current object, according to a specified resource model. The
solutions, mainly because these solutions do not involve consumer user who wish to access an object checks this
sensible or personal information from users and are not resource directory for the available virtual entities (e.g.
oriented to a user-to-user interaction, but a user-to- searching for an object “car” with the desired attribute
community or a user-to-municipality where identity and “route”) creating the proper request. The service registers
privacy are not the main important targets. and delivers this request to the producer user, owner of
Here is where SocIoTal goes a step beyond, focusing on the requested object. Then, a communication link among
this user-to-user interaction (including user-to- these users is created to somehow arrange the use of the
communities as an extension of this one), that will allow object.
the users to offer their devices (or “Things”) and
information sources, sharing interesting information sets D. Sharing Information
within a well identified community. This service is intended to cover both, an M2M/User-2-
This approach will bring new business models, based on User communications to share, send, lookup or visualize
the data that can be provided and the services the users information provided by the users or IoT devices. The
can create. But this requires a solution that provides process will include interaction between a set of remote
security, trust and reliability to the interactions in this sensors or a user‟s information source (coming from
environment. SocIoTal envisions this by defining a set of social networks, cloud storage, etc.) and other users,
atomic (and generic) core services that will comprise the machines or servers, whether through any platform or a
basic enablers to further build the use cases and scenarios “dedicated” link between the source and the sink.
for trials. These core services are described in the Virtualized physical entities from different nature are
following subsection by using methodology defined by monitored by sensors and the raw data that they generate
the IoT-A Architecture Reference Model (ARM) [46]. is formatted according to a defined data model to make it
homogeneous. Afterwards, authorized users (humans or
A. Users’ group management
active digital artefacts) get access to them in order to
This service allows the users to create, configure and create new services to other users. In case of physical
manage communities belonging, personal trust circles and proximity of consumer and producer devices and no
user‟s identities in an easy way. A community is a group remote consumers exist, a direct communication should
of people trusted by the user. The user can create a group, be also provided by instantiating the required resources
be added to a group, add the people they want (all after on the consumer device.
been authenticated within SocIoTal framework), and
select the resources (e.g. access to some sensors owned
by the person) they want to share or not. Once created the
group, all people from the same community will have
access to the resources shared by each person in the
community. Also, if the user does not want to continue
sharing resources with a member of the group, they can
leave the group or they can oust the person from the
group, and all the access to the shared resources are
revoked
B. Discovering People and Devices
It enables checking user/device‟s location and Figure 2. Sharing information service
discovering other users/devices around them. According
to IoT-A nomenclature, the physical entity within this E. Data Access Control
service will be the users to be discovered, associated to This service provides an access control to the system
their smart phones as the devices in charge of sensing the components through implementation of all authentication,
position of its owner. When a device/user detects the authorization and accounting (AAA) mechanisms. Access
presence of another device/user which belongs to the to the virtual entities is provided for the registered users
same community/bubble, both devices receive a only. Data access is restricted to registered users allowing
notification indicating the presence of a only those having a digital profile with sufficient
community/bubble member and basic data from their privileges to retrieve certain information. Data access
profiles. Once this has been received, they can establish a control handles several users‟ roles. The user creation and
secure communication link to share data, services or chat. management is one of the key components in the
platform. The platform should manage several types of

Page 273 of 478


ICIST 2014 - Vol. 1 Regular papers

user accounts as well as the user authentication in the interaction among the devices belonging to two users can
system. Data access control should be performed in such be also considered.
a way to limit the access only inside the trusted group. The reputation score can also be generated automatically
by the algorithm based on a certain model. Commonly
used reputation models include: (i) an outlier detector
model [47][48][49], based on summation and
aggregation; (ii) Bayesian framework models [50] which
depict the reputation value as a probability distribution,
(iii) models based on fuzzy logic, (iv) or even methods
based on the social ratings system model
[51][52][53][54].
H. Route Calculation
This service provides the user a route to go from one
place in the city to another based on information collected
Figure 3. Data Access Control from diverse registered sources and the events and
incidences uploaded by the users.
F. Notification service A user will introduce the positions of the starting point
This service implements notifying capabilities - and the destination, as well as other preferences related to
notifications. Several types of notifying mechanisms the desired route. A global database server is checked to
should be implemented such as push notifications, SMS gather the information that the route calculation
and email. Consumer user can register their preferred algorithms needs. This storage entity can provide
notifying mechanisms based on the notification information uploaded by the utilities of the city,
importance level and availability. The same notification municipality, public service transport and updated by
could be sent to multiple users (e.g. a notification for the other users. With this collected data, the service‟s route
water leaks detection in the apartment could be sent to calculation algorithms will process them and provide the
both the apartment owner and the public water company most suitable route.
at the same time. The owner can be notified via SMS
whilst the water company can be notified via e-mail). All IV. SCENARIOS AND USE CASES
reported events (virtual entities) are available via the web Based on the SocIoTal development objective described
portal where an access to a more detailed description of in the previous section and the information gathered from
the virtual entities is provided only to the authorized the citizens of Novi Sad and Santander, this section
person/companies. presents the different scenarios to be deployed in these
two cities that will group the different sets of use cases
and core services designed to achieve the project goals.
Information is gathered in Santander by using
Santander City Brain platform portal provided by
Santander‟s City Council and in Novi Sad by organizing
the workshops. From the sample of 850 registered user in
Santander platform and 120 people attending 9 workshops
in Novi Sad, a number of 825 ideas are collected. The
security, trust and reliability technical aspects of the
provided sample were analyzed to form SocIoTal
scenarios and use cases
In the following text are presented some of the
scenarios to be deployed in Santander and Novi Sad which
group the different sets of use cases and core services
designed to achieve the project goals.
Figure 4. Notification service
A. Scenario Enabling Santander
Over the smart city platform, this scenario creates an
G. Reputation service applications environment oriented to disabled people
(although any user could also benefit from it). The main
This service describes the trust and reputation idea is to create a way for disabled people to point out
mechanisms that will enable generation of a reputation incidences (e.g. traffic light for pedestrians doesn‟t work,
score for the entity in the global storage; e.g. a reputation open ditch at this street, etc.) over a city map, so the rest
score for the sensed phenomenon; a trustworthiness score of the users can get access to the current status of how
for a given user trust, etc. A reputation score can be seen “enabled” is the city. The location data of these reported
either as an indicator that a particular device has provided incidences can be used together with other information
reliable measurements in the past, or as a measure of how sources (such as city information, public transport, taxis,
trustworthy is a given user for another. Together with etc.) to calculate alternative routes suitable for disabled
other metrics, as measure of trust, the degree of social people, avoiding obstacles from one point of the city to

Page 274 of 478


ICIST 2014 - Vol. 1 Regular papers

another. Also, the users can check the locations of disable leisure events reported through the reporting service
parking spaces and the number of free available lots. could be added in a list or in an information map.
SCENARIO_SOC_001: Enabling Santander Scenario B. Scenario Santander Citizen
Use Cases
Services
Santander Citizen tries to create an environment for users
to share their resources/data/sensors. The Santander
Incidence Reporting &
Tracking Service
Accessible Routes Accessible Parking
Enabled Information
System
platform will provide mainly the identity management
infrastructure and the security and trust mechanisms
whilst users provide information, resources and services.
Sharing information Route calculation
1) Car pooling
This use case will allow citizens to share car trips.
Figure 5. Scenario Enabling Santander Within the use case, the users will be able to offer their car
for giving others a lift or vice versa. The information that
citizens should provide would be their starting points and
In this line, extra info can be shared in this scenario, destinations and the date/time. The process would start
like disabled accesses locations, adapted premises, bars, when people who want to drive their car find other people
pubs, restaurants, etc. or events related to this collective. with the same/similar route. The next step would be to
All this shared information will make interesting to select the partners and to send them an invitation. After
create a community made up by disabled people, their that, a meeting is established and when people are arriving
family, assistants, caretakers, etc. This means to set up a to the place the discovery mechanisms are triggered and
registering utility and a profile creation and management they can meet in an easy and trust way. A secure
tool, so every incidence, event or comment uploaded can communication between each travel partner and the shared
be assigned to an identified (only by community car is initiated when the meeting point is reached in order
members) profile. to discover the car. While traveling together, day after
1) Incidence reporting & tracking day, the same partners get to know each other better and
The main functionality of this use case is to register a the same their mobile devices. After few trips together,
geo-located event in a specified environment (e.g. a their devices start to share data by using relaxed secure
broken bench, a broken handrail, etc.) reported by a user. communication mechanism, i.e., music from one of the
The user data could contain plain text and/or multimedia partner‟s phone can be directly streamed to the car stereo
files (audio, video, photo). As part of this service, the without need to encrypt it, thus saving device battery.
platform will also offer the capability of tracking the These tasks will elapse within a trust framework, where
events uploaded (incidence properly registered, status, all users have been authenticated and their profiles have
actions taken, etc.). been validated, so the rest of users can trust in the veracity
2) Accessible Routes of the main info provided by users of the service.
This use case will offer the citizen an accessible route At the same time, to compensate the owner of the
to go from one point to another along the city. This route shared car, its private parking lot can be automatically rent
will avoid barriers of different nature such as not during the time it is not used. To this purpose, the parking
accessible infrastructures, works in the city, blocked lot position can be shared with interested citizen when the
access to streets, broken traffic lights, etc. The information car is not there. To avoid malicious user to use the system
will be gathered from different sources such as to track presence at home of specific user, the information
municipality information, public transport, local should be provided only to registered user, traveling by
authorities, utilities, reporting incident applications, etc. car in the proximity of the parking lot.
3) Accessible parking 2) Sharing sensors – Weather forecast
It has been designed to provide the user information It will allow users to share information gathered with
about the location of disabled parking spaces and its their own weather sensors. Also, they will be able to
availability. To make this information available to the create new services using data from that sensors or sensor
user, the data gathered from the parking sensors installed information sources from other users. To create a uniform
under the pavement in the disabled parking spaces are framework for this use case, it has to be developed a
modelled and stored so that users will be able to select an common data modeling to ensure that all data have
area and check how many free available lots are. homogeneous format and all devices can understand the
4) Enabled Information System data received. There can be described different ways of
receiving data. First, people can be subscribed to a set of
It allows disabled users to have updated information
weather sensors and they can receive data from them
about accessible locations, adapted premises, bars, pubs, every so often. Also, they could request the information
restaurants, etc. or events related to this collective. After directly to the global storage in charge of storing all data
being authenticated, the user will be able to look for an from sensors. Other way could be that someone creates a
adapted location around their position or in other place service which estimates a weather forecast with data
they selected in the city. The information will be gathered from the sensors, sending more concrete
provided and updated by the municipality, information to other users.
restaurants/pubs/bars‟ owners or other users. The users of
C. Scenario Novi Sad
the service will be able to punctuate the locations taking
into account different aspects such as the grade of This scenario creates different domains of applications
adaptability, the quality of the service/products, etc. Also, environment that will enable children and elderly
monitoring, watering of the green surfaces, reporting of

Page 275 of 478


ICIST 2014 - Vol. 1 Regular papers

different categories of problems in city and buildings; and automatically either to the public water company if the
monitoring of different crowd sourced information from leakage takes place in the common areas or to the
social networks and local TV and news stations. apartment owner if the leakage is detected inside the
apartment. Manual reporting is done by tenants for the
situations like: lift failure, fuse failure etc. Users register
their preferred notification mechanisms based on the
notification importance level. The same notification can
be sent to multiple users. For example, a notification for
the water leakage detection in the flat can be sent to both
the flat owner and the public water company at the same
time. The flat owner will be notified via SMS whilst the
water company will be notified via email to the
appropriate person from the company responsible for
monitoring alarms and handling emergency situations.
Figure 6. Scenario Novi Sad
Similarly, in case of the lift failure the reporting will be
done to the responsible maintenance company. All
1) Guardian - Children and elderly monitoring messages should contain details about reported issues. It is
application mainly composed by the sharing information service
This scenario proposes a system that will enable remote together with data access control for users and service for
location monitoring of children and elderly and alerting providing acknowledge for users of trust.
in case of emergency. The user location may be obtained 4) Public reporting application
either using GPS coordinates or by checking on control This scenario proposes a reporting application for
points with NFC and/or QR tag enabled places (schools, different categories of problems directly to municipalities.
playgrounds, bus stations, home, etc.).Several types of These reports are available to other citizens that can rate
users should be supported by the use case. The final user published events. The rating mechanism is based on trust
will be able to create groups. Users can subscribe to and privacy mechanism that will be later developed within
specific (expected) routes in order to receive notifications this scenario. Reports related to communal issues are
if the monitored user deviates from the expected route. directly forwarded to communal department. There are
Elders/children in emergency situations can use panic many applications with similar functionalities; the added
button to alert their person of trust. Monitoring of the values from this use case are: privacy and trust, reputation
elderly people is more focused on the user activity mechanism that will enable efficient collection of relevant
information and presentation of collected issues in the
detection, thus the appropriate mobile application will application. Users may subscribe for specific types of
upload the observed acceleration data. Subscribed users events or city areas. It is mainly composed by the sharing
may be notified if a lack of activity is detected. The information service together with data access control for
mobile application should include a function for the fall users and service for providing acknowledge for users of
detection that is reported to the system. Mobile trust.
application can also implement the heart rate (HR) 5) City Visualization Dashboard
interface. The HR may be estimated either using the SmartCity dashboard visualizes a general opinion of the
mobile device‟s camera with the flash placed near the people in a city about upcoming events by data mining
camera objective or by using the cheap ECG device. and extracting of patterns from online news station, social
Additional processing within the mobile application may networks, and other sources. A devoted mobile application
also provide useful intuitive messages to the end user. enables users to express their current feeling using the
Additionally the processing may provide the stress level mood meter. The final statistics is visualized using the IoT
estimation on the smartphones with or without an led lamp driven by a very popular ARM based Linux
additional device. embedded system, the Raspberry Pi. All fields of the lamp
2) Watering of the green surfaces are independent and can be controlled individually by
visualizing different letters, numbers or even pictures.
The Public City Utility Company (PCUC) may provide
The lamp is controlled through a SPI interface on the
a list of green surfaces which are equipped with the soil
moisture sensors. Citizens may register for some areas and Raspberry Pi. Based on a current mood in the city, the
receive reminders for watering. Using the sensor data lamp can visualize different types of information such as:
a smile or a sad face; a statistics about the city;
PCUC should detect the watering and provide a discount
measurements collected from sensors such as
to the worthy citizens‟ monthly bill.
environmental parameters, etc. It is mainly composed by
It is mainly composed by the sharing information the sharing information service together with data access
service together with data access control for users and control for users and service for providing notifications.
service for providing notifications.
3) Building Council and Janitor V. CONCLUSION
The main purpose of the Building Council and Janitor The services and use cases defined in this paper will
Use Case is reporting the building defects and problems enable contribute with high value along the following key
with tenants. Reported issues are sent to the person in socio-economic dimensions: increased resource
charge. Defects can be either detected automatically or efficiency to build more sustainable and leaner
reported manually by tenants. Automatic detection of a
problem can be illustrated through example of the water communities; increased public and community
leakage detection. In such a case the report is sent accountability; increased safety of vulnerable in the

Page 276 of 478


ICIST 2014 - Vol. 1 Regular papers

community; increased social inclusion and awareness of [24] OpenIot FP7 project,
citizens. Among others, SocIoTal will provide an [25] Gatesense http://www.gatesense.com/
architecture including an intuitive and secured [26] FI-LAB http://www.fi-ware.eu/
environment inspired by social media tools and focused [27] Intoino http://www.intoino.com
on empower citizens to easily manage access to IoT [28] Evrythng, http://www.evrythng.com
[29] Allthingstalk, http://www.allthingstalk.com/
devices and information, while allowing IoT enabled
[30] TthingWorx, http://www.thingworx.com/
citizen centricservices to be created through open
[31] Waag, http://waag.org/en/project/smart-citizen-kit
community API‟s which will run on the top of an IoT
[32] Nimbits http://www.nimbits.com/
service framework.
[33] Thethingsystem, http://thethingsystem.com/
[34] M2m.eclipse, http://m2m.eclipse.org
ACKNOWLEDGMENT [35] Nodered, http://nodered.org/
[36] Kinesis, http://aws.amazon.com/kinesis/
This paper describes work undertaken in the context of [37] Devicehive, http://www.devicehive.com/
the SocIoTal project(http://sociotal.eu/). The research [38] Opensensors, http://opensensors.io/
leading to these results has received funding from the [39] Iot-toolkit, http://iot-toolkit.com/
European Community's Seventh Framework Programme
[40] Sensdots, http://www.sensdots.com/
under grant agreement n° CNECT-ICT- 609112.
[41] Social-iot, http://www.social-iot.org/
REFERENCES [42] ConnectedLiverpool, http://www.connectedliverpool.co.uk/
[43] Ciseco, http://shop.ciseco.co.uk/about-us/
[1] International Telecommunication Union 2005, “The Internet of
Things”, ITU Internet Reports. Available at: [44] OsiIoT, http://osiot.org/?q=node/29
http://www.itu.int/osg/spu/publications/internetofthings/Internetof [45] Openremote,
Things_summary.pdf http://www.openremote.org/display/HOME/OpenRemote
[2] M. Victoria Moreno, José L. Hernández, Antonio F. Skarmeta, [46] Internet-of-Things Architecture (IoT-A), “Project Deliverable
Michele Nati, Nick Palaghias, Alexander Gluhak, Rob van D1.2 – Initial Architectural Reference Model for IoT”
Kranenburg, "A Framework for Citizen Participation in the http://www.iot-a.eu/public/public-documents/d1.2
Internet of Things", Pervasive Internet of Things and Smart Cities [47] C.T. Chou, A. Ignjatovic, W. Hu, “Efficient Computation of
(PitSac) whorkshop, 2014. Robust Average in Wireless Sensor Networks using Compressive
[3] Barcelona Ciutat Intel·ligent , http://smartbarcelona.cat/en/ Sensing”, Technical Report: UNSW-CSE-TR-0915.
[4] SmartSantander FP7 project, http://www.smartsantander.eu/ [48] S. Ganeriwal, M. Srivastava, “Reputation-based framework for
[5] UrbanSensing FP7 project under grant agreement n° 314887 high integrity sensor networks”, ACM Transactions on Sensor
(FP7/2007-2013)http://urban-sensing.eu/ Networks (TOSN), vol. 4, no. 3, May 2008.
[6] LiveGov FP7 project, http://liveandgov.eu/ [49] Kuan Lun Huang, Salil S. Kanhere, Wen Hub, “On the need for a
reputation system in mobile phone based sensing”, Ad Hoc
[7] Nomad FP7 project, http://www.nomad-project.eu/ Networks, Volume 11, Issue 2, Pages 611-732 (March 2013)
[8] APLAWS open source Collaboration and Content Management [50] HaoFan Yang, Jinglan Zhang, Paul Roe, “Reputation modelling in
System, https://fedorahosted.org/aplaws/
Citizen Science for environmental acoustic data analysis”, Social
[9] SMARTIP Project, http://www.smart-ip.eu Network Analysis and Mining, 2012
[10] The Green Watch: Crowdsourcing Air Quality Measurements http://eprints.qut.edu.au/57569/2/57569.pdf
http://readwrite.com/2009/12/08/the_green_watch_project_crowds [51] A. Ignjatovic, N. Foo, and C. T. Lee, An analytic approach to
ourcing_air_quality_measurements reputation ranking of participants in online transactions," in The
[11] COBWEB FP7 reference number: 308513, 2008 IEEE/WIC/ACM International Conference on Web
http://cobwebproject.eu/ Intelligence and Intelligent Agent Technology, Volume 01,
[12] WeatherSignal, http://weathersignal.com/ (Washington, DC, USA), pp. 587{590, IEEE Computer Society,
[13] Bicing, https://www.bicing.cat/ 2008.
[14] RingRing, http://ring-ring.nu/ [52] A. Flanagin, M. Metzger, R. Pure, and A. Markov, User-generated
ratings and the evaluation of credibility and product quality in
[15] Chorist FP6 project, http://www.chorist.eu/ ecommerce transactions," in System Sciences (HICSS), 2011 44th
[16] Ushahidi, http://www.ushahidi.com/ Hawaii International Conference on, pp. 1{10, IEEE, 2011.
[17] Japan Geigermaphttp://japan.failedrobot.com/ [53] Mohammad Allahbakhsh, Aleksandar Ignjatovic, Hamid Reza
[18] INSIGHT- Intelligent Synthesis and Real-time Response using Motahari-Nezhad, Boualem Benatallah, “Robust evaluation of
Massive Streaming of Heterogeneous Data, http://www.insight- products and reviewers in social rating systems”, World Wide Web
ict.eu/ 2013
[19] MedWatcher, https://medwatcher.org/ [54] HaoFan Yang, Jinglan Zhang, Paul Roe, “Using Reputation
[20] SafetyGps, http://www.safetygps.com/ Management in Participatory Sensing for Data Classification”,
Procedia Computer Science 5 (2011) 190–197
[21] Webinos, http://www.webinos.org/
[22] BUTLER FP7 project, http://www.iot-butler.eu/
[23] Di.me http://www.dime-project.eu/

Page 277 of 478


ICIST 2014 - Vol. 1 Regular papers

Enhancing BPMN 2.0 Informational Perspective


to Support Interoperability for Cross-
Organizational Business Processes
Marija Jankovic*, Miroslav Ljubicic*, Nenad Anicic* and Zoran Marjanovic*
* Faculty of Organizational Sciences, University of Belgrade, Jove Ilica 154, 11000 Belgrade, Serbia
{jankovic.marija, ljubicic.miroslav, anicic.nenad, marjanovic.zoran}@fon.bg.ac.rs

Abstract — Business Process Modeling Notation (BPMN) is being performed), behavioral (when and how activities are
a standard for business process modeling, within performed), organizational (where and by whom activities
organization or outside of its boundaries. BPMN analyzes a are performed) and informational (informational
business process as a set of interrelated activities, focusing entities/data produced or manipulated by a process) [5].
primarily on the functional perspective of the process. In The problem is that, in the description of the process,
contrast to the aforesaid, BPMN pays very little attention to BPMN focuses on the sequence flow, e.g. the sequence of
the data/information flow, hence neglecting the tasks, gateways, and events [4]. Informational aspects
informational perspective of the process while modeling. In relevant for the process execution are treated as less
this light, the paper proposes an approach for formal important than functional process perspective. In
modeling and specification of information requirements
BPMN1.x it was not possible to define the process
used and generated in the cross-organizational business
semantic for informational elements such as data or data
processes. UML View Profile is introduced to specify
flow. They were classified as artifacts, e.g. simple
information requirements as views over the common
annotations of the diagram. In BPMN 2.0 data has been
reference ontology. BPMN 2.0 extension is performed in
order to enable connecting the defined views and the
upgraded to a process variable, while an XML schema has
corresponding process activities. Eventually, the proposed defined the data semantic. The main disadvantage of
information requirements specification enables generation complex data logic expressed as XML is luck of the
and automation of the message instance transformation at diagrammatic representation. Only the small part of the
implementation level. The paper discusses benefits of the information specified by semantic model is represented in
presented approach and practical advantages supporting the diagram, e.g. text label, data and data store icon.
the cross-organizational interoperability provisioning. Moreover, in the specification of CBPs, and in addition
to defining the process flow, it is necessary to define the
I. INTRODUCTION detailed information requirements they use and generate.
Business processes often take place between multiple The descriptions of document types, e.g. the informational
independent partners crossing organizational boundaries. and message models, and especially descriptions of their
Modeling of cross-organizational business processes connections, should be an integral part of the business
(CBPs) focuses on defining process views describing the processes’ informational aspect. The BPMN 2.0 notation
interaction between two or more business partners [1]. For is not meant to allow data modeling and the breakdown of
a comprehensive CBPs modeling, a three-level approach data information in specific data models [4]. Instead, it
should be applied [2][3]: provides extension points to accommodate diverse
technologies. Therefore, we propose an approach for
Business process level: This level primarily supports a
formal modeling and specification of information
set of shapes and symbols easily understood by business
requirements used and generated in the CBPs.
analysts. The CBPs models on this level are not executed.
The approach is based on the idea that information
Technical processes level: This level represents the full
requirements should be specified in the terms of mutual
control flow of the process, e.g. events and messages
reference ontology. According to [6], there are many
exchanged between collaborating partners. CBP models
different meanings of the word ontology. Gruber [7] has
are still not executable at this level.
defined the ontology as “explicit specification of a
Executable process level: This level involves a conceptualization”. This definition seems to be the most
platform-specific interaction. It illustrates how data, cited in the literature, however other useful definitions can
services, messages, etc. are specified and executed in an be found in [8-10]. In the context of this work reference
appropriate execution engine. ontology is used as an explicit and formal representation
Currently, the Business Process Modeling Notation of a set of business concepts and business concept
(BPMN) version 2.0 is the only modeling technique that relationships for a specific business domain. Moreover, it
could theoretically support all three levels [4]. Executable is a shared vocabulary and a shared conceptual model of
modeling is a brand new capability of BPMN 2.0. the data exchanged between the collaborating business
Executable details are fully captured in the BPMN partners [11]. The UML View Profile is introduced in
standard attributes. Additional advantage of BPMN 2.0 is order to specify information requirements as views over
the capability to represent four important process the common reference ontology. The BPMN 2.0 extension
modeling perspectives : functional (what activities are is performed in order to enable connecting the defined

Page 278 of 478


ICIST 2014 - Vol. 1 Regular papers

views and the corresponding process activities. Finally, since this ontological language enables defining the
the proposed information requirements specification models at different levels of abstraction. On the other
enables generation and automation of the message hand, it provides easy understanding and use of the
instance transformation at the implementation level. language concepts, and it is most frequently used for
The remainder of the paper is structured as follows. The describing integrative standards (OAGIS, RosettaNet,
following section discusses the related work. The third Universal Business Language (UBL)). Despite these
section proposes the approach to solve the identified standards being fundamentally based on Extensible
problems. The fourth section demonstrates the approach Markup Language (XML) format for exchanging
on an illustrative example. The final section concludes the messages (i.e. they define types of business documents
paper. through XML Schema), their specifications at a
conceptual level are most frequently presented by means
II. RELATED WORK of the UML graphic notation, given that in this way they
ensure easier understanding of the standard by business
Barkmeyer and Denno address the shortcomings of analysts. Therefore, the description of all
information requirements specification for cross- documents/messages exchanged in the business process is
enterprise modeling in [12]. They propose a methodology provided via the UML models and in accordance with the
for specifying the needed information requirements in a reference ontology.
joint business process. The term joint process denotes a
shared viewpoint of the joint actions between An issue arising at this point is how to formally specify
collaborating partners. The methodology includes the information requirements of the activities within the
developing three major components: a reference ontology business process. Barkmeyer and Denno [12] point out
for the business entities, a formal specification of the joint that an information requirement arises when an agent uses
process and a binding between process elements and a property of an entity or relationship in conducting a
business entities [12]. However, they do not propose modeled activity. Therefore, the information requirements
notation for information requirements specification. of the activity are only the properties of the business
entities from the reference ontology that are involved in
Deutch and Milo point out that the process flow affects the execution of the activity. The business entities will
the application data and vice versa [13]. Another also contain other properties that are not involved in the
important issue discussed in [13] is modeling of data activity and therefore do not represent the information
manipulations and transformations performed by business requirements of that particular activity. To specify the
processes. A number of database researchers [14-16] have information requirements as a subset of the business
strived to establish a connection between the database and entities’ properties we propose UML View Profile, an
the corresponding processes. An interesting approach to UML extension defined using a profile mechanism. This
information requirements modeling based on relational approach is similar to the concept of a database view,
database is proposed in [17]. In research works such as where the reference ontology with its business entities
[16-18] the data manipulated by the process are specified corresponds to the database tables while the view model
as artifacts. In [18] Abiteboul et al. have introduced defined using UML View Profile contains the derivation
Active XML semi-structured data models. rules of the information requirements from the business
Several authors have proposed BPMN 2.0 extension entities.
approaches focusing on different process modeling Finally, having the activity information requirements
perspectives [19, 20]. An interesting approach relevant to defined as the view model, and messages used and
our work is presented in [21]. The authors have proposed produced by the activity described by the reference
to extend the BPMN 2.0 process models in the following ontology model, it is necessary to associate them with the
views: organizational, data, control and function. They activity within the BPMN model. To enable this, we
have annotated the Data View with concepts from domain propose BPMN extension based on the extension
specific ontology. However, our approach is more general mechanism introduced in BPMN 2.0.
since their solution is designed for ARIS enterprise system
modeling methodology based on Link Data principle. The procedure for specifying the information
requirements comprises the following steps:
III. DETAILS OF THE APPROACH a. Reference Ontology Development. The reference
ontology for a specific business domain is created,
As already stated in the introduction, our approach is or an existing reference ontology is chosen, if
based on the reference ontology as a shared vocabulary of appropriate. Ontology specification at the
the messages exchanged between the collaborating conceptual level is presented using the UML
business partners. The ontologies may vary, not only in models.
terms of the content, but also according to the structure,
implementation, level of description, conceptual range and b. Business Process Model Development. If not
language specification. The number of languages could be already defined, the BPMN model of cross-
used for its construction. These languages can be organizational business process is created.
classified into graph-based languages (Semantic networks, c. Annotation and Association of Information
Conceptual graphs, Unified Modeling Language (UML)), Requirements. The BPMN model is annotated and
frame-based languages (Frame systems, Open Knowledge enriched by concepts defined in the BPMN
Base Connectivity (OKBC), XML-Based Ontology extension, to visually present use of the reference
Exchange Language (XOL)) and logical languages (Web ontology models and the view models with the
Ontology Language (OWL), Rule Markup Language process activities.
(RuleML), Knowledge Interchange Format (KIF)). Our
approach is based on the UML description of the ontology

Page 279 of 478


ICIST 2014 - Vol. 1 Regular papers

d. View Model Specification. Detailed specification of extension mechanism that can be used by modelers to
the information requirements of the process define new concepts with their own semantics. The
activities is created using the UML View Profile. BPMN extension mechanism contains a set of extension
Derivation rules contained within the view model elements: ExtensionDefinition,
created in the last step form the basis for the automatic ExtensionAttributeDefinition, Extension and
generation of mapping rules between the elements of the ExtensionAttributeValue [4]. They have been used to
reference ontology model and the view model. Execution define the BPMN 2.0 metamodel extension, depicted in
of the generated mapping rules transforms the data Fig.1, which enables inclusion of the ontology document
contained within the exchanged message (instance of the definition and view definition into the BPMN process.
reference ontology model) into the information ExtensionDefinition and ExtensionAttributeDefinition
requirements of the activity (instance of the view model). elements are used to define structure of the proposed
extension, and Fig. 1 depicts them with the stereotypes of
The UML View Profile, BPMN extension and model
transformation process are described in the continuation of the same name, while the original BPMN elements are
marked with the stereotype BPMN.
this chapter.
The BPMN metamodel elements relevant for the
A. UML View Profile inclusion of the ontology document/view definitions are
This section lays out a UML profile proposal, called the DataObject, DataInput and DataOutput. All these are
UML View profile, as a formal mechanism for identifying subclasses of ItemAwareElement, selected as the BPMN
the information requirements of the process activities [22]. metamodel concept being extended. The
Using the proposed UML View Profile, information ItemAwareElement is extended either by the
requirements are defined as a subgroup of properties of OntologyElement or the ViewElement extension
the appropriate business entities of the reference ontology. definition (in Fig. 1 this is illustrated by the {xor}
The defined stereotypes of the UML View profile are constraint). An ItemAwareElement (e.g. a DataObject)
described in Table 1. can practically contain either the reference ontology
document (represented by the OntologyElement) or the
B. BPMN Extension view defined over the reference ontology document
(represented by the ViewElement). The original BPMN
The BPMN is designed to be extensible by a standard elements used to specify data structure contained by an
Table 1. UML View Profile

a) Stereotype: ViewPackage b) Stereotype: ViewClass


Description: Represents a package that contains view definition. Description: Represents a class defined within the view definition,
Constraints: The package members must be of one of the based on the reference ontology class. Contains ViewProperty
stereotypes: ViewClass, ViewAssociation or basedOn. properties which define subgroup of properties of corresponding
reference ontology class.
Tagged Values: expressionLanguage - the language used to define
the expressions within the package members. Constraints: It must contain at least one property with the
ViewProperty stereotype. It must be based on the reference ontology
class (represented by the dependency connection with the basedOn
stereotype).
Tagged Values: isEntryPoint - signifies whether ViewClass is the
entry point of the view, i.e. the initial point for the transformation
execution.

c) Stereotype: ViewProperty
Description: Represents a property defined within ViewClass whose
value is determined by an expression defined over the properties of the
reference ontology class.
Constraints: It must have a defined value for the tagged value d) Stereotype: ViewAssociation
expression. Description: Represents an association that connects two
Tagged Values: expression - the expression that defines the mapping ViewClasses.
of the ViewProperty to one or more properties of the reference ontology Constraints: The association ends owner must be ViewClass or
class. ViewAssociation itself (depending on the navigability of the association
end). It must have a defined value for the tagged value
refinementExpression.
Tagged Values: refinementExpression - the expression that defines
the condition for additional filtration of the set of ViewClass objects at
the ViewAssociation end.
e) Stereotype: basedOn
Description: Dependency relationship of this stereotype defines the
dependency of ViewClass from the reference ontology class, i.e. it
defines the reference ontology class whose properties are subsetted by
ViewProperties of the ViewClass.
Constraints: The basedOn dependency source must be ViewClass, f) Stereotype: Key
while the target must be Class. Description: Represents the ViewClass identifier.
Constraints: It must be applied to ViewProperty.

Page 280 of 478


ICIST 2014 - Vol. 1 Regular papers

Figure 1.BPMN Extension

ItemAwareElement are ItemDefinition and Import definition. Based on them we can define the
elements. Likewise, these BPMN elements are utilized to transformation rules of the reference ontology model
import data structure of the reference ontology document instance (i.e. message exchanged within the business
or the view used in the extension. By default, the data process) to the instance of the view model (these instances
structure is serialized in XML Schema format. are shown at M0 meta-layer in Fig.3). There are several
ViewElement has the OntElementRef property ways in which the transformation of the models can be
referencing the OntologyElement it depends on, i.e. it defined. Query/View/Transformation (QVT) specification
defines the reference ontology document to which the is one of the standard ways provided by Object
view is to be applied. For the visual representation of this Management Group (OMG) [23]. The other could be one
dependency the ViewOntologyAssociation extension is of the XML transformation languages (XQUERY,
defined, extending the original BPMN element Extensible Stylesheet Language Transformations (XSLT))
DataAssociation. This extension limits DataAssociation bearing in mind that business messages (M0 models) are
by defining the ViewElement (sourceRef property) as an usually exchanged in the XML format and that the model
association source and OntologyElement (targetRef description itself can be serialized in the XML format.
property) as the association target. When the The approach proposed here is based on the QVT
ViewOntologyAssociation extension is used within a transformations, while the XML-based transformations
DataAssociation element, sourceRef and targetRef can be generated from them if necessary.
properties of the DataAssociation must have the same Transformation rules for the instances of the reference
values as the respective properties of the ontology model can be automatically generated based on
ViewOntologyAssociation extension. the view model (which is in return based on the reference
Fig. 2 shows an illustrative example using the concepts ontology model). This is possible because the
defined in this BPMN extension (the extension concepts transformation definition itself can be presented as a
are marked with the appropriate stereotypes). model. The QVT defines the predefined operation
‘asTransformation’ which enables cast of the
C. Model Transformation transformation definition compliant with the QVT
As previously stated, the UML View Profile is used to metamodel as a transformation instance [23]. This is used
specify semantic mapping rules between the view model
and the reference ontology model (shown in Fig.3 at M1
meta-layer of the four-layered metamodel architecture). M2 UML Metamodel extends UML View Profile
These rules are contained within the view model
conformsTo conformsTo conformsTo

Reference
basedOn View Model
Ontology Model
M1
Transformation
Generation
conformsTo conformsTo
basedOn basedOn
generate

RO Document Data
M0 Instance in Transformation out
View Instance

Figure 2. Sample BPMN process with extension elements Figure 3. Data Model Transformation

Page 281 of 478


ICIST 2014 - Vol. 1 Regular papers

to invoke on the fly transformations definitions created B. Business Process Model Development
dynamically [23]. This QVT feature is used in our The second step creates formal specification for the
approach, as illustrated in Fig.3 with the Transformation collaborative business process identifying all activities and
Generation node. In this step, the QVT transformations shared information exchanged within the process. An
definitions are dynamically generated based on the rules example of a collaborative shipping business process is
defined within the view model. The generation algorithm given at Fig. 4, focusing solely on Supplier participant's
relies on the fact that both source and target models are activities.
instances of the same metamodel, i.e. the UML
metamodel. Since the model entry point is given for each C. Annotation and Association of Information
view definition, the algorithm relies on the definition of Requirements
the entry point ViewClass in further processing. The rules
for the generation of the QVT transformations from the In this step, the information requirements are associated
Object Constraint Language (OCL) expressions are with the appropriate BPMN elements by annotating them
primarily applied to the ViewProperties and in accordance with the proposed BPMN extension.
ViewAssociations of the entry point ViewClass [24]. Fig. 5 illustrates annotation and association of
Thereafter, they are successively applied to other information requirements for the Verify Shipment activity.
ViewClasses and their ViewProperties and Types of data objects are annotated through the
ViewAssociations. In the next step, the generated QVT OntologyElement stereotype defining the eKanban
transformations (represented in Fig.3 with the Data ShipmentSchedule document exchanged between Carrier
Transformation node) are executed. and Supplier, and ViewElement stereotype, which defines
The transformation of data for the validation of the the activity information requirements as a view over
ShipmentSchedule, named ShipmentScheduleView, i.e. it
proposed approach is done within the Eclipse Modeling
defines how to filter and generate the data from the
Framework (EMF) with the application of QVTo ontology document. Their mutual inter-dependency is
implementation. explicitly shown by the dependency relationship annotated
with ViewOntologyAssociation stereotype.
IV. EXAMPLE
Let us present our approach by an example. The D. View Model Specification
example models a generic eKanban scenario [25]. Final step creates the information requirements
specification using UML View Profile. Fig. 6 depicts the
A. Reference Ontology Development ShipmentScheduleView definition. The view is defined
The first step creates the reference ontology. This over the ShipmentSchedule document of the eKanban
ontology definition represents a key part of the reference ontology. For the purpose of clarity, only the
architecture and contains information about business elements of the ShipmentSchedule document relevant for
concepts, the connections between them and the the definition of the view are shown (ShipmentSchedule
contextual description which describes in what way the and ScheduleLine classes from the eKanbanShipment
information entities (whether basic or aggregating) can be package). The view is defined within the ViewShipment
used in a specific business scenario. The eKanban package with the ShipmentScheduleView ViewClass as
reference ontology was chosen to illustrate our approach, the entry point of the transformation. The
at this step [26]. ShipmentScheduleView is mapped to the
ShipmentSchedule class of the eKanban ontology
Carrier

ShipRequest ShipmentSchedule

Stage good for


Shipment
Supplier

Evaluate Shipment Request for Shiping Verify Shipment Generate ASN Data
Rules Verification Shipment

Prepare Shipment
Data

Figure 4. Sample Shipping Process


Carrier

submodel
ShipmentSchedule eKanbanOntology
<<OntologyElement>> <<OntologyElement>>

<<ViewOntologyAssociation>>
ShipmentRequest
Stage good for
Shipment ShipmentScheduleView
Supplier

<<ViewElement>>
Evaluate Shipment Request for Shiping
Rules Verification
Verify Shipment Generate ASN Data
Prepare Shipment Shipment
Data

Figure 5. Annotated Shipping Process

Page 282 of 478


ICIST 2014 - Vol. 1 Regular papers

Figure 6. ShipmentScheduleView Definition


(basedOn dependency). Its ViewProperties are mapped to having all properties of the ScheduleLine class. If it is
the properties of the ShipmentSchedule class by the necessary to use only the subset of properties of
appropriate OCL expressions defined within the ScheduleLine class, a new ViewClass have to be defined
expression tagged value of each ViewProperty. The OCL (ScheduleLineView in Fig. 6) as well as a new
mapping expressions are defined in the form suitable for ViewAssociation with appropriate refinement expression
direct execution against the appropriate reference ontology (ViewAssociation between ShipmentScheduleView and
concept. The self keyword within the ViewProperty's ScheduleLineView in Fig. 6). It should be noted that now
OCL expression marks the reference ontology class to ShipmentScheduleView has ViewProperty
which the ViewClass, owner of the ViewProperty, is largeScheduleLines and ViewAssociation, both defined
mapped. For example, ViewProperty Type is defined by using the same expression (self.lines->select
the "self.scheduleType" expression which is executed (totalReceived.amount > 100)), but resulting in sets of
against the ShipmentSchedule instance and results in the different objects. ViewProperty largeScheduleLines will
value of its scheduleType property. contain the Set of ScheduleLine objects while the Set
A ViewClass, mapped to a reference ontology class by obtained through ViewAssociation will contain
a basedOn dependency, can also define its ViewProperties ScheduleLineView objects (that contains the subset of
over the related classes of the mapped class and their ScheduleLine properties relevant for the view definition).
properties. In line with the aforementioned, the Fig. 7 depicts this by an example of the ShipmentSchedule
ShipmentScheduleView contains allScheduleLines document instance (Fig. 7 a) and the appropriate
ViewProperty representing the Set of all ScheduleLine ShipmentScheduleView instance obtained as the result of
objects of ShipmentSchedule, and largeScheduleLines the transformation process (Fig. 7 b).
ViewProperty representing the Set of ScheduleLine
objects with amount greater than 100. In both cases, the V. CONCLUSION
Set will contain "full" ScheduleLine objects, i.e. objects This paper presents an approach in the formalization of

Figure 7. Ontology document instance and view instance examples

Page 283 of 478


ICIST 2014 - Vol. 1 Regular papers

the informational aspect of cross-organizational business Semantic Web Services and Business Applications (pp.3-22),
processes and the possibility for their automation, i.e. Berlin/Heidelberg: Springer, 2007
implementation. [11] M. Vujasinovic, E. Barkmeyer, N. Ivezic, Z. Marjanovic,
‘’Interoperable Supply-Chain Applications: Message Metamodel-
The key contributions of this paper are: based Semantic Reconciliation of B2B Messages’’, International
- the definition of the UML View Profile as a Journal of Cooperative Information Systems, 19(01n02), pp. 31-
mechanism to specify the information 69, 2010.
requirements in terms of the reference ontology. [12] E.J.Barkmeyer, and P. Denno. "On capturing information
requirements in process specifications", Enterprise
- the definition of the BPMN extension to allow Interoperability II, Springer London, pp.365-376, 2007., ISBN-
association of the information requirements to the 13:9781846288579
BPMN model activities. [13] D. Deutch, and T. Milo, ‘’Business Processes: A Database
- the definition of the QVT transformations enabling Perspective’’, Synthesis Lectures on Data Management,
Morgan&Claypool Publishers, 4(5), pp. 1-103, 2012.
the automation of the model instance
[14] C. Beeri, A. Eyal,T. Milo, and A. Pilberg.,’’Monitoring business
transformation for the purpose of their easier processes with queries’’, InProc. 33rd Int. Conf. on Very Large
implementation. Data Bases, pp. 603–614, 2007.
It can be concluded that the proposed approach is [15] A. Deutsch, M. Marcus, L. Sui, V. Vianu, and D. Zhou, “A
sufficiently general and flexible to describe cross- verifier for interactive, data-driven web applications’’, InProc. of
organizational business processes that include a detailed SIGMOD, pp. 539–550, 2005. DOI: 10.1145/1066157.1066219
specification of the informational content. [16] C. Fritz, R. Hull, and J. Su, ‘’Automatic construction of simple
artifact-based business processes’’, In Proc. 12th Int. Conf. on
The orientation of our future efforts is to define tools to Database Theory, pp. 225–238, 2009. DOI:
support the proposed manner of describing processes, and 10.1145/1514894.1514922
therewith facilitate the application of the steps presented [17] S. Abiteboul, V. Vianu, B. Fordham, and Y. Yesha, “Relational
in this approach, thus making the application of the transducers for electronic commerce”, InProc. 17th ACM
presented transformations possible. SIGACT-SIGMOD-SIGART Symp. on Principles of Database
Systems, p.p. 179–187, 1998.DOI: 10.1145/275487.275507
REFERENCES [18] S. Abiteboul, P. Bourhis, and V. Vianu, “Comparing workflow
specification languages: a matter of views”, InProc. 14th Int.
[1] ATHENA. “Deliverable Number: D.A2.1: Cross-Organisational Conf. on Database Theory, pp. 78–89, 2011. DOI:
Business Process requirements and the State of the Art in 10.1145/1938551.1938564
Research, Technology and Standards Version 2”, SAP, November [19] A. Grosskopf, “An extended resource information layer for
2005. BPMN”, Systems Engineering, 1-17, 2007
[2] S. Lippe, U. Greiner, and A. Barros, “A survey on state of the art [20] A. Rodriguez, E. Fernandez-Medina, M. Piattini, “A BPMN
to facilitate modelling of cross-organisational business extension for the modeling of security requirements in business
processes’’, XML4BPM 2005, 2nd Workshop of German processes”, IEICE Transactions vol. E90-D(4), April 2007
Informatics Society e.V. ( GI) in conjunction with the 11 th GI
Conference “BTW2005”, Gesellschaft für Informatik Bonn, pp. [21] F.Gao, W. Derguech, M. Zaremba, “Extending BPMN 2.0 to
7-22, 2005. Enable Links between Process Models and ARIS Views Modeled
with Linked Data”, BIS 2011 Workshops, LNBIP 97, pp. 41–52,
[3] http://athena.modelbased.net/wholesite.pdf (accessed 30 March 2011.
2014)
[22] Object Management Group (OMG), “Unified Modeling
[4] Object Management Group (OMG), “Business Process Model and LanguageTM (OMG UML) Version 2.4.1”, August 2011
Notation (BPMN) Version 2.0”, January 2011.
[23] Object Management Group (OMG), “Meta Object Facility (MOF)
[5] B. Curtis, M. Kellner, and J. Over, “Process modeling’’, 2.0 Query/View/Transformation Specification Version 1.1”,
Communication of the ACM, Vol. 35, No.9, pp. 75-90, September January 2011
1992.
[24] Object Management Group (OMG), “Object Constraint Language
[6] N. Guarino, D. Oberle, S. Staab, “What is an Ontology”, (OCL) Version 2.3.1”, January 2012
Handbook on Ontologies, International Handbooks on Information
Systes, Springer, 2009 [25] Automotive Industry Action Group (AIAG),”IBP-2 Invetory
Visibility & Interoperability Electronic Kanban Business Process
[7] T. R. Gruber, “A Translation Approach to Portable Ontologies”, Version 1”, March 2006.
Knowledge Acquisition, 1993.
[26] E.J.Barkmeyer, B. Kulvatunyou, “An Ontology for the e-Kanban
[8] N. Guarino, “Formal ontology, conceptual analysis and knowledge Business Process”, NIST Internal Report 7404, National
representation”, International Journal of Human-Computer Studies Institute of Standards and Technology (NIST), 2007.
43 (5–6) 625–640, 1995
[9] N. Guarino, P. Giaretta, “Ontologies and knowledge bases –
towards a terminological clarification”,Towards very large
knowledge bases: knowledge building and knowledge sharing. .
Amsterdam: IOS Press, 25-32, 1995
[10] M. Hepp, “Ontologies: State of the art, business potential and
grand challenges”, Ontology Management – Semantic Web,

Page 284 of 478


ICIST 2014 - Vol. 1 Regular papers

Towards interoperability properties for tooling a


software bus for energy efficiency
Alexis Aubry*,**, Hervé Panetto*,**
* CNRS, CRAN UMR 7039, France
** Université de Lorraine, CRAN UMR 7039, Boulevard des Aiguillettes
{alexis.aubry; herve.panetto}@univ-lorraine.fr

Abstract—This paper proposes a draft architecture for a gies on a large scale. This framework will propose the
software bus that deals with the interoperability challenges integration of any energy sources and sinks across the
when having to interconnect, in a software platform, some territory, seeking potential interconnections between in-
software components/tools/methods/approaches for optimis- dustries (territory scale), to optimise process efficiency
ing energy efficiency. This platform will be developed in the (plant/process scale) and to facilitate the optimal design of
framework of the French ANR Plate-Form(E)3 project. new technologies (component level). The platform will
After having highlighted the challenging interoperability interconnect some existing tools (open source or proprie-
issues that are inherent to this type of platform, we are ana- tary) implementing different specialised methods, models
lysing a state-of-the-art for identifying the candidate tech- and algorithms. The issue of interoperability is thus im-
nologies. An architecture based on one technology candidate portant.
for solving the basic interoperability challenge is then pro-
The goal of this paper is to present the analysis and syn-
posed.
thesis of the state-of-the-art, relevant for resolving the
different interoperability issues of the future Plate-
I. INTRODUCTION form(E)3 system – prototype.
The current context of increasing scarcity of fossil fuels The first part of this paper is related directly to the defi-
and the associated price volatility strongly encourage the nition of the interoperability problems in PFE3. Hence, it
society for energy saving. Although significant efforts presents theoretical foundations for interoperability, the
have been made in the industrial sector since 1973, ac- motivation for that work, scenarios and use cases that
cording to estimation from the French institute CEREN1, form the Plate-form(E)3 system architecture and its envi-
the potential energy power saving could be up to 12 Mtoe2 ronment. The second part analyses the state-of-the-art,
(about 23% of energy consumption in the industrial sec- namely candidate technologies, models, tools, resources
tor). These savings could be made on the following basis: and frameworks for the resolution of the identified in-
teroperability problems. Different types of candidate tech-
 About 2/3 of the savings can be made on plants nologies are presented. Each of the technology analysis
using local optimisation approaches, convention- will consist of two sections. The first section presents the
al or experimental technologies. technology in detail. The second section analyses the rele-
 The remaining 1/3 of the savings can be achieved vance of the technology for PFE3. Finally a discussion is
by conducting cross-cutting actions, using tech- proposed for presenting an architecture based on the use
nology for recovery and transport of residual en- of a standard for process simulation software (namely
ergy. CAPE-OPEN) that is considered as one of the candidates
The local optimisation approach (process/plant scale) is for dealing with our interoperability issues.
already extensively studied, while the global optimisation
approach (territorial area) is not addressed in the literature. II. INTEROPERABILITY AND THE ASSOCIATED
In fact, no tool exists that is capable to achieve a cross- PROBLEMS IN PFE3
scale optimisation of energy and environmental efficiency.
The aim of Plate-Form(E)33 project is to address this A. Theoretical foundations for interoperability
problem. The ANR Plate-form(E)3 (PFE3) project: Digi- IEEE defines interoperability as the ability of two or
tal Platform for computation and optimisation of Energy more systems or components to exchange information and
and Environmental Efficiency at different scales for indus- to use the information that has been exchanged [1].
try (component/process/plant/territory) must contribute to Hence, the diversity, heterogeneity, and autonomy of
the optimisation of energy and environmental efficiency software components, application solutions and business
of industry and territories. Plate-form(E)3 will be realized processes, must be considered as potential sources of non-
by a prototype for assessing the impact of new technolo- interoperability. In contrast to system integration, which
basically deals with formats, protocols and processes of
1
CEREN, French Centre for Studies and Economic Re- information exchange, the objective of interoperability is
to have two systems invoking each other’s functions or
search on energy
exchanging information with the consideration that they
2
Mtoe = Million Tonnes of Oil Equivalent are not aware of each other’s internal workings.
3
http://www.agence-nationale-recherche.fr/en/anr-
Furthermore, interoperability aims at correct and com-
funded-project/?tx_lwmsuivibilan_pi2[CODE]=ANR-12-
plete reasoning on the meaning of the information which
SEED-0002

Page 285 of 478


ICIST 2014 - Vol. 1 Regular papers

is exchanged between two systems. Hence, it is sometimes  Cross‐scale interoperation


called “semantic interoperability”. Main tools for the im-  Cross‐domain interoperation
plementation of the semantic interoperability are ontolo-  Cross‐feature interoperation
Tool
gies, languages for ontologies’ representation, inference feature
tools (engines) and semantic applications. 2
Semantic interoperability of systems means that the
precise meaning of the exchanged information is uniquely
Expert 
interpreted by any system not initially developed for the 1
domain
purpose of this interoperation. A formal definition of se-
mantic interoperability has been proposed in [2]. Modelling
Thermal

B. Interoperability problems in PFE3 Simulation 3

The goal of the Plate-Form(E)3 project is to provide a


first prototype of a software platform in which some exist- Optimisation Energetics
Chemistry
ing tools/methods/approaches have to be connected to
Scale
solve dedicated use cases. This platform may then be con-
sidered as a software bus where any used application can
connect. These use cases concern two types of scenario at
two extreme scales. Figure 1. Plate-Form(E)3 interoperation framework[4]
The first type considers the process scale; the objective
of the future interoperability solution at this scale is to identify the generic scenarios that realize this interopera-
facilitate engineering of a new component in the single tion.
process, to improve the general process performance, e.g.
concerning the energy costs. In this first scenario, interop- We have identified the following generic scenarios [4]:
erability concerns the efficient (optimised) interconnection cross-scale interoperation, cross-domain interoperation,
of unit operations (i.e. basic step in a process such as sepa- cross-feature interoperation (see Figure 1). Also, some
ration, crystallization, evaporation, filtration…) within a first general assumptions on the possible approaches to
single process. Beside material flows, these interconnec- address the problems at this generic level are given.
tions may also need to consider flows of information,  Cross-scale interoperation: the different scales
needed to optimise the process execution, taking into ac- concern the component (optimal design of new
count the cost of energy. technologies), the process/plant (optimisation for
The second type is related to the territory scale; the ob- efficient energy management) and the territory
jective at this scale is to facilitate integrated energy man- (optimisation of potential interconnections be-
agement with a final goal to optimise the energy consump- tween industries). The tools that will potentially
tion in the specific territory. This will be enabled by the be connected with the platform will be used at
collaboration of the different plants in one defined territo- these different scales, producing models that
ry. Two plants could collaborate to exchange the resources need to be exchanged compromising the overall
that are considered as excessive or even as a waste in one performance.
company, but could be used as energy source in another  Cross-domain interoperation: for model-
one (e.g. hot water, steam, heat, pressurized fluid, ..). In ling/simulating/optimising the physical systems
this second scenario, interoperability may also concern through the platform, users use knowledge and
both material and information flows and their sustainable domain-dependant tools that are specialised.
reuse by different facilities, inside a territory. Thus experts’ knowledge covers broad areas of
Thus, the tools that must be interconnected in Plate- physics for modelling thermal, thermodynamics,
Form(E)3 do not operate at the same scale, with the same chemistry and energetics processes. We must al-
business knowledge and on the same models but they so add the experts’ knowledge related to optimi-
must always be able to share information that they pro- sation. Hence, to some extent, cross-domain in-
duce, ensuring the overall coherency of the whole. That teroperability is related to semantic interoperabil-
means that these tools must be interoperable. ity.
The above context can thus be defined through three  Cross-feature interoperation: physical systems
generic and general scenarios that must be realised within modelled in Plate-Form(E)3 will be simulated
two classical interoperability levels (technical and concep- and optimised through these models. Tools for
tual) [3]. Some scientific problems then arise when inter- modelling, simulation and optimisation need to
secting those scenarios with the interoperability levels. be interconnected. This is the problem related to
So, we propose here to define these scientific problems, so-called syntactic interoperability. However, the
based on the consideration of the use cases at two levels of semantics is not a priori excluded as a possible
abstraction. First level considers the most generic interop- asset to achieve the interconnection of the above
erability scenarios. The second level takes into account the tools. Namely, semantics can be used to achieve
conceptual architecture of Plate-form(E)3. syntactic interoperability.
For making the previous scenarios effective, there exist
1) Generic scenarios and scientific problems some barriers (conceptual, technological and organisa-
In order to highlight the underlying problems in setting tional) [3] that define levels of interoperability to be stud-
interoperation between the specialised tools for modelling ied. While organisational barriers are an issue mainly from
physical systems and their optimisation, it is important to

Page 286 of 478


ICIST 2014 - Vol. 1 Regular papers

governmental and privacy perspectives, we will focus on tions within Plate-form(E)3 architecture: custom and na-
technical and conceptual barriers as follows: tive.
 Technical barriers are related to the incompatibil- Custom interoperations are related to a fact that diversi-
ity of information technologies (architecture & ty of tools, of both closed and open architecture (both,
platforms, infrastructure, exchange formats syn- commercial and custom), exists in the environment where
tax …). these interoperations need to be achieved. More important,
 Conceptual barriers are related to the semantic custom interoperations are related to a possibility to intro-
mismatches of information to be exchanged. duce some new tools of unknown capability to interoper-
These barriers concern the modelling at the high ate to the environment, probably even after the Plate-
level of abstraction. form(E)3 prototype is released.
For highlighting the scientific problems linked to the Native interoperations are the ones that occur between
Plate-Form(E)3 project, we propose to intersect the differ- known groups of software tools or modules, whether of
ent generic scenarios with the interoperability levels. open or closed architecture, with natively ensured interop-
eration capability; or between the tools or modules where
Technical interoperability problems appear in each sce- there is possibility to natively develop this capability.
nario. Solving these technical barriers is now easier and
partly achieved by standard techniques and implemented Obviously, native interoperability is more related to
interfaces. We can cite, for instance, XML (eXtensible software integration, while custom interoperability needs
Mark-up Language) and linked applications: SOAP (Sim- approaches with increased level of flexibility.
ple Object Access Protocol) and WSDL (Web Services 3) A preliminary architecture for the platform
Description Language). We must therefore assess whether A preliminary structure of the platform is represented in
the candidate tools for integration into Plate-Form(E)3 use Figure 2. This structure is a conceptual solution to the
existing standards (CAPE-OPEN4, ...) or we must define scientific problems presented before and can be function-
such a standard. When connecting a new tool, we must be ally defined as follows.
able to assess quickly its ability to interoperate with Plate- Coloured symbols represent different interoperability
Form(E)3 at the technical level. issues. While the dark colour (Process integration) indi-
Conceptual interoperability problems concern: cates the problems of higher priority, the light one (GIS
 for the cross-scale interoperation scenario, the integration) represents problems that are of lower priority
dynamics and the granularity of the used models and level of detail. This decision is justified by the esti-
that are not the same. Indeed, the models have mated level of complexity, which is greater when a pro-
not the same time scale when considering a terri- cess integration framework is considered.
tory or a component. Moreover, the different For each of the illustrated connections, two different per-
models represent heterogeneous aggregates of in- spectives are considered: models (unifying models that
formation depending of the scale of the modelled consider a common perspective to the interoperable mod-
system (from territory to component). It is there- ules) and technical interoperability (including technical
fore necessary to formalize and finally assess the approaches, data formats, interfaces).
correlation between models outputs at a given The user interface module allows one user to put some
scale and their use as inputs at another scale information into the system and to define the optimisation
(with the same tool or not). problem (optimisation criteria/objectives, constraints…).
 for the cross-domain or cross-feature interopera- Moreover through this interface, the user can access to the
tion scenarios, the knowledge of several specific different services offered by the platform and the other
domains that are managed by the different tools modules. This user interface must also give to the user all
to be connected through the platform. This heter- the information necessary for decision aiding by calling
ogeneous knowledge produce semantically het- the visualisation module that gives some relevant indica-
erogeneous models that must be exchanged, tors in a dashboard.
stored, processed consistently with the purposes The process integration module is able to connect to
for which they have been built. Moreover, this and to call some external specific tools for model-
raises the issue of the a priori evaluation of the ling/simulating processes defined by the user. Moreover,
ability to exchange ad-hoc models (related to a
specific domain or a particular tool feature) Plate-form(E)3

without knowing in advance the tools that will be User interface

connected to the platform to process these mod-


els (and thus the business semantics of the related
Process integration framework

models that are shared through the platform).


GIS integration framework

GIS database
and/or engine
Simulation tool 1
Optimization

2) General scenarios and scientific problems


Visualization

Simulation tool 2
General scenarios of interoperability are directly related MultiCriteria Analysis

to identified correlations between the different modules of


….

Solver

Plate-form(E)3. When scientific problems are defined at Simulation tool n LifeCycle Analysis
tool
this level, generic scenarios are taken into account, name-
ly, definitions of corresponding scientific problems are Persistence
LifeCycle
database
specialized with consideration of these correlations. It is
proposed to distinguish between two types of interopera-
Figure 2. Preliminary structure of Plate-Form(E)3
4
http://www.colan.org

Page 287 of 478


ICIST 2014 - Vol. 1 Regular papers

this module must be able – for solving the energetic inte-


Models and Integrated modeling Integrated
gration problem for processes – to integrate also all the ontologies tools simulation/optimizat
obtained models in a coherent way without loss and mis- ion tools
understanding. We can distinguish the integration when CAPE-OPEN Modelica CERES
we are in a territory scale or when we are in a pro-
cess/plant scale. Territory scale problem concerns the
integration in terms of considering multiple processes of OntoCAPE Jacaranda OSMOSE
the different plants. In contrast, process/plant scale prob-
lem focuses on a single process, while different compo-
ISO15926 COGents
nents (heat exchangers…) of the different existing simula-
tion tools and libraries may be taken into consideration for
its optimisation. CLiP
The optimisation module solves the mono-objective or
multi-objective problem defined by the user through the
user interface. For defining criteria and constraints, this
Figure 3. Overview of the proposed candidate technologies
module will connect with the model obtained by the inte-
gration module.
The visualisation module gives to the user, through the
neering) software tools; it is maintained by CAPE-OPEN
user interface, an illustration of the relevant indicators for
decision aiding and decision itself, based on the results of Laboratories Network (CO-LaN). It was developed in a
joint EU initiative Global CAPE-Open (1997-99), later
the optimisation and the models that are given by the pro-
cess integration module. This module can also connect to also endorsed by IMS (which gave it a global reach). Initi-
the GIS module if geographical data are relevant consider- ative combined similar efforts of BP (EU project PRIMA)
ing the predefined use case. and BASF (German consortium IK-CAPE).
The GIS integration module will be invoked for territo- CAPE-OPEN defines rules and interfaces that allow
CAPE applications or components to interoperate. This
ry scale use cases if geographical data are relevant for
solving the optimisation problem or for helping the user interoperation is achieved by combining the different so-
called Process Modelling Components (PMC) in model-
for taking its decision. Some examples of the relevant data
are plant locations, landscape features, such as declination ling the process in specific Process Modelling Environ-
or natural obstacles, energy network geo-data, transport ment (PME). PMC is a software component which is in-
routes, etc. tended to carry out a narrow, well-defined function such
as the computation of physical properties, the simulation
All information, regarding the functionality of Plate- of a particular unit operation, or the numerical solution of
form(E)3 is stored in a database for the persistence. certain types of mathematical problems arising in process
simulation or optimisation. Some examples of PMCs are
heat exchanger design models, pump models, distillation
III. STATE-OF-THE-ART models, mixer/agitator calculators, safety relief design
This section presents an overview of the list of candi- calculators, etc. Process Modelling Environment (PME) is
date technologies, models, tools, resources and approaches a software tool that supports the design of a process model
that are considered relevant for resolution of interoperabil- either from scratch or from libraries of existing models, or
ity problems, as defined in the previous section. The can- both. They then allow the user to perform a variety of
didate technologies are foreseen as the possible building different tasks, such as process simulation or optimisation,
blocks of the future interoperability solution. This over- using this single model of the process. Interoperation is
view only presents the basic features of the technologies supported by CAPE middleware, implemented by using
with arguments about possible relevance for the future Microsoft COM, OMG CORBA or .NET technology.
interoperability solution. CAPE-OPEN is the ultimate solution for syntactic in-
Figure 3 gives an integrated overview of the proposed teroperability of process modelling and simulation tools,
candidate technologies with regard to the different sub- endorsed by the industries. It is supported by the wide
problems. Indicated relationships illustrate already exist- range of the different existing tools, such as Aspen,
ing integration between technologies (“uses” relationship). ProSim, SimSci, Belsim and many others. It seems like a
It is important to highlight that this overview considers main candidate for a resolution of interoperability problem
only Process Integration Framework of Plate-form(E)3 at process scale.
and the parts of Plate-form(E)3 architecture that are con- 2) CLiP : Conceptual Lifecycle Process Model
sidered as core tools – related to integrated simulation and CLiP is a comprehensive data model for process engi-
optimisation. Each analyse of technology will conclude neering [6]. It is developed with an objective to general-
with a discussion about the relevance of this technology ize, extend and integrate different existing models for
for Plate-Form(E)3. chemical engineering [7].
Both interoperability problems are related to a process
A. Candidate technologies for Process Integration paradigm. CLiP seems like a prime candidate for model-
Framework ling chemical industry processes, since it generalize, ex-
1) CAPE-OPEN : Open industry standard for tends and integrates different existing models. CLiP is
process simulation software also used as a basis for development of OntoCAPE onto-
CAPE-OPEN [5] is an open industry standard for in- logical framework.
teroperability of CAPE (Computer Aided Process Engi-

Page 288 of 478


ICIST 2014 - Vol. 1 Regular papers

3) OntoCAPE : Large-scale ontology for the domain (optimisation, sensitivity analysis, Pareto curve analysis
of Computer-Aided Process Engineering (CAPE) ...).
OntoCAPE5 captures consensual knowledge of the pro- OSMOSE is a solution for integrated energy manage-
cess engineering domain in a generic way such that it can ment, which is a core of the interoperability problems at
be reused and shared. Some possible applications of On- the territory level (second type).
toCAPE include the systematic management and retrieval
of simulation models and design documents, electronic 3) CERES Platform
procurement of plant equipment, mathematical modelling, A CERES software platform is developed in scope of
as well as the integration of design data from distributed
sources. OntoCAPE can be characterized as a formal, CERES-2 project10, funded by ANR. Its objective is to
heavyweight ontology, which is represented in the OWL optimise waste and heat recovery in industrial processes
modelling language. OntoCAPE has been subdivided in and achieve energy integration. It is developed in C++ and
layers, which separate general knowledge from knowledge it is using OpenModelica, actually Modelica API as mod-
about particular domains and applications. elling and simulation environment.
OntoCAPE is exhaustive semantic information model CERES is already considered as one of the main can-
for data integration across the chemical process design. It didates for integration platform, in specific for addressing
can be a reference for integration and management of process-scale interoperability. Interfaces with simulation
distributed design data, namely process designs of the platforms will be additionally investigated. It seems that
different plants. Thus, it is considered as relevant for terri- the efficiency of these interfaces could be significantly
tory scale interoperability problem. Also, it is used as improved if CAPE-OPEN is considered as a wrapper.
reference ontology for automated decision making related
to configuration of the processes (see COGents).
C. Other candidate technologies, approaches and tools
1) ISO1592611 : Industrial automation systems and
B. Candidate technologies for core Plate-form(E)3
integration - Integration of life-cycle data for process
1) Modelica: Multi-domain modeling language for plants including oil and gas production facilities
component-oriented modeling of complex systems While the above models consider processes in process
Modelica6 is an object-oriented, declarative, multi-domain industries as focal modelling paradigms, ISO15926 aims
modelling language for component-oriented modelling of at providing artefacts for modelling technical installations
complex systems, e.g., systems containing mechanical, and their components.
electrical, electronic, hydraulic, thermal, control, electric The objective of ISO15926 (developed as extension of
power or process-oriented subcomponents. Modelica is a STEP principles to long-life process plants) is to facilitate
modelling language rather than a conventional program- effective and efficient exchange and reuse of complex
ming language. Its classes are not compiled in the usual plant and project information, or in specific to mitigate
sense, but they are translated into objects which are then the current high costs of rekeying and reformatting in-
exercised by a simulation engine. The simulation engine is formation to move it from one proprietary system to an-
not specified by the language, although certain required other. It is mostly related to providing models for equip-
capabilities are outlined. ment and their properties. ISO 15926 acts like an inter-
Modelica is used to develop platforms that could be preter between two otherwise incompatible systems, by
applied for integrated modelling and simulation. Hence translating the descriptions of plant objects from one
the relevance for territory scale interoperability problem. company’s database to that of another. In doing so, the
Examples of these platforms are OpenModelica7 and meaning of all the terms is being maintained, inde-
JModelica8. pendently of the context.
Setup for the process industries with large projects in-
2) OSMOSE : A tool for the design and analysis of volving many parties, and involving plant operations and
integrated energy systems maintenance could take a long time. Optimising existing
OSMOSE9 (Acronym for Multi-Objective Optimisation of processes by replacing an existing component (process-
integrated Energy Systems) is a Matlab platform designed scale interoperability problem) or by adding components
for the study of energy conversion systems. The platform which could facilitate energy integration (territory-scale)
allows linking several software, for flowsheeting (Belsim, assumes procurement of the installation component, or at
Vali, Aspen Plus), energy integration (Easy, GLPK), op- least exchange of the information which is sufficient to
timisation (MOO), and lifecycle impact assessment define the requirements for this component. Obviously,
(Ecoinvent). Among other features, OSMOSE offers a establishment of the correspondences between process
complete suite of computation and results analysis tools and equipment models could contribute to facilitating the
collaboration between the relevant systems (e.g. for
5
http://www.avt.rwth-
aachen.de/AVT/index.php?id=730&L=1 10
http://www.agence-nationale-recherche.fr/en/research-
6
https://www.modelica.org/ programmes/energie-durable/systemes-energetiques-
7
https://www.openmodelica.org/ efficaces-et-decarbones/funded-project-
8
http://www.jmodelica.org/ eesi/?tx_lwmsuivibilan_pi2[CODE]=ANR-10-EESI-0001
9
http://leni.epfl.ch/osmose 11
http://en.wikipedia.org/wiki/ISO_15926

Page 289 of 478


ICIST 2014 - Vol. 1 Regular papers

process modelling and procurement). Existing formal


Plate-form(E)3
representations of ISO15926 [8] could reduce the efforts
User interface
in making these correspondences. Registry
management

Unit Operations
PMCs

2) COGents : Semantic approach to CAPE web PMC registry functions


module

service choreography Simulation tool 1

Process integration framework


Num solvers
Core PIF API

sys PMCs
Property

GIS integration framework


PMCs
Simulation tool 2

COGents project proposed the approach to dynamic


Unit Operations

….
API module

CAPE-OPEN objects
Simulation tool n

Visualization
CAPE services composition [9], where a number of soft- Properties
API module

ware agents collaborate to configure a process model,


Flowsheet
analysis PMCs Flowsheet
API module

according to the users’ requirements, defined by using Solvers


API module

Optimization
OntoCAPE ontology. Namely, agents are used as CAPE Logging

web services choreographers. Typical use of this ap-


functions module

proach is as following: The user defines a Modelling


Task Specification (MTS) in OntoCAPE format to de- Persistence

scribe the unit he/she requires in term of functionality and


PMC
registry Process
log

parameters (of the underlying tool, e.g. HYSYS). Then,


library and match maker agents find the appropriate unit Figure 4. Process Integration Framework architecture in CAPE-
operation using the generated MTS file. OPEN context
COGents provide automated support for configura-
tion/generation of process model, on demand, based on tecture is only a potential architecture that is not a final
the user’s requirements. choice for the project.
In this section, a description of CAPE objects, by CAPE-
3) Jacaranda OPEN standard is provided, together with the methodol-
Jacaranda12 is a system for process synthesis, or automat- ogy and illustrations of some interfaces. The proposed
ed process design, intended for conceptual or early stage architecture is made on basis of the high-level architec-
design [10]. It aims to provide the support necessary for ture, presented in figure 2. It elaborates in more detail a
creative and exploratory design, helping the engineer to Process Integration Framework component of Plate-
identify the important issues and constraints for a given form(E)3, in context of possible use of CAPE-OPEN
design problem. interfaces to exploit the external process modelling and
Jacaranda is a solution for automated process design. simulation tools. This elaborated architecture is illustrated
Therefore, it may be a candidate technology for generat- on Figure 4.
ing cross-plant processes in territory scale interoperability Process Integration Framework (PIF) is a part of the
problem. It is also used in COGents project as optimisa- Plate-form(E)3 architecture whose role is to connect to
tion platform [9]. and invoke some services offered by the external tools,
used for process modelling and simulation. In context of
CAPE-OPEN integration, Process Integration Framework
IV. DISCUSSION should implement functions which are using CAPE-
Based on the previous State-of-the-art, this section OPEN interfaces to access the above services. The func-
shows how CAPE-Open should be used in PFE3 as one of tions are part of so-called Process Integration Framework
the candidate technologies for resolution of process-scale Application Programme Interface (PIF API). It is as-
interoperability problem of PlateForm(E)3. In fact, sumed that this approach is possible under condition that
CAPE-OPEN interfaces between CAPE (Computer-Aided the above tools are CAPE-OPEN compliant. This implies
Process Engineering) tools defined the primary means for that before final selection of the technology used to im-
establishing the systems interoperability in this domain. plement Process Integration Framework, a detailed analy-
They are defined in EC sponsored effort (within two con- sis of the CO-compliance of the final choice of process
secutive projects: CAPE-OPEN and Global CAPE- modelling and simulation tools (to embed to, or to use
OPEN), with participating major industries and labs, thus within Plate-form(E)3) must be carried out. Despite pos-
gaining the global reach. sible non-compliance situations, CAPE-OPEN must be
Today, CAPE-OPEN represents widely accepted ap- carefully considered, since it is today’s de facto industrial
proach, methodology and specification for making the standard for interoperability of process applications. In
different CAPE tools and components interoperable. Ref- this context, the PIF acts as Process Modelling Environ-
erence [11] provided the list of CO-compliant CAPE ment (PME), namely a client or a socket, as it uses the
tools. This list is not exhaustive because of the date of the CAPE-OPEN interfaces in order to request services from
publication. More detailed and updated list is maintained the external software. The process modelling and simula-
at co-lan.org website. Majority of the candidate tools for tion tools, namely their open components, act as Process
process modelling and simulation in Plate-form(E)3 ar- Modelling Components (PMC), or servers or plugs, since
chitecture already provide some level of support to they are applications, wrapped with the CAPE-OPEN
CAPE-OPEN integration. However, the following archi- interfaces in order to expose their functionality. The list
of these functions, namely contents of PIF API should be
defined based on the specific interoperability cases. At
12
http://www.ucl.ac.uk/~ucecesf/jacaranda.html

Page 290 of 478


ICIST 2014 - Vol. 1 Regular papers

this point, it is clear that they should be grouped accord- search ANR in the frame of the Plate-Form(E)3 pro-
ing to the PMC classes they are communicating with. ject. The authors would like also to thank Dr. Milan
Process modelling and simulation tools which are parts of Zdravkovic (University of Nis and Ingline D.O.O) for
Plate-form(E)3 landscape provide PMC classes which his support to this work.
can be used by Process Integration Framework, namely
respective API modules: Properties API module, Unit REFERENCES
operations API module, Numerical solvers API module [1] IEEE. IEEE Standard Computer Dictionary: A Compilation of
and Flowsheet Analysis API module. These modules are IEEE Standard Computer Glossaries. Institute of Electrical and
Electronics Engineers, 1990. ISBN: 1559370793
interfaces which are wrapping the native implementations [2] J. Sowa. Knowledge Representation: Logical, Philosophical, and
of the respective relevant functions in Optimisation mod- Computational Foundations, CA:Brooks/Cole Publishing Co.
ule. They are using CAPE-OPEN objects, such as Ther- 2000
mo, Unit, Numerics and Simulator Executive objects. [3] INTEROP, Enterprise Interoperability-Framework and knowledge
Two other modules are foreseen to provide supportive corpus - Final report, INTEROP NoE, FP6 – Contract n° 508011,
Deliverable DI.3, May 21st 2007.
functions to PIF API. PMC registry functions module [4] A. Aubry, J. Noel, D. Rahon, H. Panetto. A cross-scale models
facilitate adding, editing and deleting PMCs, available to interoperability problem: the Plate-Form(E)3 project case study.
Plate-form(E)3 platform. Logging functions module track 3rd industry Applications and Standard initiatives for Cooperative
and store all activities related to using the different PMCs Information Systems for Interoperable Infrastructures, Sep 2013,
Graz, Austria. Springer, OTM 2013 Workshops. LNCS 8186, pp.
of the different process modelling and simulation tools, 57-61, Lecture Notes in Computer Science. 2013
by the platform. [5] J.P. Belaud, M. Pons. Open software architecture for process
simulation: The current status of CAPE-OPEN standard. Comput-
er Aided Chemical Engineering. Volume 10, pp. 847–852, 2002.
V. CONCLUSIONS AND FUTURE WORKS [6] B. Bayer, W. Marquardt. Towards integrated information models
for data and documents. Computers & Chemical Engineering.
The work presented in this paper is a first step in deal- Volume 28, Issue 8, pp 1249–1266, 2004.
ing with interoperability issues when having to inter- [7] B. Bayer, R. Schneider, W. Marquardt. Integration of data models
connect some tools/methods/approaches in a software for process design - first steps and experiences. Computers &
bus for modelling/simulating/optimising processes for Chemical Engineering. Volume 24, Issues 2–7, pp 599–605, 2000.
energy efficiency at different scales: compo- [8] R. Batresa, M. Westb, D. Lealc, D. Priced, K. Masakia, Y. Shi-
madae, T. Fuchinof, Y. Nakag. An upper ontology based on ISO
nent/plant/territory. 15926. Computers & Chemical Engineering, Volume 31, Issues
Based on an extensive state-of-the-art, a first architec- 5–6, pp. 519–534, 2007.
ture has been proposed based on CAPE-OPEN stand- [9] B. Braunschweig, E. Fraga, Z. Guessoum, W. Marquardt, O.
ard. The next step is to finalise the choice of the archi- Nadjemi, D. Paen, D. Piñol, P. Roux, S. Sama, M. Serra, I. Stalk-
tecture (based on CAPE-OPEN standard or not), to de- er, A. Yang. CAPE web services: The COGents way. Computer
Aided Chemical Engineering. Volume 18, pp 1021–1026, 2004.
velop this architecture and finally to extend it for deal-
[10] E.S. Fraga, M.A. Steffens, I.D.L. Bogle, A.K. Hind. An object-
ing with interoperability issues when connecting some oriented framework for process synthesis and optimization. Fifth
tools/models related to the different scales presented International Conference on Foundations of Computer-Aided Pro-
above. cess Design. pp. 446 – 449, 2000.
[11] J.M. Nougues, D. Piñol, J.C. Rodríguez, S. Sama, P.Svahn.
CAPE-OPEN as a mean to achieve interoperability of Simulation
Aknowledgement Components. 45th International Conference of Scandinavian Sim-
This work has been partially funded by the program ulation Society, SIMS 2001.
SEED 2012 from the French National Agency for Re-

Page 291 of 478


ICIST 2014
Poster papers

Sections:
1. Computing

2. E-Society, E-Government and E-Learning

3. Hardware and Telecommunications

4. Information Systems

5. Integration and Interoperability

Page 292 of 478


ICIST 2014 - Vol. 2 Poster papers

Suitability of Data Flow Computing for


Number Sorting
Anton Kos
Faculty of Electrical Engineering, University of Ljubljana, Slovenia
Tržaška 25, 1000 Ljubljana, Slovenia
[email protected]

Abstract—In this paper we study the suitability of data flow only on the group of comparison based sorting algorithms.
computing for number sorting. We briefly present and All of the most popular sorting algorithms, as well as
discuss the properties of sequential, parallel, and network network sorting algorithms, are members of this group.
sorting algorithms. The major part of this study is dedicated Comparison based sorting algorithms examine the data
to the comparison of the most important network sorting by repeatedly comparing two elements from the unsorted
algorithms and to the most used sequential and parallel list with a comparison operator, which defines their order
sorting algorithms. We present the effects of sorting in the final sorted list. In this paper we divide comparison
algorithm parallelization and further discuss its impact on based sorting algorithms into three groups based on the
sorting algorithms implementation on control flow and data time order of the execution of compare operations:
flow computers. The obtained test results clearly show that
sequential sorting algorithms execute the comparison
under certain limitations, when measuring the time needed
operations in succession, one after another, parallel
to sort an array of numbers, data flow computers can
sorting algorithms execute a number of comparison
greatly outperform control flow computers. By finding
solutions to current problems of data flow sorting
operations at the same time, network sorting algorithms
implementation, important improvements to many
are essentially parallel algorithms; they have the property
applications that need sorting would be possible.
that the sequence of comparison operations is the same for
all possible input data.
I. INTRODUCTION A particular comparison based sorting algorithm can
have one or more versions belonging to one or more of the
Sorting is one of the most important computer above listed groups. For instance, merge sort can be
operations. Therefore, a constant quest for better sorting executed sequentially, it has its parallel version, and it can
algorithms and their practical implementations is be implemented as a network sorting algorithm.
necessary. Sorting is also an indispensable part of many
applications, often concealed from user. One of many A. Sequential Sorting
such examples is searching for information on the It has been proven [1] that comparison based sequential
Internet. Most common, search algorithms work with sorting algorithms require at least the time proportional to
sorted data and search results are presented as an ordered on average, where is the number of items
list of items matching the search criteria [3]. to be sorted. Properties of some of the most used
To date most computer systems use well studied comparison based sorting algorithms are listed in Table I.
comparison based sequential sorting algorithms [2] that
have practically reached their theoretical boundaries. TABLE I
Speedups are possible with parallel processing and the use PROPERTIES OF THE MOST POPULAR COMPARISON BASED SORTING ALG.
of parallel sorting algorithms. Sorting time – O(x) notation
We can achieve parallelization through the use of multi- Algorithm Average Best Worst
core or many-core systems that can speed up the sorting in Insertion
the order proportional to the number of cores. Recently a Selection
new paradigm called data flow computing re-emerged. It Bubble
offers immense parallelism by utilizing thousands of tiny Quicksort
simple computational elements, improving the Merge
performance by orders of magnitude. Heap

The motivation of this paper is to investigate the


possibilities of using the data flow computing paradigm It can be seen that the average, the best, and the worst
for sorting algorithms and their implementation on a data sorting times vary considerably among algorithms.
flow computer. We have the possibility to work with the Especially the best sorting time is heavily dependent on
Maxeler MAX2 data flow computer system on which we the configuration of input data. For instance, insertion sort
have carried out all our tests. has the average and the worst sorting time of , but
with the nearly sorted input data it needs only
II. SORTING ALGORITHMS operations, where is the number of needed inversions.
Sorting algorithms can be classified on different On the other hand, quick sort has the average and the best
criteria, such as computational complexity, memory sorting time of , but in some special cases it
usage, stability, general sorting method, and whether or has problems with the nearly sorted input data, where it
not they are comparison sorting [5].We will concentrate has the worst sorting time of [3].

Page 293 of 478


ICIST 2014 - Vol. 2 Poster papers

While on average, the best choice are Quicksort, Merge Sorting networks are the implementations of network
sort, Heap sort, and Binary tree sort, one would like to sorting algorithms and they consist only of comparators
avoid Quicksort as its worst sorting time in some rare and wires. Sorting networks can be described by two
cases can reach . On the other hand, if the properties: the depth and the size. The size is defined as
configuration of data is expected to be favourable (nearly the total number of comparators it contains. The depth is
sorted, for instance), the best choice could be one of the defined as the maximum number of comparators along
algorithms with sorting time that is linearly proportional to any valid path from any input to any output [2].
(Insertion, Bubble, Binary tree, and Shell sort). We see By inspecting the properties of network sorting
that the choice of the best sorting algorithm is not at all an algorithms in Table II, we can conclude that Bubble
easy task and depends on the expected input data size and network sorting is inferior to the others in both properties.
configuration. While the size of the Bitonic network is larger than the
B. Parallel Sorting size of Odd-even merge network, its constant number of
comparators at each stage can be an advantage in certain
Parallelization of sorting algorithms can be applications. If the later is not important, then the best
implemented by using multi-core and many-core choice would be the use of an Odd-even sorting network.
processors [6]. Generally the term multi-core is used for
processors with up to twenty cores and the term many- TABLE II
core for processors with a few tens or even hundreds of PROPERTIES OF SOME NETWORK SORTING ALGORITHMS.
cores. In most practical cases this approach is not optimal,
Sorting
as for a true parallel sorting, such a system would need the network
Depth Size
number of cores in the order of number of items to be
sorted ( ). In many applications grows into thousands, Bubble
millions and more.
Comparison based sorting algorithms are Bitonic
computationally undemanding as the computational
operations are simple comparisons between two items. To Odd-even
sort a set of items, we would need a set of /2 very merge
basic computational cores primarily designed to perform
the mathematical operation of comparison. In addition to Assuming that all the comparisons on each level of the
that, such computational cores would need some control sorting network are done in parallel, its depth defines the
logic in order to execute a specific sort algorithm. number of steps needed to sort numbers on the inputs
Data flow computing is a good match for parallel and thus defines the time needed to sort the input. The size
sorting algorithms because of its possibility of executing of the sorting network tell us how many comparison is
many thousands of operations in parallel, each of them needed, hence how many comparators we need to
inside a simple computational core. The only limitation is construct a sorting network. For instance, in hardware
the absence of control over the sorting process in implementations the size defines the required chip area.
dependence of intermediate results, meaning that the
sequence of operations of the sorting process must be III.
NETWORK SORTING VS. SEQUENTIAL AND
defined in advance. This fact prevents the direct use of PARALLEL SORTING
sorting algorithms from Table I as they are designed for Theoretically the number of sequential operations or
control flow computers; hence they determine the order of comparisons for Quicksort sorting algorithm is in the
item comparisons based on the results of previous order of and for the network version of the
comparisons. The possible solution is the adaptation of Bitonic or Odd-even merge sorting in the order of
those sorting algorithms in a way that ensures their [1]-[4]; i.e. theoretically, Quicksort is
conformance to data flow principles. For instance, if we better than Bitonic merge algorithm by a factor of .
can assure that the parallel sorting algorithm can be This statement is true when we disregard the influence of
modeled as a directed graph, then the sorting process sorting algorithm constants.
conforms to the data flow paradigm.
A. Sorting Algorithm Constants
C. Network Sorting
Considering the algorithm constants, the number of
Network sorting algorithms are parallel sorting operations for Quicksort algorithm is in the order of
algorithms with a fixed structure. Many network sorting and for the Bitonic merge algorithm
algorithms have evolved from the parallel versions of in the order of ; what gives us the
comparison based sorting algorithms and they use the
ratio of or , where algorithm
same sorting methods like insertion, selection, merging,
etc. Sorting networks structure must form a directed constants ratio is defined as . We expect,
graph, which ensures the output is always sorted, that for the discussed sorting algorithm pair, .
regardless of the configuration of the input data. Because Network sorting algorithms conform to the data flow
of this constraint, network sorting algorithms that are paradigm, they have practically no computational
derived from parallel sorting algorithms will in general overhead (they have no need for process control);
perform some redundant operations. This makes them therefore the Bitonic merge network sorting has a small
inferior to their originating parallel sorting algorithms in constant . On the other hand Quicksort decisions
the number of operations (comparisons) that they must depend heavily on the results of previous operations and
perform. hence Quicksort has a large algorithm constant .

Page 294 of 478


ICIST 2014 - Vol. 2 Poster papers

For small values, where , Quicksort prevails over algorithm constants and sequential
algorithm should be slower than Bitonic algorithm and for algorithms become faster.
large values, where , Quicksort algorithm
should become faster. To prove our assumptions we have B. Parallelization
run a series of tests where we measured the sorting times Despite the encouraging results from Figure 2, the
of the Quicksort algorithm and the network version of the following question remains: “Can we expect, that for any
Bitonic merge sorting algorithm. All results for both larger values of , network sorting would outperform
algorithms, presented in Figure 1, were obtained by sequential comparison based sorting?” Even if we exploit
sequential computation (no parallelism is employed) on parallelism, wouldn’t it decrease the computational time
the PC using algorithms written in C code. We can by the same factor for all algorithms (parallel execution of
observe, that sorting time curves cross at approximately sequential algorithms and parallel execution of network
. Below that number the sequential version of algorithms), and the performance ratio would stay the
Bitonic network sorting is faster than Quicksort and the same? The answer lies in the change of computational
opposite above that number. paradigm and moving to the domain of data flow
computing. Let us illustrate that through an example.
1000000
TABLE III
QuickSort PARALLELIZATION OF SORTING ALGORITHMS
Bitonic
100000 Values for the best algorithm of type
(expressions given in O(x) notation)
Measure Parallel (N) Parallel (P) Network
10000
Comparisons
Sorting time
1000

For a true parallel execution of a sorting algorithm we


100 need computational cores. With that ensured,
16 64 256 1024 4096 16384 sorting times for such a parallel algorithm are in the order
Figure 1. Comparison of sorting times for Quicksort and Bitonic of for classical algorithms and
Mergesort network algorithm in dependence on the number of items for network algorithms. Let us assume that the best
being sorted ( ). parallel control flow system has a maximum of
computational cores. Eventually, with the growing , we
0,020 will get to the point where and sorting times of
Bitonic
classical parallel algorithms will be in the order
Bubble
; again growing faster than linearly and
OddEvenMerge
0,015
not truly parallel. Since that is not desirable, the sorting
QuickSort
should move to the data flow computers that can ensure
BubbleSort
enough cores for a true parallel execution.
HeapSort
0,010 MergeSort
10000

0,005
1000

0,000
8 16 32 64 128 100
N
Sequential
Figure 2. Comparison of the average sorting times between the Network
popular sequential sorting algorithms (solid lines) and network sorting Parallel (N)
10
Sorting time

algorithms (dashed lines). Parallel (P=4)


Parallel (P=8)
After we have proved, that algorithm constants for Parallel (P=16)
network sorting algorithms can be considerably smaller 1
Parallel (P=32)
that the constants of sequential sorting algorithms, we 0 200 400 600 800 1000
have conducted similar tests and comparisons for the most N
popular sequential sorting algorithms and the most
Figure 3. Expected sorting times for the algorithms from Table III.
popular network sorting algorithms. The results are shown Sorting time is given in cycles (time to do one comparison) needed to
in Figure 2. Let us emphasize again that all the results for sort an array of N items.
all algorithms are obtained by the sequential computation
on a PC using C code. We can see that for the smallest Number of comparisons and sorting times for different
values of network sorting algorithms outperform any implementations of sorting algorithms and different
sequential sorting algorithms. When grows, the higher degree of parallelization are listed in Table III. For the
order of computational complexity of network algorithms sequential algorithms the sorting time is proportional to

Page 295 of 478


ICIST 2014 - Vol. 2 Poster papers

the number of comparisons. With parallel algorithms we D. Experimental Results


execute (true parallel) or (near parallel) comparisons To demonstrate the validity of the above conclusions,
at the same time and sorting times are for the we have devised a number of tests on a control flow
corresponding factor smaller. Network algorithms execute computer (PC) and on a data flow computer (Maxeler
all the comparisons of each step in parallel. MAX2 card). Figure 4 shows the speedup in sorting times.
In Figure 3 we plot the expected sorting times for the The speedup is the ratio between sorting times on a PC
algorithms from Table III without the consideration of the and sorting times on a MAX2 card. We can see that with
algorithm constant . The curves show, that when the growing number of arrays ( ), the speedup becomes
grows, the true parallel sorting algorithm is superior to all higher, what confirms our assumptions and we can state,
of them, followed by the network sorting algorithm and that under certain conditions, data flow computing is
the near parallel sorting algorithm. The sequential sorting suitable for number sorting and outperforms control flow
algorithm is the slowest. computing.
C. Control Flow vs Data Flow Computers 20
Based on the results in Figure 3, we can state that in a
control flow computer, true parallel sorting algorithm is 15
clearly the first choice. But if we move to a data flow
computer, things change considerably. In a data flow 10
computer data flows between operations organized as a
directed graph. In the case of network sorting algorithms, 5
the sorting network structure is a directed acyclic graph
with comparators organized to sort the input array of
0
values. 1 10 100 1000 10000 100000 1000000
When we sort one array, the sorting time is directly M
proportional to the depth of the sorting network and in Figure 4. The sorting speedup for arrays of size in
each cycle only one layer of comparators is active, the dependence from the stream size .
other stay idle. One cycle is defined as time needed to do
one comparison step. One layer of comparators represents CONCLUSION
all comparators of one step of the sorting algorithm. Such Not all algorithms are suitable for data flow computing.
a sorting scenario is more suitable for control flow In this paper we show that number sorting is suitable for
computers. Data flow computers are designed for data implementation on data flow computers and can, under
flows or data streams, in the case of sorting, that would be certain conditions, greatly outperform the control flow
a stream of arrays of values to sort. computers. There is a lot of work still to be done. One of
For instance, if we have ararys of values to be the main obstacles to date is the small array sizes that can
sorted, we can send them to the sorting network one after be implemented on data flow computers. We expect that
another. Arrays enter the sorting network in one cycle with the advances in data flow computers. By finding
intervals. Similarly, after the first array is sorted on the solutions to the above and other problems and obstacles,
output, the subsequent arrays exit the sorting network in a serious improvements to many applications that need
one cycle intervals. Each step of the algorithm operates on sorting, would be possible.
a different array. When reaches the depth of the sorting
network, all comparators of the network are active. In REFERENCES
such scenario the sorting time for the first array is in the [1] Donald E. Knuth, “The art of computer programming. Vol. 3,
order of , all the rest follow in one cycle Sorting and searching”, Addison-Wesley, 2002
intervals and their sorting time is essentially in the order [2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest,
of . Clifford Stein, “Introduction to Algorithms, Second Edition”,
Cambridge (Massachusetts), London, The MIT Press, cop. 2009
Comparing the sorting of ararys of values on a [3] Robert Sedgewick, “Algorithms in Java, Third Edition, Parts 1-4”,
data flow computer (sorting network) and on a control Addison-Wesley, 2010
flow computer (true parallel sorting) gives us interesting [4] K.E. Batcher, “Sorting networks and their applications”,
results. The sorting time for the control flow computer Proceedings of the AFIPS Spring Joint Computer Conference 32,
with the true parallel operation is in the order of 307–314, 1968
and for the data flow computer with sorting [5] “Sorting Algorithm”,
network in the order of . When http://en.wikipedia.org/wiki/Sorting_algorithm,
both sorting times are comparable, but when accessed 20.1.2014
, network sorting on a data flow computer [6] “Parallel computing”,
http://en.wikipedia.org/wiki/Parallel_computing,
becomes much faster. accessed 20.1.2014
The conclusion of this consideration is that for small [7] “Sorting Network”, http://en.wikipedia.org/wiki/Sorting_network,
and small , the best choice is parallel sorting algorithm accessed 20.1.2014
on a control flow computer, for large and small data [8] Nadathur Satish, Mark Harris, Michael Garland, “Designing
flow computer will always perform better, for large and Efficient Sorting Algorithms for Manycore GPUs”, IEEE,
small control flow computer will always perform International Symposium on Parallel & Distributed Processing,
2009
better, when both and are large, we can not be
[9] http://www.maxeler.com/technology/dataflow-computing,
conclusive because much depends on the ratio . accessed 27.4.2013

Page 296 of 478


ICIST 2014 - Vol. 2 Poster papers

Application of Digital Stochastic Measurement


over an Interval in Time and Frequency Domain
Boris Ličina1, Platon Sovilj1
1Department of Power, Electronics and Communication Engineering, Faculty of Technical Sciences,
University of Novi Sad, Trg Dositeja Obradovica 6, 21000, Novi Sad, Serbia
[email protected], [email protected]

Abstract—Widely employed strategy “measurements in


a point” has represented the backbone in measurement
evolution and has become a standard method. Time-
continuous signals are sampled (at time instant) and
converted into discrete digital variables with maximum
accuracy. An alternative approach called “digital Figure 1. Block diagram of application of a uniform random dither h
stochastic measurement over an interval” carries clear to the measured signal y
advantages in three challenging areas: measurement at
high frequencies, measurement of noisy signals and The motive of “measurement over an interval” strategy,
measurement that requires high accuracy and linearity. formulated in [3], was a very simple hardware (low
This overview paper summarizes all cases of application resolution flash A/D converter, hence lowering the
of this approach in time domain as well as in frequency number of systematic error sources) and easy parallel
domain which have been developed so far. Described processing (practically without additional delays in signal
measurement concepts enable simple designing of the processing). Measurement over an interval is an integral
instruments with advanced metrological characteristics. approach to measure a signal and its parameters - a signal
Keywords—Electrical measurements, Digital measurements, [4] or some of its parameters [5] are measured during a
A/D conversion, Probability, Stochastic processes. finite time interval of an arbitrary duration.
From theoretical point of view, the problem is highly
I. INTRODUCTION non-linear and stochastic, and therefore neither the
standard linear Theory of discrete signals and systems nor
The essence of the sampling method is as follows: in a the Theory of random processes can be applied. It was
theoretically infinitely short time interval (practically in an necessary to develop an alternative mathematical
instant), a sample of an analogue measured variable is approach. The time within the measurement interval is
taken and in a time interval t this sample is converted treated as a stochastic variable with a uniform distribution.
into a number in a device called A/D converter. This Consequently, the problem of measurement over the
commonly used strategy called “measurement in a point” interval can be classified in the Probability theory and the
(sampling method measurement) has been the backbone of area of Statistical theory of sampling.
the measurement instrumentation development in
metrology, control, telecommunications, etc. In the II. MEASUREMENT IN TIME DOMAIN
conversion process, accuracy and speed are opposing
requirements. The mathematics in the background that A. Measurement of constant voltage
explains this approach is algebra, while the applied theory
is the Theory of discrete signals and systems. The device called Stochastic additional A/D converter
with one dither generator (SAADK-1G) can be used for
The high speed of all electronic circuits implies the average value measurement. This device is shown on
viability of the Central limit theorem for practical Fig. 2:
measurements. This idea has been known since early
1960’ş [1], but in different context (stochastic computer Voltage ranges and decision thresholds associated with
design). It has been shown [2] that a singular process of measuring average input signal by uniform
measurement does not need to be maximally accurate, quantizer are represented graphically in Fig. 3.
while the measurement uncertainty is reduced by adding a
uniform random noise (dither) to the input signal prior
quantization (Fig. 1). Probability density function (PDF)
of uniform random distribution dither signal h is:
1 a
ph  for h  (1)
a 2
where quantum of uniform quantizer is labeled with
a  2g . Figure 2. Block diagram of SAADK-1G

Page 297 of 478


ICIST 2014 - Vol. 2 Poster papers

Figure 5. Measurement of the average signal value

obviously:
y  h  3g (8)
Possible values of  are    2 g ,0,2 g , and the
analytical term for  is:
Figure 3. Voltage ranges and decision thresholds associated with   2 g  (b1  b1 ) (9)
process of measuring 
where b1 , b1  0,1 and b1  b1  0 . It is never possible
For the subsequent exposure let us assume that the that b1 and b1 are equal to 1 simultaneously - it would
following conditions are satisfied: have meant that y  0 and y  0 simultaneously.
y  R, R  Z  a, (2)
The counter shown on Fig. 5 works like accumulator - it
Obviously: accumulates summands (b1i  b1i ) . At the end of
a (3) measurment interval accumulator output value is equal to
yh  R N
2
In the case of dithered constant voltage at the input of
 (b1i  b1i ) . Microprocessor gets information about
i 1
the converter y  const  n  a  a let determine average number of samples from clock. At the end of
value of the A/D converter’s output  (from Fig. 1). measurement interval microprocessor calculates average
  n1  P(n1 )  n  P(n )  n  a  a  y (4) value  as:
For upper conditions the variance of the variable  is: 2g N 1 N (10)
   (b1i  b1i )    i  y
 2  a  a  a (5) N i 1 N i 1
If the sampling frequency tends to infinity and the
It is obvious that for any y  R the variance is limited
registers are of infinite length, then   y . This relation
2
with  2  a . Consequently,  2 is limited as follows: is also valid if the sampling frequency is finite, but the
4 measurement time is infinitely long.
a2 (6) B. Measurement of the average value of limited
 2   2  y 2  2  y2 
4 function over closed time interval
This is a very important characteristic of above The crucial question for practical implementation of the
described measurment process. In this case input signal y measurement shown at Fig. 5 is what is the difference
is not a constant anymore but rather has its range of value. between  and y if both, sampling frequency and the
Fig. 4 shows dependence  2 of a (distance y to measurement interval, are finite? For assessing the
nearest quantum level). measurement uncertainty for the measurement from Fig. 5
Device for average signal value measurement is very we need to apply Central limit theorem.
simple, with the minimal hardware structure as shown in This result introduces the measurement time interval,
Fig. 5. As Z (number of positive quantization levels) is not a point in time, as an important variable for expressing
not specified, all the above is valid for Z  1 . In such a measurement results.
case (2) becomes: Let y  f (t ) (it is not limited to be constant signal) be
y  R, R  a  2g (7) a limited integrable function and h a uniform random
signal, both of which satisfy conditions given with (1). If
t is a random variable with a uniform distribution whose
PDF is: pt   1 /(t2  t1 ) , then y is random variable
dependent on t . Then the average value of the A/D
converter’s output  over the interval t  t1 ,t2  .
(applying of sifting property of Dirac delta functions) is
given as:
t2
1
Figure 4. Dependence of output value variance ADC upon distance to
nearest quantum level

t 2  t1  f t   dt  y (11)
t1

Page 298 of 478


ICIST 2014 - Vol. 2 Poster papers

This result introduces the measurement time interval,  takes its values from   0, 1  0, 2 g, so the
not a point in time, as an important variable for expressing converter output is now given with:
measurement results. The device from Fig. 5 measures t2
average value of the input signal over the interval. 1 (16)
If Z  1 , if error e of a single measurement of y is
 
t 2  t1  f t  dt
t1

defined with e    y , then variance of the error Corresponding measurement error is:
 e2   2   y2 of this measurement is: t
 1 t2 
2
2g 2
2g 2
t
1 2 2
t
 f t  dt    f t  dt
f t   dt  f t   dt (12) t 2  t1 t1  t 2  t1 t1 
 e2  
t2  t1 t1 t2  t1 t1  2  (17)
N
Quantity e is random, and e  0 , hence e 2   e2 . In the case of sinusoidal signal y(t )  A  sin t
converter output is:
Because both the Central limit theorem and Statistical
T
sampling theory can be applied to the individual 1 A T 2A (18)
measurements error e , average value of the A/D
 
T 0
A  sin t dt  
T 2
 (1  1) 

converter’s output is:
For each individual analytic sample  is
  ye (13) i  2 g  (b1i  b1i ) . The counter shown on picture is now
Because   y , average error is e  0 , hence: again in role of accumulator, but now it accumulates
summands b1i  b1i . At the end of measurement interval
 2   y2   e2 (14) N

As y is deterministic variable that characterizes the


accumulator output value is equal to  b1i  b1i .
i 1
signal, thus having error determined only with member Microprocessor gets information about number of samples
 e2 (not with  y2 ). from clock. At the end of measurement interval
microprocessor calculates average value  as:
Also, as Central limit theorem and statistical sampling
theory can be applied to the error e , next estimation 2g N 1 N
follows:    b1i  b1i    i  y (19)
N i 1 N i 1
 e2 Modified SAADK-1G can be used for sinusoidal signal
 e2  (15)
N y(t )  A  sin t RMS measurement as described before.
where N is the number of samples within the time Measurement accuracy is then:
interval T  t2  t1 . 2 A 4 A2
2g   2
More details on this measurement case are given in [3,
σ Ψ2  π π  4 A g  A (20)
9, 10] and according to them standard measurement
uncertainty can be defined as  e .
N π2N
Depedence of output value variance (which is also the
absolute measurement error) is shown on Fig. 7.
C. Measurement of amplitude and RMS value the
sinusoidal signal
If we use device from Fig. 5 to measure sinusoidal
signal y(t )  A  sin t average value, measurement value
will always be 0, because the average value of sine or
cosine signals on integer number of periods is equal to 0.
For this purpose the structure of device from Fig. 5 is
modified by adding one “or” circuit together with using
„up” counter input only so providing „a two-way Figure 7. Error diagram for modified SAADK-1G in sinusoidal regime
rectifier“. Microprocessor gets number of periods from
“Zero Crossing Detector” (ZCD) - Fig. 6. D. Measurement of RMS value of complex periodic
signal
Device for this measurement is called Stochastic
additional A/D converter with two dither generators
(SAADK-2G). It is shown on Fig. 8.
When performing this measurement the same signal is
introduced to both device inputs y1 (t )  y2 (t )  u(t ) and
RMS value of the signal is calculated as:
T
Figure 6. Implementation of sinusoidal signal measuring device based 1 2
u t  dt (21)
T 0
on modified SAADK-1G U eff 

Page 299 of 478


ICIST 2014 - Vol. 2 Poster papers

Device working on this base is called Stochastic Let’s determine the variance of measurement error for
additional A/D converter with two dither generators product of signals y1 and y2 when signals power is
(SAADK-2G) as shown on Fig. 8. measured using the device from Fig. 8. If   y1  y 2  e is
This measurement is practically reduced to the special an instantaneous value of the multiplier output, where e is
case of power measurement. the product’s measurement error, the variance
E. Measurement of power  e2   2   y2 y of the error e is:
1 2

SAADK-2G device shown on Fig. 8 is used also for 2 g  2 t2


1 2 2
t

f1 t  f 2 t  dt  f1 t  f 2 t dt (25)
t2  t1 t1 t2  t1 t1
2
active power measurment. We have voltage signal  e2 
y1 (t )  u(t ) on the one imput of SAADK-2G and current
signal y2 (t )  i (t ) on the other one. Power of signal is The consequence of above, is that the Central limit
now calulated as: theorem and the theory of samples are both valid for the
quantity e , hence:
T
1
ut   it  dt  e2
T 0
P (22)
 e2  (26)
N
Device from Fig. 8 have 2 two-bit flash A/D converters where N is the number of samples within the time
from Fig. 5, with inputs y1  f1 (t ) and signal h1 , and interval T  t 2  t1 .
y 2  f 2 (t ) and signal h2 , respectively. h1 and h2 are Estimation (26) is correct if the discrete sets of samples
mutually uncorrelated random uniform dither signals. 1  1 (1), 1 (2),  , 1 ( N )
can represent function
Outputs 1 and 2 are passed to a multiplier; the y1  f1 (t ) and 2  2 (1), 2 (2),  , 2 ( N ) can represent
multiplier output is   1  2 , and it can assume values:
function y2  f 2 (t ) , which means that the Nyquist’s

   2 g  ,0,2 g  .
2 2
 conditions, regarding a uniform sampling of the signals
If, during one measurement interval, N A/D y1 and y2 , is satisfied.
conversions are performed by each A/D converter, then The limit of precision is analyzed theoretically, via
the accumulator from Fig. 8 accumulates the sum of N simulation and experimentally in [5, 7].
N
subsequent multiplier’s outputs:  1 (i)  2 (i) . F. Measurement of definite integral product of two or
i 1
more signals
This accumulation can be simply used for calculation of
the average value of the multiplier output  over the The device from Fig. 8 can be extended to S inputs
measurement interval as: ( yi  fi (t ), i  1,2,..., S ), and the two-input multiplier can
be replaced with a S-input multiplier, thus being adjusted
1 N (23)
   1 (i)  2 (i) for measurement of the averaged multiplier’s output as
N i1 follows:
Over the time interval T  t2  t1 , the average value of 1 N
   1 (i)  2 (i)  S (i) (27)
the multiplier output  , is determined as follows: N i1
This can be generalized for a product of S signals, and in
t
1 2 (24)
t 2  t1 t1
 f1 (t )  f 2 (t )  dt  y1  y2
such a case, the averaged output of the multiplier is also:
t2
The practical consequence of this is that device from 1
  f1 (t )  f 2 (t )    f S (t )  dt (28)
Fig. 8 can be used for measurement of signal power. t 2  t1 t1

Variation of the measurement error is:


2 g S t f t   f t     f (t )  dt 
2

t t 
 e2  1 2 S
2 1 t1
t2
(29)
1
 f1 t   f 2 t     f S
2 2 2
 (t ) dt
t 2  t1 t1

III. MEASUREMENT IN FREQUENCY DOMAIN


The RMS instruments in [8] and [9] measure the same
input signal in two internal channels in low resolution but
at a very high speed. Two uncorrelated stochastic dither
functions are superimposed onto input signals in channels
1 and 2, as shown in Fig. 9. The high speed of both
sampling and further processing facilitates the RMS
Figure 8. Block diagram of SAADK-2G for realization of RMS measurement by the very definition of the RMS value of a
measuremet signal, giving a higher resolution of the result and, hence,
very accurate measurements.

Page 300 of 478


ICIST 2014 - Vol. 2 Poster papers

which stems from a combined effect of quantization


within the A/D converter and the introduced dither, i.e:
  1  2  y1  y2  e (32)
The first term of this multiplier output is the signal that
has to be measured. The first and the second terms in (32)
are independent; hence, their average values over the
measurement period and their variances are also
Figure 9. Schematic of the stochastic instrument for one (sine or uncorrelated. The average value of the second term in (32)
cosine) harmonic component is zero, as shown in [9], and hence does not feature in the
average value of the expected output  over the
As the aim of the instrument presented in this paper is
measurement period. Hence:
to measure the harmonics of mains voltages and currents,
the RMS instruments in [8] and [9] are adapted for such a 1
T
(33)
T 0
purpose. The first modification is that the input to channel  y1  y 2 dt
2 is not a measured signal but a dithered sine or cosine
function that is generated in advance, stored in the In digital measurements, the average value is obtained as
MEMORY, and quickly retrieved during measurement, as 1 N
shown in Fig. 9. Second, such a structure is implemented   k
N k 1
(34)
in parallel for each sine and cosine component of each
harmonic to be measured, thus enabling very high The summing of the samples during the measurement
processing speed (Fig. 10). interval is done by the instrument itself, and this sum is
Dithering signals h1 and h2 are random, uniform, and the output of the structure shown in Fig. 9. The division
by the number of samples N is performed in a
mutually uncorrelated [2]. They are generated in a way to microprocessor, which also calculates each sine (or
satisfy the following conditions that limit their amplitude cosine) component of the k-th harmonic from the
and define their probability density function: appropriate channel as:
0  hi  i / 2 (30)
ak  2cos k / R , bk  2sin k / R (35)
phi   1 /  i , for i  1,2 (31) The calculation of measurement uncertainty is an
They are also superimposed onto the measured signals extension of the theory developed in [6]. Relative
y1 (any continuous function of time) and y2 (a measurement uncertainty for the digital multiplier output
 is limited by:
continuous base function in the generalized case or a
prestored dithered sine or cosine function in a specific 1 (36)
u ( )  (Y2  )/ N
case of Fig. 9), respectively. Their sampled values at every 2
time instant within measurement interval T are 1 and 1
u  (Y2  ) /(  N ) (37)
2 , respectively. These sums are then processed by two 2
flash A/D converters, which perform the A/D conversion where Y2 is the RMS value of the dithered base function
within a single clock cycle. The A/D converters can be of
a two-bit structure, as in [8], or a multi-bit structure in the signal in channel 2, 1 is the A/D converter quantum,
generalized case. and N is the number of samples over a measurement
The sampled digital signals 1 and 2 are multiplied, interval [6]. The relative standard uncertainty is related to
and their product  is numerically integrated in the the standard measurement uncertainty u () by:
accumulator over measurement period T . Finally, a
microprocessor (not shown in Fig. 9) will pick up the u  u() /  (38)
accumulated value at the end of the measurement interval If R is the amplitude of the dithered base function Y2 ,
to perform the final processing.
then:
The measured value (i.e., multiplier output)  differs
from the input signal’s product by measurement error e , Y2  R / 2 (39)
According to (36), (37), (38) and (39) and the relation
between the Fourier coefficients ak and bk and 
presented in [12], the standard measurement uncertainty
of any Fourier coefficients measured with this method is
limited by:
1
u (ak )  u (bk )  2  / N (40)
2
and standard measurement uncertainty of harmonic
amplitude is given by:
2 2
u ( ak  bk )  1 / N (41)

Figure 10. Schematic of the stochastic instrument for one (sine or Therefore, the system can have very good accuracy
cosine) harmonic component when increased number of samples N . If the A/D

Page 301 of 478


ICIST 2014 - Vol. 2 Poster papers

converter would be an ideal one, then 1  0 , and the derived values. All values have variable resolution in
time and value. The basic accuracy of these values is
right side of (40) was 0. 0.2% of full scale. This device is remotely controlled.
It can be seen that the measurement uncertainty is
influenced by the RMS value of the signal in channel 2 Y2 ACKNOWLEDGMENT
and the resolution in channel 1 1 , as well as by the This work was supported in part by the Provincial
number of samples within the measurement period N . Secretariat for Science and Technological Development of
Autonomous Province of Vojvodina (Republic of Serbia)
At first sight, it seems that the reduction in Y2 will under research grant No. 114-451-2723, and supported in
reduce the measurement uncertainty. However, Y2 also part by the Ministry of Science and Technological
Development of the Republic of Serbia under research
(via 2 and  ) defines the value of  ; hence, the grant No. TR32019.
measurement uncertainty (41) is not sensitive to the
amplitude of the signal in channel 2. On the contrary, Y2 REFERENCES
should be as large as possible, so that the channel-2 range [1] J. von Neumann, “Probabilistic logic and the synthesis of reliable
is fully utilized, thus maximizing the measurement result organisms from unreliable components,” in Automata Studies, C.
E. Shannon, Ed. Princeton, NJ: Princeton Univ. Press, 1956.
 . Signal y2 is within the range ± R ; quantum 1 is
[2] M.F. Wagdy and W. Ng, "Validity of uniform quantization error
defined by the chosen A/D converter resolution in channel model for sinusoidal signals without and with dither." IEEE
1, whereas the number of samples N can be a Transactions on Instrumentation and Measurement, vol. 38, no. 3,
compromise between the necessary instrument speed and June 1989. pp.718-722, doi: 10.1109/19.32180.
the required accuracy. [3] V.V. Vujicic, I. Zupunski, Z. Mitrovic and M.A. Sokola,
"Measurement in a point versus measurement over an interval."
Measurement uncertainty, prototype devices and Proc. of the IMEKO XIX World Congress; Lisbon, Portugal. Sep.
experimental results of this measurement case are 2009. pp. 1128-1132 no. 480.
discussed in [4, 6]. [4] V. Pjevalica and V.V. Vujicic, "Further Generalization of the
Low-Frequency True-RMS Instrument." IEEE Transactions on
CONCLUSION Instrumentation and Measurement, vol. 59, no. 3, March 2010. pp.
736-744.
This is an overview paper which summarizes all cases [5] D. Pejic, M. Urekar, V. Vujicic and S. Avramov-Zamurovic,
of application of “digital stochastic measurement over an "Comparator offset error suppression in stochastic converters used
interval” approach in time domain as well as in frequency in a watt-hour meter.", in Proc. CPEM 2010, Proceedings; Korea.
domain, which have been developed so far. Described June 2010.
measurement systems enable simple designing of the [6] V. Pjevalica and V.V. Vujicic, "Further Generalization of the
instruments with advanced metrological characteristics. Low-Frequency True-RMS Instrument." IEEE Transactions on
Instrumentation and Measurement, vol. 59, no. 3, March 2010. pp.
Several prototypes and small-series of commercial 736-744.
instruments have been developed and their measurement [7] B.M. Santrac, M.A. Sokola, Z. Mitrovic, I. Zupunski and V.V.
uncertainty is being kept extremely low [7-9]. Fig. 11 Vujicic, "A Novel Method for Stochastic Measurement of
shows one practical implementation of the device from Harmonics at Low Signal-to-Noise Ratio." IEEE Transactions on
Fig. 8 - the four-channel three-phase power analyzer [11], Instrumentation and Measurement, vol. 58, no. 10, pp. 3434-3441,
Oct. 2009.
that works on the principles elaborated in the paper. The
unit measures 16 values of RMS of current, 3 values of [8] D. Pejic, V. Vujicic, “Accuracy limit of high-precision stochastic
Watt-hour meter,” IEEE Transactions on Instrumentation and
RMS of voltage, distortion factor of three-phase voltage, Measurement, vol. 49, pp. 617-620, June 2000.
12 active powers, 3 frequencies, and over a hundred [9] V. Vujicic, S. Milovancev, M. Pesaljevic, D. Pejic and I.
Zupunski, "Low frequency stochastic true RMS instrument," IEEE
Transactions on Instrumentation and Measurement, vol. 48, no. 2,
pp.467-470, Apr. 1999.
[10] V. Vujicic, “Generalized low frequency stochastic true RMS
instrument”, IEEE Transactions on Instrumentation and
Measurement, vol. 50, pp. 1089-1092, Oct. 2001.
[11] I. Zupunski, V.V. Vujicic, Z. Mitrovic, S. Milovancev and M.
Pesaljevic, "On-line determination of the measurement uncertainty
of the stochastic measurement method." Proc. of the IMEKO XIX
World Congress; Lisbon, Portugal, Sep. 2009, pp. 1048-1051 no.
278.
[12] V. Vujicic, D. Davidovic, N. Pjevalica, V. Pjevalica, D. Pejic, I.
Zupunski, M. Urekar, P. Sovilj, Z. Mitrovic, S. Milovancev, B.
Vujicic, Z. Beljic, „New product: Four-channel three-phase power
analyzer with the functions of power quality measurements – type
MM4“, Technical documentation database of Department of
Electrical Measurements at Faculty of Technical Sciences Novi
Sad, Novi Sad, 2012
[13] A. Papoulis, Probability, Random Variables and Stochastic
Figure 11. One practical implementation of the device - the four- Processes, ser. McGraw-Hill Series in Systems Science. McGraw-
channel three-phase power analyzer Hill, New York, 1965.

Page 302 of 478


ICIST 2014 - Vol. 2 Poster papers

Performance comparison of Lattice Boltzmann


fluid flow simulation using OpenCL and CUDA
frameworks
Jelena Tekić, Predrag Tekić, Miloš Racković
University of Novi Sad, Faculty of Sciences, Novi Sad, Serbia
{radjenovic, tekic} @uns.ac.rs, [email protected]

Abstract—This paper presents performance comparison, of


the lid-driven cavity flow simulation, with Lattice
Boltzmann method, example, between CUDA and OpenCL
parallel programming frameworks. CUDA is parallel
programming model developed by NVIDIA for leveraging
computing capabilities of their products. OpenCL is an
open, royalty free, standard developed by Khronos group
for parallel programming of heterogeneous devices (CPU’s,
GPU’s, … ) from different vendors. OpenCL promises
portability of the developed code between heterogeneous
devices, but portability has performance penalty. We
investigate performance downside of portable OpenCL code
comparing to similar CUDA code run on the NVIDIA
graphic cards. Lid-driven cavity flow benchmark code, for
both examples, has been written in Java programming
language, and uses open source libraries to communicate
with OpenCL and CUDA. Results of simulations for
different grid sizes (from 128 to 896) have been presented
and analyzed. Simulations have been carried out on an Figure 1. Memory bandwidth for the CPU - GPU
NVIDIA GeForce GT 220 GPU.
A. CUDA
Keywords: CUDA, OpenCL, Lattice Boltzmann, Java, GPU
Compute Unified Device Architecture (CUDA) [1] has
been introduced by NVIDIA in 2006., as proprietary,
I. INTRODUCTION vendor specific, API and set of language extensions for
programming NVIDIA products. Considering that CUDA
In recent years multi-core and many-core processors are has been developed by the same company that produces
replacing single core processors, especially Graphics hardware devices, it would be expected for CUDA code to
Processing Units (GPUs) have greatly outperformed CPUs perform better on their hardware products. Since, CUDA
in memory bandwidth (Figure 1) and number of arithmetic was another new device specific API and language,
operations per second. GPUs have an important role in developers were forced to learn in order to utilize
today’s high performance computing applications. GPUs NVIDIA products, and that fact caused rise in demand for
brought high performance computing, which was privilege a single language and API that would be capable of
to small group of people, scientists, and reserved for large dealing with any device architecture.
computer clusters, to every commodity desktop/personal
computer. CUDA provides two different APIs, the Runtime API
and Driver API. Both APIs are very similar regarding
Due to large processing power potential of GPUs, basic tasks like memory handling, and starting with
researchers and developers are becoming increasingly CUDA 3.0 APIs are interoperable and can be mixed to
interested in exploiting this power for general purpose some level. The most important differences between these
computing. two APIs is how kernel’s are managed and executed.
Specific scientific fields, like computational fluid
dynamics (CFD), benefited from this trend, of increasing B. OpenCL
processing power of GPUs. Algorithms that can be Open Computing Language (OpenCL) [2] is an open,
relatively easily parallelized, like Lattice Boltzmann royalty free, standard developed by Khronos group [3] for
method, gain more popularity. parallel programming of heterogeneous devices (CPUs,
In this paper we investigate the performance differences GPUs, DSPs) from different vendors. OpenCL has
between CUDA and OpenCL implementations of well- attracted vendor support, with implementations available
known CFD benchmark problem, one sided lid driven from NVIDIA, AMD, Apple and IBM. It was introduced
cavity flow. Code was developed using Java programming in late 2008. Because the standard has been designed to
language, and open source java bindings libraries for
OpenCL and CUDA, JOCL[4] and JCUDA[5].

Page 303 of 478


ICIST 2014 - Vol. 2 Poster papers

reflect the design of contemporary hardware there are a lot


of similarities with the CUDA programming model. TABLE III.
CUDA VS. OPENCL THREAD/WORK-ITEM INDEXING
The execution model for OpenCL consists of the
controlling host program and kernels which execute on CUDA OpenCL
OpenCL devices. To scientific programmers, the OpenCL gridDim get_num_groups()
standard may be an attractive alternative to CUDA, as it blockDim get_local_size()
offers a similar programming model with the prospect of blockIdx get_group_id()
hardware and vendor independence. threadIdx get_local_id()
threadIdx + blockIdx*BlockDim get_global_id()
OpenCL code (kernel) can be compiled at runtime,
which is not a case with CUDA compile model, and that gridDim*blockDim get_global_size()
add up to OpenCL execution time. On the other hand, this
just in time compile model allows compiler to generate Mapping between CUDA and OpenCL API objects is
code for the specific device (GPU), leveraging device’s shown in Table 4.
architecture advantages.
C. Similarities of CUDA and OpenCL TABLE IV.
CUDA VS. OPENCL API OBJECTS
CUDA and OpenCL are parallel computing
frameworks. CUDA is supported only on NVIDIA CUDA OpenCL
products, OpenCL has more general approach, it is cross- CUdevice cl_device_id
platform and supported on heterogeneous devices from CUcontext cl_context
different vendors. Since, OpenCL standard has been CUmodule cl_program
designed to reflect contemporary hardware there are a lot CUfunction cl_kernel
of similarities between CUDA and OpenCL frameworks. CUdeviceptr cl_mem
OpenCL shares a set of core ideas with CUDA. These / cl_command_queue
frameworks have similar platform models, memory
models, execution models and programming models.
Therefore, it is possible to transfer CUDA programs to
OpenCL programs, and vice versa. Mapping between II. IMPLEMENTATION DETAILS
CUDA and OpenCL terminology, regarding memory and
To CUDA/OpenCL programmer, the computing system
execution model, is presented in Table 1.
consists of a host (often a CPU) and one or more devices
(often GPU) that are massively parallel processors
TABLE I. equipped with a large number of arithmetic execution
CUDA VS. OPENCL TERMINOLOGY
units. Programs, that have been developed, use Java
CUDA OpenCL programming language for the host part of the computing
thread work-item system, and CUDA and OpenCL kernel’s for
block work-group programming of NVIDIA device that has been used.
global memory global memory
constant memory constant memory A. Java and CUDA
shared memory local memory In order to use Java as host programming language, we
local memory private memory have used open source Java CUDA library (JCUDA ver.
0.5.5) [5]. This library gives a level of abstraction,
between host and device calls/commands. Eclipse IDE has
Mapping between CUDA and OpenCL syntax been used to create Java project and add JCUDA .jar files
terminology is given in Table 2. to project, after that .dll files have to be copied to location
in the environment “path”. Also, installation of CUDA
TABLE II. toolkit (5.5) required an installation of MS Visual Studio
CUDA VS. OPENCL SYNTAX (because it has bundled C compiler).
CUDA OpenCL Kernel source code has to be compiled using NVCC
__global__ (function) __kernel compiler. As a result we have a file that we can load and
__device__ (function) / execute using Driver API. There are two options how the
__constant__ (variable) __constant kernel can be compiled: as a PTX file, as a CUBIN file.
__device__ (variable) __global We have compiled our kernel’s as PTX file, which is
human readable (and not a case with CUBIN file).
There are a lot of similarities in every aspect of these B. Java and OpenCL
two programming frameworks, between CUDA and
OpenCL. Almost every CUDA term can be mapped in In order to use Java as host programming language, we
OpenCL terminology. This fact led to creation of tools [8] have also used open source Java OpenCL library (JOCL
for porting CUDA to OpenCL. ver. 0.1.3) [4]. We have used Eclipse IDE to create Java
project, as with CUDA code. JOCL java archive files have
In this work, existing OpenCL code [6,7], was ported to been added to project path, and also JOCL.dll file has
CUDA, manually, following syntax and other mappings been put into environment “path”. OpenCL is able to
presented here. compile kernel’s at runtime.
Mapping between CUDA and OpenCL thread/work-
item indexing is given in Table 3.

Page 304 of 478


ICIST 2014 - Vol. 2 Poster papers

In both cases (CUDA and OpenCL) three kernel files


have been create for: “streaming”, “collision” and TABLE VI.
“boundaries”. Execution of these kernel’s have been TESTING DEVICE DETAILS
called from host program written in Java (in cases of both Vendor NVIDIA
frameworks). Performance results of these simulations are CL_DEVICE_NAME GeForce GT 220
presented in next section. CL_PLATFORM_NAME NVIDIA CUDA
CL_PLATFORM_VENDOR NVIDIA Corporation
III. PERFORMANCE RESULTS AND DISCUSSION CL_DEVICE_VENDOR NVIDIA Corporation
CL_DEVICE_TYPE GPU
We have tested CUDA and OpenCL version of lid CL_DEVICE_OPENCL_C_VERSION OpenCL C 1.0
driven cavity (Figure 2) numerical simulation on NVIDIA CL_PLATFORM_VERSION OpenCL 1.1 CUDA 4.2.1
GeForce GT 220. In Table 2 testing device (GPU) details CL_DEVICE_VERSION OpenCL 1.0 CUDA
have been listed. Latest CUDA drivers (320.57) and CL_DRIVER_VERSION 320.57
CUDA toolkit (5.5) have been used.

Figure 2. Lid driven cavity model

In Table 5. are listed device characteristics, which are in Figure 3. Streamlines Re=100
direct connection with performance of the simulation.
Parameter number of compute units In Table 7. are displayed completion time (in
(CL_DEVICE_MAX_COMPUTE_UNITS) has proved to be a milliseconds) for CUDA and OpenCL version of lid
crucial for the execution speed in parallel algorithms. driven cavity numerical simulation.

TABLE V. TABLE VII.


COMPUTE DEVICES CHARACTERISTICS RESULTS SIMULATION DURATION IN MILISECONDS
CL_DEVICE_NAME GeForce GT 220 Mesh size Cuda OpenCl
CL_DEVICE_GLOBAL_MEM_SIZE 1 034 485 760 128 43 934 22 564
CL_DEVICE_LOCAL_MEM_SIZE 16384 256 77 512 59 031
CL_DEVICE_MAX_WORK_ITEM_SIZES 512 368 140 527 118 648
512 512 221 731 199 347
64 768 443 835 478 044
896 558 605 566 954
CL_DEVICE_MAX_CLOCK_FREQUENCY 1360
CL_DEVICE_ADDRESS_BITS 32
CL_DEVICE_MAX_COMPUTE_UNITS 6

Platform and device information have been given in Figure 4. represents graphical interpretation of the
Table 6. These informations are obtained using the results given in previous table (Table 7.). From this figure
application written in Java, leveraging the JOCL library we can draw a conclusion that for smaller mesh size
and OpenCL API functions. In the first column are OpenCL implementation is faster than CUDA
parameter names, and in the following column are values implemetation of lid driven cavity numerical simulation.
obtained from the selected device for the demanded OpenCl implementation performances dropped when
parameters. mesh size was increased and became bigger then OpenCl
The grid resolution used for this model ranges from local work size.
128x128 (16384 nodes) to 896x896 (802816 nodes). Simulation results show that there is no substantial
Steady-state of the simulation is achieved after difference in execution times between, almost identical,
approximately 20 000 time steps. Streamlines of the CUDA and OpenCL programs. Nevertheless, OpenCL
steady-state of the simulation, for Reynold’s number 100 program is portable, and can be executed on
is shown on Figure 3. heterogeneous devices, from different vendors, without

Page 305 of 478


ICIST 2014 - Vol. 2 Poster papers

any modification, which makes better choice for most Java as “host” programming language from which both
developers. implementations made GPU calls.
Simulation has been carried out on NVIDIA GeForce
GT 220 GPU. It has been shown that, although, CUDA is
propritery framework developed for NVIDIA products,
and OpenCL is portable between heterogeneous devices,
simulation performance (time duration) is almost the
same, and in some cases OpenCL program shows even
better performance results then the similar CUDA
program.

REFERENCES
[1] CUDA, http://www.nvidia.com/object/cuda_home_new.html,
January 2014.
[2] OpenCL http://www.khronos.org/opencl/, January 2014.
[3] Khronos Group, http://www.khronos.org/, January 2014.
[4] Java OpenCL Library - JOCL, http://www.jocl.org/, January
2014.
[5] Java CUDA Library - JCUDA, http://www.jcuda.org/, January
2014.
[6] Tekić, P., Rađenović, J., Lukić, N., Popović, S., Lattice
Boltzmann simulation of two-sided lid-driven flow in a staggered
cavity, International Journal of Computational Fluid Dynamics,
(ISSN 1061-8562), Vol. 24, Issue 9, pp. 383-390, 2010.
Figure 4. Execution time graph Miliseconds/Mash size [7] Tekić, P., Rađenović, J., Racković, M., Implementation of the
Lattice Boltzmann Method on Heterogeneous Hardware and
Platforms using OpenCL, Advances in Electrical and Computer
IV. CONCLUSION Engineering, (ISSN: 1582-7445), Vol. 12,No 1, pp. 51-56, 2012.
In this paper we have compared performance results of [8] Harvey, M.J., De Fabritiis, G., Swan: A tool for porting CUDA
two implementations, of the same benchmark problem, in programs to OpenCL Computer Physics Communications, (ISSN
two different GPU programming frameworks/interfaces, 0010-4655), Volume 182, Issue 4, Pages 1093-1099, 2011.
CUDA and OpenCL. We have used numerical simulation
of a well-known benchmark problem (often used in CFD)
lid driven cavity flow. Both implementations have used

Page 306 of 478


ICIST 2014 - Vol. 2 Poster papers

JEDAN PRISTUP RANGIRANJU HOTELSKIH WEB SAJTOVA


PRIMENOM MULTIMOORA METODE

AN APPROACH TO RANKING HOTELS' WEBSITES BY APPLYING


MULTIMOORA METHOD
Dragisa Stanujkic1, Andjelija Plavšić1, Ana Stanujkić2
Fakultet za menadžment Zaječar, Megatrend univerzitet, Beograd1
Independent researcher2

Sadržaj – U ovom radu je prikazana upotrebljivost quality of websites, with particular emphasis on the
MULTIMOORA metode u cilju evaluacije hotelskih hotels' industry, one characteristic MCDM method is
websajtova, sa posebnim osvrtom na hotele u ruralnim shown in the simplified form. Finally, the usability of
područjima, kao i mogućnost promocije turističkih presented MCDM model to evaluate the quality of hotels'
potencijala ruralnih područja primenom hotelskih website is shown in a numerical example.
websajtova.
1. CRITERIA FOR EVALUATING THE QUALITY
Abstract $ This paper presents the usability of the OF THE HOTELS' WEBSITES
MULTIMOORA method for evaluating quality of the
hotels' websites, with a particular focus on the hotels in In many published papers a number of different
procedures for evaluating the quality of websites have
some Serbian rural areas, and the possibility of
been proposed. As a result, a number of evaluation
promotion of tourism potential of these rural areas using
criteria has also been proposed.
the hotels' website.
Compared to other types of e4Commerce, the hotels'
INTRODUCTION industry has its own specificities. Therefore, for the
evaluation of the hotels' websites, the appropriate set of
In some rural areas of Serbia tourism has been identified criteria should be selected.
as one of the major segments of the economy on which to
Approach proposed by Chung and Law [1] can be
future sustainable development can be based. It is evident
identified as one of the earliest approaches which have
that the area known as Timočka Krajina has considerable
proposed with the aim to determine a set of specific
tourism potential. However, there raises a question: How
criteria for determining website quality. In this approach,
much these potentials are utilized?
they proposed a framework for evaluating hotels'
It is well known that the Internet has provided significant websites, which has been based on the five basic
opportunities for the promotion of tourism potential, as evaluation criteria: facilities information, customer
well as in the terms of the business in the hotels' industry. contact information, reservation information, surrounding
Hotels' websites now represents the reality and necessity, area information, and management of websites, and also
similarly as the personal identity card in real life. In a their sub4criteria.
competitive environment, hotels which do not have their
Law and Cheung [2] have also identified five criteria
own websites probably are past, or it will be past very
which are important for determining the quality of hotels'
soon.
websites, and for each of these criteria they gave proper
However, here arises another question: How much these weight. For the evaluation quality of websites Law and
sites are adapted to the requirements of the potential Cheung have chosen the following criteria: reservations
customers. This question also initiates the following information, facilities information, contact information,
question: How to measure the quality of hotels' websites? surrounding area information, and website management.
In the area of multiple criteria decision making (MCDM), In order to more precisely determine the quality of the
for identifying criteria for evaluating websites, as well as website, they in any dimension have identified more
for determining the quality of a websites, numerous attributes, as well as their weights too.
studies have been devoted. In mentioned area a number of Zafiropoulos and Vrana [3], based on survey of Greek
MCDM methods have also been formed. hotels' websites, have identified six criteria, that are
It can be assumed that from the standpoint of the hotels’ relevant for measuring the quality of the hotels' websites.
managers it can be usefully to have an effective, but also These criteria are: facilities information, guest contact
easy to use, MCDM model which enable comparison information, reservation and prices information,
with the competition. surrounding area information, management of the website
Therefore, in this paper, a model based on the use of and company profile. In these dimensions they also
proven evaluation criteria and relatively simple to use identified attributes.
MCDM method is presented. For this reason, also, this
paper is organized as follows: After reviewing some of
the important studies which have been devoted to the
identification of criteria important for evaluating the

Page 307 of 478


ICIST 2014 - Vol. 2 Poster papers

2. THE MULTIMOORA METHOD max rij ; j ∈ Ω max


In the field of multiple criteria decision4making (MCDM), rj =  i . (4)
 min rij ; j ∈ Ω min
a number of theories and approaches have been formed. i

From these, as some of the most prominent, can be The best ranked alternative, based on Reference point
listed: Weighted Sum (WS) method [4], Analytic *
approach of the MULTIMOORA method, ARP can be
Hierarchy Process (AHP) method [5], Technique for
Ordering Preference by Similarity to Ideal Solution determined as follows
(TOPSIS) method [6], Preference Ranking Organisation *   
Method for Enrichment Evaluations (PROMETHEE) ARP =  Ai = min max(| r j − rij |)  . (5)
i  j 
method [7], Minkowski distance metric [8, 9], 
ELimination and Choice Expressing REality (ELECTRE) The normalized performance ratings in MULTIMOORA
method [10]. method are calculated as follows
The Multi4Objective Optimization by Ratio Analysis plus xij
Full Multiplicative Form (MULTIMOORA) method has rij = , (6)
been proposed by Brauers and Zavadskas [11], based on ( ∑in=1 xij )1 2
the previous researches [12, 13]. This method has been where xij denotes performance ratings of i4th alternative
proposed in order to cope with subjectivity problems with respect to j4th criterion.
arising from the usage of weights in many known MCDM
methods [14]. The MULTIMOORA method is also The Full Multiplicative Form. The Full Multiplicative
characteristic because it integrates three specific Form method embodies maximization as well as
approaches, named: minimization of purely multiplicative utility function
− Ratio System approach, [14]. The overall utility of the i4th alternative, based on
− Reference Point approach, and the Full Multiplicative Form of the MULTIMOORA
− Full Multiplicative Form. method, can be determined as follows:
The Ratio System approach. The basic idea of the Ratio Ai
ui = , (7)
system approach of the MULTIMOORA method is to Bi
determine the overall performance index of an alternative
as the difference between its sums of normalized where:
performance ratings of benefit and cost criteria, as follows Ai = ∏ xij , and (8)
j∈Ω max
Qi = ∑ rij − ∑ rij , (1)
j∈Ω max j∈Ω min Bi = ∏ xij . (9)
j∈Ω min
where Qi denotes the ranking index of i4th alternative, rij
denotes the normalized performance of i4th alternative A numerous extensions are proposed for the
with respect to j4th criterion, Ω max and Ω min denotes sets MULTIMOORA method, and these extensions enable the
of benefit and cost criteria, respectively; i = 1,2, … m; m usage of the MULTIMOORA in the fuzzy environment
is the number of compared alternatives, j = 1,2, ..., n; n is and/or enable group decision4making approach. A very
the number of the criteria. simple approach which allows the use group decision4
making with the MULTIMOORA method involves the
The compared alternatives are ranked on the basis of their transformation of individual performance ratings into the
Qi in ascending order, and the alternative with the highest group performance ratings before applying the Eqs. (5),
value of Qi is the best ranked. The best ranked alternative,
(7) or (8).
based on the Ratio System approach of the
* For group which contains K decision4makers the
MULTIMOORA method, ARS can be determined as transformation of individual into group performance
follows ratings can be made using geometric mean, as follows
*   K 
1/ K
ARS =  Ai = max Qi  . (2) xij =  ∏ xijK  , (10)
 i   k =1 
The Reference Point approach. For optimization based where xijk denotes the performance rating of i4th
on the reference point approach Brauers and Zavadskas
alternative in relation to the j4th criterion obtained from
[12] have proposed the following form
the k4th decision maker.
  The usability of the MULTIMOORA method has been
minmax(| r j − rij |) , (3)
i  j  successfully demonstrated in a number of cases, such as
where rj denotes the normalized performance the j4th regional development [15, 16, 17], choice of bank loan
coordinate of reference point, and it can be determined as [18] personnel selection [19, 20], and forming a multi4
follows: criteria decision making framework for prioritization of
energy crops [21].

Page 308 of 478


ICIST 2014 - Vol. 2 Poster papers

3. NUMERICAL EXAMPLE Ranking orders of alternatives, obtained by using three


approaches contained in MULTIMOORA method, are
In order to demonstrate the usability of the
shown in Tables 7, 8 and 9.
MULTIMOORA method for evaluating quality of hotels'
websites, in this paper an example adopted from [22] has
been used. Table 7. Ranking results obtained on the basis of the
The criteria used for the evaluation of the hotels' websites, Ratio System approach
on the basis of the Law and Cheung [2], are shown in C1 C2 C3 C4 C5 Si Rank
Table 1. A1 0.60 0.71 0.46 0.60 0.60 2.97 1
Table 1. The Criteria for hotels' websites evaluation A2 0.34 0.40 0.46 0.13 0.34 1.67 4
A3 0.47 0.19 0.46 0.56 0.47 2.15 3
Criteria
A4 0.56 0.55 0.59 0.56 0.56 2.81 2
C1 Reservations information
C2 Facilities information Table 8. Ranking results obtained on the basis of the
C3 Contact information Reference Point approach
C4 Surrounding area information C1 C2 C3 C4 C5 di Rank
C5 Website management RP 0.60 0.71 0.59 0.60 0.60
In adopted example performances of hotels' websites are A1 0.00 0.00 0.13 0.00 0.00 0.00 1
measured on five points Likert scale. The performances of A2 0.26 0.31 0.13 0.47 0.26 0.13 4
examined hotels' websites based on the selected criteria, A3 0.13 0.51 0.13 0.04 0.13 0.04 3
obtained from the three stakeholders, are shown in Tables A4 0.04 0.15 0.00 0.04 0.04 0.00 1
2, 3 and 4. Table 9. Ranking results obtained on the basis of the Full
Table 2. The performance ratings obtained from the first Multiplicative Form
stakeholder C1 C2 C3 C4 C5 ui Rank
C1 C2 C3 C4 C5 A1 4.64 4.64 3.63 4.64 4.64 1686.87 4
A1 5 5 3 5 5 A2 2.62 2.62 3.63 1.00 2.62 65.42 1
A2 3 3 3 1 3 A3 3.63 1.26 3.63 4.31 3.63 260.58 2
A3 4 1 3 4 4 A4 4.31 3.63 4.64 4.31 4.31 1349.49 3
A4 5 4 5 5 5 Comparative results of ranking alternatives, obtained
Table 3. The performance ratings obtained from the using different approaches integrated in the
second stakeholder MULTIMOORA method, as well as the final ranking
C1 C2 C3 C4 C5 order of alternatives, obtained on the basis of the
procedure for ranking alternatives and selecting the most
A1 5 5 4 5 5
A2 3 2 4 1 3 appropriate one based on the MULTIMOORA method,
A3 3 2 4 5 3 are shown in Table 10.
A4 4 3 5 4 4 Table 10. Ranking orders obtained on the basis of
Table 4. The performance ratings obtained from the third different MULTIMOORA approaches
stakeholder Rank RS RP MF Resulting
C1 C2 C3 C4 C5 A1 1 1 4 1
A1 4 4 4 4 4 A2 4 4 1 4
A2 2 3 4 1 2 A3 3 3 2 3
A4 2 1 3 2
A3 4 1 4 4 4
A4 4 4 4 4 4 The obtained ranking order of alternatives is the same as
one obtained in the example adopted from [22], where
The group performance ratings, obtained using Eq. (10), Grey relational analysis (GRA) has been used for ranking,
are shown in Table 5. which confirms correctness of the proposed approach.
Table 5. The group performance ratings of alternatives
C1 C2 C3 C4 C5 CONCLUSION
A1 4.64 4.64 3.63 4.64 4.64
In comparison with other MCDM methods,
A2 2.62 2.62 3.63 1.00 2.62
MULTIMOORA method is characteristic because it does
A3 3.63 1.26 3.63 4.31 3.63
not explicitly require the use of the weight of criteria, and
A4 4.31 3.63 4.64 4.31 4.31
it consolidates three approaches that are used to rank the
The normalized performance ratings of alternatives, alternatives. Nevertheless, the computational procedure of
obtained using Eq. (6), are shown in Table 6. the MULTIMOORA is simple to use, and its applicability
Table 6. Normalized performance ratings has been demonstrated in numerous studies.
C1 C2 C3 C4 C5 The proposed model for evaluating the quality of hotels'
A1 0.60 0.71 0.46 0.60 0.60 website is simple to use, but it can be very useful to the
A2 0.34 0.40 0.46 0.13 0.34 hotels’ management for the purpose of comparison with
A3 0.47 0.19 0.46 0.56 0.47 competing firms.
A4 0.56 0.55 0.59 0.56 0.56

Page 309 of 478


ICIST 2014 - Vol. 2 Poster papers

In the rural areas, with less developed tourism and many in a Transition Economy. Control and Cybernetics
insufficiently exploited tourism potentials, the promotion 35(2), 445–469.
of these potentials and the acquisition of new customers
[13] Brauers, W.K.M. (2004). Optimization Methods for a
can be of very great significance. This is especially
Stakeholder Society, a Revolution in Economic
important if we accept the fact that the hotels' industry is a
Thinking by Multi$Objective Optimization. Kluwer
very competitive environment, and that the competition in
Academic Publishers, Boston, USA.
this area in the future will grow continuously.
[14] Balezentis, A., Valkauskas, R., Balezentis, T. (2010).
REFERENCES Evaluating situation of Lithuania in the European
Union: structural indicators and multimoora method.
[1] Chung, T., Law, R. 2003. Developing a performance
Technological and Economic Development of
indicator for hotel websites. International Journal of
Economy (4), 578–602.
Hospitality Management 22(1), 343–358.
[15] Brauers, W.K.M., Zavadskas, E.K. (2010).
[2] Law, R., Cheung, C. 2005. Weighing of hotel
Robustness in the MULTIMOORA model: the
website dimensions and attributes. In A. J. Frew
example of Tanzania. Transformations in Business
(Ed.), Information and communication technologies
and Economics 9(3), 67–83.
in tourism, New York: Springer Wien, pp. 327–334.
[16] Brauers, W.K.M., Zavadskas, E.K. (2011). From a
[3] Zafiropoulos, C., Vrana, V. 2006. A framework for
centrally planned economy to multiobjective
the evaluation of hotel websites: The case of Greece,
optimization in an enlarged project management: the
Information Technology & Tourism 8, 239–254.
case of China. Economic Computation and Economic
[4] MacCrimon, K.R. (1968). Decision marking among Cybernetics Studies and Research 1(1), 167–188.
multiple$attribute alternatives: a survey and
[17] Brauers, W.K.M., Ginevicius, R. (2010). The
consolidated approach. RAND memorandum, RM4
economy of the Belgian regions tested with
48234ARPA.
MULTIMOORA. Journal of Business Economics
[5] Saaty T.L. (1980). Analytic Hierarchy Process: and Management 11(2), 173–209.
Planning, Priority Setting, Resource Allocation.
[18] Brauers, W.K.M., Zavadskas, E.K. (2011).
McGraw4Hill, New York.
MULTIMOORA optimization used to decide on a
[6] Hwang, C.L., Yoon, K. (1981). Multiple Attribute bank loan to buy property. Technological and
Decision Making $ Methods and Applications. Economic Development of Economy 17(1), 174–188.
Springer, New York.
[19] Balezentis, A., Balezentis, T., Brauers W.K.M.
[7] Brans, J.P., Vincke, P. (1985). A preference ranking (2012). MULTIMOORA4FG: A multi4objective
organization method: The PROMETHEE method for decision making method for linguistic reasoning with
MCDM. Management Science, 31(6), 647–656. an application to personnel selection. Informatica
23(2), 173–190.
[8] Minkowsky, H. (1896). Geometrie Der Zahlen.
Leipzig, Teubner. [20] Balezentis, A., Balezentis, T., Brauers, W.K.M.
(2012). Personnel selection based on computing with
[9] Zeleny, M. (1973). Compromise programming, In:
words and fuzzy MULTIMOORA. Expert Systems
Multiple Criteria Decision Making, eds: J. Cochrane with Applications 39(9), 7961–7967.
L. & M. Zeleny, pp. 2624301, University of South
Carolina Press, Columbia, SC. [21] Balezentiene, L., Streimikiene, D., Balezentis, T.
(2013). Fuzzy decision support methodology for
[10] Roy, B. (1991). The Outranking Approach and the
sustainable energy crop selection. Renewable and
Foundation of ELECTRE Methods. Theory and Sustainable Energy Reviews 17(1), 83–93.
Decision 31(1), 49473
[22] Stanujkic, D., Djordjevic, B. Ivanov, S. (2012).
[11] Brauers, W.K.M., Zavadskas, E.K. (2010). Project Measuring web site quality in the hotel industry using
management by MULTIMOORA as an instrument GRA: A case of the Serbian rural area, in Proc. of the
for transition economies. Technological and International Scientific Conference – UNITECH '12,
Economic Development of Economy 16(1), 5–24. 16–17 November, 2012, Gabrovo, Bulgaria, pp.
[12] Brauers, W.K.M., Zavadskas, E.K. (2006). The 245–254.
MOORA Method and its Application to Privatization

Page 310 of 478


ICIST 2014 - Vol. 2 Poster papers

VIŠEKRITERIJUMSKI MODEL ZA VREDNOVANJE KVALITETA WEB


SAJTOVA REGIONALNIH TURISTIČKIH ORGANIZACIJA
MULTI-CRITERIA MODEL FOR EVALUATING QUALITY OF WEBSITES
OF THE REGIONAL TOURISM ORGANIZATIONS
Dragisa Stanujkic1, Milica Paunović1, Goran Stanković2
Fakultet za menadžment Zaječar, Megatrend univerzitet, Beograd1
Policijska uprava Bor2

Sadržaj – U ovom radu je prikazana primena jednog decision-making (MCDM), with an emphasis on group
višekriterijumskog modela zasnovanog na poređenju u decision-making. In the first subsection of this section an
parovima i primeni ARAS metode u cilju evaluacije effective and simply to use MCDM method is presented,
websajtova regionalnih turističkih organizacija u and in the second subsection the use of pairwise
ruralnim područjima, sa ciljem povećanja njihove comparisons process for determining the weights of
efikasnosti u smislu promociji turističkih potencijala. criteria is considered. Based on considerations made in
the previous sections, in the third section websites of
Abstract - This paper presents the use of a multiple some regional tourism organizations have been evaluated.
criteria decision-making models for evaluating quality of Finally, the conclusions are given.
regional tourism organizations websites from some
Serbian rural areas, in order to in order to increase their 1. THE CRITERIA FOR EVALUATING QUALITY
efficiency in terms of promotion of tourism potentials. The OF WEBSITES OF THE REGIONAL TOURISM
proposed model is based on the use of the pairwise ORGANIZATIONS
comparisons and the ARAS method.
Website can help in obtaining an advantage over
competence. However, the only existence of a website
INTRODUCTION
does not automatically provide the competitive
In many rural areas, development and improvement of advantages. So, here also arise two questions: How much
tourism have been identified as a very important activities a website actually meets the requirements of its users and
for achieving sustainable development. It is similar in the how to measure the level of satisfaction of their
area known as Timocka Krajina, in the Eastern Serbia, requirements, i. e. the quality of website?
which is located on the borders with Bulgaria and In the literature, numerous studies have been devoted to
Romania. the evaluation of website quality. Boyd Collins developed
The mentioned rural area has a number of attractive, but the first formal approach to the evaluation of websites in
also almost unknown, tourist destinations. From many, late 1995. His model, intended for librarians, has been
here are mentioned some of them such as: Stara Planina based on six criteria, developed by combining evaluation
(Old Mountain), Soko Banja (Sokobanja Spa), criteria for printed media, and considering what was
Gamzigradska banja (Gamzigrad Spa), Borsko jezero relevant for websites [1]. These criteria are: Contents,
(Bor Lake), ancient complex of Roman palaces and Authority, Organizations, Searchability, Graphic design
temples Felix Romuliana, and so on. The list of potential and Innovation use.
tourist locations is too long to be listed here, which is why Studies that are intended for the identification of key
many significant destinations have to be omitted. evaluation criteria, and/or their significances, are still
After mentioning names of these potential tourist actual. For example, Dumitrache [2] has gave an
destinations, someone probably wants to learn more about overview of criteria used for evaluation of e-Commerce
them. The Internet has brought significant opportunities sites in Romania, during the period from 2006 to 2009
for the promotion of the less known tourist destinations. year. It has stated navigability, response time,
However, here arise some questions: How much regional personalization, tele-presence and security as very
tourism organizations, from the rural areas, use the important criteria. Davidaviciene and Tolvaisas [3] have
benefits that the Internet provides, and how much these identified the list of criterions for quality evaluation of e-
websites provide the information necessary for attracting Commerce websites. They also have provided a
attention of potential tourists, especially tourists from the comprehensive overview of the criteria that have been
other countries? recently proposed by different authors. In accordance with
The answers to the above questions can be obtained by [3] criteria: easy to use, navigation, security assurance,
measuring the quality of the websites of the some regional help (real time) and design have been discussed by
tourism organizations that are located in Timocka Krajina. numerous authors, such as [4, 5, 6].
For these reasons, this paper is organized as follows. In Zafiropoulos and Vrana [7], based on survey of Greek
the first section, the criteria for evaluating the websites of hotels web sites, have identified six dimensions, i.e.
the regional tourism organizations are considered. The criteria, that are relevant for measuring the quality of the
second section of the paper is devoted to multiple criteria hotels websites. These critera are: facilities information,
guest contact information, reservation and prices

Page 311 of 478


ICIST 2014 - Vol. 2 Poster papers

information, surrounding area information, management Adaptability. Some websites, in order to increase the
of the website and company profile. For these criteria they satisfaction of their visitors, allow some adjustment in
have also identified a number of sub-criteria. accordance with the needs and requirements of their
Compared to other types of e-Commerce, the websites of visitors. If the target group of the regional tourism
regional tourism organizations have its own peculiarities. organizations are the visitors from other countries, then it
Therefore, for their evaluation the appropriate set of may be very useful to allow them to obtain information on
criteria and their significances also, have to be identified their spoken languages, as for example English, German,
and determined. Russian, etc.
In the previously published papers, to determining quality Currency. Criterion currency refers to the up to date of
of websites of regional tourism organizations has not been the websites, and there can be identified two sub-
devoted significant attention. Therefore, in this paper, categories, namely:
such type of websites has been considered as website  up to date of information provided on websites, and
positioned between the websites primary intended to  dead links.
provide information and hotels' websites.
The websites which promote the tourism potentials and
It is known that a larger number of criteria allow more the tourist destinations, mainly containing a large number
precise evaluation of alternatives. However, a larger of static information. However, they may contain some
number of criteria can lead to the formation of complex types of dynamic information - which is often related with
MCDM models, which may be too complex for ordinary the some events of a local character, such as some
users. In contrast, too few numbers of criteria can lead to manifestations, festivals and so on. For the visitors of the
the formation of too simple, and/or practically unusable, websites it can be useful not only that such information is
MCDM models. up to date. Sometimes it can be very helpful to get out
Assuming that one of the main goals of the regional there a visible proof of their accuracy, as for example the
tourism organizations is promotion of new tourist date when the information is updated.
destinations to the potential tourists from other countries, The dead links can occur as a result of badly organized
i.e. probably the first-time visitors to these websites, on modification of the structure of the website, and they are
the basis of previously published papers, for evaluating also very undesirable occurrence.
quality of regional tourism organizations websites the
The existence of not up to date information, or lack of
criteria shown in Table 1 have been selected.
evidence of their accuracy, as well as the existence of
Table 1. Criteria for websites evaluation dead links, can negatively affect the satisfaction of the
Criteria website visitors.
C1 Design Navigability. Ease of finding the necessary information
C2 Authority on the selected website can have a positive effect on the
C3 Accuracy growth of interest of the website visitors. In contrast, the
C4 Adaptability inability, or difficulty to find the required information
C5 Currency may lead to the abandonment of website site.
C6 Navigability Adequate and well-organized menu system, well
The meaning of the selected criteria is the following: organized hyperlinks, site map and the ability to search
Design. The design of website is one of the frequently the entire of website can help to ensure that potential
used criteria for evaluating the quality of websites, but in tourists become real visitors.
different types of websites this criterion has a different
significance. In the case of websites that promote natural
beauties and the new tourist destinations the design can be 2. MULTIPLE CRITERIA GROUP DECISION-
of the great importance, especially for the first-time MAKING
visitors.
Ordinary MCDM models are usually based on the opinion
Authority. Criterion authority refers to the ability of of a single decision maker, and they can be precisely
easily and reliable identification of the website owner. On shown in the following form
the Internet, a significant number of sites that are engaged
to the promotion of tourism potentials of Serbia can be D  [ xij ]mn
found. However, it is also evident that some of them have , (1)
W  [w j ]
not been updated a long time ago, and that some
information on them is outdated or even inaccurate. where D is a decision matrix, where W is a weight
Accuracy. Criterion accuracy refers to the accuracy of the vector, xij is the performance rating of i-th alternative to
information contained on the site. It is known that, in
order to increase the popularity of a tourist destination, the j-th criterion, w j is the weight of j-th criterion, i = 1,
some unconfirmed or partially accurate information can 2, ..., m; m is a number of alternatives, j = 1, 2, ..., n; n is a
be placed on websites. For visitors of the website number of criteria.
hyperlinks that enable verification of information
For solving a number of complex decision-making
provided on the site can be very useful.
problems, it is necessary to take into account opinions of

Page 312 of 478


ICIST 2014 - Vol. 2 Poster papers

more decision makers, i.e. usually of relevant experts. In where S i denotes the overall performance rating of i-th
such cases, the Multiple Criteria Group Decision Making alternative, i = 0, 1, ..., m.
(MCGDM) approach is commonly used, and it can be
precisely shown in the following form Step 4. Calculate the degree of utility, for each
alternative. The degree of utility can be calculated as
D  [ xijk ]mn K follows
, (2)
W  [ wkj ]n K Si
Qi  , (6)
S0
where x ijk is the performance rating of i-th alternative to
where Qi denotes the degree of utility of i-th alternative,
the j-th criterion given by k-th decision maker; k = 1, 2,
..., K; K is a number of decision makers and/or experts and S 0 is the overall performance index of optimal
involved in MCGDM. alternative, i = 1, 2, ..., m.
After that, the alternative with the largest value of Qi the
most acceptable alternative.
2.1 ADDITIVE RATIO ASSESSMENT METHOD
The Additive Ratio ASsessment (ARAS) method has
been proposed by Zavadskas and Turskis [8]. The process 2.1.1 Group decision-making approach based on the
of solving decision-making problems using the ARAS ARAS method
method, similarly to the use of other MCDM, begins with The ARAS method can be classified as a relatively newly
forming the decision matrix and determining weights of MCDM method. Therefore, in comparison with other
criteria. After these initial steps, the remaining part of MCDM methods, for this method a smaller number of
solving MCDM problems using ARAS method can be extensions has been proposed. However, some extensions
precisely expressed using the following steps: formed with the aim to enable its usage in a fuzzy
environment and/or enable group decision-making
Step 1. Determine the optimal performance rating for approach have also been proposed for the ARAS method.
each criterion. In this step, the decision maker sets the One of the simplest approaches, which provide the
optimal performance rating for each criterion. If the adaptation of the ARAS method in order to allow group
decision maker does not have preferences, the optimal decision-making approach can be formulated as follows:
performance ratings are calculated as For group that contains K decision-makers the
transformation of individual into group performance
max xij ; j   max ratings can be made using geometric mean, as follows:
x0 j   i , (3)
min xij ; j   min 1/ K
i  K 
xij    xijK  , (7)
where x0 j denotes the optimal performance rating of j-th  k 1 

criterion, max denotes the benefit criteria, i.e. the higher where x ijk denotes the performance rating of i-th
the values are, the better it is; and  min denotes the set of alternative in relation to the j-th criterion obtained from
cost criteria, i.e. the lower the values are, the better it is. the k-th decision maker. After that, the previously
mentioned procedure of the ARAS method remains as
previously stated.
Step 2. Calculate the normalized decision matrix. The
normalized performance ratings are calculated as follows

 xij
 m ; j   max 2.2 DETERMINING THE WEIGHTS OF CRITERIA
 i  0 xij
rij   , (4) In the multiple criteria group decision-making, it is very
 1 xij ; j   min important how to aggregate individual criteria weights
 im 01 xij and individual performance ratings into the group

(aggregated) criteria weights and performance ratings.
where rij denotes the normalized performance rating of i- Many published papers have also indicated that the use of
th alternative in relation to the j-th criterion, i = 0, 1, ..., the group decision-making approaches and pairwise
m. comparison procedure provides an efficient approach for
precisely determining the relative importance of criteria
i.e. weights of criteria.
Step 3. Calculate the overall performance rating, for
each alternative. The overall performance ratings can be 2.2.1 Pairwise comparison
calculated as follows The pairwise comparison procedure is quite simple and
n understandable, even for decision makers who are not
Si   w j rij , (5) familiar with the MCDM. For a decision-making problem
j 1 that contains n criteria, the process of determining

Page 313 of 478


ICIST 2014 - Vol. 2 Poster papers

weights of criteria begins by forming reciprocal square where wj is the weight of criterion Cj and n is the number
matrix of criteria
A  [aij ]nn , (8) The values for RI are determined based on matrix size n.
Table 3. shows the value of the Random Consistency
where A denotes a pairwise comparison matrix, a ij is the Index RI for different matrix sizes [9].
relative importance of criterion Ci in relation to criterion
Cj, i  1,2,, n , j  1,2,, n , and n is the number of Table 3. The Random Consistency Index for different
criteria. In the matrix A , aij  1 when i  j and matrix sizes
Matrix
a ji  1 / aij . 1 2 3 4 5 6 7 8 9 10
size (n)
The nine-point scale, shown in Table 2, proposed by RI 0.00 0.00 0.58 0.9 1.12 1.24 1.32 1.41 1.46 1.49
Saaty [9], is used to assign a relative importance of Thanks to this controlling mechanism, the above
criteria. mentioned procedure for calculation of criteria weights
Table 2. The scale of relative importance for pairwise has become very popular and frequently used.
comparison 2.2.2 Group decision-making approach to determine
Intensity of Definition criteria weights
Importance
In many published papers, the use of the different group
1 Equal importance
decision-making approaches to determine the group
3 Moderate importance
criteria weights, have been considered. In this approach,
5 Strong importance
the simplest and the most efficient one is accepted and
7 Very strong importance
used.
9 Extreme importance
2, 4, 6, 8 For interpolation between the For a group that contains K decision makers, the group
above values weight of each criterion wj is calculated using the
geometric mean, as follows
After forming, the matrix A, by using one of several
1K
available procedures, weights of criteria can be  K 
calculated. Using the Normalization of the Geometric w j    wkj  , (13)
 k 1 
Mean of the Rows procedure, the weights of criteria are
calculated as follows where: w kj is the weight of criterion Cj, obtained on the
1n 1n basis of pairwise comparisons performed by decision
 n  n  n 
wi    aij     aij  . (9) maker k.
 j 1  i 1 j 1 
While forming the matrix A, it is very important that the 3. NUMERICAL EXAMPLE
each decision maker should perform its comparisons To present the effectiveness of the ARAS method, in this
consistently. The decision about the consistency of paper, the partial results, adopted from the study which
performed comparisons and their acceptability, are made performed to determine the quality of the websites of the
on the basis of the Consistency Ratio. If the consistency regional tourism organizations from the Timocka krajina,
ratio is higher than 0.1, then the pairwise comparison have been used. The aim of this paper is not to promote
matrix A is inconsistent, and therefore the comparisons any of them, which is why these are, in this example,
should be reviewed and improved. simply labeled as alternatives A1, A2, A3, A4.
The Consistency Ratio is calculated as follows Criteria weights, obtained on based on the opinions of the
three stakeholders on the basis of the use of pairwise
CR  CI RI , (10)
comparison, are shown in the Tables 4, 5 and 6.
where CR denotes the consistency ratio of the pairwise Table 4. The criteria weights obtained from the first
comparison matrix A, CI is the Consistency Index and stakeholder
RI is the Random Consistency Index. Criteria D Au Ac Ad C N wi
The values of CI can be calculated as follows C1 D 1 5 3 1 3 1 0.26
C2 Au 1/5 1 1/3 1/5 1/5 1/3 0.04
CI  (max  n) (n  1) , (11) C3 Ac 1/3 3 1 1 3 3 0.20
where max denotes the maximum eigenvalue of the C4 Ad 1 5 1 1 3 3 0.25
C5 C 1/3 5 1/3 1/3 1 1/3 0.09
pairwise comparison matrix and it can be calculated as C6 N 1 3 1/3 1/3 3 1 0.15
follows
CR = 0.1 (0.095%) < 10%
n  n  
max     aij  w j  , (12)
j 1 i 1  

Page 314 of 478


ICIST 2014 - Vol. 2 Poster papers

Table 5. The criteria weights obtained from the second The group performance ratings, obtained using Eq. (7),
stakeholder are shown in Table 11.
Criteria D Au Ac Ad C N wi
Table 11. The group performance ratings of alternatives
C1 D 1 5 3 1 3 3 0.15 C1 C2 C3 C4 C5 C6
C2 Au 1/5 1 1/3 1/5 1/5 1/3 0.03
A1 4.31 3.63 4.00 2.88 4.64 3.30
C3 Ac 1/3 3 1 1 3 3 0.24
A2 4.00 3.63 3.30 2.62 4.64 4.31
C4 Ad 1 5 1 1 3 3 0.14
A3 3.00 3.63 3.00 4.00 4.31 4.00
C5 C 1/3 5 1/3 1/3 1 1/3 0.12
A4 2.29 3.63 2.00 2.00 1.59 3.00
C6 N 1/3 3 1/3 1/3 1/3 1 0.31
CR = 0.08 (0.0809%) < 10% In Table 12 are given weighted normalized performance
ratings, overall performance rating and degree of utility,
Table 6. The criteria weights obtained from the third for each alternative.
stakeholder
Criteria D Au Ac Ad C N wi Table 12. The group performance ratings of alternatives
C1 D 1 5 3 1 3 3 0.30 C1 C2 C3 C4 C5 C 6 Si Qi Rank
C2 Au 1/2 1 1/3 1/5 1/5 1/3 0.04 A0 0.06 0.01 0.05 0.05 0.02 0.04 0.231
C3 Ac 1/3 3 1 1 3 3 0.20 A1 0.06 0.01 0.05 0.04 0.02 0.03 0.207 0.90 1
C4 Ad 1 5 1 1 3 3 0.25 A2 0.05 0.01 0.04 0.04 0.02 0.04 0.200 0.86 2
C5 C 1/3 5 1/3 1/3 1 1/3 0.10 A3 0.04 0.01 0.04 0.05 0.02 0.04 0.197 0.85 3
C6 N 1/3 3 1/3 1/3 1/3 1 0.12 A4 0.03 0.01 0.03 0.03 0.01 0.03 0.125 0.54 4
CR = 0.08 (0.084%) < 10% From Table 12 it can be seen that the significant part of
The resulting criteria weights, obtained by Eq(13), are the considered regional tourism organizations has a high
shown in Table 7. value of Qi, which indicates that their managers have
become aware of the benefits that using the website can
Table 7. The resulting criteria weights be achieved.
Criteria wi
However, of the four evaluated websites only one of them
C1 0.24 had the English version. The remain significant languages
C2 0.05 also have not been present.
C3 0.21
If the promotion of tourism potential outside of Serbia is
C4 0.21
one of the important goals of tourism organizations, then
C5 0.11
their managers should seriously consider implementation
C6 0.18 of multilingual web presentations. Similar websites in
The performances ratings of examined regional tourism surroundings countries already have that.
organization websites on the basis of the selected criteria,
obtained from the three stakeholders, are shown in Tables CONCLUSION
8, 9 and 10.
It is known that some Serbian rural areas have significant
Table 8. The performance ratings obtained from the first tourism potentials. Therefore, in this paper, an approach
stakeholder primarily intended to evaluate the quality of regional
C1 C 2 C3 C4 C5 C6 tourism organizations websites based on the combined
A1 4 3 4 2 4 4 use of proven MCDM method - ARAS method and
A2 4 3 4 2 5 4 pairwise comparisons has been presented.
A3 3 3 3 4 4 4 The proposed model is effective and easy to use, and it
A4 2 3 2 2 1 3 can also be used to compare quality of website of a
regional tourism organization with the sites of
Table 9. The performance ratings obtained from the
competitive organizations, as well as to determine the
second stakeholder
degree of satisfaction based on requirements of website
C1 C2 C3 C4 C5 C6
users.
A1 5 4 4 3 5 3
A2 4 4 3 3 5 5 This model may be particularly useful if it is used for
A3 3 4 3 4 4 4 comparison with similar organizations from the
A4 2 4 2 2 2 3 surrounding countries, and for comparison with
competitors in order to perform the correction that will
Table 10. The performance ratings obtained from the lead to forming a websites that are in a greater extent
third stakeholder aligned with the needs of users, and thus better promote
C1 C2 C3 C4 C5 C6 tourism potentials.
A1 4 4 4 4 5 3
A2 4 4 3 3 4 4 REFERENCES
A3 3 4 3 4 5 4
[1] Merwe, R. and Bekker, J. (2003). A framework and
A4 3 4 2 2 2 3
methodology for evaluating e-commerce web sites, Internet
Research: Electronic Networking Applications and Policy
13(5), 330-341.

Page 315 of 478


ICIST 2014 - Vol. 2 Poster papers

[2] Dumitrache, M. (2010). E-Commerce applications ranking, [6] Cao, M., Zhang, Q. and Seydel, J. (2005). B2C e-commerce
Informatica economica 14(2), 120-132. web site quality: an empirical examination, Industrial
[3] Davidaviciene, V. and Tolvaisas, J. (2011). Measuring Management & Data Systems 105(5), 645-661.
quality of e-commerce web sites: Case of Lithuania, [7] Zafiropoulos, C. and Vrana, V. (2006). A framework for the
Economics and Management [Ekonomika ir vadyba] 16(1), evaluation of hotel websites: The case of Greece,
723-729. Information Technology & Tourism 8, 239- 254.
[4] Loiacono, E. T., Watson, R. T. and Goodhue, D. L. (2007). [8] Turskis, Z. and Zavadskas, E. K. (2010). A novel method for
WebQual: An instrument for consumer evaluation of Web Multiple Criteria Analysis: Grey Additive Ratio Assessment
sites, International Journal of Electronic Commerce 11(3), (ARAS-G) method, Informatica 21(4), 597-610.
51-87. [9] Saaty, T. L. (1980). Analytic Hierarchy Process: Planning,
[5] Parasuraman, A., Zeithaml, V. A. and Malhotra, A. (2007). Priority Setting, Resource Allocation. McGraw-Hill, New
E-S-QUAL: a multiple-item scale for assessing electronic York.
service quality, Journal of Service Research 7(3), 13-33.

Page 316 of 478


ICIST 2014 - Vol. 2 Poster papers

Information Flow in Parking Areas Management


in the Enterprise Information System
Zoran Nešić*, Leon Ljubić**, Miroslav Radojičić*, Jasmina Vesić Vasović*
* Universityof Kragujevac, Faculty of Technical Sciences, Čačak, Serbia
** JKP “Parking Service Kragujevac“, Kragujevac, Serbia
[email protected], [email protected], [email protected], [email protected]

Abstract— This paper presents contemplations on the • PSKG employees (including management of the
information flow in the parking areas management activity. company)
The specified activity is one of the primary activities of • Supervisory Board
companies engaged in charging for parking services. In the
paper, the emphasis is on this element, the analysis of
• Local government
information flow, which is a basic element in the Parking service users’ expectations in terms of
development of information system. A detailed analysis of information include uniform set of information of
the observed information segment, data flow diagrams and appropriate quantity and quality, which is a precondition
relational database have been presented here. Also, the for the provision of quality services.
information flows that are the basis of the information Employees in PSKG are the second interested party
system of parking service in the function of integration of definitely interested in obtaining information. Hierarchical
the entire enterprise management have been clarified. organization of employees calls for information in terms
of quantity and quality.
I. INTRODUCTION The Supervisory Board has a controlling role;
accordingly, a need for the information used for the
Modern management conditions of a company require decision-making is similar to that of the company
increasingly effective management of large amounts of management.
data or information. The dynamics of change in the
Local government has a legislative role and the need for
environment emphasizes understanding of the relevance
information similar to that of the supervisory board, but
of informative facts and defining the relationship between
with a different aim which is to start the process of
them. For the successful performance of tasks in the field
making decisions that will have an impact on the
of management at all levels it is necessary to have the
legislative framework in which the company must operate.
information system that will serve as an information
framework for all types of information inputs, or outputs However, there is another goal that the local government
wants to implement – that is to have PSKG acting as a
generated from the company.
part of a broader system of public companies of the city.
This article discusses the specific and current problems Therefore, for optimal decision making at the city level it
related to information flow in the management of parking is necessary to connect parking service with other
areas, which are limited resources that do not follow the companies in the system, on the information level, in
increase in the number of motor vehicles. All larger cities order to obtain a unique and optimal legal framework.
in Serbia are trying to solve this problem in different
ways; all those attempts have something in common – The aim of this paper is to analyze the flow of basic
they are aimed at forming a company that will take care of information on the management of parking areas which
will form the basis for this company. Presented analysis is
the urban space for parking - parking service. The City of
a basis for the development of integrated information
Kragujevac is no exception, and the local government of
system of the company [9] - [13].Displayed consideration
the City of Kragujevac founded JKP Parking Servis
enables the formation of the relational database model
Kragujevac (PSKG) as a company which would manage
the city's parking area as a very important resource [1]. [14] - [17] as the basis for the use of information by
applying a wide range of software solutions.
PSKG has implemented the mechanism of parking
management through the legal system of parking and
charging [2], [3] and the experiences of other cities [4] - II. ANALYSIS OF THE INFORMATION FLOW
[8]. The function of parking areas management includes all
The aim of PSKG is to use optimal management of processes and activities related to managing parking areas
parking space to justify its social purpose – that means to as a resource. Parking areas are an essential tool and they
provide a parking spot to every citizen in a reasonably are maintained and managed by PSKG (Parking Service
short period of time and at the location closest to the Kragujevac). According to this function, all the changes
user’s needs – more precisely, to provide quality service. that are occurring in parking areas or parking spots must
be monitored, preferably in real time. Thereat, the terms
From the aspect of need for information, four interested
parking area and parking spot should be distinguished in
parties can be identified:
the sense that the parking spot is a part of the parking area
• users of parking services which is provided for parking one car. Parking area
includes all parking spots in a particular area with

Page 317 of 478


ICIST 2014 - Vol. 2 Poster papers

additional space for access and vertical signalization. • Forming of orders for the elimination of irregularities
Parking area can include one or more parts of streets or • Forming of requests to the competent authority
car parks in front of buildings. All parking areas in a • Forming of permits for the intended use
particular district make up the zone, while all zones make
up the parking payment area. There are three zones – zero,
first and second zone. The process of maintenance of car parks consists of the
As previously stated, the parking areas are included in following activities:
open and special parking areas. The parking area which • Creation of orders for parking space marking
belongs to open car parks is a part of street or off-street • Creation of orders for setting up and repair of
area. signalization
Parking areas consist of parking spots. One parking
spot is a parking space for one car. The parking spots have Figure 1 presents the context diagram which contains
not been numbered yet, but that will be done in the near all the inputs, outputs, controls and mechanisms that are
future. Vertical signs are used for marking a particular essential for car parks functioning at the highest level.
parking area in open car parks, thus indicating a part of the
street or off-street space that is provided for parking. The input of this function consists of two groups of data
Individual parking spot is marked by a horizontal sign, i.e. and information - field data and users’ applications. Field
contour on the ground. This is also the case with the data are obtained by the controllers, collectors and officers
special parking areas. Depending on the configuration of who are directly responsible for reporting.
the area and the shape and size of parking space, These data are various observations related to
arrangement of parking places and their numbers are developments in the actual parking areas or in their
obtained. immediate surrounding. They can be in written or verbal
Parking spots are constantly in the process of form, but they should all contain common elements - what
supervision, regarding whether they are being used as was observed, when it was observed, how much was
foreseen by the function. This primarily relates to whether observed, who has observed it and where.
there is misuse by user for purposes other than those Application is a document that is generated by car park
required (unforeseen use by placing objects that are not users and is usually in the form of a request for removing
vehicles, illegal preventing of other users from parking by some irregularities. The diagram shows all the
use of certain obstacles, etc). In case of detecting such applications and data from the field - in fact, a generic
irregularities, PSKG would take measures by ordering abstraction of real information and documents that come
perpetrators to eliminate them within a specified period. If from the outside.
they are not eliminated the request is submitted to the Car park function output includes generically different
communal inspection or communal police, depending on permits, orders and requirements. Permits are related to
the nature of the offense. Also, if so needed by the local the issuance of various approvals; orders are related to
government or by third persons (legal entity or requests given by Parking Service regarding rectifications;
individuals), a certain number of parking places may be the requirements are related to seeking assistance or
excluded from the payment process and given for use forwarding the matter to the competent state authorities
under special conditions prescribed by the Regulations. for further proceedings. Control is exercised through a
This does not only apply to parking, but also to other variety of laws and regulations which Parking service
activities (construction works, storage of certain facilities must comply with in the management of car parks and
for public or private events, etc). parking areas. Laws and regulations are regulated by the
Maintenance of parking spots is the process by which state and local governing bodies. Also, the internal Rules
physical marking of new parking areas and parking spots of Procedures of the Parking Service are the element of
is performed, as well as the re-marking of existing parking control. Officers, controllers and collectors are the
spaces and spots. This refers to painting of areas, i.e. mechanism.
drawing of contours foreseen by traffic regulations as well In Figure 2 three processes that reflect the structure of
as setting up and repair of traffic and light signals. the parking management function can be seen.
According to the aforesaid, this function includes the The process of maintaining basic data is used to
following processes: combine all the activities related to the supervision of the
• Maintenance of basic information on car parks parking areas state and the state of traffic signalization
• Monitoring of car parks use
• Maintenance of car parks

The process of maintaining basic information on car


parks consists of the following activities:
• Creation and monitoring of basic data on parking
spots
• Creation and monitoring of basic data on signalization

The process of monitoring car parks use consists of the


following activities: Figure 1. Context diagram for the function of management of
• Applications forming parking areas

Page 318 of 478


ICIST 2014 - Vol. 2 Poster papers

Figure 2. Decomposition diagram for the function of parking Figure 3. The activity of car parks supervision
areas management
signalization relevant for the functioning of the parking
equipment. Input into the process consists of data from the areas are undertaken. Input consists of data from the field
field. Field data are obtained by the controllers, collectors and updated state of parking area obtained by the activity
and officers who are directly responsible for reporting. of car parks supervision. The output is in the form of
Data from the field are various observations concerning updated data on the signaling equipment. The control
developments in the parking areas or in their immediate elements are generic documents - regulations and laws.
surroundings. They can be in written or verbal form, but Officers, controllers and collectors make up the
they should all contain common elements - what was mechanism.
observed, when it was observed, how much was observed, Within the activities of the application formation,
who has observed it and where. The output from the Figure 4, procedures and steps that are used for forming
process consists of updated information on the status of applications for detected irregularities are carried out. The
parking area which is to be used as a basis for other input to this activity consists of updated state and
processes in this function. The control elements are laws application submitted by the user. The output is a
and regulations that set the legal framework for the document - application that is forwarded to the activities
implementation of this process. Officers, controllers and of forming order for the user. The control element is a
collectors are the mechanism. generic document - law which is acted upon in the event
Controllers and collectors are engaged in observing of detected irregularities. Officers, controllers and
events on the ground and forwarding data to the collectors make up the mechanism.
competent officers who make decisions on further steps. An order forming activity involves implementation of
Supervision is a process which involves the activities of procedures and steps necessary for forming order for the
giving permits and orders to users and requirements to user with the aim of correcting the detected deficiencies.
state agencies. Input is the application by the user The output is a document – an order for the user. The
regarding the observed irregularities. The output is in the control element is a generic document - Rules. Mechanism
form of generic documents - permit, order and consists of officers, controllers and collectors.
requirements. A requirement forming activity involves
Maintenance is a process in which the activities of implementation of procedures and steps for forming a
forming orders for maintenance of parking areas and request to state authorities in case it is necessary to carry
signalization are carried out. The input to this process is out the correction of irregularities over which the Parking
the order from the supervision process according to which service does not have jurisdiction. The input to this
the maintenance process – maintenance of areas and activity is order to the user for the elimination of
signalization – must take place. irregularities. The output is a generic document - request
Also, input can be a copy of the order for the user that to the state agency. The control element is a generic
is also generated in the supervision process and by which, document - Rules. Mechanism consists of officers,
in case the user fails to execute the order, the maintenance controllers and collectors.
process is required to carry out its activities. The output is
maintenance order by which a specific department is
required to perform that maintenance. Control elements
are generic documents: regulations and law. Officers,
controllers and collectors make up the mechanism.
Within the activity of car parks supervision, Figure 3,
procedures and steps that register any changes to the
actual parking areas are implemented. The input to this
activity consists of field data obtained by the employed
controllers and collectors and other employees. Output
consists of updated information on the situation. The
control elements are generic documents - regulations and
laws. Officers, controllers and collectors make up the
Figure 4. The activities of the application formation
mechanism.
Within the activities of signalization supervision,
procedures and steps that record all the changes in

Page 319 of 478


ICIST 2014 - Vol. 2 Poster papers

document - permit. The control element is a generic


document - Rules. Mechanism consists of officers,
controllers and collectors.
The activities of order for marking, Figure 5, include
steps and procedures for establishing the order for parking
spaces marking. Input is an order for maintenance
obtained from the supervision process. Output is the order
for marking that is forwarded to the Marking works
service. The control elements are generic documents -
Law and the Regulations. Mechanism consists of officers,
controllers and collectors.
The activities of order for signalization include steps
and procedures for establishing the order for signalization
Figure 5. The activities of order for marking
maintenance. Input is the order for maintenance obtained
A permit forming activities include the implementation from the supervision process. Output consists of an order
of procedures and steps forming a generic document - the for repair or setting up of signalization that is forwarded to
user’s permit - for the temporary use of parking space. Service for signalization maintenance. The control
The input to this activity is a request received from the elements are generic documents - Law and the
requirement forming activity. Output is a generic Regulations. Mechanism consists of officers, controllers

Figure 6. A relational data model for the function of parking areas

Page 320 of 478


ICIST 2014 - Vol. 2 Poster papers

and collectors. the starting point for creating a wide range of software
The relational data model shown in Figure 6 is a model solutions for data analysis in all segments.
which consists of the following entities: It is undeniable that the presented consideration is the
• Parking Spot contribution to enterprise integration and interoperability
of the entire company and the local government. PSKG is
• Street
a company that not only takes care of the parking of
• Traffic Sign passenger cars, but also participates in the regulation of
• Worker other issues in the field of road transport and the city
• Zone traffic, at the request of the local government. This zone
• Application system of utilization and charging makes the user choose
the most convenient parking method, which relieves
• Photos For Application parking space in the city centre, as shown in the practice.
• Request For Permit Information system of Parking service company, in this
• Permit regard, has an important role in providing all necessary
• Order - Signalization information to users of parking services, supervisory
board, local government as well as the employees and
• Request To Authorities management of the company.
• Oder For The User
• Zone / Street ACKNOWLEDGMENT
• Traffic Sign / Street The research presented in this paper was supported by
• Street / Parking Spot Ministry of Education and Science of the Republic of
• Order - Marking Serbia, Grant III-44010, Title: Intelligent Systems for
Software Product Development and Business Support
Independent entities are: Parking Spot, Street, Traffic based on Models.
Sign, Worker, Zone. These entities are candidates for a
table of codes in the database.
REFERENCES
Dependent entities are: Application, Photos For
[1] The public company „Parking service“ Kragujevac, Available at:
Application, Request For Permit, Permit, Request To www.parkingservis.rs/ (Accessed: 29.12.2013.)
Authorities, Order For The User, Order - Marking, Order - [2] Assembly of the City of Kragujevac, Available:
Signalization. This group of entities is a candidate for www.kragujevac.rs/Skupstina_grada-40-1 (Accessed:
those tables in the database that will represent documents 29.12.2013.)
in this function. [3] Law on Traffic Safety on Roads, Available at:
Derived entities are: Zone/Street, Traffic Sign/Street, www.parkingservis.rs/images/pdf/zakon_o_bezbednosti_saobracaj
Street/ Parking Spot. This group of entities is derived from a_na_putevima.pdf (Accessed: 29.12.2013.)
other entities, and they are candidates for tables in the [4] The European Parking Association (EPA), Available at:
http://www.europeanparking.eu/ (Accessed: 29.12.2013.)
database.
[5] Scope of Parking in Europe, Data Collection by the European
Tables code lists are: Parking Spot, Street, Traffic Sign, Parking Association,
Worker. Other tables refer to store data on the documents http://www.europeanparking.eu/cms/Media/Taskgroups/Final_Re
that exist in this function. These tables are contain photos port_EPA_Data_Collectionort_final_web1%20.pdf (Accessed:
for the application, request for a permit, the request to 29.12.2013.)
authority, order for labeling, order for signaling. [6] Parking service Belgrade, http://www.parking-
servis.co.rs/cir/parkiranje (Accessed: 30.09.2013.)
Other tables: Zone/Street, Traffic Sign/Street, Street/ [7] Zoned parking system in Belgrade, http://www.parking-
Parking Spot in a role of implementation of belonging servis.co.rs/cir/parkiranje/zone (Accessed: 29.12.2013.)
structure. [8] JKP Parking service Novi Sad, http://www.parkingns.rs/
(Accessed: 29.12..2013.)
III. CONCLUSION [9] R. Stair and G. Reynolds, Principles of Information Systems,
Course Technology Cengage Learning, Australia (2012)
This paper analyzes the basic flow of information in the
system for parking areas management. The presented [10] D. L. Olson and S. Kesharwani, Enterprise Information Systems:
Contemporary Trends and Issues, World Scientific (2010)
analysis is the basis for the establishment and operation of
[11] J. G. Maria van der Heijden, Designing Management Information
the information system of the considered company. Systems, Oxford University Press (2009)
Management of parking areas is the main business [12] K. C. Laudon and J. P. Laudon, Management Information
segment which can be upgraded by management functions Systems: Managing the Digital Firm, Prentice Hall (2002)
- user files, personnel, equipment maintenance, [13] J. A. O'Brien and G. Marakas, Management Information Systems,
purchasing, customer requirements etc. In this sense, the McGraw-Hill Companies,Incorporated (2006)
analysis presented in this paper is a contribution to the [14] A.Silberschatz and H. F. Korth, S. Sudarshan, Database System
development of the whole information system of the Concepts, McGraw-Hill, New York (2010)
company. Based on the basic element of information [15] T. Halpin and T. Morgan, Information Modeling and Relational
system development, information flow analysis, the Databases, Morgan Kaufmann (2010)
presented consideration has enabled the formation of the [16] H. Garcia-Molina, J.D. Ullman and J.Widom, Database systems:
relational database model. The presented relational the complete book, Pearson Prentice Hall (2009)
database model contains all the information necessary for [17] T. J. Teorey, S. S. Lightstone, T. Nadeau and H.V. Jagadish,
doing business in this segment. Also, this method provides Database Modeling and Design: Logical Design, Fourth Edition,
Academic Press (2010)

Page 321 of 478


ICIST 2014 - Vol. 2 Poster papers

ICT infrastructure at sports stadium:


requirements and innovative solutions
L. Petrović*, M. Desbordes**, D. Milovanović***
* ALFA University, Faculty of Management in Sport, Belgrade, Serbia
** Université Paris-Sud 11 & ISC School of Management, Paris, France
*** University of Belgrade, Faculty of Electrical Engineering, Belgrade, Serbia

[email protected], [email protected], [email protected]

Abstract—Design of a sports stadium seeks to maximize the advanced video/audio and security systems, and the best
benefits of information and communication technologies business model are presented.
(ICTs), with the new innovations becoming available all the
time. At the same time, developers need to understand II. REQUIREMENTS
requirements, objectives and priorities. The UEFA
requirements which reflect current standards for football Stadium infrastructure is categorized in UEFA
stadiums are reviewed in the first part of the paper. The regulations and, as such, rated as category one, two, three,
case study of system integrators that provide expandable or four (for elite 27 stadiums) in ascending ranking order
ICT-based solutions specially designed for the 2014 FIFA (previous method of ranking stadiums in one to five-star
World Cup is presented in the second part. Finally, one of scale is obsolete starting from 2006). Basic differences
the conclusions is that clever use of multimedia and among categories are in the structural criteria: field of play
interactive technologies can be harnessed to enhance the
(100-105m long, 64-68m wide), minimum floodlighting
spectators’ experience and generate revenue.
(800-1400lux), minimum seated capacity (200-8000), and
minimum space for press/TV [4, 5].
I. INTRODUCTION Technology program development. The development of
The UEFA (Union des Associations Européennes de a program can help identifying all systems, users and
Football) has published extensive catalogs of applications necessary for the stadium facility, as well as
requirements for football stadiums relating to their interoperability, convergence and network allocation in
construction, infrastructure, organization and operation. system implementation [7]. The development of the
There are also more and more specialist companies and program should depend on:
system integrators that provide adapted ICT-based  systems and applications implemented,
solutions specially designed for use in stadiums.  level of system convergence to IP,
Therefore, sports clubs are looking for competent  support of systems, users and application,
providers with whom they could form a long-term
 allocation of services,
strategic partnership for adaptation and implementation
of ICT solution package: basic equipment, media  system reliability and redundancy,
systems, ICT network within stadium and other locations,  loss prevention,
mobile solutions, security concept [1, 2, 3]. The  uninterruptible service and connectivity,
developers need to understand requirements, objectives  future expansion and growth potential.
and priorities. Decisions taken at the beginning of any The escalating demand for a wide and reliable
project are vital for its future success. In this regard, it is implementation of open-architecture electronic
of paramount importance to fully understand the needs, communications systems requires an immediate planning
objectives and limitations. The selection of ICT of core infrastructure. This should take place at the same
specialists, consultants and contractors should be time as the development of the architectural building
carefully managed to ensure that every stage of the program. Most electronic building systems are
project is implemented to the highest possible standards, converging to a common and open Internet data protocol
on time and within budget. Each stadium is its own IP, which uses Ethernet-based links and networks:
unique case. In addition to a specific set of current and telephone, administrative data, wireless data (WiFi),
future needs, each is defined by its traditions, and the building management systems, electronic access control
community it represents [4, 5, 6, 7, 8]. and intrusion detection, video surveillance, television and
The paper is organized as follows. In the first part, the other low-voltage electrical systems. Electronic building
categorization of stadium infrastructure and technology systems will continue to evolve using IP, making the
program requirements is reviewed. The requirements planning of these systems increasingly important. Given
reflect current Digital TV standards and state-of-the-art the increase in system convergence and integration,
design of surveillance systems. In the second part, ICT planning for both present and future is vital to ensure the
infrastructure of projects for the 2014 FIFA World Cup, longevity of systems. These criteria must be developed by

Page 322 of 478


ICIST 2014 - Vol. 2 Poster papers

following existing communication industry standards concourse, fire doors, VIP areas, and all player and media
(ISO/IEC, ANSI/TIA/EIA, IEEE and BICSI) that help to areas, should comply fully with national and local safety
anticipate future technologies. regulations and standards, with regard to both fire
Fundamental elements of communication systems protection and health and safety [6].
which need to be reviewed and evaluated are namely: All stadiums need to have a fully integrated safety and
 core infrastructure (dedicated communications rooms, security strategy that covers the entire structure and its
raceways and containment); surroundings. It is vital for security to be centralized and
 support systems (heating, ventilation and air that those responsible for implementing the strategy have
conditioning, electrical power and lighting); a full view of all major sections of the venue. For ease of
 cable infrastructure (facility backbone and horizontal operation, the stadium staff needs to ensure that CCTV
cabling); cameras (closed-circuit television) are correctly
 system electronics (telephone switch, data switches, positioned. The audio quality of the PA (public address)
servers and computers); system needs to be high in order to ensure that important
 implementation (support, applications, network or emergency announcements are clearly audible
allocation and services); throughout the venue. The stadium design must include
 administration (management, maintenance, upgrades). control rooms and meeting rooms for security staff, as
well as adequate facilities for the police and first aiders.
Communications systems, applications and users that Furthermore, provision must be made for easy, direct
need to be considered and coordinated during the vehicle access for emergency services.
program development and design of a stadium are, as
follows: Most modern stadiums have large video walls or
digital scoreboards that are used to broadcast match
 administrative data system;
highlights and other announcements. They also serve for
 broadcast television;
a vital purpose in terms of safety, as they can be used to
 building management systems;
transmit video and text instructions to the public in the
 cash point/ATM machines; case of an emergency.
 clock system;
 fire alarm systems; CCTV surveillance cameras should be installed in all
 food service point of sale; internal and external public areas inside and outside the
 lighting control; stadium, and should be used to monitor any areas where
 mobile telephone service; there is potential for security problems. During the design
stage, the security consultant should provide a concise
 police and fire radio;
layout of the CCTV camera positions and requirements
 public telephones;
inside and around the stadium. The stadium should have a
 retail point of sale; centralized control room located in a prominent position
 roof controls; in the bowl. The control room should have an unrestricted
 scoreboard; view of as many spectator areas as possible, as well as of
 secured telephone system; the pitch. The control room is the hub from which the
 security electronic access control; stadium security officer and their team, together with
 security electronic intrusion detection; representatives of the local authorities and emergency
 security video surveillance; services, monitor and control all aspects of crowd safety
 signage; and stadium management. The control room must be
 sound systems; fitted out with a full range of communication equipment,
 telecommunications utility service; including the PA system and access control and counting
 telephone system; systems. Control room operators should be able to
 ticketing; monitor non-visible areas by means of a network of
 video boards; CCTV cameras and screens. The surveillance cameras
 wireless Internet and data. should be linked to color monitors and should have pan,
tilt, and zoom functions as well as the inbuilt facility to
Operational safety and security. The important take still pictures.
aspects in the planning, design, construction, running and
management of any stadium are safety and security. A. Digital television infrastructure
Experience has demonstrated the need to have in place a
stringent but people-friendly safety strategy. Personal The FIFA (Federation Internationale de Football
safety of those inside the venue is very important and no Association) requirements reflect current DTV standards
expense should be spared to ensure that all spectators are in stadiums. However, exact capacities and quantities will
able to watch and enjoy the match in a safe environment. be determined in each case by the organizing authorities,
Safety aspects of the design and construction should the media services and the broadcasting organizations. In
always be prioritized, even where this may be detrimental television in particular, flexibility is required in order to
to factors such as comfort. Every section of the stadium, accommodate newly developing technologies to
including access and exit points, turnstiles, the main maximize coverage [7].

Page 323 of 478


ICIST 2014 - Vol. 2 Poster papers

Lighting requirements change according to technical required for each camera. Depending on the importance
developments, such as the introduction of high-definition of the match, between three and six portable atmosphere
television (HDTV). For a new stadium, it is advisable to cameras may be used, allowing movement along the
consult a leading television company or the appropriate touch line and in the area behind the goals. This needs
continental television consortium. approval from the football governing body concerned.
Depending on the importance of the game, many Given the developments in the television coverage of
installations (such as seats for radio and television football, additional cameras and camera positions may
commentators) may be temporary. They will be erected include reverse-angle cameras, cameras level with the
for a short time and then be dismantled. It is essential to edge of the penalty areas, six-meter cameras and rail
provide easy access to and from these areas and an cameras [7].
adequate electricity supply. Unilateral coverage. At each unilateral camera
TV/audio system. All hospitality areas should be position in the main stands and behind the goals, a feed of
equipped with audio and video equipment and networks. the international sound should be available. Space of
The required number of television sets is: 1 in each approximately 2m x 3m per camera should be provided
private area; 1 per 50 guests in the commercial affiliate alongside the multilateral cameras. There should be
hospitality village areas; 1 per 100 guests in the prestige clearly defined and separate sectors behind the
areas; 1 per skybox. advertising boards behind each goal, measuring
approximately 2m x 2m per camera. In both cases, the
Television studios. Provision should be made for at
least three television studios for major matches, each of exact number of positions should be determined by the
approximately 25m2 and a minimum height of 4m, to organizers and broadcasters. Further positions may be
allow for television sets and lighting. They should be located beside or behind the commentary area, as
located in such a way that players and coaches can reach determined by the organizers and broadcasters. Observer
them easily from the dressing rooms at the end of the seats without desks for broadcaster personnel should also
match. In addition, one television studio should afford a be located in this sector. Where possible, space should be
panoramic view over the pitch. For major international provided at specified places near the players’ entrance to
events, up to four such studios may be required. the field. The allocation and use of this space, especially
Multilateral coverage. All camera positions are for interviews and presentations, will be subject to
subject to a joint agreement between the organizers and regulations [7].
broadcasters (Fig. 1). Attention must be paid to avoiding
cameras being impeded by the public. Main cameras in B. Television surveillance system
the central stand must be situated at the halfway line, at A modern stadium should be equipped inside and
the point of intersection between the line and the nearest outside with public surveillance color television cameras,
touch line. The exact position of the multilateral cameras mounted in fixed positions with pan and tilt facilities.
will be determined by the host broadcaster on inspection These cameras should monitor all of the stadium’s
of the stadium. These cameras must face away from the approaches and all of the public areas inside and outside
sun, giving an unhindered view of the whole playing the stadium [7].
surface. The commentators’ positions have to be situated The television surveillance system should have its
on the same side of the ground. A space of approximately own independent power supply and private circuit. It
2m x 3m should be allowed for each camera. One goal should be operated and controlled from the stadium
camera should be situated behind each goal, on the control room where the monitor screens should be
longitudinal axis of the pitch, at a height which permits situated. It should also be capable of taking photographs
the penalty mark to be seen above the crossbar of the both inside and outside the stadium.
goal. The angle of the line of sight to the horizontal
Each stadium must have a control room which has an
should be between 12° and 15° and a space of 2m x 3m is
overall view of the inside of the stadium and which must
be equipped with public address facilities and television
surveillance monitor screens. The size, configuration and
furnishing of the control room should be agreed upon in
consultation with the local police.
The stadium commander should have the capability of
overriding and cutting into the public address system
whenever necessary. The system governing the arrest,
detention and indictment of offenders may differ from
country to country, or even from city to city, so stadium
designers should consult the local police and civic
authorities to determine whether it is necessary to include
facilities such as a police muster room, a charge room and
detention cells for male and female prisoners within the
stadium itself.

Figure 1. DTV infrastructure: a) possible TV camera position,


Page 324 of 478
b) standard fixed/field camera views.
ICIST 2014 - Vol. 2 Poster papers

A second control room and emergency command make provision for cabling channels and signal repetition
centre is desirable. It should have a location which is that will allow any new technology to be incorporated in
convenient for arriving emergency personnel and their the future. Advance provision costs much less than
vehicles. subsequent adaptation.
Sufficient CCTV surveillance monitors and control Multimedia installations such as video walls, TV
systems should be installed in the VOC to properly screens and automated information systems will continue
undertake proactive and reactive surveillance monitoring to become more and more sophisticated and versatile. 3D
and control of the cameras. Furthermore, the system TV is already a reality. Higher quality video screens,
should contain digital video recorders (DVRs) of information panels and internal stadium information
sufficient capacity to record and store images for a networks will all help to enhance the spectator experience
minimum of 60 days. in the future. Devices such as Smart phones and tablets
There should be a robust and comprehensive play an increasingly prominent role in our daily lives.
communications system for all aspects of stadium safety Stadium developers can exploit these technologies to
and security. Standard commercial mobile phone enhance their own operations, media services and,
networks often become overloaded during an incident and perhaps most importantly, interaction with the spectators.
therefore cannot be relied upon as a means of WiFi-enabled stadiums provide enhanced connections
communication for the purposes of safety and security. for mobile phones and other Internet-linked devices,
As such, the following systems should be in place in the permitting spectators to access a wide range of
VOC: information and statistics related to the event they are
 external fixed landline, direct dial; attending, which can enhance their overall experience.
 intercom or internal fixed landlines between key Complex systems can be developed to interact with
locations around the stadium and the VOC; personal handheld appliances such as telephones, tablets
and games consoles that can provide fans with
 radio network for all safety and security functions;
multimedia content relevant to the event and, indeed, to
 Internet/data facilities. other events taking place elsewhere.
Communications rooms should include: There are more and more specialist companies and
telecommunications utility demarcation rooms, a main system integrators that provide adapted technology-based
cross-connect room (main communications room), solutions specially designed for use in stadiums [9].
computer equipment rooms (data centre or server rooms)
and intermediate cross-connect rooms (communications
A. ICT infrastructure projects for the 2014 FIFA World
distribution rooms). The location of the communication
Cup
distribution room is critical to ensure that the length
limitations of horizontal cables are maintained. The 2014 FIFA World Cup, an international football
Communications rooms should be located to ensure total tournament held every four years, will be hosted by
cable length to any outlet device does not exceed 90m. Brazil in 2014 in 12 cities around the country. Brazil is
Strict adherence to this is required. Segments exceeding also gearing up for the Summer Olympic Games in 2016.
this length will not function and certainly will not support In preparation for these events, ICT infrastructure is
future technologies. Communications rooms should be being prepared for stadiums throughout the country,
dedicated and separate from electrical rooms. Rooms including data network systems and security solutions
should align vertically to form risers to ease the that enable the control of monitoring cameras.
installation of cable throughout facility. Co-locating or Company NEC is in charge of building ICT systems
sharing rooms with communications and other low- at 4 of the 12 official World Cup stadiums and at 1 quasi-
voltage systems is recommended. The sizes of all official stadium. The stadiums all boast state-of-the-art
communications rooms will depend on the type of room, designs that incorporate advanced technology as well as
the equipment supported and distribution densities. The sustainability, environmental and safety considerations.
communications cable infrastructure system should be NEC, a provider of information technology services and
planned to support voice and data applications/systems products, will fully integrate the stadiums infrastructure,
operated over a multi-media cabling plant including fiber including IP and wireless networks for telephony, data
optics and twisted pair copper [7]. and images. The project also includes a state-of-the-art
sound system and large-scale video screens rivaling those
III. INNOVATIVE SOLUTIONS of the largest stadiums in the world. NEC will provide a
Modern stadium designs seek to maximize the wide range of integrated systems that include the latest IP
benefits of information and communication technologies, data and communications networks, security solutions
with new innovations becoming available all the time. If featuring 250 security cameras, sound systems, fire
cleverly used, multimedia and interactive technologies detection systems and building automation systems that
can be harnessed to enhance the spectators’ experience are designed, deployed and monitored by ICT
and enjoyment of the match. Smaller stadiums are likely infrastructure from NEC [10].
to be more restricted in their budgets but should still be in Brazil's preparations for hosting include the
a position to take advantage of some, if not all, construction of new sports arenas, as well as the
technological advances. Stadium designs should always development and implementation of next-generation

Page 325 of 478


ICIST 2014 - Vol. 2 Poster papers

smart cities surrounding these areas after the close of result, existing security cameras can be used to detect
events. In Brazil, which is undergoing rapid economic disturbances, such as a sudden change in the flow of foot
growth, the smart city concept is being developed in traffic or a crowd formed around a fallen individual, and
urban districts, starting with the areas around the to accurately estimate the degree of congestion, even in
stadiums to help ensure development, continues in the an extremely crowded environment. In a crowded
host regions even after this large-scale national event is environment, where hundreds of people are interacting in
over. Creative solutions, cutting edge technology, and a complex mixture of activity, it is difficult for
integrated security, environmental and energy savings are conventional technologies to quickly detect and identify
combined so that in a given locality, services such as the characteristics of a disturbance [11, 12].
communication, transport, energy, and security function Key features of this technology include the following:
as efficiently as possible. The resources of smart cities
1. Accurate understanding of crowd conditions.
are managed optimally. Once you build a football
This technology creates pseudo-images through
stadium, for example, the goal is then to transform the
simulations that demonstrate a wide range of different
surrounding area. A smart city project can also be
conditions, such as varying degrees of crowd
focused on rejuvenating a city, like what is happening in
congestion and each individual’s behavior. Image
Rio de Janeiro. NEC is working on the electronics for the
recognition technologies then employ a proprietary
Olympic Village in Rio de Janeiro. NEC is building two
algorithm to match and analyze security camera
large areas, one for housing the games, and another for
images against pseudo images. This enables crowd
the athletes and both must meet the minimum
conditions to be understood with a high degree of
requirements of the IOC (International Olympic
accuracy even when people in the images appear to be
Committee). After the Olympic Games, these regions will
heavily overlapped.
be turned into large residential condominiums for the
middle class. In this case, the onboard electronics 2. Fast and accurate analysis of changing crowd
implemented by the company will meet the needs of the conditions.
2016 Olympic Games, while also anticipating demand Special focus is placed on the behavior of people in
post-Olympics. In another smart city project will have an the vicinity of any unusual occurrence. For example,
entertainment venue, including a new intelligent and if two individuals begin to quarrel in a public space,
multi-purpose arena. It must also be prepared to the technology closely analyzes the behavioral
accommodate corporate and residential areas, as well as patterns of people surrounding the incident, such as
research and development facilities, universities, hotels, temporary pauses or large gatherings. Moreover, the
and shopping malls. level of an incident can be estimated based on a pre-
determined threshold value that helps measure the
B. Advanced video/audio and security systems magnitude of a crowd’s change.
After almost 20 years, the NEC began realizing the
limitations of analog-based system: it provided IV. CONCLUSIONS
deteriorating image quality and was limited in terms of ICTs have become the hidden backbone of industrial
coverage due to a lack of cabling infrastructure. NEC has systems, commerce, transport, government, health,
had some very specific criteria for its new technology entertainment – defining how we live, work and play the
platform: IP-based and open-architecture system that world over. They are fundamental to every aspect of
would be able to integrate with other systems, scalable modern society, cutting across all industry sectors and all
system, designed to adapt to future changes, and very areas of life, visible or not. ICTs are behind the incredibly
user friendly. accurate timekeeping technology which measures
NEC is providing solutions feature highly networked sporting performance to the thousandth of a second, and
and integrated technologies that help to ensure security the highly-complex technological and logistical
and provide high quality visual and audio systems. The broadcasting operations that brings results all but
networked system will contribute to securing the stadium instantaneously to sport lovers everywhere.
with hundreds cameras that monitor for emergencies and ICTs are set to play an increasingly prominent part in
may perform facial recognition, enabling stadium stadium design and construction in the future. While
operators to better understand the needs of audience smaller stadiums may not have financial resources to take
members based on characteristics that include age, gender full advantage of every advance and innovation,
and location [10]. experience shows that new technology which is initially
Also, NEC has developed crowd behavior analysis expensive eventually comes down in cost, making it
technology that utilizes security camera footage to affordable to more and more stadium developers. Five out
understand the behavior of crowds. This technology is of 12 stadiums of 2014 FIFA World Cup are partnerships
expected to contribute to greater safety and security between public and private sector initiatives. It is
through an early detection of unusual conditions and believed that is the best business model in terms of
accidents. NEC's newly developed technology analyzes legacy, since it is a long term agreement and the company
the influence that an unexpected incident, or the signs of responsible for the stadium will be able to get some
an incident, has on the behavior of a crowd in order to return on investment. However, winners from many
accurately identify changes in the crowd's behavior. As a auctions are companies that presented lower costs;

Page 326 of 478


ICIST 2014 - Vol. 2 Poster papers

therefore, it is needed to check what kind of ICT avoiding often rushed and stressful process of trying to
equipment they are using. purchase food and drink during the half-time interval.
Major- and mega- sports events such as the 2014 In order to exploit all of these technological
FIFA World Cup or Olympics Games can change entire opportunities, stadium ICT infrastructure should be
cities, even countries. Older experiences have shown configured to incorporate data cabling and fiber-optic
internal positive impacts. Big events act as a catalyst for networks. The ability to offer state-of-the-art solutions
new technologies and ICT innovation: the enhancement will be an attractive facet of the commercial packages
and optimization of international gate links, construction offered by a stadium. It should also be future-proofed, i.e.
of a data centers, fiber optic in metropolitan areas, designed to adapt to future changes that the latest
increase in broadband speed and capacity, 4G wireless technological advances can always be embraced.
services and local content production. However, several To conclude, it is obvious that ICTs already underpin
ICT specialists are worried that ICT infrastructure of all aspects of modern life, including sport, and have the
stadiums could be outsourced, under an arrangement that potential to create enormous societal change for the good.
would have a company deploy a container with Current and future innovators and businesses will reap
everything that is needed to support the event that they the benefit of the legacy infrastructure of the largest –
could pull out once an event ends. This would fit the mega and major – sports events long after the last trophy
FIFA requirements, but may not be good for country and medal have been awarded, as ICTs and innovative
development, since it would be short-term. Another solutions in sports industry combine fruitfully to produce
question is how country will use the whole infrastructure not only sporting success, but long-lasting socio-
built for the World Cup, not only telecommunications. economic transformation.
Technology has advanced enormously in recent years,
and there are now many applications that can be used in
stadiums to increase revenue generation. In addition to REFERENCES
online shops from which fans can buy team merchandise, [1] L.Petrović, D.Milovanović, M.Desbordes, “Emerging technologies
Websites of some clubs and national associations now and sports events: Innovative information and communications
even allow you to make stadium restaurant reservations. solutions”, Sport, Business, Management: An International Journal,
As the influence of Websites and social networking sites Emerald Group Publishing Limited, UK., accepted for publication.
such as Twitter and Facebook continues to grow, so does [2] L.Petrović, D.Milovanović, "Analysis of digital video in sports
event: research, technology development and implementation", in
the scope for commercializing an online presence. In Proc. of the YuINFO, March 2012.
WiFi-enabled stadiums, spectators have access to a wide [3] K.R.Rao, Z.Bojković D.Milovanović, Wireless multimedia
variety of online information on match days. They can communications, CRC Press, 2009.
access statistics and match reports and in some cases, [4] UEFA Regulations of the Europa league, Stadiums and match
where allowed, can even replay the match itself online, via organisation, 2010/11, pp.17-20.
computers, mobile telephones, and tablets. Advertising [5] UEFA Stadium infrastructure regulations, Edition 2010, 24 pages.
revenue has become increasingly important for stadiums [6] UEFA Guide to quality stadiums, 2011, 160 pages.
and new technology has revolutionized the ways in which [7] FIFA Football stadiums: Technical recommendations and
requirements, 2007, 248 pages.
this can be delivered. On match days, large video walls, [8] FIFA Stadium safety and security regulations, 2010, 112 pages.
TV screens, LED displays and digital hoardings can all be [9] T-Systems International, Integrated, expandable ICT innovation
used to help deliver a striking visual message to fans in solution in the DFL, (available online www.t-systems.com)
the stadium as well as TV viewers. The scope for [10] NEC Builds ICT systems for more soccer stadiums in Brazil,
expanding online commerce within the context of football (available online www.nec.com)
events is huge. Many fans already purchase their match [11] NEC technology detects changes in crowd behavior and promotes
security, C&C User Forum & iEXPO 2013 at the Tokyo
tickets online. However, there will come a time when International Forum, 14 – 15 November 2013.
spectators will even be able to order refreshments and [12] NEC’s pursuit of imaging and recognition technologies, NEC
have them delivered without even leaving their seats, thus Technical Journal, vol.6, no.3, 2011, pp.10-16.

Page 327 of 478


ICIST 2014 - Vol. 2 Poster papers

Sakai CLE in Serbian Higher Education


Goran Savić, Milan Segedinac, Nikola Nikolić, Zora Konjović
Faculty of Technical Sciences, University of Novi Sad
{savicg, milansegedinac, nikola.nikolic, ftn_zora}@uns.ac.rs

Abstract - The paper presents experiences with Sakai CLE general. Sakai OAE provides a personalized learning
at the Faculty of Technical Sciences of University of Novi environment organized as a social network. The system
Sad. Sakai CLE has been used at the Faculty in a blended encourages establishing a user community for creation,
learning since 2011. Currently, 369 students are using Sakai sharing, using and evaluation of learning material.
within 4 courses. The paper analyzes the usability of the Regarding the implementation, Sakai is implemented
Sakai tools and their impact on the courses. In addition, two using Java technologies.
new Sakai tools developed at the Faculty of Technical This paper analyzes Sakai CLE. As we mentioned
Sciences are described. The first tool is developed as a above, Sakai CLE provides creating web pages containing
support for the SCORM courses personalization, while the a specific set of Sakai tools. The Sakai tools are
second one provides the semantic search of learning
independent components that encapsulate specific
resources stored in the Sakai CLE.
functionalities and interact with a user. There are three
main categories of tools in Sakai: learning tools (learning
I. INTRODUCTION resources management, online exams ...), administration
tools for teachers (grading, calendar, student management)
Computer-supported learning has been usually and collaboration tools (DropBox, chat, forum ...).
implemented within specialized software applications
commonly named Learning Management Systems (LMS). Sakai CLE is accessible by registered users only. There
Ferris and Minielli in [1] explain that LMS is a software is a separate account for each student and teacher. Users
application for creation, storage, management and using of participate in courses with different roles. User may be an
learning resources. LMS provides an infrastructure that instructor, a teaching assistant or a student in the course.
provides management and delivery of learning material, An instructor manages users and content in the course.
definition and evaluation of individual and institutional Teaching assistant is allowed to manage content in most
learning objectives, and learning process monitoring [2]. course sections. Student may access learning material and
So far, various LMS have been developed. A list of 383 create content using tools designed for students. E.g.
publicly available LMS (with user interface in English) is he/she may post messages on the forum, upload files in
given in [3]. According to [4], among commercial LMSs the drop-box section, edit wiki pages etc. In addition to
most widely used are Blackboard, Desire2Learn and these three user categories, there is an administrator as a
eCollege. When it comes to free LMSs, most popular are super user who may access whole content and
Moodle, Sakai and Canvas. The source code of these three functionalities in the system.
systems is publicly available. Using an open-source LMS Beside web pages containing e-courses content, for
enables modifying existing functionalities and developing each user there is a separate set of pages that represent his
complete new functionalities adjusted to the educational personal workspace. The workspace may contain tools for
institution needs. the administration of user's personal data, tools related to
This paper describes the experiences with the Sakai some course-independent Sakai functionalities and tools
LMS. From the beginning, Sakai has remained on the that provide a centralized access to data from all users’
open-source LMS market in the shadow of the more Sakai courses.
popular Moodle LMS. In this paper we describe our
experiences with the Sakai LMS to show how Sakai fits III. SAKAI AT THE FACULTY OF TECHNICAL SCIENCES
the e-learning needs of the Serbian higher education. Sakai CLE ver. 2.7 has been used at the Faculty of
Technical Sciences (FTN) at the Chair of Informatics
II. SAKAI CLE from February 2011 (hereinafter FTN Sakai). FTN Sakai
Sakai [5] is an LMS developed within the Sakai is publicly available at the internet address [6]. So far, 4
foundation that was formed by several educational courses have been set on the FTN Sakai system. 369
institutions, commercial organizations and individuals in students and 5 teachers have participated in the courses.
order to integrate independently created local LMSs into a Table 1 presents the details about the courses in FTN
single modern LMS. Sakai is a free and open-source Sakai.
system. Currently, it has two official distributions - Presented courses have been used in a blended learning
Collaboration and Learning Environment (Sakai CLE) and environment as a support for the traditional face-to-face
Open Academic Environment (Sakai OAE). learning on the Faculty. The home page of the Web
Sakai CLE is a content management system which programming course in FTN Sakai system is showed in
provides various specialized e-learning tools. Apart from Figure 1.
e-learning purposes, it may be used for collaboration in

Page 328 of 478


ICIST 2014 - Vol. 2 Poster papers

for displaying all completed ILS questionnaires, filling out


TABLE I. a new questionnaire and displaying questionnaire results,
FTN SAKAI COURSES separately. A newly created web page for filling out an
Study No. of No. of ILS questionnaire is shown in Figure 2.
Course Semester
program students teachers After completing the questionnaire, the student’s
Computers learning style is calculated. Figure 3 shows our page for
Web
and VI 328 3 displaying the results of an ILS questionnaire.
programming
automation
Software and As mentioned, the personalization is done on SCORM-
Web compliant courses. Standard Sakai distribution doesn’t
information III 41 1
programming
technologies contain a tool for working with SCORM-compliant
Platforms for courses, but there are few third-party tools for this
Software and
object- purpose. In our FTN Sakai extension, a SCORM tool
information III 41 1
oriented
technologies developed by the UC Davis University [9] has been used.
programming
Software and
We have integrated previously described personalization
Web design information IV 41 1 mechanisms into this tool. Hence, in the FTN Sakai
technologies system, the UC Davis SCORM tool displays SCORM
courses personalized according to the results from the ILS
IV. FTN SAKAI EXTENSIONS questionnaire. The personalization process and the
screenshot of the personalized web programming course
Beside the tools that are parts of the standard Sakai in the UC Davis SCORM tool is shown in Figure 4
distribution, FTN Sakai has been extended with two tools (learning material in the course is in Serbian).
developed at the Faculty of Technical Sciences.
The quality of the personalization highly depends on
As mentioned, Sakai source code is publicly available. metadata that describe learning resources. A teacher
The code has been written in Java programming language defines metadata during a course creation phase. XML
and organized in more than 80 projects for Eclipse IDE syntax, used for specifying resources and their metadata,
[7]. Sakai system runs on the Apache Tomcat web server is primarily a machine-readable format. Teachers without
and uses MySQL database management system.
Regarding presentation layer, different front-end
technologies, such as servlets, Velocity, JSP and JSF are
used.
A. Sakai tool for the SCORM courses personalization
First tool has been developed as a support for the
personalization of SCORM courses. Learning material in
SCORM-compliant courses has been personalized based
on student’s learning style. Learning style has been
defined according to the Felder-Silverman model of
learning styles. In this model, student’s learning style can
be determined using the Index of Learning Styles (ILS)
questionnaire. Personalization principles are described in
detail in [8].
We extended Sakai system with a tool that generates a
personalized SCORM course for each student. The tool
contains a module for determining student’s learning Figure 2. Filling out a new ILS questionnaire in the FTN Sakai
style. The module has been implemented using JSF extension for SCORM courses personalization
technology. Within the module, a student can fill out an
ILS questionnaire. The module contains three web pages

Figure 3. Displaying the results of an ILS questionnaire in the


Figure 1. Web programming course in FTN Sakai Sakai FTN extension for SCORM courses personalization

Page 329 of 478


ICIST 2014 - Vol. 2 Poster papers

Beside the page for displaying metadata, we have


implemented a support for editing of metadata on a
separate web page. Figure 6 shows a screenshot of this
page.
B. Sakai extension for the formal representation of
learning goals
Second extension developed within the FTN Sakai
system is related to the representation of the learning goals
in a Sakai course. Further text gives a brief description,
while all details are presented in [10]. Standard Sakai
distribution contains a tool named Syllabus intended to
describe the course structure. We may consider this
structure as learning goals that should be achieved in the
course. The Syllabus tool represents learning goals as a
free text. Such representation is not appropriate for
Figure 4. Filling out a new ILS questionnaire in FTN Sakai machine processing. To overcome this constraint, we have
extension for SCORM courses personalization developed a new tool for representing learning goals in the
FTN Sakai system. The new tool provides machine-
readable description of learning goals in a course. As
specific technical knowledge may find it difficult to define formalism, we have used the ontology described in [11]
resources’ metadata. To facilitate specifying metadata, we and [12]. We have modified Sakai Resources tool (which
have created a new page within FTN Sakai system. The provides learning resources management) to enable
page displays resources graphically in a form of a tree. linking learning resources to formally represented learning
Next to the resource, its metadata are displayed. This page goals. The extension incorporates a new metadata field in
is shown in Figure 5. the metadata set of the learning resource. This new
metadata field defines a learning goal related to the
learning resource. Figure 7 shows a page for displaying
resource metadata extended with a new textbox field for
entering a corresponding learning goal.
Incorporating machine-readable representation of
learning goals has enabled the implementation of new
features in FTN Sakai. We have added two more actions
for the selected learning resource which provides finding
similar resources and finding precondition resources,
respectively.
To achieve a specific learning goal, a student consumes
a set of learning resources related to this goal. We use the
term “similar resources” for all resources designed to

Figure 5. Web page for displaying resources in a FTN Sakai SCORM


course

Figure 6. Web page for editing resource’s metadata in a FTN Sakai Figure 7. Defining a learning goal related to the learning resource
SCORM course in FTN Sakai

Page 330 of 478


ICIST 2014 - Vol. 2 Poster papers

achieve the same learning goal. In FTN Sakai, we have course content. For each course, a set of wiki pages
extended the Resources tool, from the standard Sakai containing course announcements, assignments and
distribution, to support finding similar resources for the learning material has been created. In contrast to Sakai,
selected resource. Figure 8 shows a newly created page MoinMoin doesn’t provide specialized tools for course
for displaying all learning resources similar to the resource administration, so course content is administered by
“CSS pre-test” in the Web programming course. editing wiki pages. Since there is no integrated wiki editor
Achievement of a specific learning goal implies usually in MoinMoin, a user must be familiar with wiki syntax to
some prior knowledge. We have extended Resources be able to edit wiki pages.
Sakai tool to provide finding all learning resources that MoinMoin has been primarily used as a repository of
should be studied before the specific resource. We refer learning material. As previously mentioned, the most
such resources as precondition resources. Precondition frequently used Sakai tool is the Resources tool for the
resources can be found by exploring the ontology of administration of learning resources. Since Sakai has been
learning goals previously stored in FTN Sakai. The also primarily used for sharing learning material, it has not
ontology contains the “is-precondition” relation. This brought significant improvement of the teaching practices
relation between two learning goals implies that first established within MoinMoin platform. Migration to Sakai
learning goal should be achieved before the second one. has not shifted a focus of the teaching scenarios, since the
For the selected learning resource, its precondition focus has stayed on the learning resources retrieval.
resources are all learning resources related to a learning Although Sakai has been mostly used for the learning
goal which is a precondition for the learning goal related resources management, it is still more comfortable
to the selected resource. Figure 9 shows precondition software environment than MoinMoin platform, since it
resources for the resource “HTML tables” in the Web provides specialized e-learning tools. Also, it is relatively
programming course. easy to train new Sakai users because they do not need to
So far, described extensions have been used as a part of learn wiki syntax like with MoinMoin. Still, we should
the scientific research only. The extensions have not been mention that the version 2.7 of Sakai CLE relies on the
used in the courses officially and they are not deployed on web 1.0 technology, which implies some constraints
the publicly available FTN Sakai server. regarding the user experience. Sakai 2.7 tools do not use
Ajax technology to communicate with the server which
V. EVALUATION makes content displaying slower than it might be.
This section analyzes benefits of using FTN Sakai in Concerning the other tools in FTN Sakai, DropBox tool
teaching at the Chair of Informatics. Beside the Sakai has simplified the communication with the students. This
system, the MoinMoin [13] software platform has been tool has provided a centralized repository for sharing
used as an environment for blended learning at the Chair. learning material between teachers and students, which
MoinMoin is a generic content management system based has made it easier for a teacher to monitor students’
on wiki pages whereby we use it for the administration of engagement in the course. Regarding Forum, as another
tool used for the communication between course
participants, it has been used very rarely. The reasons may
be different, but we assume that social networks are
nowadays a common environment for the electronic
communication between students.

VI. CONCLUSION
For sure, Sakai CLE has improved learning
technologies at the Faculty of Technical Sciences
introducing more flexible tools for sharing learning
resources and communication between teachers and
students. Also, Sakai’s architecture is extendable which
allows us to implement two new Sakai tools to meet e-
learning needs in our institution.
Figure 8. Web page for displaying similar resources in FTN Sakai
Still, current Sakai tools lack of quite old fashioned user
interface and obsolete implementation technologies.
Moreover, the future of the whole Sakai project is
uncertain. The University of Michigan and the University
of Indiana as a two founder-institutions drop out the
project last year and they will not participate in the project
development anymore.
All these reasons led us not to choose Sakai as a long
term e-learning solution in our institution. Currently, we
are migrating our e-courses to the Canvas LMS which
seems to be a new raising star on the e-learning sky,
providing large set of functionalities mixed with an
attractive and easy to use user interface.
Figure 9. Web page for displaying precondition resources in FTN
Sakai

Page 331 of 478


ICIST 2014 - Vol. 2 Poster papers

ACKNOWLEDGMENT [6] Chair of Informatics, Faculty of Technical Sciences, “FTN Sakai”,


http://www.enastava.informatika.ftn.uns.ac.rs, 2011.
Results presented in this paper are part of the research [7] Eclipse Foundation, “Eclipse Platform“
conducted within the Grant No. III-47003, Ministry of http://www.eclipse.org/platform (2012)
Education, Science and Technological Development of the [8] G. Savić, Z. Konjović, “Learning Style Based Personalization of
Republic of Serbia. SCORM E-learning Courses”, Proceedings of the 7th
International Symposium on Intelligent Systems and Informatics,
REFERENCES pp.316-320, Subotica, Serbia, 2009.
[9] UC Davis, “Sakai SCORM Player”,
[1] P.S. Ferris and M.C. Minielli, “Using Electronic Courseware: https://confluence.sakaiproject.org/display/SCORMPLAYER/SC
Lessons for Educators”, in: Hoffman, S.J. (Ed.), Teaching the ORM+in+Sakai, 2005.
Humanities Online: A Practical Guide to the Virtual Classroom.
M.E. Sharpe, Armonk, N.Y., 2011. [10] G. Savić, M. Segedinac, Z. Konjović, “Bringing Semantics to
Sakai Content“, Proceedings of the 2nd International Conference
[2] M. Szabo, K. Flesher, “CMI Theory and Practice: Historical Roots on Information Society Technology and Management (ICIST
of Learning Management Systems”, in: Proceedings of the 7th E- 2012), Kopaonik, Serbia, 2012.
Learn 2002 World Conference on E-Learning in Corporate,
Government, Healthcare & Higher Education. Montreal, Quebec, [11] G. Savić, M. Segedinac, Z. Konjović, “Automatic Generation of
Canada, 2002. E-Courses Based on Explicit Representation of Instructional
Design“, Computer Science and Information Systems, Vol. 9, No.
[3] D. McIntosh, “Vendors of Learning Management and E-learning 2, pp. 839 – 869, 2012.
Products”. http://www.trimeritus.com/vendors.pdf , 2012.
[12] M.T. Segedinac, M. D. Segedinac, Z. Konjović, G. Savić, “Formal
[4] Butler University, “Learning Management System (LMS) approach to the organisation of educational objectives“,
Evaluation 2011-2012”, Psihologija, No. 4, Vol. 44, pp. 307–324., 2012.
http://blogs.butler.edu/lms/files/2011/08/executive-summary.pdf,
2012. [13] MoinMoin, “The MoinMoin Wiki Engine”, http://moinmo.in,
2012.
[5] Sakai Project, “Sakai”, http://sakaiproject.org, 2012.

Page 332 of 478


ICIST 2014 - Vol. 2 Poster papers

Personalized design in interactive


mapping as a part of information society
Mirjana Kranjac*, Uroš Sikimić**, Đorđije Dupljanin***, Slaviša Dumnić****
*, ***, ****Faculty of technical sciences, Department for traffic engineering, Novi Sad, Serbia
** Politecnico di Milan, Milan, Italy

[email protected], [email protected], [email protected], [email protected]

Abstract— Public administration has to be leader into handicrafts interactive map with personalized tourism
establishing of information society. The reason for it is its routs of traditional crafts.
important role in raising awareness of necessity to The map should be web based and public available.
implement information telecommunication technologies Interactive map is still not in public use. The interest of
(ICT) into the all areas of social activities. Secretary for artisan crafts which were included into the map to be
economy of the Government of Vojvodina supports presented in a ICT based map was impressive. Their
activities of old handicrafts by giving financial donations to reaction shows big expectations of implemented
them through competitions. Such support has proved to be technology and e-based promotion of their work. The
good but it has not improved visibility of old handicrafts. It future will bring real effects of the map.
appeared that public administration could add value to old
handicrafts work through additional service. Additional II. LITERATURE REVIEW
service is old handicrafts promotion. For the promotion
were used multimedia ICT tools. The paper presents When discussing term information society five
activities of elaboration of interactive map of old definitions of it could be distinguished [1]:
handicrafts performed at Secretary for economy of
• Technological
Government of Vojvodina. Interactive map includes
handicrafts description and photos and offers personalized • Economic
touristic routs that enables visits to old handicrafts sites and • Occupational
practicing of traditional crafts. • Spatial and
• Cultural.
I. INTRODUCTION In this paper authors are discussing technological viw
Globalized world is world based on ICT use. Public of information society. A volume of technological
administration has to deal with changes and must not stay innovation which started to appear in 1970s lead to
beyond information society. Exclusion means loss of reconstruction of the social world due to fact that their
influence and required results. Contemporary public impact is so profound. Even more, 90s brought merging
administration has to build their activities based on ICT of information and communications technologies (ICTs)
platform. Public administration has in its scope of work what led to a society with new quality.
area of tourism. It must include new technologies for its The Internet is revolutionising the distribution of
promotion. Travel technology includes virtual tourism in tourism information and sales. Well-developed and
the form of virtual tour technologies. Travel technology innovative Web sites can have “equal Internet access” to
may also be referred to as e-travel or e-tourism, in international tourism markets [2].
reference to "electronic travel" or "electronic tourism".
Authors of Ref. [3] say that new technologies offer easy
eTourism is designed on use of ICT and e-commerce access to a large amount of tourism information. But,
solutions in the travel and tourism industry. It includes this causes difficulties for a decision-maker to assess
the analysis of the market structures, and customer many available alternatives for designing a customised
relationship management. trip. Certain decisions may require the resolution of
This paper presents a project elaborated by support of conflicting goals. They even offer mathematical model
the Secretary for economy of Vojvodina. The goal of the and interactive multi-criteria techniques having in mind
project was to increase visibility of traditional crafts. schedule, tourist’s wishes and needs, along with the
Authors of the paper offered a new solution based of the characteristics of the area.
use of InDesign software which enabled design of old

Page 333 of 478


ICIST 2014 - Vol. 2 Poster papers

In Ref. [4] authors determine e-tourism as (green production). The need for such ICT tourism
competitiveness of the organisation by taking advantage product is recognized by the Government of Vojvodina
of intranets for reorganising internal processes, extranets who imposed realization of a project which would make
for developing transactions with trusted partners and the handicrafts recognizable and offer information and data
Internet for interacting with all its stakeholders and about them available for use. Handicrafts' data will be
customers. They explain that e-tourism concept includes offered to visitors, investors, traders, big companies,
all business functions (i.e., e-commerce, e-marketing, e- SMEs, experts, researchers, professors, students and
finance and e-accounting, eHRM, e-procurement, eR&D, pupils. Public administration took role of generator of
e-production) as well as e-strategy, e-planning and e- new tourism offer which will intensify development of
management. E-tourism bundles together three handicrafts and tourism in Vojvodina.
disciplines: business management, information systems
and management, and tourism. • General objective of the project is:

Some authors [5] propose new e-tourism business - to draw attention to the achievements of traditional
model: Government to Business to Customer (G2B2C). crafts and possibility to visit crafts sites and try or learn
Their model eliminates agents and enables easily how they are working.
communication with the tourist service providers who is • Specific objective of the project:
playing the main role. The government provides a central
information point about tour operators and packages -to collect, organize, present and make available to all
available for tourists (web portal). stakeholders, important handicrafts' information of the
Autonomous Province of Vojvodina
Electronic tourism is referred as an innovative part of
tourism. Tourism firms with well done websites have -to give possibility to visit crafts sites and try or learn
access to a huge tourism market. Electronic procurement how they are working
of tourism packages enables online travel services or -to attract investments into this sector and to include the
products [6]. Some authors stress importance of f-tourism sector in supplying SMEs and industry.
which is tourism offer based on social networks,
particularly on Face book [7]. • Methodology (presented in Table 1):
New intermediately organization with ICT as their Phase 1 - preliminary activities:
core activities which set up “virtual business webs” is (These actions were performed by tourism experts. The
described by some authors. Such way of Internet use goal was creation of valid data base structure)
creates new values to customers and stakeholders [8].
-selection of attractive handicrafts of Vojvodina which
Web site design is discussed and detailed elaborated are representative
by some authors and stressed as powerful promotional
tool which can be used in deferent sectors and tourism, -selection of the way of their presentation.
too [9], [10]. Phase 2 - collection activities:

III. PROCESS OF INTERACTIVE MAP (These actions were performed by tourism operationals
DEVELOPMENT who collected data already defined in the preliminary
phase)
The project of interactive map of traditional crafts
routs presents added value to routine action of provincial -collection of data which were selected within the
government which is competition for financial support to preliminary phase
old handicrafts. It is consequence of the need to enrich -creation of handicrafts tourism routs: Bačka, Banat,
services offered by traditional crafts New service should Srem and three big Vojvodina handicrafts tourism routs.
enable visits of domestic and foreign tourists to
handicrafts sites and for them to become familiar with Phase 3 – presentation activities:
functioning of old crafts. During visits trying to handle (These actions were performed by external ICT experts
old skills will be offered to visitors. That will open other who created concept of interactive handicrafts tourism
possibilities, like investing in enlargement products’ map and entered collected data)
scope and types, services or creation of old handicrafts
villages. These will be enabled by creation of so called - creation of the interactive logistics map concept and
“handicrafts tourism routs”. The map with routs contains: collected data entry
- insight into all old handicrafts sites Phase 4 - Web activities:
- Bačka, Banat and Srem tourism routs (These actions were performed by internal ICT experts
from public administration who included new interactive
- personalized routs according to travellers wish. handicrafts tourism map into the existing web portal of
The project should be support to the economic the Secretary for economy which was investor of the
development of Vojvodina in area of tourism and crafts whole project)

Page 334 of 478


ICIST 2014 - Vol. 2 Poster papers

- Fitting the new interactive handicrafts tourism map to • Photos


the existing web portal.
-about working process
• Target groups: Tourists, travellers, investors,
-finished products.
traders, big companies, SMEs, experts,
researchers, professors, students and pupils
3.2. Phase 2 - collection activities
• Multiplying effect: Spreading the created ICT
tool to other tourism and non tourism areas. Data collection was done by sending email request
with description of data which should be sent.
• Sustainability effect: inclusion handicrafts in
production chains and tourism development.
3.3. Phase 3 – presentation activities
TABLE 1.
PHASES OF INTERACTIVE MAP CREATION External ICT experts were consulted to select ICT
software which will suit the best for the required service.
Phase Description Responsibility They elaborated the interactive map according to the
1 Creation of valid data base Tourism experts terms of reference made by tourism professionals.
structure Tourism experts were collecting required data.
External experts entered collected data into the map
2 Collection of handicrafts Tourism
backbone. Adobe InDesign CS5 was software evaluated
data (handicrafts sites and operationals
as the best to perform given task. Adobe InDesign is a
routs)
desktop publishing software application produced
3 Creation of the interactive External ICT by Adobe Systems. It is used to create works such as
handicrafts tourism map experts posters, flyers, brochures, magazines, newspapers and
concept and collected data books. Graphic designers and production artists are the
entry basic users. It is for creating and laying out posters,
publications, and print media. It supports export
4 Fitting the new interactive Internal ICT to EPUB and SWF formats to create digital publications.
handicrafts tourism map experts It is an ideal platform to elaborate interactive navigable
into the existing web portal documents. InDesign offers designers to control over
format and typography they demand and enhancements
3.1. Phase 1 - preliminary activities like button navigation, video, live hyperlinks, slide shows
-In the first phase experts selected handicrafts which and other similar to those.
are the most representative for Vojvodina region.
3.4. Phase 4 - Web activities
The main criteria for rating the handicrafts centres are
agreed as following: New interactive tourism map after finalization will be
put on the existing web site of the Secretary for economy
• Tradition (public administration) by internal ICT experts in charge
• Attractiveness of maintenance of the Secretary’s web site. The idea of
the project was of experts employed at the Secretary
• Typical for Vojvodina which was investor of the whole project. Realization of
the project idea was done with support of students from
• Rarity
Faculty of technical sciences in Novi Sad and the project
Additional conditions are: is a good example of synergy between science and public
administration. It is result of joint work of experienced
• Nearness of good roads experts who know what is needed and young students
• Possibilities for transport, telecommunication with new energy and ICT skills. Web site address of the
services and accommodation site which will contain interactive old handicrafts tourism
map of Vojvodina in Serbian and English languages will
-As a result of previous carefully selected factors, there be: http://www.spriv.vojvodina.gov.rs.
were chosen over 30 handicrafts centres, as important.
-Data which are declared as representative are: IV. DESCRIPTION OF INTERACTIVE MAP OF
HANDICRAFTS TOURISM
ROUTS OF VOJVODINA
• Description of handicraft with data about:
The map consists of (Figures: 1, 2, 3):
-type of traditional craft
-ownership
-who established it
-period of functioning

Page 335 of 478


ICIST 2014 - Vol. 2 Poster papers

A. General data about hadicrafts’activity in Vojvodina


B. Map which shoes position of Serbia in Europe and
position of the Autonomous Province of Vojvodina in
Serbia
C. Map of roads, rivers and channels with settlements
D. List of the selected handicrafts
E. Description of each handicraft with photos
F. Offer of handicrafts tourism routs
Figure 2. Interactive multimedia map – description and
A. General data about handicrafts’ activity in photos
Vojvodina region
It contains description of contemporary situation of
handicrafts in Vojvodina and history of its
development with explanation of the main areas of
this area and sources of them.
B. Map of AP Vojvodina
It shows position of Serbia on the map of Europe and
location of Vojvodina inside of Serbia. This is useful
for foreign tourists.
C. Map of roads, rivers and channels with settlements
This map is additional tool which helps to reach Figure 3. Interactive multimedia map – list of old
certain location of handicraft site, by road or any handicrafts
other way.
F. Offer of handicrafts tourism routs
D. List of the selected handicrafts Presents choice of few tourism routs. Routs connects
The list is located within map of Vojvodina and neighbouring crafts sites. This will enable visits of few
contains names of handicrafts with locations. Its aim is to handicrafts on the same route.
be user friendly for selection of the certain handicraft. Routs are geographically determined. Selected routs
are: Bačka, Banat and Srem. Big and small Vojvodina
E. Description of each handicraft with photos routs are offered, too.
When a handicraft point is selected according to its
name and location, data about it appear. Data consists V. PROMOTION OF HANDICRAFT OF
of: VOJVODINA REGION
-a story about handicraft business, beginning, roots, Creation of interactive handicrafts tourism maps was
why and how, where. What is the object of the done in cooperation of Vojvodina government with
business and sort of products or services Faculty of technical sciences, its master students. It is
good example of cooperation between public
- photos about working process and final products.
administration and research institution.
The intention was to present data and photos which
After the map will get its final shape it will be
will attract visitors, tourists and investors.
promoted as a pilot project which could be expanded to
other sectors and prolonged.
VI. CONCLUSION
The authors of the paper promote information society
services supported by public administration. The paper is
example of incentive for the use of ICT in public
administration business. The first attempt was with
elaboration of interactive logistics map which was very
successful. The authors of this paper describe their
further activities which lead to more information society
oriented public administration.

Figure 1. Interactive multimedia map - first screen The segment which is described in this paper is use of
ICT in area of tourism. A very attractive part of tourism,

Page 336 of 478


ICIST 2014 - Vol. 2 Poster papers

handicrafts tourism, is the main topic of investigation. [3] B. Rodrigez, J. Molina, F. Perez, R. Caballero, “Interactive design
The authors involve use of personalized tourism routs of handicrafts tourism routes”, Department of Applied Economics
created by users of this web portal. The case which is (Mathematics), University of Malaga, Spain , Tourism
shown presents public administration in role when it Management, Volume 33, Issue 4, 2012, pp. 926–940
offers ideas to entrepreneurs in area of tourism, the way [4] D. Buhalis, S. Jun, E tourism, Goodfellow Publishers, Oxford,
how to use new technologies, like ICT. The paper is good 2011
example of cooperation with educational institution and it
[5] M. Kabir, K. Jahan, M. Adnan, N. Khan, “Business Model of
is a well done base for sustainable use of services in E- Tourism for Developing Countries”, International journal on
information society. The idea could be spread to other information and communication technology, Research and
tourism services and to other business segments. Publication Unit (RPU), vol. 03, pp. 214-220.
[6] A. Nadelea, A. Balan, “E-tourism and tourism consumer
acknowledgment protection”, Amfiteatru economic, vol. 4, pp.75-82, 2011.
[7] E. Pantano, L. Di Pietro, “From e-tourism to f-tourism:
This paper is a result of research within the emerging issues from negative tourists' online reviews”, Journal
technological development project - TR36040, funded by of Hospitality and Tourism Technology, Vol. 4 Iss: 3, pp.211 - 227
the Ministry of Education, Science and Technological
[8] Tascott, D., Ticoll, D., Lowy, A., 2000. Digital capital,
Development of Republic of Serbia. Harvard Business School press, Boston Massachusett, 124-128

[9] Vrontis, D., Ktoridou, D., Yioula, M. 2007, “Website


REFERENCES Design and Development as An Effective and Efficient
[1] F. Webster, Theores of the information society, Taylor &Frances Promotional Tool: A Case Study in the Hotel Industry in Cyprus
group, London, 2006 Melanthiou”, Journal of Website Promotion , Volume 2, Issue 3-4,
Taylor&Frances online, 125-139
[2] C. Kim, E-tourism: an innovative approach for the small and
medium-sized, OECD, 2009, London, pp. 145-150 [10] “Where Theory Meets Practice”, Budapest, Hungary, pp.
113- 119

Page 337 of 478


ICIST 2014 - Vol. 2 Poster papers

Design and Implementation of Software


Architecture for Public e-Procurement System in
Serbia
Vjekoslav Bobar*, Ksenija Mandic**
* Administration Agency for Common Services of Government Authorities/Department for Information and
Telecommunication Technologies, Belgrade, Serbia
** Faculty of Organizational Science, Belgrade, Serbia

[email protected], [email protected]

Abstract—Public e-procurement is recognized to be the The main objectives of this paper is automate and
main area of government-to-business that needs to be streamline procurement process in order to: reduce the
exploited by government of developing nations. In this time and cost of doing business for both businesses and
paper we give all details about public e-procurement in government, realize better value for money spent through
Serbia with a view to offering an improved approach. Here
are described the main transformation phases from
increased competition and the prevention of cartel
traditional form of public procurement to electronic form formation, standardize the procurement processes across
and each phase is described. Using three-tier architecture in government authorities, allow equal opportunity to all
software engineering a new logical model for the public e- suppliers and bring higher level of transparency.
procurement system in Serbia was designed and developed
to eliminate the bottlenecks with existing systems. The II. PUBLIC E-PROCUREMENT
certain parts of the use cases, class diagrams, decomposition In this paper, public procurement means procurement
levels and sequence diagrams in development of public e-
procurement system in Serbia are given. At the end, we of goods, services and work by the government authority,
pointed out possibilities for further development in the field in the manner and under conditions prescribed by the
of public e-procurement systems, mostly concerning Law of public procurement in Serbia [4]. The
development in cloud environment. development of ICT, especially the Internet, enabled
imlementation of public procurement as a specific web
I. INTRODUCTION service over the Internet, i.e. they contributed to
E-Procurement is the business-to-business or business- implementation of public procurement by electronic
to-consumer or business-to-government purchase and means (e-procurement). With more recent commercial
sales of supplies, work and services through the Internet as uses of the Internet, company have been computerizing
well as other information and networking systems, such as their procurement processes using emerging tecnologies
Electronic Data Interchange and Enterprise Resource and moving their corporate purchasing to the World Wide
Planning [1]. It is an excellent way for governments Web [5]. The same situation is in the public sector in
businesses to cut overhead costs and reach a larger Serbia where is public procurement can realize over the
customer base. E-procurement systems allow users to Internet. In according to Article number 42 of mentoined
search for products and services from pre-selected Law in [4], The Administrative Agency for Common
suppliers, verify product availability and route approvals Services of Government Authorities is state authority
according to policy or statute. which responsible for centralized public procurement and
for PeP system.
In the past few years, many governments are
systematically working on realizing e-government policies In that sense, PeP in Serbia is the process of
and frameworks, which include the delivery of electronic purchasing goods, works or services electronically,
services for businesses and citizens [2]. Public electronic usually over the Internet. In order to successfully
procurement (PeP), as one of the significant electronic implement the PeP process, as a specific Web service, it
services of e-government, which refers to the relation of is necessary to determine phases of this process, then
government towards businesses, should be implemented methodology and possibilities for implementation, with
as web electronic service because it increases transparency taking into account all objectives, characteristics,
and reduces costs in the work of public administration. strategies, functions and activities of government
PeP is defined as the online application of information authorities. Considering the fact that the traditional form
technology and infrastructure to the management, of the process of public procurement is very expensive
processing, evaluation and reporting of government and requires a great expenditure of time, PeP can be
procurement. observed as a possibility for re-engineering the process of
public procurement. That is the reason why e-
Government procurement represents 18.42% of the procurement receives special attention in most countries.
world GDP [3]. It is used by government authorities in The disadvantages of the traditional form of public
conducting all activities of the government procurement procurement (for example, high costs of preparation and
process cycle for the acquisition of goods, works, and publication of tender documentation, lack of
consultancy services with enhanced efficiency in transparency, etc.) can be overcome by transformation
procurement management.

Page 338 of 478


ICIST 2014 - Vol. 2 Poster papers

into electronic form through phases [6] given in the Selection of the most acceptable bid and contract
Figure 1. awarding is a phase where contract authority decides for a
bid based on results from e-evaluation phase and after that
creates contract with selected bidder.
Post-award phase has following sub-phases [6]: e-
ordering, e-invoicing and e-payment.
E-ordering represents a phase in which a contract is
drafted, after which, the bidder must supply an electronic
catalogue of his products or services and the contract
authority made an order. Based on catalogue, the contract
authority will place an order, by submitting it to the bidder
who will confirm the order electronically.
E-invoicing and e-payment are the phases that ensure a
uniform link between the accounting systems of the
contract authority and bidder, allowing for the invoice to
be directly forwarded from the bidder’s accounting to the
contract authority’s accounting for payment. This can be
carried out through e-mail, web based or through a
completely integrated information system.
III. RELATED WORK
Figure 1. PeP phases The concept of software architecture has been
recognized in the last decade as a means to cope with the
PeP in Serbia consists from two main phases: pre-award growing complexity of software systems [7]. Software
phase and post-award phase. architectures have received recognition as an important
Pre-award phase has following sub-phases [6]: call subfield of Software Engineering [8].
preparation for public procurement, e-notification, e- In the field of design architecture of e-procurement
submission, e-evaluation and selection the most suitable software, there have been many different papers
bid. published. So, for example, reference [9] presented the
Call preparation for public e-procurement is a phase model of e-procurement software upon which Web-
where contract authority creates tender documentation service is based on and which offers implementation of
with all conditions and criteria. public procurement according to supply chain
E-notification is a phase that provides online management scenario. The represented system provides
publication of call for public procurement, review of all hybrid architecture in combination with SOA (Service
public procurement calls (previous, current and future) Oriented Architecture) approach and EDA (Event Driven
and of all contracts which are awarded in past Architecture) approach. In an SOA context, their
procurement process. The bidder can read information approach acts as a BPM (Business Process Management)
online or download it. This phase can be integrated into a software based on the SOA paradigm, facilitating the
website or software of a government institution. creation and execution of highly transparent and modular
process-oriented applications and enterprise workflows.
E-submission of bids is a phase that ensures the online
In an EDA context, their approach provides a software
access to tender documentation, whether through an
infrastructure designed to support a more real-time
Internet site of a government institution or through
method of integrating event-driven application processes
separate software created for that purpose. Bidders can
that occur throughout existing applications, and are
review documentation online or download it. This phase
largely defined by their meaning to the business and their
enables possible addition clarification of tender
granularity.
documentation on bidder’s request. After completing the
documentation and creating the bid, the bidder sends it to Also, in reference [10] presented the SOA-oriented
the contract authority, also electronically (upload). Due to model for e-procurement software, where they suggest
security considerations, it is necessary for the entire three types of model which can be taken into
system to be based on digital signatures and PKI (Public consideration in the process of implementation of the
Key Infrastructure) technology. Such a system, of course, architecture of e-procurement software. Those are public
should has two way communication between the contract model, private model and combined model for the
authorities and bidders in terms of providing additionally implementation of the architecture.
required explanations of tender documentation done by A very significant description of e-procurement
the contract authority. architecture, designed using three-tier approach is the e-
E-evaluation is the most important phase of PeP which procurement system described in reference [11]. This
is ensured maximum uniformity for evaluation of bids system is based on Microsoft Windows and uses mobile
based on predefined criteria. In evaluating bids it is web user interface as presentation tier, while the tier of
possible to use electronic auction or multi-criteria decision business logic are encapsulated business rules for public
making methodologies which will be integrated into the e-procurement. The supplier-product data and WSDL
background of the PeP system. The result of the entries provided by suppliers are packaged in the data
evaluation of bids is a recommendation of the most service layer, i.e. the Web services registry. The registry
acceptable bid and awarding of contract to a bidder who functionality can be accessed by a set of public Web
offers the most acceptable bid.

Page 339 of 478


ICIST 2014 - Vol. 2 Poster papers

services operations which can be consumed by the e-  User interface and presentation processing.
procurement agent [11]. These components are responsible for accepting
Three-tier architecture approach has been used for the inputs and representing the results. They belong
creation architecture of e-procurement software in the to the client tier. It is common that these
Government of Andhra Pradesh which is consists components are called the presentation layer.
presentation tier (with two load-balanced web servers  Computational function processing. These
running the Microsoft Windows® 2000 Advanced Server components are responsible for providing
operating system and Internet Information Services (IIS) transparent, reliable, secure and efficient
version 5.0), Business Logic Tier which is encapsulated distributed computing. They are also responsible
using Microsoft COM+ technology, and handles a range for performing necessary processing to solve a
of tasks including authentication, authorization, and particular application problem. We say these
workflow management, XML Data Layer Tier and components belong to the application tier. It is
Database Tier which runs on Microsoft SQL Server™ common that these components are called
2000 Enterprise Edition [12]. business logic tier.
Also, the e-procurement software in Armenia, for  Data access processing. These components are
example, follows a three-tier architectural approach responsible for accessing data stored on external
comprising of: the web/presentation tier, the business storage devices, for example, hard disk drives.
logic tier and the Enterprise Information System (EIS) They belong to the back-end tier and it is
tier. The client tier consists of client graphical user common that these components are called Data
interface that allows users to invoke activities within the persistence tier.
context of the system for completing procurement Figure 2 shows a simple review of three-tier
processes. The activities and the data made available to architecture.
users are regulated by an advanced security module
associating user accounts to user roles and access rights.
The requests initiated in the client tier are transmitted to
the business tier (Web servers), which in turn invokes
respective Business Logic services (application servers).
The system, following the execution of the built-in
business logic, responds to the client, returning
appropriate results. The EIS tier provides the required
system infrastructure and software for storing data and
communicating with various hardware/software
components such as user authentication servers (LDAP)
and mail servers [13].
IV. METHODOLOGY OF DEVELOPMENT AND
IMPLEMENTATION
A good selection of solutions for public e-procurement
software should provide a higher level of transparency, Figure 2. Three-tier Architecture
better efficiency and lower costs of public e-procurement
process in Serbia. In this paper, the software development The Presentation Layer is basically the Graphic User
methodology adopted was incremental model to develop Interface (GUI) and all of the components associated with
the PeP system. With respect to industry standards, the interface. At this layer the data are given in a
Unified Modeling Language (UML) approach was used presentation structure that the browser will be able to
to capture the system requirements and design. The PeP display.
system developed is a web application. The client-side The Business Logic Layer implements the domain
was implemented using HTML (Hypertext Markup specific business processes and rules as well as deals with
Language) and JavaScript while the server-side was managing the model computing services and
implemented with Hypertext Preprocessor (PHP). The authorization. This layer is mainly composed by all
database management system used is MySQL. The web objects that are a meaningful part of the problem’s
server used is Apache. universe of discourse. Usually, it contains a wide set of
persistent classes, that are those which must survive
A. Designing the Logical Model of Architecture for systems stops, keeping their instances in some kind of
Public e-Procurement System database. Objects in this tier are hidden for the user, who
In creation of logical model, we used the approach of only can manipulate them via a set of screens, located on
three-tier client-server architecture because PeP system is the presentation tier. This is composed by all available
web-oriented application, which is, in nature, a distributed user screens, and has direct access to business objects
information system. through their public members [16].
Three-tier is a client–server architecture in which the The Data Persistence Layer comprises all the persistent
user interface, functional process logic (business rules), data stores and the data access middleware.
computer data storage and data access are developed and B. Logical Model of Architecture for PeP System
maintained as independent modules, most often on
separate software [14]. In three-tier architecture, By applying of the approach described in previous
distributed software applications contain three type section, we designed logical model of architecture of PeP
following components [15]: in Serbia which is shown in the Figure 3.

Page 340 of 478


ICIST 2014 - Vol. 2 Poster papers

service. Call and parameters are formatted in SOAP-XML


format and they are sent to the Web service. Web service
receives SOAP-XML request, it reads information about
the method which should be called and parameters which
should be sent, then executes the method and the result is
again formatted as SOAP-XML and is returned as an
answer. This response is returned to the caller who reads
information from XML and returns result as a reply value.
Data Persistence Tier on the model from Figure 3
provides necessary infrastructure and software for data
storage and for communication with different hardware
and software components, which are located on the user
side (bidder) in a way as shown in Figure 3.
Using the described model of architecture of PeP
system, we have done implementation PeP system as the
distributed information system in the form of web
service.
C. PeP System Design
Generally use case diagrams graphically depict the
interactions between the system and external system and
users. In other words, it graphically decides who will use
the system and in what ways the user expects to interact
with the system.
In Figure 4, the use case diagram illustrates activities of
Procurement Officer on the Government Authorities’ side.
This officer is responsible for the creation and
management of Calls for Tenders.
Figure 3. The Logical Model of Architecture for PeP System
A Procurement Officer can interact with the system in
the following ways: Creation a new Call for Tenders,
The architecture form Figure 3 is a Web service which, Administration of an existing Call for Tenders,
in its essence, represents distributed web applications Preparation and Publication a new Call for Tenders,
which components is connected via Internet and usually Publication of Additional Documents for Tenders and
communicate using HyperText Transfer Protocol (HTTP) Reports Visualization.
and Simple Object Access Protocol (SOAP).
Presentation tier of this architecture consists of GUI
which provides realization of all necessary activities for
users to complete the procurement process. Activities and
data within PeP system should be available to users
through the appropriate security level of authentication
with defined roles and access rights of users. Requests
initiated through this tier are transmitted via SOAP and
HTTP to the business logic tier, which is on the web
servers.
Services at business-logic tier are divided into basic and
advanced services.
The basic services are: E-notification, Content
Management, User Management, Monitoring, Searching,
Reporting, Time Stamping, Digital Certification,
Encryption and Online Help.
The advanced services are: E-submission, E-evaluation,
E-auction, E-awarding, E-ordering, E-invoicing, E-
Payment, Dynamic Procurement System (DPS) and
Statistics. Figure 4. The Example of Use Case for PeP System
Data exchange in the suggested PeP system from Figure
3 is realized in such a way that the user of the system Figure 5 shows the class diagram for PeP System that
(contract authority or bidder) call certain function which is consists of classes of the system, their interrelationships
located at the remote server of a service provider. This and the methods and attributes of the classes. The
function refers to procedures or functions of PeP system, procuring entity for instance can accept contractors, reject
which are located on a certain web server and which can contractors, accept contracts, reject contracts, and approve
be called over the Internet. Service provider sees this contracts. The contractors on the other hand, can bid for
function as any other local function and very often does contracts, update profile, download contract document
not need to know that it is a distance method of web from the database.

Page 341 of 478


ICIST 2014 - Vol. 2 Poster papers

Figure 8. The Example of GUI for Pep System


Figure 5. The Example of Class Diagram for PeP System
This PeP system allows entrance of all the information
The Figure 6 shows an example of sequence diagram needed for creating new call in public procurement (name
for PeP System. of the procurement, value, important data, participation
and evaluation criteria, etc.). At the same time, the
software enables the bidders to submit their bids, then to
perform the opening and their evaluation based on
predefined criteria. After that the most suitable bid is
selected. Selection of the most suitable bid can be the
objective of the decision making problem [17]. By
entering basic information about the PeP into the web
system, we come to the part of the PeP system where it is
necessary to define and enter criteria for evaluation of
bids, and this is shown in Figure 9.

Figure 6. The Example of Sequence Diagram for Pep System

The Figure 7 shows an example of sequence diagram


for PeP System.

Figure 9. The Example of GUI for Pep System

The software has predefined types of criteria in


accordance with the Law on Public Procurement in
Serbia as financial, business, technical and employee
capacity [4]. The contract authority have the freedom to
determine the importance of each criterion depending on
type of public procurement by assigning a corresponding
weight where is the sum of weights must be exactly 1
(one).
V. CONCLUSION
Figure 7. The Example of Decomposition Level for PeP System
PeP represents one of the key areas where electronic
D. PeP System Implementation methods are used that significantly simplify the
Few interfaces from the software application procurement process, particularly in the government
development are captured in the figures below. In Figure 8 sector, which is known for lengthy procurement
is the GUI of PeP system with basic information about the procedures (up to 6 months) [17]. We saw that it is
call in the public procurement. possible to transform the traditional form of public

Page 342 of 478


ICIST 2014 - Vol. 2 Poster papers

procurement into the electronic form by using ICT. This The improvement of described model will be subject of
contributes to costs reduction, enabling efficiency and next researching.
high level of transparency and as a result we have savings
for economy subjects and government authorities. The REFERENCES
basic benefit of using the PeP system in Serbia is the [1] P.J.H Baily, ‘’Procurement Principles and Management’’, Harlow,
possibility of consolidating numerous information systems England: Prentice Hall Financial Times, 2008, pp. 394.
into a single place by establishing a standardized [2] C. Vassilakis, G. Lepouras and A. Katifori, ‘’A heuristics-based
purchasing method and supplier interface. XML-based approach to reverse engineering of electronic services’’,
WEB services modernize this process, enabling cheaper Information and Software Technology, vol. 51, 2009, pp. 325–336.
application than it was possible a few years ago and the [3] E. Auriol, ‘’Corruption in Procurement and Public Purchase’’,
International journal of Industrial Organization, vol. 24, 2006, pp.
result is reduction of costs for business and government 867-885.
subjects.
[4] National Assembly of the Republic of Serbia, ‘’Public
Nevertheless, it can be concluded from this paper that Procurement Law’’, Official Gazette, No. 124/12, 2012.
design and implementation of PeP system is not simple. [5] Q. Dai, R.J. Kauffman, ‘’To be or not to B2B: Evaluating
This means technical knowledge on one side, i.e. knowing managerial choices for e-procurement channel adoption’’,
methodology of designing distributed information systems Information Technology and Management, vol. 7, 2006, pp. 109–
and as well knowing procurement processes and 130.
regulations on the other side. Three-tier architecture [6] V. Bobar, ‘’Metodološki i institucionalni okvir za razvoj
elektronske javne nabavke kao G2B servisa elektronske uprave,’’
approach is suitable for designing PeP system. Described InfoM Journal, vol. 47, 2013, pp. 10-15.
architecture of the PeP system in this paper has been used [7] F.C. Filho, P.H. da S. Brito and C.M.F. Rubira, ‘’Specification of
as a model for implementation and practical application of exception flow in software architectures’’, The Journal of Systems
PeP system in Serbia. and Software, vol. 79, 2006, pp. 1397-1418.
The model described in this paper can be improve [8] E.Y. Nakagawa, F.C. Ferrari, M.M.F. Sasaki and J.C. Maldonado,
applying concept of cloud computing where is architecture ‘’An aspect-oriented reference architecture for Software
based on the cloud computing concept means that contract Engineering Environments’’, The Journal of Systems and
Software, vol. 84, 2011, 1670-1684.
authorities and interested bidders share infrastructure,
[9] A.G. Hernandez et al., ‘’A Hybrid Architecture for e-
databases and software applications. The Figure 10 shows procurement’’, Computational Collective Intelligence, vol. 5796,
it. 2009, pp. 685-696.
[10] L.M. S. Oliveira and P.P. Amorim, ’’Public E-Procurement’’,
International Financial Law Review, vol. 20, no.3, 2001, pp. 43-
47.
[11] M. Chen and M.J. Meixell,’’Web Service Enabled procurement in
the extended enterprise: an architectural design and
implementation’’, Journal of Electronic Commerce Research, vol.
4, no. 4, 2003, pp. 141-155.
[12] K. Bikshapathi and P. Raghuveer, ‘’Implementation of e-
procurement in the Government of Andhra Pradesh: A Case
Study’’, Publications of Computer Society of India, eGovernance
Case Studies Edited by Dr Ashok Agarwal, 2006, pp. 270-285.
[13] K. Baghdasaryan, ‘’ARMENIA: Case Study on e-Government
Procurement Development’’, Technical Assistance Consultant’s
Report, Asia Development Bank, 2011.
[14] W.W. Eckerson, ‘’Three Tier Client/Server Architecture:
Achieving Scalability, Performance, and Efficiency in Client
Server Applications’’, Open Information Systems Journal, vol. 10,
1995.
[15] W. Jia and W.Zhou, ‘’Distributed Network Systems: from
concepts to Implementations’’, Springer, New York, vol. 15,
2005, pp. 22-24
[16] M. Polo, J.A. Gomez, M. Piattini and F. Ruiz, ‘’Generating three-
tier applications from relational databases: a formal and practical
approach’’, Information and Software Technology, vol. 44, 2002,
pp. 923–941.
[17] V. Bobar, ‘’Methodology of Concept Application for Multi-
criteria Decision Making in the Public e-Procurement Process’’,
Metalurgia International, vol. 13, no. 4, 2013, pp. 128-138.
Figure 10. The Improved logical architecture model of PeP system
using concept of cloud computing

Page 343 of 478


ICIST 2014 - Vol. 2 Poster papers

HADOOP AND PIG FOR INTERNET CENSUS DATA ANALYSIS

Aleksandar Nikolić, Goran Sladić, Branko Milosavljević, Stevan Gostojić, Zora Konjović
{anikolic, sladicg, mbranko, gostojic, ftn_zora}@uns.ac.rs
Faculty of Technical Sciences, University of Novi Sad

Abstract – Internet Census 2012 data provides a very To discover which IP addresses were actually used on
comprehensive look at the state of the Internet as a the Internet, Carna Botnet conducted ICMP Ping scans
whole. The dataset is a result of extensive network on IPv4 address space available for public use. This
scans covering available public IP space. In this paper scan used a modified version of fping [2] tool. The
we present a platform based on Hadoop, large scale dataset contains reverse DNS records acquired by
data processing framework, for analysing this dataset. running reverse DNS queries for every IP address. Port
Custom Pig Latin User Defined Functions were scan data was acquired by running a version of
developed which present a data specific querying Nmap[3] for 100 most common ports. To check if the
interface allowing faster and more flexible analysis. host is alive, Nmap does hostprobe requests which can
then be used to roughly determine the host operating
1. INTRODUCTION system type. The dataset also contains service probe
results. By sending a certain type of packet to a port, a
Internet Census 2012 [1] was made publicly available service probe, and matching the response to the known
in late 2012. The data is the result of comprehensive database the type of the service can be determined.
Internet-wide scans performed by Carna Botnet. Smaller vulnerable devices, which could not be
Created by anonymous researcher, it exploited insecure infected with a bot, were used to run random traceroute
credentials on embedded devices. scans. All the scans were conducted from March to
Many devices on the Internet are unprotected and left December of 2012 and included 420 million IP
with default username and password exposing them to addresses.
abuse. Carna scanned the Internet for such devices,
adding to its botnet and using them for future scans. The dataset was released for download over BitTorrent
The botnet consisted of 420 thousand embedded and consists of 568 gigabytes of zpaq archives,
devices which represents 25% of the total number of segmented by IP address ranges. Zpaq compression
vulnerable devices. Figure 1 shows geographical was chosen as it offers highest compression ratio.
distribution of infected embedded devices. There are total of eight different scan types: ICMP
ping, reverse DNS, service probes, host probes, syn
scan (open port scan), TCP/IP fingerprints, IP ID
sequences and traceroute scans.

Previous scans of this scale focused on a limited subset


of services or TCP/IP features. In 2007, Heidemann et
al. [4] conducted a first attempt at measuring the
population of visible Internet hosts. They present host
population results based solely on ICMP echo replies.
Analysis of Internet Census 2012 dataset by the
original author has shown comparable results. Using
Zmap [5], researchers have conducted a large scale
Figure 1. Carna Botnet distribution.
study of HTTPS certificates infrastructure. They have
performed 110 Internet wide scans over 14 months and
Programs installed on infected devices were non- analysed trust relationships between certificate
intrusive and the device would return to its prior state authorities [6].
upon reboot. Scanning rate was limited to not overload
the device. As all infected devices were available over
Internet Census 2012 dataset presents a unique view at
Internet, botnet could have been designed without the
the state of the Internet in 2012. Because of its size, in
Command & Control server which has the advantage
order for us to conduct deeper dataset analysis, a large
of not having a single point of failure. Scan data was
scale computing platform was required. In this paper
coalesced from infected machines by middle nodes
we propose storage and computing cluster solution
from where it was downloaded by the master server.
based on Hadoop platform for analysing Internet
The bot binary was built for 9 different architectures
Census 2012 dataset. Hadoop’s distributed file system
and was 40 to 60 kilobytes in size.
presents a simple way for storing large amounts of
data. Apache Pig is an extensive language especially
suited for querying machine generated text files. By

Page 344 of 478


ICIST 2014 - Vol. 2 Poster papers

extending Pig with custom User Defined Functions support ad-hoc data analysis. Data can be queried
(UDFs) we created a platform for analysing this directly without the need for importing into tables. To
dataset. allow custom data processing, Pig Latin has support for
user defined functions (UDFs) which can be used to
In section 2 Hadoop key concepts and Pig Latin customize all aspects of data processing. Usual Pig
scripting language is introduced. In section 3 concrete Latin script consists of a LOAD statement that reads
cluster organizations is presented. Section 4 shows data from the HDFS, a series of data transformations
dataset organization, and section 5 presents custom and a STORE statement that writes results to the
UDFs and data processing examples. HDFS.

2. HADOOP AND PIG By leveraging Hadoop's HDFS large Internet Census


dataset can be distributed and stored on a number of
Hadoop [7] is a large-scale computation and data nodes without the need of keeping track of individual
storage framework. It is designed to provide high pieces. As Internet Census dataset comes in form of
reliability and availability and to run on commodity textual log files, Apache Pig lends its self as a natural
hardware. It constitutes a system for storing and choice for its processing.
processing of data in a highly distributed and parallel
fashion. Hadoop is comprised of two main points [10]: 3. CLUSTER ORGANIZATION
• Map/Reduce [11] – a framework for assigning
Every Hadoop cluster is composed of a number of
work to the nodes
different logical nodes (see Figure 2):
• HDFS [12] – a distributed file system 1) NameNode – stores location information and
metadata about the files in HDFS which is
Map/Reduce, in a broader sense, represents a required when retrieving data spread across
programming paradigm for processing large data sets. the cluster.
It consists of two phases. Map phase performs sorting 2) SecondaryNameNode – periodically
and clustering of data, and Reduce phase performs downloads NameNode image and creates a
summation of data in those groups. In Hadoop, checkpoint.
Map/Reduce is a way of writing programs running on 3) JobTracker – takes requests from a client and
Hadoop clusters. The Map phase is responsible for assigns TaskTrackers with tasks to be
breaking up the dataset into independent pieces which performed.
are then processed in the Reduce phase [11]. 4) TaskTracker – accepts Map and Reduce tasks
form the JobTracker.
Hadoop distributed file system is designed for high 5) DataNode – stores files in HDFS and manages
input and output speeds. It is used for storing large file blocks within the node.
input and output files for Hadoop processing. It
provides high throughput rates by splitting the files and
scattering them throughout the cluster. Knowing where
individual pieces of data reside allows for optimized
processing distribution [12].

A typical Hadoop Map/Reduce program in Java


consists of two classes: A mapper class implementing a
Mapper interface, and a reducer class implementing
Reducer interface. Mapper class' job is to process
supplied data into key/value pairs which represent
intermediate results. Hadoop framework spawns one
mapper task for each distributed part of the data.
Reducer class' processes intermediate key/value pair
results into a smaller set of values that share a key.

Hadoop programs present a very low level data


processing interface. Apache Pig [8] represents a Figure 2. Hadoop cluster components
higher level platform for creating Map/Reduce
programs on top of Hadoop. It consists of a compiler
Hadoop follows a master/slave design. A simplest
that produces sequences of Map/Reduce programs
cluster consists of one master node and one or more
from its Pig Latin programming language [9]. Pig Latin
slave nodes. The physical master node is composed of
combines high level declarative querying of SQL and
NameNode, SecondaryNameNode and JobTracker
low-level procedural programming. Each Pig Latin
logical nodes. Slave node consists of DataNode and
program consists of multiple steps, each step being a
TaskTracker logical nodes. Each logical node runs in
type of query. As opposed to SQL, Pig is designed to

Page 345 of 478


ICIST 2014 - Vol. 2 Poster papers

its own Java VM. Single point of failure for the 5. QUERYING THE DATA
Hadoop cluster is NameNode which in case of a failure
can be restored from Secundary NameNode Since Internet Census 2012 dataset is domain specific,
checkpoint. TO reduce the impact of failing as it represents network scan results, some conventions
NameNode, bigger clusters can have a multi-tier, tree- can be used to effectively query the data. Each dataset
like, structure and can have NameNode and JobTracker file is organised as a tab separated, columnar text file.
nodes on different physical nodes. Every file consists of one column representing the IP
address and one or more columns representing the data
Our cluster consists of one master node running the for that address. For example, Table 2 shows one
NameNode, SecundaryNameNode and JobTracker excerpt from reverse DNS records. First column is IP
logical nodes and 12 slave nodes running TaskTracker address, second is timestamp in Unix Epoch time and
and DataNode logical nodes. All slave nodes have Intel third is reverse DNS query response, number in
Core2Duo processors, 2 gigabytes of RAM and 170 brackets specifying error codes.
gigabytes of hard drive space available, all connected
in a local 100Mbps network. In total, there are 24 cores IP Timestamp Result
available for parallel data processing and 2 terabytes 108.0.140.2 1336954500 (3)
for HDFS data storage. 55
108.0.141.0 1336886100 (2)
4. DATASET ORGANIZATION L100.LSANCA-
VFTTP-
108.0.141.1 1336914900 165.verizon-
Internet Census dataset is organized in a number of file gni.net
system directories: hostprobes, synscans, icmp_ping, Table 1. Reverse DNS query sample
rdns and serviceprobes. All directories, except
serviceprobes, contain a number of zpaq archives, one
for each public class A network. Serviceprobes As such, data can be loaded and organized by the Pig
directory contains gzip archives, for each service probe Latin script in a straightforward way as shown in the
type, each in turn containing a number of zpaq archives listing 1.
for each class A network. Traceroute results are stored
in single zpaq archive. RDNS = LOAD '/rdns/' USING
PigStorage('\t') AS (ip, timestamp,
result);
Direct data processing on zpaq archives is unpractical Listing 1. Loading the data
due to its slow decompression. Our tests have shown
that decompressing a single 80 megabyte zpaq archive Keyword LOAD expects file or directory location from
on our slave node takes up to 40 minutes. Slow which to load files, method for parsing each line, and
decompression speed is due to high compression ratio names for columns.
of dataset archives, which is roughly 18:1.
Since we will rarely want to execute queries on the
Estimated size of plain text data is 9 terabytes. Since whole IP range, loading only a subset of needed files is
our cluster has 2 terabytes of HDFS space available, a an imperative. The usual way of specifying network
time/memory compromise has been made by ranges in network scanning software is CIDR
recompressing plain text data into gzip archives. Gzip (Classless Inter-Domain Routing) notation relying on
has been chosen for its decompression speed and network id and netmask. Data is sorted into files for
ability to query files without fully decompressing them each network with netmask 255.0.0.0, or class A
first. Decompression of zpaq archives and network so only files falling into the network range we
recompression into gzip archives has been executed on are interested in would need to be loaded. For example,
all 12 cluster nodes. Master node would keep a queue to query the data from network 147.91.0.0/16, only
of archives to be processed, distributing two by two to /rdns/147.gz file needs to be loaded, and for network
slave nodes for recompression and fetch the results. 144.0.0.0/5 files from 144.gz to 151.gz should be
This step was implemented with common Linux shell loaded as all addresses in between belong to the given
tools without reliance on Hadoop cluster. network. To load a range of files, shell expansion can
Recompression of hostprobes, synscans, icmp_ping be used as shown in Listing 2. to substitute directory
and rdns archives took 2 days. Recompression of path in the load statement with a Pig variable input.
serviceprobes archives additionally took 4 days.
Resulting dataset is 1.6 terabytes in size. It has been
pig -f script.pig -param
loaded into HDFS while preserving the same directory input=/rdns/{144..151}.gz
structure. Apache Pig has transparent support for Listing 2. Shell expansion example
loading gzip files.
Additionally, IP addresses need to be filtered for
networks smaller than those with 8 bit netmask. User
should be able to specify network ID and netmask and

Page 346 of 478


ICIST 2014 - Vol. 2 Poster papers

filter results belonging to that network only. A filter in filter is run. IP address is first converted from string to
a form of User Defined Function can be used. Filter the integer representation and then verification is made.
UDFs need to extend the FilterFunc class and
implement the exec method which returns a boolean
value. Listing 3 shows how to use a filter UDF with REGISTER internetCensus.jar;
parameters. To pass parameters to a UDF, the class DEFINE IPBelongs
constructor is used. internetCensus.IPBelongs("147.91.175.
0",”17”);
B = FILTER A by IPBelongs(ip);
Listing 4 shows an excerpt of a UDF for filtering IP
Listing 3. Filtering IP addresses
addresses by network ID and netmask. The network ID
and network mask are set in the constructor before the

public class IPBelongs extends FilterFunc {


int netID;
int mask;
public IPBelongs(String netID, String maskBits){
this.netID = ip2int(netID);
this.mask = -1 << (32 – Integer.parseInt(maskBits));
}
public int ip2int(String ip){
Inet4Address ipAddress = (Inet4Address) InetAddress.getByName(ip);
byte[] ipBytes = ipAddress.getAddress();
int ipInt = ((ipBytes[0] & 0xFF) << 24) |
((ipBytes[1] & 0xFF) << 16) |
((ipBytes[2] & 0xFF) << 8) |
((ipBytes[3] & 0xFF) << 0);
return ipInt;
}
public Boolean exec(Tuple input) {
if (input == null || input.size() == 0)
return null;
try {
Object value = input.get(0);
int ip = ip2int(value);
return ((netID & mask) == (ip & mask));
} catch (ExecException ee) {
throw WrappedIOException.wrap(ee);
}
}
}

Listing 4. IP address filter

While performing data analysis on synscan and service executed. Nmatch, a tool supplied with Internet Census
probes data, certain ports and port ranges might be of dataset, can be used to match service probes.
special interest. A filter UDF similar to the IP address Fingermatch can be used to match IP fingerprints to
one can used to filter ports. We adopted Nmap's style OS. To facilitate the execution of external programs,
of specifying port ranges, meaning the user can specify Pig Latin has the notion of streams. STREAM operator
a single port, an array of comma separated ports or a is used to send data through external script or program.
port range with starting and ending port. UDF Only limitation is that external program has to accept
constructors can only be passed strings as parameters input on STDIN and write output on STDOUT. Listing
so the constructor is responsible of generating a list of 6 shows an example of streaming service probe
ports. Filter usage is shown in listing 5. fingerprints to Nmatch. Each fingerprint is passed to
Nmatch over STDIN; results are printed on STDOUT
REGISTER internetCensus.jar; and added to C.
DEFINE FilterPort
internetCensus.FilterPort("1024-
A = LOAD 'service_probes/80-
3306”) GetRequest/' USING PigStorage('\t')
B = FILTER A by FilterPort(ip); AS (ip, timestamp, state, result);
Listing 5. Filtering results by ports B = FOREACH A GENERATE result;
Service probe and IP fingerprint results have specific C = STREAM B THROUGH './nmatch
service and Operating System fingerprints for each GetRequest tcp';
scanned service and IP address which are stored in Listing 6. Using STREAM operator
Nmap fingerprints format. To match these fingerprints
to Nmap's database external programs need to be

Page 347 of 478


ICIST 2014 - Vol. 2 Poster papers

Presented UDFs can be used to easily analyse the


dataset and extract meaningful results. To speed up and distribute the analysis workload and
data storage we created a Hadoop cluster with 12 slave
ALL_IPS = LOAD '/synscan/147.gz' nodes, totaling 24 CPU cores and over 2 terabytes of
USING PigStorage('\t') AS (ip, storage. In order to get fast processing speeds but still
timestamp, state, reason, type, retain the dataset size on a manageable level, original
ports); zpaq archived dataset was recompressed into gzip
REGISTER internetCensus.jar; archives. We utilized Pig and extended Pig Latin with
DEFINE IPBelongs
internetCensus.IPBelongs custom UDFs to present how this platform can be used
("147.91.175.0",”23”); to extract meaningful information in a straightforward
NETMASK_IPS = FILTER A by manner.
IPBelongs(ip);
DEFINE FilterPort Presented platform can be a basis for data mining tools
internetCensus.FilterPort ("53"); which can be used to perform extensive analysis and
PORT_53 = FILTER NETMASK_IPS by uncover more subtle properties of the Internet.
FilterPort(ip);
UDP_PORT_53 = FILTER PORT_53 by type
== 'udp'; REFERENCES
OPEN_UDP_PORT_53 = FILTER UDP_PORT_53
by state == 'open'; [1] Port scanning /0 using insecure embedded devices,
STORE OPEN_UDP_PORT_53 INTO '/output' http://internetcensus2012.bitbucket.org/l.
USING PigStorage ('\t');
Listing 7. Complete Pig Latin Script [2] Fping, http://fping.sourceforge.net/.
[3] Nmap, https://nmap.org
[4] J. Heidemann, Exploring Visible Internet Hosts
Listing 7 presents a complete Pig Latin script that
through Census and Survey, 2007.
enumerates hosts in 147.91.175.0/23 network that have
UDP port 53 opened, indicative of DNS server. Script [5] Z. Durumeric, ZMap: Fast Internet-Wide
in Listing 7 starts by loading the file containing data of Scanning and its Security Applications, 22nd
interest. Each result is filtered multiple times. First, USENIX Security Symposium, August 2013.
filtering only IP addresses we are interested in using [6] Z. Durumeric, Analysis of the HTTPS Certificate
IPBelongs UDF, then filtering only IP addresses that Ecosystem, Proceedings of the 13th Internet
have port 53 in the results. Remaining IP addresses are Measurement Conference, October 2013.
filtered by the type (TCP or UDP) of port and finally [7] Hadoop, https://hadoop.apache.org/
by port state. Results are written back to HDFS using [8] Apache Pig, https://pig.apache.org/
STORE operator. [9] C. Olston, Pig Latin: A Not-So-Foreign
Language for Data Processing, Proceedings of the
6. CONCLUSION 2008 ACM SIGMOD international conference on
Management of data, 2008.
Large amount of data contained in Internet Census [10] Hadoop Documentation,
2012 dataset presents a unique snapshot of the state of http://hadoop.apache.org/docs/current
the Internet as a whole in 2012. With the rising [11] Hadoop MapReduce – Hadoop Documentation,
adoption of IPv6 Internet scans of this scale and https://wiki.apache.org/hadoop/MapReduce
completeness will soon be impossible. The dataset was [12] Hadoop HDFS – Hadoop Documentation,
created by an anonymous researcher by exploiting https://wiki.apache.org/hadoop/HDFS
insecure embedded Internet devices. Analyzing over 9
terabytes of raw text files, representing the logs of
Internet-wide scans, has proven to be a difficult task.

Page 348 of 478


ICIST 2014 - Vol. 2 Poster papers

Interdependencies of communication and


electrical infrastructures
Goran Murić*, Dragan Bogojević**, Nataša Gospić*
* University of Belgrade, Faculty of Traffic and Transport Engineering, Belgrade, Serbia
** PE EPS, Belgrade, Serbia

[email protected], [email protected], [email protected]

Abstract— Infrastructure systems are essential for the


modern society functioning, starting from electricity and gas II. VULNERABILITY AND DISRUPTION OF POWER
supply, communication networks, water supply systems and DISTRIBUTION SYSTEMS
transportation networks. Recently, the rapid development of Modern society is largely dependent on electricity
communication and information technology introduced new supply, and occasional blackouts that cover vast
controlling and monitoring mechanisms that rely on such geographical areas (part of the cities, cities and even
technologies. All infrastructures become dependent on
regions) occur once in a while. One of the most important
reliable and secure communication systems. In this paper,
properties of connected networks is that if comes to the
the mutual dependency of communication and electrical
failure in certain element it may lead to failure of other
infrastructure is described. The general overview of
dependent elements in that network, or in other networks
scientific achievements and papers published in this area is
presented. Finally, the most important recommendations for
attached to it. [1] This may happen in series and can lead
a Serbian electrical system are presented in accordance with to a cascade of failures. In fact, a failure of a very small
EU standards and instructions. fraction of nodes in one network may lead to the complete
fragmentation of a system of several interdependent
networks. [2]
I. INTRODUCTION Interdependent infrastructures are coupled together and
Societies all over the world are highly dependent on those connections are usually very complex, so we are not
various infrastructures. A major disruption to the able to predict what should be the consequences of system
environment can lead to severe public safety issues and failure exactly. That uncertainty in prediction leads to
economic losses. The emerging problem is increased unexpected, even catastrophic events, which affects the
interdependency between systems, and the failure in one society as a whole.
infrastructure can lead to the failure in others and vice One of the most undesirable events that can occur is the
versa. In the past decades, the electric power grid power blackout. A dramatic example of such a cascade of
experienced several severe failures that affected the power failures is the electrical blackout that affected much of
supply to millions of customers. These events highlight Italy on 28 September 2003. [2] I was a great example of
the vulnerability of the electric grid infrastructures and cascade failures and interconnected networks of different
their interdependencies. In this paper, the authors deal types, because the shutdown of power stations directly led
with two major, highly dependent infrastructures: the to the failure of nodes in the Internet communication
electricity infrastructure and the associated information network, which in turn caused further breakdown of
infrastructure. The interdependencies of these two power stations.
infrastructures are increasing due to a growing connection One of the best examples of cascade failures and their
of the power grid networks to the global and local consequences is the power grid in the United States.
information infrastructure, as a consequence of market During the years the United States were subjected to many
deregulation and opening. These interdependencies blackouts with severe consequences. Hines at al.
increase the risk of failures. researched a frequency and trends of large blackouts in
In this paper, the major failures in power supply in North America, and their findings show that the frequency
recent years are presented, together with a brief of large blackouts in the United States has not decreased
explanation of its causes and consequences. The methods over time and that there is a statistically significant
for defining dependencies and the importance of critical increase in blackout frequency during peak hours of the
infrastructures are presented, with special attention to the day and seasons of the year. That means that this problem
dependencies between electrical and communication is getting bigger over time. From the data enclosed in the
infrastructures. The final chapters are dedicated to the paper, we can conclude that largest blackouts were not
regulatory aspect of this particular problem within the EU caused by natural disasters like hurricanes and storm, but
and worldwide. Finally, the authors discuss the situation from the cascading effect.[3]
in Serbia and its present regulations and challenges The blackouts could have serious consequences, and the
regarding the process of power market liberalization. costs of such events could be very high. One example is a
blackout occurred in the Northeastern area of the United
States and in the Southeastern area of Canada in 2003.
Approximately 50 million people were affected [4] and
the economic losses in the United States were in a range
between $4 billion and $10 billion [5]. The power was

Page 349 of 478


ICIST 2014 - Vol. 2 Poster papers

restored only after four days in some parts of the United information infrastructures cause the network to behave
States. on the rules of the percolation phenomena. Furthermore,
On Tuesday, 23 September 2003, Eastern Denmark and additional information on causes of certain importance of
Southern Sweden suffered an extensive power failure. In the nodes are given in order to provide the decision maker
Eastern Denmark, this meant that about 2.4 million people with more options for decreasing or increasing the
were without electricity from 1237 hours. As a result of importance of nodes, and therefore making the network
the power failure, about 1,850 MW consumption was more resilient.
disconnected and about 8 GWh electricity was not
supplied as planned in Eastern Denmark. [6]
On November 4, 2006, at around 22:10 (local time), the IV. DEPENDENCIES BETWEEN ELECTRICAL AND
UCTE European interconnected electric grid was affected COMMUNICATION INFRASTRUCTURES
by a major system disturbance. The tripping of several It is well known that strong dependencies between
high-voltage lines, which started in Northern Germany, various types of infrastructures exist. Those very
split the UCTE grid into three separate areas with dependencies could be the reason for multiple
significant power imbalances in each area. The power infrastructure failures, and faults, errors and attacks could
imbalance in the Western area induced a severe frequency easily propagate through such infrastructural network.
drop that caused an interruption of supply for more than That propagation of failures, known as a cascading effect,
15 million European households. [7] can cause the multiplicative effects of escalating collapses
On the other part of the world, there was a failure that that can affect multiple sectors.
affected the Taiwan electrical network on July 29, 1999 In this chapter, the authors will focus on mutual
(729 blackout). The failure left approximately 82.5% of dependencies (interdependencies) between two functional
consumers without power. The blackout was caused by entities within the electric power systems:
the coincidence of a series of circumstances and failures.
[8] 1. Electrical network – responsible for electricity
generation, transmission and supply
2. Communication network – mostly consisted of
networks and control systems that monitor physical
III. THE METHODS FOR DEFINING DEPENDENCIES AND
parameters of electrical infrastructures, and initiates
IMPORTANCE OF CRITICAL INFRASTRUCTURES
appropriate actions when necessary
Various types of infrastructures depend on each other, The nature of those networks is different, so the
and regarding the level of their interdependency, the modeling approach for each of them should be tailored to
mutual influence becomes more complex. fit the behavior of the network itself and to describe its
Within the past few decades, modern control and properties. The differences in the type of networks that are
information technologies have been developed in an dependent on each other make an additional difficulty in
attempt to improve the safety and robustness of utility modeling mutual dependencies of interconnected systems.
systems by exchanging technologies across them. One of
the major consequences is the emergence of
Communication networks
interdependencies among these critical infrastructure
systems. Failure of an individual system becomes more Failures in communications networks can have a
likely to affect the functionality of other interconnected various causes. Random failures of equipment or software
infrastructure systems. In order to mitigate the within the network happen very often. They are usually
consequences of undesired events, the complex relations caused by human mistake or by some defect in the
between infrastructures should be understood. There are physical component. On the other hand, the failures could
many researches that worked on modeling of be result from an intentional attack with the aim to harm
interdependencies among various critical infrastructures. the network. These attacks are very hard to prevent and
detect on time, because they are usually performed by the
Zhang and Peeta [9] proposed the equilibrium models
group of intelligent attackers. At the end, the results of all
with a systematic analytical approach to analyze various
these failures are similar and led to: the unavailability of
types of interdependencies involving multiple systems
service for a given period of time in the certain area.
within a single modeling framework that is sensitive to
real-world data availability. They proposed a framework Communication networks have the protective
which combines an engineering approach that leverages mechanisms and for most of the random failures, they use
the network structure of infrastructure systems, with an their routing mechanisms. Various types of routing
economic market-based approach. Furthermore, Peters at mechanisms are available, and regarding the type and
all [10] presented a dynamics model for the spreading of purpose of the network, the routing protocols should be
failures in directed networks, which includes several chosen to fit the purpose.
aspects that are important in interacting networked
systems at a general level. There are some other authors Electrical networks
worth mentioning that dealt with similar problems [11- Electrical network is responsible for electricity
13]. generation, transmission and supply. Now days, most
The authors propose a method for critical infrastructure efforts are directed to the protection of the smart grid
protection which is based on the identification of the systems worldwide. The elements within the electrical
critical elements [1, 14] (critical routes within the grid are highly dependent on each other, and the failures
network, and the recognition of the most important could easily propagate through the network. Such
nodes). In this approach, dependencies within the critical propagation may be caused by human actions, inaction or

Page 350 of 478


ICIST 2014 - Vol. 2 Poster papers

by the misconfiguration of protection systems of the Regarding the communications sector, the EU initiated
electrical network, which, while trying to insulate the several frameworks, projects and bodies that are dealing
breakdown, can sometimes support its expansion [15]. with this issue. For example, within the European Action
This kind of cascade effect is different from known effects Plan on Critical Information Infrastructure Protection, the
in communication networks when it comes to the failure. ENISA (European Union Agency for Network and
Information Security) is developing standards and
recommendations to achieve a high and effective level of
Modeling of interdependencies of two networks
Network and Information Security within the European
If there is a failure of a component in one of these Union.
infrastructures it can cause a fault in others. For example,
if there is an electrical outage, it can cause the routers to On the other hand, the European Committee for
Electrotechnical Standardization (CENELEC) is
stop working, and if those routers are a part of an internal
responsible for standardization in the electro-technical
communication network of the power system, it can cause
further collapses back into the electrical system. The engineering field, and thus for the electrical power
interdependencies of these two infrastructures are distribution and protection.
increasing due to a growing connection of the power grid The joint initiative from these two sectors for
networks to the global information infrastructure, as a standardization and protecting both of the infrastructures
consequence of market deregulation and opening.[16] is formed from the three institutions: CEN1, CENELEC2
Many researchers and the papers dealt with modeling of and ETSI3. This group is called Smart Grid Coordination
mutual dependencies between these two infrastructures. Group and their work resulted with the set of standards for
Their goal is to understand the complex relations between Smart Grid within the European Union Member States
those networks, and to develop potential countermeasures [21].
that can reduce the power system vulnerability. It is obvious that the electrical networks as a part of the
Chiaradonna with colleagues [17] proposed a model- critical infrastructure are a concern of many international
based framework for quantitatively analyzing the bodies.
propagation and impact of malfunctions in electric power
systems. The framework is implemented using the
stochastic activity network (SAN) formalism and is VI. SITUATION IN SERBIA
applied to concrete case studies that support the Occasional blackouts worldwide created an initiative
understanding and assessment of the impact of for further extension of existing standards and regulations.
interdependencies. After any major disturbance in the electrical sector, the
Laprie at al. [16] described modeling and analysis of new recommendations are adopted. Serbia should also
interdependency-related failures between these two comply with the current EU regulations and responsible
infrastructures. They concentrated on cascading, authorities should observe for any changes and updates in
escalating and common-cause failures, which correspond that sector.
to the main causes of interdependency-related failures. Serbia, which is already behind the EU in terms of
They are referring to the work of Rinaldi [18], where three electrical energy market liberalization, should overcome
types of failures that are of particular interest when additional challenges that new legal regulations will bring
analyzing interdependent infrastructures are addressed. (new Energetics Law is in an adoption phase). Separation
Beccuti et al. [19] presented high level models of distribution and supply to independent legal entities
describing the various interdependencies between the demands a new communication infrastructure and new
electric Power Infrastructure (EI) and the Information connectivity methods, which will be absent in initial
Infrastructure (II) supporting management, business, phases and could increase a risk of potential problems in
control and maintenance functionality. The possible the network.
interdependencies have been investigated by means of The liberalization introduces new players on the market
model at different abstraction levels. with various communication infrastructures, or without
In the process of modeling interdependencies within the communication infrastructure at all (small hydropower
those infrastructures, we have to take in consideration the plants, small solar power plants by definition are built on
differences between them and try to analyze mutual remote sites without appropriate control systems), which
dependencies for each of infrastructures, e.g. to analyze could be a significant problem in managing of the power
separately communication infrastructure dependency on market.
electrical infrastructure, and electrical infrastructure From the above discussions it is obvious that on
dependency on communication infrastructure [15]. After governmental level and the highest level of electrical
that, we can start to analyze more complex relations sector management the appropriate strategy related to the
between those infrastructures and possible bouncing back CCI for electrical infrastructure protection with regard to
failures from one infrastructure to another. the communication network has to be adopted. The
following list of recommendations selected from the two
important documents published by U.S.-Canada Power
V. EU RECOMMENDATIONS AND REGULATIONS System Outage Task Force [5] and European body UCTE
European Union recognized the importance of Critical
Infrastructure protection starting from the Council
Directive 2008/114/EC of 8 December 2008. [20] All 1
sectors covered by the Directive are developing their own European Committee For Standardization
2
standards and recommendations about the protection. European Committee For Electrotechnical Standardization
3
European Telecommunications Standards Institute

Page 351 of 478


ICIST 2014 - Vol. 2 Poster papers

- Union for the Coordination of the Transmission of REFERENCES


Electricity [7] should be considered starting the process: [1.] Murić, G., N. Gospić, and M. Šelmić. Protecting Critical
• Make reliability standards mandatory and enforceable, Information Infrastructures by Increasing its Resilience. in
with penalties for noncompliance. International conference on Applied Internet and Information
Technologies. 2013. Zrenjanin, Serbia.
• Develop and deploy IT management procedures. [2.] Buldyrev, S.V., et al., Catastrophic cascade of failures in
• Develop a corporate - level IT security governance interdependent networks. Nature, 2010. 464.
and strategies. [3.] Hines, P., J. Apt, and S. Talukdar, Large blackouts in North
America: Historical trends and policy implications. Energy
• Implement controls to manage system health, network Policy, 2009. 37: p. 5249–5259.
monitoring, and incident management. [4.] Henneaux, P., P.-E. Labeau, and J.-C. Maun, A level-1
• Establish clear authority for physical and cyber probabilistic risk assessment to blackout hazard in transmission
security. power systems. Reliability Engineering and System Safety,
2012. 102: p. 41-52.
• Transmission system operators - TSOs should have [5.] U.S.-Canada Power System Outage Task Force, Final Report on
the control over generation output (changes of schedules, the August 14, 2003 Blackout in the United States and Canada:
ability to start/stop the units) Causes and Recommendations. 2004.
• TSOs should receive on-line data of generation [6.] Elkraft System, Power failure in Eastern Denmark and Southern
Sweden on 23 September 2003 - Final report on the course of
connected to Distribution system operators - DSOs grids events. 2003.
(at least 1-minute data) [7.] Union For The Co-Ordination of Transmission of Electricity,
Final Report - System Disturbance on 4 November 2006. 2007:
Brussels.
[8.] Wong, J.-J., et al., Study on the 729 blackout in the Taiwan
VII. CONCLUSION power system. Electrical Power and Energy Systems, 2007. 29:
The dependency of electrical infrastructure from p. 589–599.
communications becomes more and more distinguished. [9.] Zhang, P. and S. Peeta, A generalized modeling framework to
Large production systems extend their lifetime through analyze interdependencies among infrastructure systems.
Transportation Research Part B, 2011. 45: p. 553-579.
revitalization mostly by modernization of controlling and
[10.] Peters, K., L. Buzna, and D. Helbing, Modelling of cascading
information systems. TSO shouldn’t simply transmit effects and efficient response to disaster spreading in complex
electrical power of high voltage, but to exchange networks. International Journal of Critical Infrastructures, 2008.
information with many other parties in timely manner, and 4: p. 17.
furthermore, as a market regulator to take care of [11.] Leelardcharoen, K., Interdependent Response of
balancing of production, consumption and purchasing of Telecommunication and Electric Power Systems to Seizmic
Hazard. 2011, Georgia Institute of Technology.
electricity. DSOs should distribute the electricity with
[12.] Dobson, I., B.A. Carreras, and D.E. Newman, A Loading-
minimal loss and therefore they need modern controlling Dependent Model od Probabilistic Cascading Failure.
and information systems. Furthermore, DSOs have an Probability in the Engineering and Informational Sciences,
obligation to read and transmit information on user’s 2005. 19: p. 15-32.
consumption from the territory of Serbia to the supplier [13.] Rosato, V., et al., Modelling interdependent infrastructures
with whom the particular user has a contract. The electric using interacting dynamical models. International Journal of
Critical Infrastructures, 2008. 4: p. 17.
energy supplier has an obligation to issue an invoice to all
customers in the country with which it has a contract, [14.] Murić, G., et al. An Approach to Assess Criticality of Elements
in the Process of Information Infrastructure Protection. in
which creates an additional demand for new billing International Conference on Telecommunications in Modern
software. These demands, altogether, require a large Satellite, Cable and Broadcasting Services - TELSIKS. 2013.
investment in development of communication Niš, Serbia.
infrastructures in the energy sector, which on the other [15.] Delamare, S., A.-A. Diallo, and C. Chaudet, High-level
hand has a restrictive financing policy. modelling of critical infrastructures’ interdependencies.
International Journal of Critical Infrastructures, 2009. 5: p. 20.
Although, the amount to be invested in information [16.] Laprie, J.-C., K. Kanoun, and M. Kaâniche, Modelling
technology is only a small fraction of the total investments Interdependencies between the Electricity and Information
in the electrical sector in Serbia, its value as a critical Infrastructures. Computer Safety, Reliability, and Security,
resource is far more important and potentially malfunction 2007. 4680: p. 54-67.
of controlling and information systems could jeopardize [17.] Chiaradonna, S., F.D. Giandomenico, and P. Lollini, Definition,
implementation and application of a model-based framework for
the generation and distributive system in whole. analyzing interdependencies in electric power systems.
International Journal of Critical Infrastructure Protection, 2001.
4: p. 24-40.
ACKNOWLEDGMENT [18.] Rinaldi, S.M., J.P. Peerenboom, and T.K. Kelly, Identifying,
Understanding, and Analyzing Critical Infrastructure
This research activity is a part of the Project Interdependencies, in IEEE Control Systems Magazine. 2001.
“Management of Critical Infrastructure for Sustainable [19.] Beccuti, M., et al., Multi-level dependability modeling of
Development in the Postal, Communications and Railway interdependencies between the Electricity and Information
Infrastructures, in Critical Information Infrastructure Security.
sectors of the Republic of Serbia” supported by Ministry 2009, Springer-Verlag Berlin. p. 48 - 59.
of Education and Science within the framework of [20.] European Commission, Council Directive 2008/114/EC of 8
scientific research projects 2011-2014 and by Telekom December 2008 on the identification and designation of
Srbija, Pošta Srbije and Železnica Srbije. European critical infrastructures and the assessment of the need
to improve their protection, in Official Journal of the European
Union L, E. Commission, Editor. 23.12.2008. p. 345-375.
[21.] CEN-CENELEC-ETSI, Recommendations for smart grid
standardization in Europe - Standards for Smart Grids. 2011.

Page 352 of 478


ICIST 2014 - Vol. 2 Poster papers

ANALYSIS PLATFORM FOR THE PRESENTATION OF A SET OF OPEN


DATA IN EDUCATION AND PUBLIC ADMINISTRATION

Srdjan Atanasijević 1, Milos Miladinovic 1, Vladimir Nedic 2, Milan Matijevic 3


1
ComTrade Solution Engineering, Belgrade, [email protected], [email protected]
2
FILUM (Department of Philology & Arts), Kragujevac, [email protected]
3
Department of Engineering Science, Kragujevac, [email protected]

Summary: Open data is data that can be freely used, should be no discrimination towards projects that will
processed and exchanged by all users. Data resulting continue to use open database, as well as to the group
from the processing and by changing the open data of the data used. You should not be restrictions on the
applies the same principle: they can continue to be used type: not used for business purposes or for use only
freely. In this paper we show the application of the basic in the educational process.
concepts of open data: standardized models of open data, Applying a set of open data every day is increasing.
the process of opening the database, application areas Figure 1 shows the major field of use and the type of data
and experience of using the platform for the presentation still open, which include:
of open data. In the second part we will make an analysis  Geo data (GIS, etc.), which are used to plot the map
of the most popular platforms for presentation open data and display objects that are geographically referenced
and provide guidelines for their use in our practice. In in the application type maps, routes, search by
conclusion, explained the potential application of a set of location
Open Data in Education and the Serbian economy.  culture, information on events and facts (cultural sites
Key words: open data, linked data sets, CKAN, opened facilities), references (collection data collection)
the public sector  science data from scientific experiments, research
 finance, public finance, the data on the expenditure of
1. CONCEPT OF OPEN DATA money from the budget, projects
 statistics, which are the result of the statistical
The concept of open data, open data and more accurately, offices, either the Republican or regional
which are owned by the public sector (state, region, local  weather forecast data from the meteorological station,
government, science) it is relatively new. First experience predict the weather, the analysis
intense opening data dating back to 2009 by the U.S.
 environment, data on the degree of contamination of
administration, the United Kingdom, New Zeeland, a site (gases, particles, ..)
Canada. The basic idea was to answer the question: where
 education, public information on the facilities,
and how to spend the taxpayers' money?
investment in education projects
Today, after five years, the concept is evolved became
 transport, information such as timetables of public
widely accepted and experienced full recognition [1, 2].
transport and so on.
Developed IT standards and platforms that facilitate the
opening of the data according to the principles that will be
exhibited in the OVM work.
What we consider open data which is their added value,
how to use them?
Open data is data that can be freely used, processed and
exchanged by all users. For data arising from a change of
open data applies the same principle, they can still be
used freely [3].
Features that adorn the degree of openness of data are
crucial for their further application. The information that
we believe should be open to meet the following
principles:
 availability of data and access to it: the data must
be available in their entirety, without any special
three-to's for their further use (that are open formats,
not related to a software license)
 reuse of data and further exchange: the right to use
of open data given by the owner must provide further
exchange, including processing and linking with
other data sets. The data should be in a form suitable Figure 1 Areas of use set of open data, according to
for computer processing. sources
 universal application: Anyone interested must be
able to access, modify and redistribute the data How to turn public data in open data set?
resulting from the processing of open data. There

Page 353 of 478


ICIST 2014 - Vol. 2 Poster papers

Experience shows that it is to open the data which are  quest for the facts from a set of open data
owned by the public sector especially important  carrying out useful information from the data set
commitment and management decisions, and the  display data defining a schema, interfaces and data
implementation of the decision itself and the technical structures
implementation of well-defined in practice. The  access to data via software interface
document, titled Manual for a description of open data  data access using the service
[3] discusses the legal, social and technical aspects of the
opening of public data. The remainder of this paper will
1 search for facts 2 from data to information
present the basic idea of the process of opening the data
Users can search data in Creating a static
and we will make an analysis of the most popular
search of specific facts. representation and
platform for the presentation of open data and provide
Facts can be found through interpretation of one or more
guidelines for their use in our practice [4].
the available user display or data sets.
In the process of implementation of the decision to open
Excel tables. It is commonly used in the
the data recommended adherence to three fundamental
visual presentation, review
principles:
and reporting.
 make small, simple and effective steps. I can not and
Data Is the fact Data Information
should not be open all the data at once. Speed is
Search Management
important as well as the benefit of the data to be
Performance Static analysis
opened. Of course, the more open data - to greater
Visualization
effect.
Reports
 include as many users as early as possible. Inclusion
of data is essential. They should be involved in the 3 of data to the interface 4 from data to information
selection of data to be open, scope and structure. Interactive access and search Availability of data sets in a
Users are the engine of the further process. Especially one or more sets of data. variety of formats.
business users to put data into the function of Defines the schema that Combining multiple sets.
increasing the volume of business and increased describes the structure of The assemblies created by
revenues will eventually pass through taxes An data and it is an interface for the API, and is defined, so
increase in the public finances. data usage. that the data can be accessed
 principle of openness in communication: to identify and program. Also available
key issues and work together to resolve them [5]. is the interface through
Legal, cultural, technical barriers, would, if it does which data can be
not create a climate of open communication, will lead downloaded for free.
to the blocking process. Data Interface Data Data
In the implementation process, to make the following Combining data sets Conversion
four steps: Writing the code Filtering
1 Choosing a dataset that is a candidate for the opening Providing interface Combining
2 Allow unlimited use of: Providing API
 Check whether there are legal obstacles to opening Datasets are ready for
data (Safety or proprietary rights) download
 A statement to the data, the legal form which data is 5 the data across services Data Service
safe, and following the principles of open data The data is stored behind a Integration with existing
 If a candidate for the opening does not meet any of set of services that access services
the restrictions, return to step 1 them. The users of the Creating new services
3 put the data available to users. In raw form and / or one various services available
of the open form. that are based on open data.
4 organization of data in the form of catalogs, with a Table 1 Typical methods of using data from a set of open
description of the structure and other information that will data
make them more applicable in practice users. These processes are not mutually exclusive, and is the
most common case, a combination of several processes.
2. PRINCIPLES FOR DEFINING THE The current situation is such that there are problems when
FUNCTIONS OF THE PORTAL OPEN it comes to mapping data and life cycle data sets to the
DATA final stage, they become visible to end users. Such
problems is less, because the description of the principles
To answer the question that all features should have IT of open data continuously improve and enhance [6, 7].
systems to support the publication and presentation of Usability data from a set of open data
open data, the need to resolve the following dilemma: In order to utilize the full value of a set of open data that
 how you plan to use open data we want to use, it is important that the information and
 used by the plan to have the use of open data data are placed in a context that will create new
Ways to access data from a set of open data knowledge and to enable the creation of new services,
In the process of linking open data sets recognize the applications and services.
following characteristic processes:

Page 354 of 478


ICIST 2014 - Vol. 2 Poster papers

Related Information ("linked data") facilitate the creation  the degree of openness / transparency of the content
of different types of services and applications that are of the data (expressed in the details of the data)
based on interrelated data [6, 7]. Considering the The first direction is technically applicable to the
usefulness of open data principles, we begin with two accessibility of the data contained in the file format. The
equal point of view with related data, such as the view information listed in the PDF files, for example, are less
from the institution that should enable the publication and available than data published in structured tables Excel.
by the user that in our processes and solutions consuming File formats that are related to the licensed application
open data. solutions that interpret them are less available than open
Assessment of the usability of open data can be viewed source file formats such as CSV.
along two main directions (Figure 2): The clearest gradation scored the opening and connecting
 the degree of complexity of data access (technical the data presented by Tim Berners-Lee [6], presenting his
dimension of the problem) and model with five stars:

The
Usability of open data: Investment data to be open:
degree of
The user can (details) Owners should provide
openness
 review
 print
 simply publishing
 preserve local
*  do not have to explain that the data can be
 manually switch to another system
used freely
 modified form optionally
 share them with others without restrictions

 further processing in the wet with the protection of


software: aggregation, visualization, calculation of
**  simply publishing
the any data ...
 export to other structured format

 must have modules for commercial export


 format data is free: it does not require you to pay a
*** from the free format
license to read the data
 simply publishing
 access control and display parts of the data
 not link to any site  owners of different data sets can be
 save interconnected
****
 use of the multiple times  to define templates of their data
 safely combined with other data  you need to invest time in organizing data for
display
 data should have a schema description
 to reveal the interconnection of data  increased the value of data
*****
 to learn about the data directly from the data schemas  a profit from the publication and use of data
as well as user
Legend:
* Information and data are available on the Web (in any format) under an open license
** The information and data available as structured data (Excel format instead of a scanned image of the table)
*** Unprotected formats (non-proprietary) - CSV instead of Excel
**** URI identification, so that people can point to individual data
***** Providing context linking different data sets
Table 2 Model 5 star: the path of the open linked data

Linked data ("linked data") refers to a set of data that commonly published public data and are useful for the
describes an open standard for metadata (such as the majority of applications. Public procurement and data on
Uniform Resource Identifier (URI) and the Resource budgetary spending less transparent data source.
Description Framework (RDF). Such data described are Linked data obtained an increasing role in the field of data
easily readable for humans and for machines [6]. management. Are independent of the domain and the
Another direction considering the open data refers to the areas in which they are used, thus proving its advantages
types of sources and details of the published data. The over the traditional method of data management. Format
most natural source of information in the public sector is of data is marked with 5 stars is called the associated data.
the traditional statistics, such as the results of the census
of households. Geospatial data is also one of the most

Page 355 of 478


ICIST 2014 - Vol. 2 Poster papers

management (ECM / DMS) and represent their extension.


Our approach to the representation of a set of open data
should be used platforms are open source.
In this paper, we will make an analysis of four very
popular platform for presenting open data that are open
source. Will present their comparative characteristics
according to the criteria and principles we have presented
in the previous section. Finally, we propose a solution that
would be suitable for use in our practice.
The choice of solutions that will be analyzed based on the
following criteria:
 platform is open source
 have filed applications that include the following data
owner: used in ministries, state agencies and local
governments (refers to platforms that are applied in
the public sector) used by universities and institutes
for the presentation of scientific papers, the results of
Figure 2 The degree of autonomy of use of open data: research and teaching support
according to the file format and contents  number of installations is greater than 20
 provide further linking and aggregation of data - meet
Linked data using semantic web standards such as RDF the class of minimum 3 stars
standard to give a description of the data and to connect We analyzed the following platforms for presenting open
with other data sets and thus provide the appropriate data sets:
context. They are based on technologies such as HTTP  CKAN platform for publishing, sharing and retrieval
protocol, URI and RDF. The next chapter will discuss the of data, which are used as the basis for detailed
four distinctive solutions that implement the concepts information in the catalogs of open data.
exposed in different applications.  Ambra is a platform for the presentation of data from
scientific studies and articles
3. SHOWING PLATFORM FOR THE  Google Fusion Tables, a solution to easily publish
PRESENTATION OF OPEN DATA and integration of open data
 OGDI platform to display data based on open. NET
Today, the market can find dozens of platforms to view technologies
and search abstracts of open data. Commercial solutions
are mostly based on known platform for documentation
Name URL ()
1 CKAN http:// ckan.org
2 Ambergris http:// ambraproject.org
3 Ogdi http:// datapublic.org
4 Google http:// tables.googlelabs.com
Fusion Tables
Legend:
K1 Application: The G-public sector, science-, ST-statistical indicators of
K2 Number of installations
K3 Support any ways to access data from a set of open data:
F - the search for facts
I - from data to information (reports, visualization)
C - from the data interface (combining data sets)
E - from data to information (export to various schemes)
S - the data to the service (integration with existing and create new ones)
K4 Model 5 star: number of stars - the degree of associated data
Table 3 Comparative platform to display a set of open data

CKAN. Nowadays the most widely accepted solution for functionalities and features for working with data. The
the publishing of open data. The solution is based on the solution is very well covered in the documentation.
Python language and program JavaScript to display on the Supports all modern concepts that characterize open data.
user side. Use the Pylons web framework and SQL It is used by over 100 institutions including ministries and
Alchemy as a ORM, and data storage Postgres SQL state government agencies G8 countries such as the USA,
database. The architecture allows easy and rapid Russia, UK, Germany, France, Japan, Canada. A favorite
development of new extensions that bring new

Page 356 of 478


ICIST 2014 - Vol. 2 Poster papers

and local governments, cities, which include Berlin, the U.S., 41 cities and areas in the United States, 43 are
Houston, Washington. under the control of the states in the world, and 160 cities
Ambergris. Ambra is an innovative Open Source and regions in the world [9]. Data from the June 2013th
platform for publishing the results of scientific research. It mostly to open more than one million different datasets.
provides opportunities for publication of notice and The analysis of data usage openly (source: data.gov),
hearing, which allows a "living" document around which reveals a set of dominant areas (data sets) from which the
the possible scientific discussion and consideration of data is still the most in demand (Figure 3):
new scientific discoveries. The platform is under active  geospatial data
development by PLoS (Public Library of Science) and  datasets on transport,
licensed under the Apache License, Version 2.0. The  agriculture,
platform is developed in Java, using Spring, Struts, and in  energy and energy services,
the middle layer Hibernate. As the application server uses  various statistical data,
Apache Tomcat platform and MySQL database. Key  data relevant to scientific and technological
users Ambra platform for the presentation of open data development.
are scientific institutions gathered around PLOS ONE
project ( http://www.plosone.org/ ). The project brings Figure 3 shows the number of downloads data from the
together over 50 institution and over 100 collections of portal (the average number of monthly, data.gov portal)
open data. Platform meets the criteria for linking open and the use of data sets in practice.
data class 4 stars. Practical experience shows that the portals solutions to
Google Fusion Tables. This solution is based on cloud openly display data based on CKAN platform when a set
services. It is well connected with all the Google tools. of data relating to the data and state government agencies,
Data can easily publish and intuitive way to manage the as well as amber and Google Fusion Tables, which are
published data. The solution has a strong visualization dominant display data resulting from projects and
(maps and charts), the ability to work with large data sets scientific research.
and integration with other Google services. Ease
promoted by Google impact that this decision is not
supported by any strict concepts of open data, and that
certainly meets the 3 star class, and that the higher the
level of integration of data coming applicative upgrades to
the potential users themselves do. Number of active users
of this solution is great. Ability to readily publication and
presentation of data makes this solution a good extension
to connect with the existing portal-user solutions to public
sector and science.
Open Government Data Initiative (ogdi) is an open
source cloud solution that promotes the use of open data
and provides assistance to a faster and easier way to
publish data sets [8]. Data can be displayed in several
different formats such as tables, maps, charts. ODGI is
based on Microsoft. NET platform, written in C # using
the Windows Azure platform. Many datasets, including
geospatial data, which returned KML format, make ogdi
compatible with popular desktop and web-based
technologies such property, such as Microsoft Bing Maps,
Google Maps, Yahoo Maps and Google Earth. Ogdi
solution is easily connected to a Microsoft portal-
products, so that a fast open public sector data for users Figure 3 Downloading data from a set of open data by the
who rely on Microsoft technologies. The number of users user (data.gov)
of this solution is dominant in the United States and
Canada, primarily in open data local administrations are Recommendation platform suitable for the public
the traditional users of Microsoft Share Point platform. sector
Our recommendation is to CKAN taken into immediate
4. RECOMMENDATIONS FOR THE consideration when choosing a platform for open display
SELECTION OF THE PLATFORM of data in the public sector. This platform is still the most
mature and complete, with a variety of installation and
Practices implementation, and there are multi-year plans for the
In the middle of 2013th all the states of the G8 and the development of new functionality. Subscribe to a large
G20, and EU member states have opened their data. In community of users and a large knowledge base with the
numbers it looks like this: over 250 open data portal, experiences of practical use that can measure dozens of
which are the responsibility of the state (includes portals man-years. The platform is open for extension and
of ministries and agencies), region, local governments. Of integration with existing back-end systems of institutions.
the total number of portal shall collect 39 federal states in

Page 357 of 478


ICIST 2014 - Vol. 2 Poster papers

Implementation risks CKAN platform in practice is benefits which would be the direct beneficiaries of the
reduced to solving the technical problems of integration citizens and the economy:
with the data generated in the existing information  Increased confidence of citizens and businesses to
systems of institutions. government agencies and other institutions of the
state, as well as its service
Recommendation Platform for Science and Education  Further development of entrepreneurship, especially
As a recommended platform for the presentation of open the service sector that could efficiently and
data in scientific research institutions believe that it is effectively use open data to improve existing and
possible to use three of these platforms, depending on the develop new services
type of institution and the scope of the material to be  Increase agility government organizations and
presented. effective connectivity with the market
If we look at the level of scientific laboratory / faculty  Analysis and available results of scientific research
chairs - Google Fusion Tables are the best choice. The that is funded from the state budget, which would
platform is easy to integrate in existing portals continue to provide businesses and entrepreneurs
laboratories, departments, or projects, the data can be opportunities to develop new products and services.
easily published and she has a rich set of tools for
visualization [10]. This solution is particularly useful The process of opening the data may not represent the
when we want students and key persons to present the activity focused solely on technology, and not focused on
most significant achievements of research. the data itself.
When presenting data from large scientific research The process of opening the data to focus on the goal of
projects AMBRA is right. Datasets are presented in improving the overall quality of life: reducing pollution
AMBRA solutions are separated by project, includes and resource consumption, improved health care, better
versioning results. Composing the social integration of connectivity of people, An increase in the productivity
several related projects participants and beneficiaries of and reduce operating costs - a total increase of living
the research results (companies) make it possible that the standards of people.
discussion comes to quality and detail to explain the
conclusions obtained. Sponsors research tracked REFERENCES
discussions and actions impact on investment. Flow
analysis discussion better define the priorities of the next [1] Miller, P., Styles, R., & Heath, T. (2008, April).
cycle of research projects [11]. Open Data Commons, a License for Open Data.
If the process of opening up data associated with In LDOW.
scientific research to look at the level of institutions such
as the Ministry of Science, Scientific Research Institute, [2] Kassen, M. (2013). A promising phenomenon of
the University or the University as a whole, then the open data: A case study of the Chicago open data
CKAN solution of choice. We suggest this platform project, Government Information Quarterly,
because of the types of documents that should be included Volume 30, Issue 4, pp. 508-513
in the process of opening, and that the addition of the [3] Open Data Handbook Documentation. (2012,
results of research and status documents institution, Novembar). Open Knowledge Foundation.
decisions and so on. Retrieved from http://opendatahandbook.org
/pdf/OpenDataHandbook.pdf
5. CONCLUSION
[4] Zuiderwijk, A., & Janssen, M. (2013). Open data
Signed agreements with organizations in the Open policies, their implementation and impact: A
Government Partnership mart 2012. The Government of framework for comparison. Government
Serbia has started the process of developing strategies Information Quarterly.
related to open data. [5] Murray-Rust, P. (2008). Open data in
This paper indicates the steps to be followed in the science. Serials Review, 34(1), 52-64.
process, which includes identification of data sets that
need to be opened, the legalization of the use of open data [6] Berners-Lee, T. (2006). Linked Data.
formats and services that open data made available to International Journal on Semantic Web and
users, and the choice of applicative platforms that can be Information Systems (IJSWIS), 4(2), 1.
used in the presentation of open data. [7] Berners-Lee, T., et. al. (2006), Tabulator:
The data themselves do not have any particular value. The Exploring and Analyzing Linked Data on the
publication of data is useless if the data is not used to Semantic Web. Procedings of the 3rd
solve real social, scientific and business problems. International Semantic Web User Interaction
Empowering the community, citizens and businesses, the Workshop (SWUI06).
process of opening the data means focusing on the use of
data and problems to be solved that way. [8] Miladinović, M. (2013). Otvoreni podaci u
Prevent anyone datasets should be done according to the javnom sektoru. Specijalistički rad. Visoka
priorities and dynamics that dictate the use of this tehnička škola Kragujevac.
information in practice. This would allow the following

Page 358 of 478


ICIST 2014 - Vol. 2 Poster papers

[9] Bizer, C., Heath, T., & Berners-Lee, T. (2009).


Linked data-the story so far.International Journal
on Semantic Web and Information Systems
(IJSWIS), 5(3), 1-22.
[10] Stefanovic, M., Matijevic, M., Cvijetkovic, V.,
& Simic, V. (2010). Web based laboratory for
engineering education. Computer Applications in
Engineering Education, 18(3), 526-536,
doi:10.1002/cae.20222
[11] Hester, J. R. (2013). Closing the data gap:
Creating an open data environment.Radiation
Physics and Chemistry.

Page 359 of 478


ICIST 2014 - Vol. 2 Poster papers

miniC Project for Teaching Compilers Course


Zorica Suvajdzin Rakic*, Predrag Rakic*, Tara Petric*
* University of Novi Sad/Faculty of Technical Sciences, Novi Sad, Serbia
[email protected], [email protected], [email protected]

Abstract—Compilers course is considered as an important repeating a lot of work which had already been done many
part of a computer science education, but not always an easy times before [2].
course to teach to undergraduates. miniC project is The next paragraphs describe contributions that address
presented. It is developed for teaching undergraduate different teaching methodologies and techniques.
Compilers course. It proved to be a very efficient concept in Li [3], thinks that teaching and learning could be
compilers course. improved by using some effective approaches such as
concept mapping, problem solving, case studies,
workshops tutorials and e-Learning. He advocates
I. INTRODUCTION problem-based learning, as a student–center teaching
approach, which enables students to establish a relation
A. Evolution of compilers courses between abstract knowledge and real problems that they
are solving.
The difference among the early beginner and
contemporary compilers courses is drastic. Early Pereira et al. [4], advocates the use of Domain Specific
compilers were implemented only in few thousand lines of Languages for teaching compilers. They think that
code, usually written in a low level language, such as student's motivation is highly dependent on the languages
asembler of C. Modern compilers often have up to few used to work on during the course. They advocate the use
million lines of code, written in different programming of specifically tailored, small and simple languages. The
languages. Earlier compilers were often made by programs that students are supposed to develop, instead of
individuals, while contemporary compilers are big being traditional compilers, will be translators or generic
software projects which include the teams of processors.
programmers. Aiken made Cool Project [2], using small academic
In the early period of the development of computer programming language: Classroom Object Oriented
science there were only a few programming languages, Language. Cool is designed to be implemented by
and there are thousands of them nowadays. In the individuals or teams of two using C++ in a UNIX
beginning, there were only a few architectures of target environment in a single semester. The project is relatively
machines, and today there still exist CISC and RISC easy to modify, so shorter or longer projects are possible.
architectures from the past, but there are even vectors, Siegfried introduced a small language Jason (Just
VLIW, multi-core and many other architecture. Another Simple Original Notation) based on ALGOL [5].
Alfred Aho thinks that, despite this rather confusing It is specifically designed for academic purposes and
variety of source languages and targeted machines, there contains all the important concepts of procedural
is still possible to offer a course of compilers which offers programming languages. It is designed for recursive
even educative uses, as well as great satisfaction to descent parsing.
students [1]. Mernik et al. [6], presents the LISA system with user
friendly interface to process Attribute Grammars and
B. Undergraduate Curriculum generate compilers. LISA has useful visualizations that
are available for each compiler phase that helps students
Compilers course has a special place in the education of to understand easily the process and the internal
students of computer sciences, because it strenghtnes and structures.
binds different theoretical and practical knowlede. As Aho
says: compilers construction is a nice example of
connection of theory and practice [1]. III. CREATION OF THE COURSE
Most of the programs of computer sciences offer a
traditional course on compilers construction. Such a A. The aims of the course
course contains a rather big project in which students At the end of a compilers course the students are
write a compiler for a small programming language. expected to be ready to work with techniques and tools for
Projects often serve two purposes: learning a language formal specification of the programming language and for
design and implementation of compilers. Additional effect the implementation of the compiler. Students should
of such a project is that it provides students with master main strategies of analyses and language
experience in building a greater software system. This translation, as well as the data structures and algorithms
project is one of the most complex software and which are used within such strategies.
engineering tasks which students do within basic
academic studies. Students are supposed to have previous knowledge
from the fields of programming languages, data structures,
assembler languages, operative systems, computer
II. RELATED WORK architecture and discrete mathematics.
Since the compilers courses are important and that it is
very demanding to make a project for the course, it is
surprising that there are no standard projects widely
applied. Many teachers create their own projects,

Page 360 of 478


ICIST 2014 - Vol. 2 Poster papers

The subjects that should be learned are of high level of


abstraction. Therefore, the way the matter is presented The choice of a source language and the language in
should be adjusted to the fore knowledge of students. On which the compiler is to be implemented, are only
the other side, the course content must be interesting and beginning dilemmas in the process of building the project
applicable, because the lack of students' motivation might which will be the basis for the compilers course. It is
originate from the lack of interest for this area. necessary to write some additional software that is going
to be used together with the compiler. Each additional
B. Project software must be designed, implemented, tested and
The vast majority of courses about compilers are based documented. Then, the teaching material must be prepared
on one project in which students write a compiler for a which will contain meaningful task for extending existing
programming language. compiler. The whole project, before the real usage on the
course, should be implemented, because the complete
The development of the project for the course on implementation is the only reliable way to ensure the
compilers is intensive and long lasting. In order to make a consistency of the project.
good project for this course, it is necessary to think well
and give answer to some important questions. Because of the big investment authors have made
during the project realization, they are inclined to use the
The debate within the academic community of project over and over again, even beyond the point of the
programmers has long been started about which project getting old.
programming languages and concepts are the most
important to be taught on the basic courses of computer Although it is very likely that some existing project will
science. Within the context of the compilers course, there not satisfy all the needs and taste of a teacher, it could be
are concrete questions: For which language should overcome if the projects are design in a way they could be
students write compilers and, what language is to be used easily modified [2].
for writing a compiler?
1) The language for which a compiler is written IV. MINIC PROJECT
As for the first question, the majority agrees that the At the Faculty of technical sciences, University of Novi
language for which a compiler is written should be small. Sad, the miniC project was developed as an aid in
This significantly reduces the efforts invested by the teaching a compilers course.
students in reaching the end of realization of the compiler.
On the other side, if the language is small and short, the A. The source language
students can comprehend it from all aspects, which is
necessary for the implementation of the compiler. Since the source language has to be a small language,
but still to contain the most important characteristics of a
It is on the teachers to decide: whether they will design programming language, the authors of the course have
some new language only for the educative purposes of the chosen the subset of programming language C, which they
compiler implementation, or they would choose a subset called miniC.
of already existing language.
miniC has been developed as a choice of characteristics
A good characteristic of a new language is that it can be and concepts of C programming language which are
designed so that it fully supports and highlights the interesting for the course of compilers’ implementation.
process of education. New language can be designed to be Characteristics are taken only in a certain amount in order
simple for implementation, not for usage, as it is a case to make the implementation of the language easy. The
with commercial languages. Characteristics of the new authors restrained from many characteristics of C
language can be taken from different already existing language which complicate the implementation of the
languages and adjust to the needs of course realization. language and which do not have significant educational
Such a language can, on purpose, be different than already contribution [8]. miniC is a really small language, but
existing languages in order to force students to sufficient enough to write simple, meaningful programs.
consciously think about the meaning of language
structures, not to rely on intuition originating from already miniC program consists of a list of function definitions ,
existing languages [2]. out of which one has to be the main() function. Every
definition of function consists optionally of local variables
The subset of already existing language, on the other and the statements. The statements supported in miniC
side, offers an advantage of knowing the language. In such language are: simple assignment statement, simplified if-
a situation the students need much less time and effort to statement, return statement and compound statement.
master all of the characteristics of the language. Great
attention can be dedicated to formal specification and to The function calls with only one argument are
the design of the language, but the emphasis is put to the supported (which do not reduce the educative aspect of
implementation of the language itself. The subset of function calls and which significantly contributes to
existing language should contain only the important simplicity of the compiler implementation). Functions are
characteristics of that program language, because it is not important in programing, but they introduce unnecessary
possible to include (illustrate) all interesting complications into languages whose only purpose is to
characteristics of the language and, at the same time, to illustrate the implementation of compiler.
maintain the project small. Another advantage of choosing And finally, small educative source language has to
a subset of existing programming language is in have at least two data types in order to represent type-
broadening of students’ understanding of that language checking. More than two types of data complicate the
and of programing itself. implementation of the language.
2) The language in which a compiler is written The most important reason for the choice of subset of C
The choice of language in which students should write programming language as the source language is the fact
compilers has not been obvious yet. The decision depends that the students during the last few years of studying
on whether the course also emphasizes the practical were using this language in different courses. The authors
development of software or the principles of the language. believe that it is easier for students to master the course if
If the course is oriented towards building of software, the they implement a language with known characteristics.
choices are C and C++. For courses with emphasis on the
language design, it is much better to present the students B. Sample compiler
the ideas of modern languages [2]. Sample compiler has several roles.

Page 361 of 478


ICIST 2014 - Vol. 2 Poster papers

First, sample compiler is implemented in many program language, but not necessarily. In this way, the
incremental versions, so that every next version students are exposed to the features of different
implements next compiling phase and in such a way programming languages (which could be fitted into
supports modular assignments. sample miniC compiler).
Then, out of every version of sample compiler, the
frameworks for assignments were made. E. Documentation
Except sample compiler, the reference compiler for As for the documentation, there is a manual [8] for the
each assignment is implemented. Referential compiler programming language miniC which contains formal and
serves as a solution which the students can use to check informal description of this language. Formal specification
their solution. The students have an opportunity to presents lexical and syntactic structure of the miniC
compare the solution of their assignment with a correct program, while informal description shows valid and
compiler, for different examples, which is of great help to invalid syntax constructions, as well as the description of
students. One of the most important functions of a semantics. It turned out that the compilers course is an
reference compilers is in the fact that implementation excellent tool for getting the students to know formal
represents the only safe way of checking the behavior of specifications.
the compiler. Apart from the manual for miniC programming
During the implementation of miniC compiler, and language, there is also documentation which describes in
even later during its advancement (which lasted few detail the implementation of sample compiler [8]. There is
years) different aspects of the language which could be also material for every assignment and documentation for
simplified, were detected, without reducing the educative tools used during the course (flex, bison [9]).
value. miniC project operates on Unix system with standard
miniC compiler uses generators of scanner and parsers: GNU software tools (make, gcc, flex and bison).
flex and bison [7,9]. This means that the language of
compiler implementation is C programming language. V. CONCLUSION
This language was chosen because the students are
familiar with it. Even in this case the students are provided In order to develop and finally shape miniC compiler,
with the support and help to reduce errors in coding. Too significant effort and a few years of work were invested.
many errors during the work on the assignment can miniC project contains sample compiler for a subset of C
exhaust students’ time and enthusiasm. programming language. Students get to extend this sample
compiler with new language structures in respect of
C. Additional software lexical, syntax and semantic analyses and code
generation.
Many standard data structures (hash tables, stacks, …). This concept of project for compilers course, has shown
are necessary for the implementation of compiler. in author's practice as the easiest and the most efficient
Although the students probably had courses on the data way for students to master the very abstract content of the
structures before the compilers course, it would be course.
unreasonable to ask to implement them during this course.
The authors are of the opinion that it is more efficient to
give students implemented data structures, with detailed ACKNOWLEDGMENT
explanations and code documentation. Research presented in this paper was supported by the
Republic of Serbia Ministry of Science and Technology
D. Assignments grants: III–44010“Intelligent Systems for Software
The development of compilers is usually divided into Product Development and Business Support based on
four tasks: lexical analysis, parsing, semantic analysis and Models”
code generation. Those tasks are closely related to each
other. For example, a student who is bad in lexical REFERENCES
analysis will not be able to thoroughly test the parser, [1] Alfred V. Aho, Teaching the compilers course, SIGCSE Bull. 40, 4
because the parser will not operate because of lexical (November 2008), 6-8. DOI=10.1145/1473195.1473196
errors. This problem is later complicated in further tasks, http://doi.acm.org/10.1145/1473195.1473196
by introducing semantic checks and through code
generation. [2] Alexander Aiken, Cool: A Portable Project for Teaching Compiler
Construction, ACM SIGPLAN Notices, vol. 31, 1996
miniC project was designed and implemented in four [3] ZhaoHui Li, Exploring effective approaches in teaching principles
incremental versions. The first contains only of compiler, The China Papers, 2006.
implementation of lexical analysis. The next contains
implementation of syntax, next the implementation of the [4] Maria Joao Varanda Pereira, Nuno Oliveira, Daniela da Cruz and
Pedro Rangel Henriques, Choosing Grammars to Support
semantic analysis and the last one contains the Language Processing Courses, 2nd Symposium on Languages,
implementation of code generating. In that way, all Applications and Technologies, pages 155—168, vol. 29, 2013,
assignments for the students, are classified into these four http://dx.doi.org/10.4230/OASIcs.SLATE.2013.155
phases. Each of these phases could be compiled separately
and run for the needs of referential solution. [5] Robert M. Siegfried, The jason programming language, an aid in
teaching compiler construction, ESCCC, 1998
Students start solving the tasks, every time, from an [6] Marjan Mernik and V. Zumer, An educational tool for teaching
appropriate framework variant. Their task is to extend the compiler construction, IEEE Transactions on Education, 46(1):61–
compiler with the new syntax constructions (new 68, 2003
statements, expressions, …) and to implement checks [7] J. Levine, flex & bison, O'Reilly Series, O'Reilly Media, 2009.
related to the given phase of compiling.
[8] Zorica Suvajdzin Rakic, Miroslav Hajdukovic, Programski jezik
miniC, specifikacija i kompajler, FTN, in press.
Initial miniC language took only the basic features from [9] Zorica Suvajdzin Rakic, Predrag Rakic, flex & bison, FTN, in
the C programming language. The student assignments are press.
designed in such a way that students extend, for each
compilation phase, sample miniC compiler with a new
language concept. This new concept originates from the C

Page 362 of 478


ICIST 2014 - Vol. 2 Poster papers

Using Syntax Diagrams for Teaching


Programming Language Grammar
Zorica Suvajdzin Rakic*, Srdjan Popov*, Tara Petric*
* Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia
[email protected] , [email protected] , [email protected]

Abstract—Understanding the syntax is an important activity II. TEACHING NEW PROGRAMMING LANGUAGES
in the process of learning the new programming language.
One of the main factors, which influence acceptability and Selection of appropriate metalanguage for teaching the
syntax of a new programming language is important.
clarity of the syntax, is the way in which the syntax is Introduction of the syntax in traditional Programming
described. There is no consensus about which meta- Languages courses starts by learning some of these meta-
language is best for describing the syntax. The traditional language, very often Backus-Naur form (BNF).
way of learning syntax involves using Backus-Naur form BNF notation is a meta-language developed in 1960, in
(BNF), which is a bit confusing and not very popular among the process of defining the programming language
the students. Using syntax diagrams and tools for syntax ALGOL 60. ALGOL 60 originated from the International
visualization is better way for understanding and learning Committee, where essential contribution was made by J.
grammar of the programming language. W. Backus and P. Naur. Since then, almost every author
of a book on a new programming language uses BNF to
specify the syntax rules of the language. The difficulty
I. INTRODUCTION with the BNF is its readability. Understanding of the rules
written in BNF can be difficult, especially for beginners.
Most people are taught their first programming Meta-symbols of BNF are:
language by example. This is probably unavoidable, since
learning the language is often carried out in parallel with :: = means "is defined as"
the more fundamental process of learning to develop |    means "or"
algorithms. But the technique has shortcomings because
the tuition is incomplete - after being shown only a limited <>   bracket are used around non-terminal symbols.
number of examples, one is inevitably left with many BNF had a problem that options and repetitions could
unanswered questions. not be expressed directly. Instead, the use of meta-
A programming language is a formal language used to productions or alternative productions were needed.
convey the algorithm from developers to developers and Alternative productions were defined as nothing, or
the programs to machines. To know and use the optional derivation or any other repeted derivation.
programming language one needs to understand it from The readability issue is partially solved by introducing
three perspectives: EBNF notation or Extended Backus-Naur Form. This
• Syntax – set of rules for writing the language notation is based on the fact that, in most real
constructions programming languages terminal symbols are listed in the
quotation marks or the apostrophe.
• Semantics – determines the meaning of certain
language constructions and the program as a If this is also adopted in meta-language, then there is no
whole need to write symbol names between the brackets < > (to
distinguish them). They can be written directly, without
• Pragmatics – use of language constructions. brackets. In addition, there is no need to use many-
In other words, the issues of program correctness are a character sign ::= but simple one-character sign = .
subject of syntax, the question of the meaning of the This way, some of expressions in meta-language are
program is subject of semantics, while pragmatics deals significantly simplified, and the efficiency of learning
with issues of expressiveness of the programming programming language syntax is increased.
languages.
Linguistics of the natural language covers phonetics,
morphology and syntax, and further, semantics and Meta-symbols of the EBNF notation and their meaning
lexicography. In the area of the programming languages are:
there is no need for phonetics and morphology, and their =         equals by definition
grammar is reduced only to the syntax [1]. In order to
define or describe the syntax of a language, there is a need |         exclusive OR
for a special language to describe it. .         end mark of production
Such language is called a meta-language. Typical meta- X         non-terminal symbol X
languages are:
“X”       terminal symbol X
1. Backus-Naur form (BNF)
{ X }     repetition of symbol X, 0 or more times
2. Extended Backus-Naur form (EBNF)
[ X ]     optional symbol X (0 or 1 occurrence)
3. Syntax diagrams.
( A | B ) grouping symbols

Page 363 of 478


ICIST 2014 - Vol. 2 Poster papers

To avoid confusion, EBNF is not more expressive then


BNF. Actually, EBNF has just introduced a few simple unary_exp : 
extensions of BNF that make grammar more expressive, (  id 
while anything that can be expressed by EBNF can also be
expressed by BNF, too. BNF and EBNF notation of a  | int_const | char_const | float_const
number is shown in the following examples (1a and 1b):  | enumeration_const
 | string
S  := '­' FN | FN  | '(' exp ')'
FN := DL | DL '.' DL )
DL := D | D DL (  '[' exp ']'
D  := '0' | '1' | '2' | '3' | '4' | '5'  | '(' argument_exp_list? ')'
    | '6' | '7' | '8' | '9'  | '.' id
Example 1a. BNF production of a number  | '­>' id
 | '++'
S := '­'? D+ ('.' D+)?  | '­­' 
)*
D := '0' | '1' | '2' | '3' | '4' | '5' | '++' unary_exp
   | '6' | '7' | '8' | '9' | '­­' unary_exp
Example 1b. EBNF production of a number | unary_operator cast_exp
| 'sizeof' unary_exp
As it can be concluded from the previous example, | 'sizeof' '(' type_name ')'
EBNF is not more powerful then BNF in terms what can Example 2b. EBNF notation for C unary-expression
be defined, but in terms of convenience. Each EBNF production
production can be translated into equivalent BNF
production.
A. Syntax diagrams
Regardless of the simplification brought by EBNF,
especially if it is the first programming language to learn,
understanding complex syntax structures can be difficult.
Textual form of syntax can be quite unreadable in the case
of complex meta-language productions.
Therefore, it is natural to resort to write language
syntax in a graphic (visual) form, because visual form can
clearly show all alternatives that exist in some language
construction.
Unlike BNF, this kind of notation does not seem to
have a commonly agreed-on name. Syntax diagrams are Figure 1. Syntax diagram for C unary­expression
also known as Railway Tracks or Railroad Diagrams or
Syntax Charts or Syntax Graphs. They do not allow us to
write anything that can't be written in BNF, they just make
the grammar easier to understand. III. TEACHING COMPILERS
Syntax diagram is a directed graph that has one entry A formal programming language has to be described by
point and one exit point and is used for the description of using another language of description that is called the
valid syntax constructions of a programming language. metalanguage.
Each syntactically valid language construction is defined The formal study of syntax was applied to
as one of the paths through the syntax diagram, from input programming languages with the publication of the Algol
to output. 60 report by Naur (1960, 1963). It used a simple notation
Graphical representation of the components of syntax known as Backus-Naur-Form (sometimes called Backus-
diagram makes syntax of the programming language more Normal-Form). Notations for describing semantics have
readable, even more intuitive. This is shown in the not been so simple, and many language semantic features
following examples (example 2a shows K&R notation, 2b are still described informally, or by example.
shows EBNF notation and Fig. 1 shows syntax diagram BNF notation describes context-free grammars so it is
for C unary expression): adequate for describing programming languages.
Therefore, parsing technology is based on context-free
unary­expression:  grammars. BNF notation is used in tools for generating
parsers, such as yacc and bison [2]. Bison's input file
  postfix expression  contains grammar of a (programming) language in
  ++unary expression  modified BNF notation.
  ­­unary expression  In order for students to comprehend compiler
  unary­operator cast­expression  construction techniques, they must, first, comprehend the
  sizeof unary­expression  theory of the formal languages.
  sizeof (type­name) A. Parsing
Example 2a. K&R notation for C unary-expression Parsing takes the grammar and an input string and
production answers these two questions: Is that string in the language

Page 364 of 478


ICIST 2014 - Vol. 2 Poster papers

of the grammar? What is the structure of that string productions in Example 2 there were two syntax diagrams
relative to the grammar? as in Example 3:
Not all strings (of tokens) are valid programs and parser
must distinguish between valid and invalid programs
(strings of tokens), and give error messages for the invalid
ones. So, there is a need for a language for describing
valid strings of tokens, and an algorithm that distinguishes
valid from invalid strings of tokens.
Programming languages have recursive structure,
which means that one symbol might be defined by itself.
Context-free grammars are a natural notation for this a)
recursive structure. A context-free grammar (CFG)
consists of:
• a set of terminals
• a set of non-terminals
• a start symbol (one of non-terminals) b)
• a set of productions.
Productions have the following form:
X → Y1 … Yn Figure 2. Syntax diagram for miniC function_list and 
stmt_list
where X (on the left hand-side) must be a non-terminal
symbol, and every Yi (on the right hand-side) can be When the two given productions were presented to
either terminal or non-terminal symbol or special symbol students graphically by the syntax diagrams, the
epsilon that denotes empty string. difference between these two constructions became
The language of the context-free grammar can be obvious. Figure 2. shows function on the upper line, and
defined as the set of string of symbols, that may be figure 3. shows statement on the lower line with empty
constrained by rules that are specific to it. upper line. The empty line shows that stmt_list can
follow empty line (and be an empty string). This
B. miniC project distinction makes recursion more readable and easier to
understand for students.
Authors of the Compilers course at the Faculty of
Technical sciences at the University of Novi Sad, The result of this modification in the miniC manual,
developed the miniC project as a teaching aid for the was instantly noticeable improvement in the
course. It contains compiler for the small programming understanding of the grammar productions and
language called miniC. This language is a subset of main interconnections between the non-terminal symbols. The
language characteristics of the C programming language. students have started asking more questions, which
To describe the language, authors have firstly used the indicates their grater engaging in the learning process.
BNF, as the usual notation for describing context-free Consequently, the understanding of the parsing process
grammars. was significantly better.
Understanding the recursive structure of the language
was quite a challenge for the most of the students. The IV. CONCLUSION
explanation was given at many different times, at different Comprehension and learning the grammar of the new
subjects. This misunderstanding slows down their programming language is a complex process. Enabling
comprehension of the parsing process. One of the things that it takes place in a controlled, interactive environment,
that are hard for the students to understand, is how to helps a great deal. Allowing students to modify EBNF
make a difference between two slightly different BNF production and display it as a syntax diagram is a very
rules, given in example 3a and 3b. good didactic tool for teaching grammars.

function_list ::= function V. ACKNOWLEDGMENT
                | function_list function Research presented in this paper was supported by
Example 3a. BNF production for function_list Republic of Serbia Ministry of Science and Technology
grants: III–44010“Intelligent Systems for Software
Product Development and Business Support based on
stmt_list ::=  Models”
            | stmt_list stmt
Example 3b BNF production for stmt_list REFERENCES
[1] Dr Jozo J. Dujmovic, “Programski jezici i metode programiranja”
Naucna knjiga, Beograd, 1990.
The first rule describes the list of functions as one or [2] J. Levine, flex & bison, O'Reilly Series, O'Reilly Media, 2009.
more functions, and the second rule describes the list of
statements as zero or more statements in the statement list. [3] Zorica Suvajdzin Rakic, Miroslav Hajdukovic, Programski jezik
The difference shows what is the minimum number of the miniC, specifikacija i kompajler, FTN, in press.
items in the list: zero or one.
In collaboration with authors of the Programming
Language course, the definition of miniC programming
language in the miniC manual [3] was extended. The
syntax diagrams were added. So, each production of the
miniC grammar was described by its syntax diagram and,
separately in the grammar, using BNF notation. For the

Page 365 of 478


ICIST 2014 - Vol. 2 Poster papers

Migration from Sakai to Canvas


Nikola Nikolić, Goran Savić, Milan Segedinac, Zora Konjović
University of Novi Sad, Faculty of Technical Sciences, Serbia
{nikola.nikolic, savicg, milansegedinac, ftn_zora}@uns.ac.rs

Abstract— The paper presents a software tool for migration consists of numerous institutions gathered around a
of e-courses from Sakai to Canvas LMS. Both of the LMSs common goal to integrate and synchronize their local
are described as well as their import/export course formats. educational software into a single LMS. The development
The tool facilitates the migration between these two LMSs of the system follows a community-source model that
by supporting automatic conversion of a Sakai course into relies on the cooperation among various stakeholders such
the Canvas-compliant course format. The verification is as academic institutions, commercial companies and
conducted on the courses held at the Faculty of Technical individuals with the goal to ensure sustainability and
Sciences (FTN) at the University of Novi Sad, Serbia. improve software development and distribution.
Sakai LMS has been distributed in two versions:
I. INTRODUCTION Collaboration and Learning Environment (Sakai CLE) and
Open Academic Environment (Sakai OAE). Sakai OAE is
The advancement of ICT has inflicted the educational a new system within Sakai Foundation promoting
settings as well as other aspects of social life. Traditional constructivist learning model. It provides collaboration
education environment has been superimposed with within users communities organized on the social
flexible and adaptable teaching approaches fostered by networking principles.
Learning Management Systems (LMS). LMSs facilitate
Since this paper deals with Sakai CLE, the detailed
various aspects of learning process such as: delivering and
description follows.
management of instructional content, identifying and
assessing individual and organizational learning or Sakai CLE is an e-learning framework, which, in
training goals, tracking the progress towards meeting addition to standard e-learning features, provides tools for
those goals, and collecting and presenting data for general computer-supported collaborative work. Sakai
supervising the learning process of organization [1]. enables users to create web pages that contain a specific
set of tools. Tools are independent components that
Nowadays, more and more educational institutions,
interact with users providing a specific set of
especially in developing countries, choose free and open
functionalities. Sakai tools may be classified into four
source LMSs. In addition to the cost minimization, which
groups: learning, collaboration, presentation and
is the main benefit of using such solutions, free and open
administration tools.
source LMSs can be customized to fit the needs of
particular educational institution. Therefore, the focus of Sakai learning tools support the administration of
our research is put on free and open source LMSs, namely educational resources and participants. These tools relate
Sakai and Canvas. to:
Regardless of chosen e-learning solution, each day • curriculum – educational programs and learning
institutions are being faced with new demands caused by objectives
ongoing changes in the learning environment and • learning material – educational resources and their
technologies [2, 3]. This results with the migrations to organization
other LMSs that better fit to new circumstances. Moving • assessment – knowledge assessment through online
from one LMS to another is not an easy task, since LMSs tests with grade statistics
differ in data formats and features. Each LMS has its own
data model, so migration from one LMS to another In Sakai, a special attention is put on collaborative
typically implies appropriate data alignment. This work. Therefore, various collaboration tools are provided:
alignment can be performed either manually or • file sharing – DropBox
automatically. Since manual alignment may be time • creating content collaboratively – wiki pages and
consuming and prone to human errors, in this paper we dictionaries
propose a migration tool that utilizes the automatic
approach when migrating from Sakai to Canvas LMS. The • communication – e-mail, chat, forum information
migration includes transferring course content from one sharing – notifications, news, blogs
LMS to another and it is verified on a case study of the • scheduling – calendar of important events,
courses held at the Faculty of Technical Sciences activities and deadlines
(hereinafter FTN) at the University of Novi Sad, Serbia. Presentation tools enable user to create a set of personal
equal. Use automatic hyphenation and check spelling. presentation pages. The presentations may be used to
Digitize or paste down figures. publish the information on users such as their work, prior
experience and skills. Available tools are:
II. SAKAI • wizards and matrices – customized collections of
Sakai LMS [4] is a free and open source system arbitrary content following sequential, hierarchical
developed within the Sakai foundation. The foundation or matrix visual organization

Page 366 of 478


ICIST 2014 - Vol. 2 Poster papers

• reports – create, view and export reports containing new applications added every day. The only precondition
presentation data that has to be met when developing a new Canvas
• templates – a predefined visual organization of application is to be compliant with IMS LTI specification
data. [7].
System configuration is managed by using By comparing two described LMSs, we can see that
administration tools aimed at managing following both of them provide standard e-learning features. Still,
elements: we believe that Canvas LMS is better solution than Sakai
at this moment, since it is newer system, more oriented on
• accounts – managing basic information about user modern web technologies and easier for the extension
accounts using third-party tools.
• membership – defining user permissions to access
particular content IV. MIGRATION TOOL
• course organization – managing courses and Each LMS has its own format for representing course
arbitrary sets of web pages data. LMSs typically support export functionality which
• system monitoring – observing technical system stores the course data in a specific format. The stored data
parameters can later be imported back to the system.
Software tool proposed in this paper relies on
III. CANVAS import/export functionalities. It takes a course exported
Canvas LMS [5] is a cloud-native learning management from the Sakai CLE and converts it into such a format
system developed by Instructure. It provides wide range of which can be imported into Canvas LMS.
e-learning functionalities relying on web 2.0 technologies. Before any migration can be performed, it is necessary
Standard Canvas distribution supports following to analyze export/import data format of the above
learning tools: mentioned LMSs. Among all course data, the software
• curriculum – defining learning outcomes and tool in this paper deals only with migration of learning
syllabi resources in a course. We refer to “learning resources” as
any digital content (file) uploaded into the system.
• learning material – learning resources files
organized into modules A. Sakai course format
• assessment – assignments, quizzes and polls Sakai course format is illustrated with Figure 1. As
supported by specialized grading tool shown in Figure 1, Sakai course content is exported in a
Regarding functionalities related to collaborative work, separate folder containing different files. XML files
Canvas offers: contain data or metadata about exported course content.
• file sharing – users have their own file repositories Each tool in a Sakai course is exported in a separate XML
• creating content collaboratively – wiki pages file.
• communication – discussions and messaging
system including video messages
• information sharing – announcements (text or
video messages with support of RSS feed)
• scheduling – calendar tool integrated with other
tools such as syllabus, assignments etc.
Presentation tools are also supported by Canvas. There
is a specialized tool for creating user’s portfolios and
custom web pages.
For the system administration, Canvas offers following
sets of tools:
• accounts – Canvas resources are available only to
registered users organized into groups with
different permissions
• membership – only enrolled students can access a
course
• course organization – course organization and
navigation can be customized
• system monitoring – automatized tasks such as
message exchange can be monitored from Canvas
The tools are implemented using Web 2.0 technology Figure 1. Sakai course format
and Ruby on Rails development platform. In addition to
predefined set of tools, Canvas may be extended by The files meaning is as follows:
adding third-party tools. So far, there are numerous • announcement.xml – holds the data about all
external applications and plug-ins for Canvas. For announcements created or generated in the
example, in [6] 132 Canvas applications are listed with exported course

Page 367 of 478


ICIST 2014 - Vol. 2 Poster papers

• assignment.xml – the data about all readings, content is stored in a separate file within the exported
quizzes, tests and projects in the course course folder. Such files have no extension and they are
• calendar.xml – export file for the Calendar
shown at the top of Figure 1. Tag body-location in the
content.xml file references the name of the file that
Sakai tool that contains schedule of the learning
contains resource content. Besides, resource tag holds
activities
data about length and type of the resource file, resource
• chat.xml – the history of all chat messages are name and resource type. All metadata specified for a
exported in this file resource in the Sakai system are listed in the properties
• content.xml - metadata about course learning tag shown at the bottom of Figure 2.
resources
• email.xml – an archive of all e-mail messages B. Canvas course format
sent by course participants Exported Canvas courses are stored as zip archives
• messageforum.xml – messages posted in the with the .imscc extension. The structure of this archive is
course forum are stored in this file illustrated with Figure 3.
• news.xml - news received in the course from
external resources through the RSS feed
• poll.xml – the file contains the data about polls
organized within the course implementation
• site.xml – course sections are stored within this
XML file
• syllabus.xml – stores a description of a course
syllabi
• user.xml – the file contains a list of course
participants
• web.xml – stores custom web content defined
within the course
• wiki.xml – the file is a backup of wiki pages Figure 3. Canvas course format
administered by course participants
The archive contains XML files whereby
Since the migration tool transfers learning resources imsmanifest.xml is the main file. The file describes
from a Sakai course to Canvas LMS, we are going to course settings and organization. Figure 4 shows the
analyze content.xml file that stores learning resources structure of the Canvas imsmanifest.xml file.
metadata. The file enumerates all the resources in the
course and its structure is shown in Figure 2. First two children of the manifest tag specify global
course information – its identifier and name. Metadata
describing course and imscc file global information are
given within the metadata tag. The tag resources
stores metadata about all learning resources in the
course. It can be noticed that each resource is represented
by using the resource tag. Within this tag, an identifier
and type of the resource are specified. Tag file and its
child tag href reference a physical file on a disk where
the resource has been stored.
Other XML files and folders in the Canvas imscc file
contain the metadata on the dynamic aspects of the course,
such as announcements. The general course settings are
defined within the course_settings folder.
Figure 2. File containing metadata about learning resources Files and folders with names that start with the letter "i"
exported from the Sakai course are related to announcements created within the course. In
Tag archive is a root element in the resources addition to the announcements, the files contain
description. Its children tags date, server, source and notifications sent or received during the course.
system describe some global information about exported All the course resources (files) are placed within the
course and Sakai system where the course has been stored. web_resources folder. In contrast to Sakai course
ContentHostingService tag contains all learning format, the files in this folder have names that correspond
resources in the course. Since resources may be organized to the original file names.
into folders, for each folder there is a collection tag Folder wiki_content contains wiki pages in the
that represents its content. Within this tag, an identifier course.
and name of the folder are defined. Some additional info,
such as the visibility of the folder, is given too. Learning C. Implementation
resources in the course are described by using the
resource tag. As mentioned, the content.xml file
The tool is implemented in Java programming
language. Tool components are shown in the UML
stores only metadata about resources. The resource
component diagram in Figure 5.

Page 368 of 478


ICIST 2014 - Vol. 2 Poster papers

The tool is an intermedior between Sakai and Canvas migration among multiple LMSs, such as Moodle,
LMSs. An input of the tool is an archived Sakai course. Desire2Learn, Blackboard etc.
Starting on this input, the tool produces a new Canvas
User Migration tool
course.
Migration process, represented in a form of an UML
activity diagram, is shown in Figure 6.
The process starts with exporting Sakai course to the
Export Sakai course
file system. After that, a user enters the file path of the
course and the destination path where a Canvas course is
going to be stored. Then, the tool parses content.xml Enter the source and
Parse content.xml file in
destination path
file in the archived Sakai course. Next step is to create a the Sakai course
folder for storing the Canvas course. Using the data
loaded from the Sakai course, the imsmanifest.xml file
of the Canvas course is programmatically generated. Create a Canvas course
folder
Then, all resource files from the Sakai course are copied
to the Canvas course folder where they are renamed
according to the Canvas course format. After that, all Generate imsmanifest.
xml of the Canvas
content in the destination folder is zipped to an archive course
with the .imscc extension. Finally, the user can import the
generated Canvas course.
Copy learning
resources
generates

Sakai course Make imscc archive


Import Canvas course
Sakai
uses

Migration tool
Figure 6. UML activity diagram of the migration process
Canvas

generates

imports

Canvas course

Figure 5. Component diagram of migration tool

D. Verification
The tool has been verified by migrating several courses
held at FTN from Sakai to Canvas LMS. Figure 7 shows
the “Object-oriented programming platforms” course
generated by using the migration tool proposed in this
paper. The course content is in Serbian as well as resource
names.

V. CONCLUSION
The paper presents a software tool that facilitates
migration from Sakai to Canvas LMS by providing an
automatic conversion of Sakai exported courses into the
Canvas-compliant course format. The tool relies on
internal export/import formats of the mentioned LMSs and
it has been implemented as a stand-alone Java application. Figure 7. Result of tested Sakai course migrated for Canvas
The main limitation of the proposed solution is that the course
tool only provides the migration of learning resources in
the course. Announcements, messages, wiki pages and ACKNOWLEDGMENT
export of other course content is not currently supported. Results presented in this paper are part of the research
Future work will be aimed at overcoming this conducted within the Grant No. III-47003, Ministry of
limitation by extending the tool. Another direction of the Education, Science and Technological Development of the
following research will be to support the bidirectional Republic of Serbia.

Page 369 of 478


ICIST 2014 - Vol. 2 Poster papers

REFERENCES [3] S. Kolowich, “Cracking Up the LMS,” Inside Higher Ed, January
2012.
[1] M. Szabo, K. Flesher, “CMI Theory and Practice: Historical Roots
[4] Sakai Project Home Page, http://www.sakaiproject.org.
of Learning Management Systems,” 2002.
[5] Canvas by Instructure - Guides, http://www.instructure.com.
[2] R. Atkinson, C. McBeath, D. Jonas-Dwyer & R. Phillips (Eds)
(2004). Beyond the Comfort Zone: Proceedings of the 21st [6] EduApps, https://www.edu-apps.org/index.html.
ASCILITE Conference. Perth, Western Australia, 5-8 December: [7] IMS Learning Tools Interoperability Specification,
ASCILITE. http://www.ascilite.org.au/conferences/perth04/procs/ http://www.imsglobal.org/toolsinteroperability2.cfm.
contents.html

Page 370 of 478


ICIST 2014 - Vol. 2 Poster papers

Implementing an Effective Public Administration


Information System: State of PAIS in the Czech
Republic and its potential application in the
Republic of Serbia
Martin Štufi*, Nataša Veljković**, Sanja Bogdanović-Dinić**, Leonid Stoimenov**
*Solutia, s.r.o., Prague, Czech Republic
University of Nis, Faculty of Electronic Engineering, Niš, Serbia
**

[email protected], {natasa.veljkovic, sanja.bogdanovic.dinic, leonid.stoimenov}@elfak.ni.ac.rs

Abstract—As part of the Smart Administration strategy of When designing the Public Administration Information
the Czech Republic, Czech government has adopted a System (PAIS), there should be two guiding principles:
unique solution to centralize and to keep actual, most 1. Guiding principle #1 - A citizen may no longer
common and widely used information. The basic registers circulate authorities
are the central information source for information systems
of public authorities in the Czech Republic. Along with the 2. Guiding principle #2 – Circle data not people
Czech Point as the interface to all administrative requests
and Data box as the electronic mail box the three projects When a person requests some document from PA body
represent a cornerstone of the effective public or when he wants to renew a passport, or register a vehicle
administration in the Czech Republic. This paper describes he often has to submit some other document confirming
the current status of Public Administration (PA) in the his identity, or document related to the task he wants to
Czech Republic and gives the characteristics of the optimal accomplish. Instead to circulate the authorities in order to
PA Information System (PAIS) and guidelines for its finish a task, person should be able to accomplish the task
possible implementation in the Republic of Serbia. in one place, because the data he needs is usually already
in some PA body’s information system. Having this in
mind, the first principle of building a successful
I. INTRODUCTION
Information Strategy should be that citizens should not
The Czech Republic, as a unitary state with a circulate authorities. This principle rises from historical
parliamentary system and decentralized and de- performance of the public administration, when large
concentrated government has three basic levels of public paper piles were kept by individual PA bodies, and people
administration – central, regional and municipal. Fourteen were circling among them in order to obtain all necessary
regions and more than 6200 municipalities represent self - documents. Nowadays, even the data are held in electronic
government units [1]. This put a stress on capacities of form, the PA information systems are inflexible and
central government to coordinate and evaluate the inefficient, and they do not exchange the data.
practices of municipalities and to provide state Public Administration bodies hold large amount of data
administration services in the uniform and proficient level about the people, companies, registrations, licenses,
across the state territory. vehicles, etc. The common problem that occurs is that
Information Strategy is the principal document most of these data are not connected. This raises the issue
governing the implementation of information and of data redundancy. For example person’s name, residence
communication technologies (ICT) in Government and and unique identification number can be found in pupil’s
thus it encourages the development of electronic records, in pension and health insurance’s records or in the
Government. The aim of the Information Strategy is to records of real-estate registrations. Mostly, there is no
create a solid conceptual foundation for adoption of central register or database from which PA bodies can
strategic documents and become a management tool for acquire data. The second principle brings in the first plan
development of Information Systems (IS) and managing creation of a unique data registers that would connect
data across organization. Implementing new technologies fragmented information and allow all PA bodies to access
in the legacy systems does not always bring positive redundant data. This will dramatically reduce the number
effect, i.e. many investments in these areas may be of situations where people need to submit documents to
unprofitable. One of the main reasons why this happens is the authorities, because the authorities will have the ability
unclear or poorly defined Information Strategy, which is to access data from centralized registers.
an integral part of the strategic objectives of the
Government. Even in cases where Information Strategy is II. E-GOVERNMENT IN THE CZECH REPUBLIC
designed correctly, it is necessary to plan its
implementation in a way to achieve optimum functioning The development of ICT enabled information system
of the systems and the expected positive effect. for the Public Administration in the Czech Republic began
in the early 2007. Prior 2007 the Czech Government was

Page 371 of 478


ICIST 2014 - Vol. 2 Poster papers

facing many challenges, among which we can highlight A. Basic Registers


the following: Basic registers are one of the cornerstones of the Czech
 lack of integrated communications infrastructure; e-government. The basic objective is to facilitate citizens,
 lack of interconnectedness between individual companies and other entities to establish contact with the
registers; public administration and to minimize the number of visits
to the offices by utilizing the opportunities and
 low digital literacy and competence levels among technologies of the 21st century. At the same time, public
public sector workers authorities have to ensure a safe, efficient and transparent
 fragmentation and multiplicity of data in key exchange of accurate and up to date data. The basic
public administrations’ databases registries will eliminate fragmentation, disunity and
 inability to share data held in current registries multiple data occurrences in critical databases of public
 lack of reliable mechanisms for accessing and administration.
utilizing public authorities’ data There are four types of basic registries:
 redundancy of citizens’ personal data 1. Registry of Inhabitants - stores data on Czech
citizens and foreigners who reside in the Czech
As a response to these challenges, in early 2007 the Republic and other physical persons related to
Czech government approved the establishment of basic Czech Republic.
goals formalized through the "Effective Public
Administration and Friendly Public Services" strategy 2. Registry of Persons – stores data on legal persons
(also known as the Smart Administration Strategy) for the and individual entrepreneurs.
period 2007-2015 [2]. The overall purpose of this policy 3. Registry of Territorial Identification and Real
instrument was for the Public Administration to achieve Estates - based on the data of Land Register and
effectiveness comparable to that of the European Union. on the data of territorial units created for statistical,
Another important document leading the development of administrative or local authority purposes.
e-Government in the Czech Republic is the "Strategy for 4. Registry of Rights and Obligations - information
the development of Information Society services for the for controlling access to other basic registries’ data,
period 2008-2012", published in March 2008 by the overview of agendas carried out by public
Government Council for the Information Society. This authorities, information about decisions that have
document sets out a vision for the Czech Republic to made changes to the data.
become one of the top five EU countries in terms of e- Each type of basic registry represents a safe database
Government development. One of the tasks outlined in the that provides relevant and unquestionable data, the so-
Strategy is the Public Administration reform in such way called reference data stored by public authorities.
to provide modern and simplified public services and Introduction of basic registries have led to the increasing
assure comfortable, secure and reliable electronic efficiency of the state administration. The PA officials do
communication across all levels of Government. not have to find out which data are relevant and updated,
The Smart Administration strategy presupposes since the basic registries hold only valid and relevant data.
extensive ICT use in the Czech public sector, which can There is also an evident acceleration in processing the
be achieved through the goals stated in the strategy: citizen’s applications and thus decreased bureaucratic
a) establishment of functional Basic registers burden.
b) improvement of practice of interoperable
government by relying on standardized management of B. Czech Point
public administration information systems. The Czech Point project was launched in 2007 with a
c) guaranteeing the possibility to use e-government goal to deliver contact points, as focal points of public
channels in service-delivery. services that would be situated in PA institutions. Through
the contact points citizens would be able to request public
d) promotion of e-communication channels within administration documents that would be produced from
public administration. basic registers [3]. In 2008 the Czech Point was fully
e) promotion of civil servants education. operational. The project resulted with the guaranteed
Some of the aims of e-government policy have been service that can be used by citizens and businesses in
already supported by e-government legislation, such as the order to communicate with the state via single contact
Act on Free Access to Information, the Act on place. Through the contact point they can: obtain and
Information Systems of Public Administration, which authenticate data from public and non-public information
currently prescribes not only the new accessibility systems, authenticate documents, converse paper
requirements, but also general duties of “long - term documents into authorized electronic form and vice versa
management” of information systems within which also and obtain information about the progress of
the certification of information systems and related administrative proceedings and to initiate administrative
information strategies is required, the new Administrative proceedings [4].
Procedure Act, the Act on Electronic Transactions and By visiting any of the contact points in the
Authorized Conversion of Documents and the Act on Czech POINT network any person can quickly and
Basic Registers. comfortably get certified extracts from a number of public
As part of the successful PA Information Strategy and non-public registers of the public administration
implementation in the Czech Republic, we can highlight without having to visit several different state
the following outcomes: administration authorities.

Page 372 of 478


ICIST 2014 - Vol. 2 Poster papers

C. Data Box be achieved by adopting the following quality


Data boxes were introduced to unify communication and management and security management goals, with the
assumption that there exists a proper hardware and
to increase efficiency in public administration. They
software infrastructure and that there are human resources
represent an electronic storage for delivery of official capable to install and maintain the PA Information
documents and for communication with public authority System. Also there must be a compliant Information
bodies [5]. They are determined for individuals, self- Strategy that will set out the similar goals as given in the
employed individuals and legal entities and for public Czech Republic Smart Administration Strategy.
power bodies. The public power bodies are obligated by To ensure the quality of the PAIS, Serbian Government
the law to use the data box, while citizens and private must set long-term goals in the following areas of quality
individuals who carry out business activities does not management:
need to have a data box account.  Ensuring the quality of data that are processed in
the Information System (QoT01- QoT05)
III. ENSURING EFFECTIVE PAIS: POTENTIAL
APPLICATION IN THE REPUBLIC OF SERBIA  Ensuring the quality of services that are provided
by Information System (QoT06- QoT10)
A successful Information Strategy odd to be fully in
line with the strategic objectives of the state and local  Quality assurance of hardware and software
governments. It needs to define steps for developing and (QoT11)
implementing strategic targets and goals. It has to define These goals can be further disseminated as given in
means for the effective use of information and documents Table I. For each goal we can define a set of targets that
as they are the intangible assets of every organization’s need to be accomplished.
business processes. The Information Strategy is not a one- Having a good understanding of what is already
off and unilaterally created document, but a living functioning effectively and being practiced and what
structure of knowledge and information that are procedures are required with immediate effect hinders the
adequately compiled and are regularly reviewed in development of successful security management strategy.
accordance with the development of superior strategic The Serbian Government must pay attention on PAIS
objectives, development environment or development of security management and assure long term objectives of
legislation. Information System's security concerning four aspects:
Information Strategy brings significant value since it  Security of Information System in general
provides guidelines to avoid inconsistencies and errors in (ST01)
information systems and in the usage of information and  Security of data processed in the Information
communication technologies (ICT). This is achieved by System (ST02-ST05)
setting strict standards and quality requirements of new
tasks and technical solutions in terms of documentation,  Security of services provided by the Information
stability and integration into the existing organization’s System (ST06-ST10)
infrastructure. The Information Strategy should be  Security of hardware and software components
consulted in moments of decision when deploying new or of the Information System (ST11)
modifying existing business processes, when introducing The aforementioned mentioned aspects are elaborated
new or changing existing Information System in Public in more details in Table II. For each aspect there is
Administration. The potential application of the Czech number of targets defined in order to track implementation
PAIS on the e-Government in the Republic of Serbia can of the corresponding security attribute.

TABLE I.
QUALITY REQUIREMENTS FOR PUBLIC ADMINISTRATION INFORMATION SYSTEM

Quality Quality
Target name Description
target attribute
Ensure that all data has been timely updated; new data should appear in information
QoT01 Timely update of data data timeliness
system with a minimum of delay.
Data control against the Perform control of data in information systems against data in the primary data
QoT02 data correctness
primary data registers registries.
QoT03 Control of data content Perform internal control of stored data in all information systems. data correctness
QoT04 Data integrity check Check integrity of data on all information system levels. data integrity
responsibility
QoT05 Record changes in data Store audit records of data changes.
for data
Guarantee functionality of Ensure the services' functionality by performing testing based on the requirements services'
QoT06
provided services defined for the services. functionality
Create uniform rules for the user interface and ensure their practical implementation
QoT07 Increase clarity of services services' clarity
in all information systems.
Propose improvements in the logic of services and propagate their implementation
QoT08 Improve clarity of services services' clarity
in individual information systems.
Search for services' weaknesses and improve services efficiency through optimized services'
QoT09 Increase services efficiency
implementation. efficiency
Increase interoperability of Gradual transfer of services on the open platform to achieve maximum usage with services'
QoT10
services minimum requirements. interoperability
Test purchased Develop a uniform methodology for purchasing third party information systems'
QoT11 technical tool
components components and apply the methodology when acquiring these components.

Page 373 of 478


ICIST 2014 - Vol. 2 Poster papers

TABLE II.
SEQURITY REQUIREMENTS FOR PUBLIC ADMINISTRATION INFORMATION SYSTEM

Security Target name Description Security


target attribute
Implement organizational measures to ensure regular updating of risk analysis for
ST01 Risk analysis analysis
major changes in Information System.
ST02 Data availability Implement mechanisms to minimize the risk of data loss, provide regular data backup availability
ST03 Data confidentiality Implement mechanisms to protect data from unauthorized access. confidence
ST04 Data integrity Implement mechanisms to ensure data integrity in terms of safety. integrity
ST05 Data provisioning Implement mechanisms to ensure provisioning data. provisioning
ST06 Availability of services Implement mechanisms that ensure the required service availability. availability
Reduce unavailability of Ensure automatic and continuous services' availability monitoring and define actions
ST07 availability
services in the event of unavailability.
Ensure confidentiality of
ST08 Implement mechanisms that minimize the risk of breach of confidentiality. confidence
services
ST09 Ensure billing of services Ensure billing for services requiring non-repudiation. billing
Ensure the availability of hardware and software components depending on the
ST10 Availability of resources availability
availability of the stored data and services.
Confidentiality of Ensure confidentiality of hardware and software according to the required
ST11 confidence
resources confidentiality of the stored data and services.
ST12 Integrity of resources Ensure the necessary level of hardware and software integrity. integrity

According to the Strategy for the Public Administration PA officials. An official, who works with data received
Reform (PAR) in the Republic of Serbia, adopted in from registers, is sure in the data validity, so he does not
November 2004, Serbian Government planned to achieve need any other proof of data validity from citizens. In this
the following objectives through the PAR: creation of a way he can provide faster and better services in a shorter
democratic state based on the rule of law and creation of a period of time. The former rules and concepts of public
public administration directed towards the citizens. The administration are being substituted with a more user-
Strategy defines general guidelines for ICT usage for centered and user-friendly PA services. In this way
accomplishment of public administration reform. The citizens are building their trust in PA agencies.
PAR Strategy marks the first official use in a Another building stone of electronic PA reform in the
governmental document of the term “e-Government”. This Czech Republic is the development of data boxes that are
document identifies e-government as one of the main intended to unify communication and increase efficiency
instruments for the increase in efficiency and accuracy of in the public administration. By using data boxes citizens
public administration, as well as an instrument for the or legal entities can apply for documents, track their
rationalization of public administration measures. Four submissions and get requested documents in electronic
years after, in 2009, the first e-Government dedicated form.
strategy was introduced. Since then the Government of the
Serbian Government is on a path of transformation and
Republic of Serbia made noticeable efforts to bridge the
modernization. It is mainly focused on establishing of e-
gap in e-Government development compared to the rest of
Government according to the European Union standards.
Europe [6].
Series of measures have been taken for the introduction of
more proficient e-government, which will increase
IV. CONCLUSION efficiency in public administration and improve
communication with citizens and businesses. Although
An Information Strategy as an approach to information there are efforts to unify data registries on the regional and
management that best supports the goals and strategies of on the state level, the process is taking off slowly. Some
the organization is necessary to implement in public of registries have been raised on the state level, i.e. they
administration bodies. The Information Strategy needs to are being centralized. For example, there exists centralized
be aligned with business strategy and imperatives of the registry of people and registry of business entities.
organization and it needs to offer a solution on coherent Citizens who can renew a vehicle registration on one place
information infrastructure and information governance. or taxpayers who can submit forms electronically and
Good Information Strategy is a prerequisite for track the status of their submissions are sensing the impact
implementation of new, or for modification of the existing of the reformed e-government. Still, there is a lot of work
Information System of Public Administration. Smart remaining so that Serbian Government can be conformed
Administration Strategy adopted by the Czech to the EU practices. The Czech government sets an
Government has resulted with a number of projects that example of successful e-government reform and
affected the modernization of PAIS and it's transformation information system modernization. Following the
into an effective information system. As outcome of the guidelines for Information System's quality and security
successful Information Strategy, the Czech Republic has management, the Government of Republic of Serbia can
promoted the Basic Registries that have led to better use estimate the efforts needed for the transformation of the
of information and hence the reduction of administrative current state of public authorities' information systems
procedures. The introduction of basic registers is an into more effective, transparent, cost resilient and more
important and positive milestone for both the citizens and user-centered information systems.

Page 374 of 478


ICIST 2014 - Vol. 2 Poster papers

ACKNOWLEDGMENT [3] D. Špaček, I. Malý, “Government Evaluation and its Practice in


the Czech Republic: Challenges of Synergies”. The NISPAcee
Research presented in this paper was partly funded by the Journal of Public Administration and Policy, Bratislava:
Ministry of Science of the Republic of Serbia, within the NISPAcee, vol. III, no. 1, 2010, pp.93-124
project "Technology Enhanced Learning", No. III47003. [4] D. Špaček, “CZECH POINTS – THE CZECH SHOWCASE?”,
State and Administration in a Changing World, 2009, Bratislava:
NISPAcee.
[5] J. Jarolimek, J. Vanek, E. Cervenkova, V. Smiskova, “Evaluation
REFERENCES of data box introduction process in the Czech Republic”, AGRIS
[1] D. Špaček, Stydy on e-government in the Czech Republic, on-line Papers in Economics and Informatics, 2010, vol. 2, iss.2
Coordination practice: E-GOVERNMENT IN THE CZECH [6] L. Stoimenov, N. Veljković, S. Bogdanović-Dinić, “E-
REPUBLIC, 2013. Government Development in Serbia - Trends and Challenges as
[2] Czech government, “Effective Public Administration and Friendly Results of Usage of New Technologies”, E-Society Journal:
Public Services”, 2007, http://www.mvcr.cz/soubor/modernizace- research and applications, University of Novi Sad, Technical
dokumenty-strategie-pdf.aspx Faculty Mihajlo Pupin, vol. 1, no:2, 2010, pp. 77-85

Page 375 of 478


ICIST 2014 - Vol. 2 Poster papers

eGovernment interoperability in the context of


European Interoperability Framework (EIF)
M.Sc.E.E. Vojkan Nikolić, MOI RS, Ministry of Interior Affairs of the Republic of Serbia
Ph.D. Jelica Protić, University of Belgrade, School of Electrical Engineering
Ph.D. Predrag Đikanović, MOI RS, Ministry of Interior Affairs of the Republic of Serbia
[email protected], [email protected], [email protected]

Abstract: European Commission recognized situation where it does not exist and its deficiency results
interoperability as a key factor in development of in an inability to exchange documents and to make
national eGovernment in the member states of connections and in lost of valued time and possibilities
European Union (EU) and as the unique eGovernment [2]. Interoperability enables the seamless eGovernment in
at the level of EU. The European Interoperability EU.
Strategy (EIS) and the European Interoperability According to European Commission „the
Framework (EIF) which are used for European public interoperability of ICT systems, share and re-usage of
services represent two key elements in the Digital information and join up of administrative processes first
Agenda for Europe promoting together in public sector organisations within themselves then
interoperability among public administrations. between themselves is crucial for the provision of high
Since National Interoperability Framework (NIF) of quality, innovative, seamless and customer-centric
the Republic of Serbia was adopted on January, 10th eGovernment services―[2].
2014, until then the development of eGovernment in EIF is defined by sets of legal acts, standards and
the Republic of Serbia had been in accordance with recommendations which describe the way in which
EIF where EIF had been the only document and member states in European public services agreed on or
guideline relating to eGovernment interoperability in the way in which they should agree in order to establish
the Republic of Serbia. This study shows cooperation. The European Interoperability Framework
eGovernement interoperability in the context of EIF [3] has the following purpose:
and the proposed solution of connecting portal of  a promotion/popularization and support to the
eGovernement and Ministry of Interior Affairs (MOI) delivery of European Public Services by fostering
Republic of Serbia through Enterprise services bus cross-border and cross-sectoral interoperability;
(ESB).  a guidance in public administrations' efforts in
order to provide European public services to
Keywords: eGovernment interoperability, European businesses and citizens;
Interoperability Framework (EIF), National  to make compatible and bind together the
Interoperability Frameworks (NIF), SOA, web various National Interoperability Frameworks (NIF) in
services, GSB, ESB a European dimension.
A set of recommendations and guidelines for
I. INTRODUCTION
eGovernment services is defined by European
eGovernment is based on digital interactions between a Interoperability Framework, so that public
government and citizens (G2C), government and administrations, enterprises and citizens could interact
businesses/Commerce (G2B), government and employees across borders in a pan-European context.
(G2E), and it is also based on government and EIF comprises the following interoperability layers:
governments/agencies (G2G). The European Commission  Political,
made a definition by which eGovernment means the  Legal,
usage of the combination of information technology,  Organisational,
organisational changes and new skills in public  Semantic and
administration, so that public services could be improved
 Technical.
and strengthen democratic processes could support public
Three basic interaction scenarios are represented in EIF
policies [1].
which show interaction ways in European Public Services
According to the European Comission „if eGovernment
between member states.
services want to support the single market and its allies in
EIF promotes public sector conceptual model which
fight for freedom, not only interoperability both within
consists of three layers: basic public functions, secure
and across organisational and administrative boundaries is
data exchange and aggregate public services, detailed in
needed, but also across national boundaries with public
the following sections.
administrations in other Member States. This will result in
The premise that each Member State has, or is in the
evolving interoperability with the enterprise sector―[1].
process of developing its national Government
Interoperability has a specific importance if
Interoperability Framework (GIF) is the base of the EIF.
eGovernment tends to achieve its full potential. The
As a result, the EIF concentrates more on supplement than
importance of interoperability is most clearly seen in the

Page 376 of 478


ICIST 2014 - Vol. 2 Poster papers

replacement, and on National Interoperability  ― The ability of two or more systems or elements
Frameworks which gives a pan-European dimension [3]. to exchange information and to use the information
EIF had a special importance for the development of that has been exchanged‖;
eGovernement in Serbia because NIF had not been  ― The capability for units of equipment to work
adopted in the Republic of Serbia for a long time efficiently together to provide useful functions‖;
(January, 10 th 2014). The development of eGovernment  ― The capability—promoted but not
in the Republic of Serbia was based on Strategy and guaranteed—achieved through joint conformance with
action plan for the development of eGovernment until a given set of standards, that enables heterogeneous
2013 while EIF was the only one document relating to equipments, generally built by various vendors, to
eGovernment interoperability. work together in a network environment‖;
According to these two documents, the concept of  ― The ability of two or more systems or
eGovernment in the Republic of Serbia started realizing components to exchange and use the exchanged
where the access point for citizens and business users is information in a heterogeneous network‖.
eGovernment portal. Business processes which support Acording to the „European Interoperability Framework
eGovernement services are performed at eGovernment for pan-European eGovernment services― report,
portal and exchange necessary data and information with „Interoperability means the ability of information and
other public authorities directly through web services. communication technology (ICT) systems and of the
Since a significant degree of interoperability in the business processes they support to exchange data and to
development of eGovernement of the Republic of Serbia enable sharing of information and knowledge.―[6]
has not been achieved until now, the exchange of data The eGovernment Working Group of the European
and information directly through web services satisfied Public Administration Network (EPAN) report proposes „
the needs. In order to provide a simple and safer but more Interoperability is the ability of a system or process to use
complex data exchange between eGovernment portal of information and/or functionality of another system
the Republic of Serbia and other public authorities. the process by adhering to common standards―
implementation of Government Services Bus (GSB) is „The Role of eGovernment for Europe’s Future―report
necessary. suggests that interoperability is ― ... the means by which
The information system of the MOI of the Republic of this inter-linking of systems, informations and ways of
Serbia IS is based on Services Oriented Architecture working will occur: within or between administrations,
(SOA) which is based on web services. Enterprise nationality or across Europe, or with the entrprise sector―
Services Bus (ESB) is implemented in SOA architecture [7].
as an independent component which enables web services e-Government interoperability, in its broad sense, is the
to communicate between themselves. ability of government constituencies to work together. At
In this study, the solution of connecting the portals of a technical level, it is the ability of two or more diverse
eGovernment and MOI through ESB, that is connecting government information and communications technology
of GSB of the Republic of Serbia and MOI of the (ICT) systems or components to meaningfully and
Republic of Serbia is proposed and described. seamlessly exchange information and use the information
that has been exchanged [8].
II. EGOVERNMENT eGovernmet interoperability is very important for
INTEROPERABILITY enhancement of governments and effectiveness for the
The European Commission made a definition by which delivery of basic public services to all citizens and
eGovernment means the usage of the combination of business users. eGovernment interoperability provides
information technology, organisational changes and new better decisions and better governance within public
skills in public administration, so that public services sectors. This kind of governance enables citizens and
could be improved and strengthen democratic processes business users an easier and faster access to government
could support public policies [1]. The improvement of information and services.
the quality of public services, encouragement of the Most governments of EU accepted the design of
democratic processes and support of community national eGovernment strategies and are implementing
objectives are the goal. priority programs.
Defined by Commission's initiative, eGovernment is: eGovernmet interoperability is important for both
 an open and transparent: public administration is governments of member states of European Union and
able to comprehend expectations of the citizens, is states which are to become part of European Union.
responsible and open to democratic participation, eGovernment interoperability is realized by adoption of
 excludes none: user-centred public standards and architecture. Standards are provided by
administration must reach everyone with personalized Government Interoperability Framework (GIF) which
services, represents a set of standards and policies that a
 effective public administration: operates government uses to specify the way in which public
efficiently saving time and cost in order to collect sectors, citizens, and partners interact with each other.
money from taxpayers. The GIF includes the technical specifications that all
There are a various number of definitions for public sectors involved in eGovernment implementation
interoperability. We quote four definitions given by the should adopt. These standards relate to:
IEEE [4] [5]: • Business process or organisational interoperability;

Page 377 of 478


ICIST 2014 - Vol. 2 Poster papers

• Information or semantic interoperability; and In strategy is emphasized that activities should be


• Technical interoperability. coordinated at the level of European Union and member
eGovernment interoperability architectures are provided states and interoperability governance should be
by Enterprise Architecture (EA) and SOA. IEEE defines established at the level of European Union.
architecture as ―the fundamental organisation of a system, Interoperability is considered to be a key segment for
embodied by its components and their relationships to efficient and effective provision of public services in
each other and to the environment and by the principles Europe in all frameworks of policy of European Union.
guiding its design and activity.‖ [9]. EIF is based on a premise that each member state of
An Enterprise Architecture (EA) is a strategic planning European Union has or is in a procedure of creating a
framework that relates and aligns ICT with the national framework of government interoperability.
governmental functions that it supports [9]. The Danish Having that in mind, EIF is more focused on the adding
government has defined EA as a ― common framework on process than on the process of replacement of national
that ensures general coherence between public sector IT frameworks of interoperability giving them pan-European
systems at the same time as the systems are optimized in dimension.
terms of local needs.‖ [11]. EIF mostly deals with pan-European dimension of
A SOA is an ― enterprise-wide IT architecture that interoperability, besides this, it has significance at the
promotes loose coupling, reuse, and interoperability national level.
between systems‖ [12]. A service orientation defines the The figure 1 represents European interoperability
needs and outcomes of eGovernment in terms of services, framework.
independent from the technology (the hardware platform, Interoperability is both a prerequisite for and a facilitator
operating system, and programming language) that of the efficient delivery of European Public Services.
implements them. Interoperability relates to:
What distinguishes SOA is its implementation of ― a • cooperation between public administrations aiming at
service platform consisting of many services that signify the establishment of public services;
elements of business processes that can be combined and • exchanging information between public
recombined into different solutions and scenarios, as administrations to fulfil legal requirements or political
determined by the business needs‖ [12]. This capability to commitments;
integrate and recombine services is what gives a service • sharing and reusing information among public
oriented enterprise the agility needed to respond quickly administrations to increase administrative efficiency and
and effectively to new situations and requirements. reduce administrative burden on citizens and businesses;
leading to:
III. EUROPEAN INTEROPERABILITY • improving public service delivery to citizens and
FRAMEWORK (EIF) business by facilitating the one-stop shop delivery of
European interoperability framework is defined as , "an public services;
interoperability framework is an agreed approach to • reducing costs for public administrations, businesses
interoperability for organisations that wish to work and citizens through efficient and effective delivery of
together towards the joint delivery of public services. public services. [3]
Within its scope of applicability, it specifies a set of
common elements: vocabulary, concepts, principles,
policies, guidelines, recommendations, and practices"[3].
The European Union's interoperability strategy which
has for its goal to provide guidelines and give priority to
activities necessary for improvement of interaction,
exchange and cooperation between European public
services, across the borders and between sectors in order
to provide public services in Europe is based on European
Interoperability Framework responsible for provision of
user-centred eGovernment services and facilitating a pan-
European level of interoperability of services and systems
between public administrations, as well as between Figure 1. European interoperability framework [10]
administrations and the public (citizens, businesses). The
common, uniform approach to interoperability with EIF defines a conceptual model which describes an
agreed vision that until 2015 interoperability enables a organising principle underlying the construction and
high degree of adoption of public services provision in operation of European Public Services and emphasizes a
Europe by: building-block approach to the construction of European
• Adequate organisations and processes of government Public Services, allowing for the interconnection and
in accordance with policies and goals of European Union; reusability of components when building new services.
• A safe exchange of information enabled through The EIF represents 12 importants principles:
common adopted, cohesive and coordinated initiatives for  The first principle sets the frame for community
interoperability when creating of legal environment, action in the area of European Public Services (1.
creating of interoperability framework and agreements on subsidiarity and proportionality);
standards and rules of interoperability.

Page 378 of 478


ICIST 2014 - Vol. 2 Poster papers

 The second principle sets generic user needs and  Legal interoperability;
expectations (2. User Centricity, 3. Inclusion and  Organisational interoperability;
Accessibility, 4. Security and Privacy, 5.  Semantic interoperability;
Multilingualism, 6. Administrative Simplification, 7.  Technical interoperability;
Transparency and 8. Preservation of Information);
 The third principle sets a foundation for Political interoperability
collaboration between public administrations (9. Political interoperability relates to cooperating partners
Openness, 10. Reusability, 11. Technological Neutrality with compatible visions, aligned priorities, and focused
and Adaptability and 12. Effectiveness and Efficiency). objectives.
Recommendation 13: Public administrations should
IV. CONCEPTUAL MODEL obtain political support for their interoperability efforts
EIF v2.0 promotes a conceptual model as a model required for the establishment of European Public
which provides the reuse of information, concepts, Services [3].
patterns, solutions, and standards in Member States and at
European level for European Public Services. Legal interoperability
European public services are based on data and
information in differnt locations and administration levels Legal interoperability relates to aligned legislation so
in diferent member states. Beside this, they combine basic that exchanged data are accorded to proper legal weight.
services constructed independently by public Recommendation 14: Public administrations should
administrations in different member states. carefully consider all relevant legislation linked to the
The conceptual model starts form the fact that it is information exchange, including data protection
necessary to provide modularity, loosely coupled service legislation, when envisaging the establishment of a
components which are interconnected through the European public service [3].
necessary infrastructure, and all of them work together, so
thay could provide the delivery of European public Organisational interoperability
services. Organisational interoperability relates to coordinated
According to the conceptual model, service orientation processes in which different organisations achieve a
to system conception and services development is in the previously agreed and mutually beneficial goal.
first plan. Service orientation is a specific style of creating Organisational interoperability defines business goals,
and using business processes. Business proces is realized aligns and coordinates business processes and brings
as a set of services. The figure 2 represents the conceptual collaboration capabilities to organisations that want to
model of European interoperability framework. exchange information and may have different internal
structures and processes. The aim of organisational
interoperability is to satisfy the requirements of the
citizens and business users by making services available,
easily identifiable, accessible and user centric. It is the
ability of organisations to provide services to each other
as well as to users or customers or to the wider public in
the case of administrative organisations.
Common organisational problems to be solved in
enterprise networking at an organisational level include,
but are not limited to: different human and organisational
behaviors, different organisational structures, different
business process organisations and management
approaches, different senses of value creation networks,
Figure 2. Conceptual model different business goals, different legal bases, legislations,
The special recommendations relating to conceptul cultures or methods of work and different decision-
model are recommendations number 8 and number 9: making approaches [10].
In order to achieve an organisational interoperability, it
 „ Public administrations should develop a component is needed to coordinate business processes of cooperating
based service model, allowing the establishment of administrative entities, define synchronization steps and
European Public Services by reusing, as much as messages and define coordination and collaboration
possible, existing service components.― mechanisms for inter-organisational processes.
― Public administrations should agree on a common Business Process Management (BPM) is realized by
scheme to interconnect loosely-coupled components and Business Process Management (BPM) tools and methods
put in place the necessary infrastructure when establishing required for the modeling and control of these business
European Public Services.―[3] processes, workflow engines for the coordination of the
execution of the process steps defined as business
V. INTEROPERABILITY LEVELS services, collaborative tools and enterprise portals to
provide user-friendly access to business services and
The EIF considers few interoperability levels: information pages made available to end-users.
 Political interoperability;

Page 379 of 478


ICIST 2014 - Vol. 2 Poster papers

A standard language for modeling and analyzing protocols (E.g. HTTP/HTTPS, SMTP, MIME, JMS or
business processes at the business level is BPMN SOAP over TCP/IP) [10].
(Business Process Modeling and Notation) version 2.0. ―State of the art‖ techniques for building integrated or
interoperable enterprise systems are SOA based on web
Semantic interoperability services. SOA provides the using of the existing systems
Semantic interoperability relates to precise meaning of as services where those services are wrapped legacy
exchanged information which is preserved and understood systems and expose functions. On the other hand, SOA
by all parties. provides building of new systems as a composition of
Semantic interoperability is defined as the ability to web services executed on different servers, remotely
share, aggregate or synchronize data and information located and communicate over internet.
across heterogeneous information systems. Semantic SOA has provided that long term used solutions: Object
interoperability deals with data and information Request Broker (ORB) and Enterprise Application
integration and consistency issues to support cooperation Integration (EAI) are replaced with new Enterprise
and collaboration, and especially share of knowledge and Service Bus solution. Due to ESB the usage of new
information. It is necessary to provide that two systems technologies based on new languages and standards,
which collaborate safely interpret common or shared data namely HTTP, SMTP or JMS (Java Messaging System)
and information in a consistent way. over TCP/IP at the data transport level, SOAP (Simple
Semantic barriers and problems which are necessary to Object Access Protocol) or RosettaNet at the messaging
be solved are: syntactic and semantic heterogeneity of level, WSDL (Web Service Description Language) at the
information, semantic gap, database schema integration service description level, UDDI (Universal Description,
with naming problems (E.g. homonyms and synonyms), Discovery and Integration) repositories at the service
structural logical inconsistencies, etc. publication and discovery level and BPEL (Business
It is necessary to provide systems which interpret the Process Execution Language) at the service composition
meaning of data, information and knowledge. The level is enabled [13].
simplest solution is to build shared metadata repositories. Recommendation 19: Public administrations should
Metadata repositories describe the contents and intent of agree on the standards and specifications to be used to
data stored in the various information systems used in the ensure technical interoperability when establishing
enterprise or by other administrative entities. For European Public Services [3].
example, LDAP for users, IT resources metadata, and
UDDI repositories for Web service registries and thesauri VI. REALIZATION OF
[10]. INTEROPERABILITY BY GSB IN
Ontological models are expressed by RDF (Resource EGOVERNMENT OF THE REPUBLIC OF
Description Format) and OWL-S (Web Ontology SERBIA
Language) according to W3C recommendations. ― The The development strategy of eGovernment for the
ontology is used as a pivotal language to map concepts period from 2009 until 2013 and the action plan which
used in one system with concepts of another system and follows this strategy is based on the possibility of
to resolve the semantic impedance mismatch.‖ [10] applying ICT in the public sector of the Republic of
Recommendation 18: Public administrations should Serbia. In accordance with this strategy, the development
support the establishment of both sector-specific and of eGovernment in the Republic of Serbia is considered as
cross-sectoral communities aimed at facilitating semantic prevailing decentralized model of eGovernment with a
interoperability and should encourage the sharing of single access point to eGovernement services.
results produced by such communities through national eGovernment portal represents the point where citizens
and European platforms [3]. and business entities access to eGovernement services in
order to do business with the country, whereas each
Technical interoperability public authority of the Republic of Serbia is in charge of a
provided service and keeps the overall responsibility for
Technical interoperability relates to planning of the quality of services and data it is in charge of.
technical issues involved in linking computer systems and Most of public authorities in the Republic of Serbia have
services. their own information system (IS). Some of IS are within
The technical interoperability (syntactical the network of public authorities (NPA) whereas others
interoperability) provides the technical foundations. The are not. IS which are not within NPA and have their own
goal is facilitating communication and interchange in network must be connected to NPA in order to support
terms of communication protocols, data exchange and eGovernment business processes at government portal.
message passing among application systems. This aspect For example, IS of MOI is not within NPA and in order to
of interoperability is developed very fast thanks to the connect with NPA and eGovernment portal, EXTRANET
development of ICT. of MOI Republic of Serbia was developed [14] [15].
In order to provide the building of loosely coupled In order to provide a higher level of interoperability and
systems in which applications support administration safer data exchange, information and knowledge, it is
business processes made of web services, exchange necessary to implement Government services bus (GSB)
messages (in synchronous or asynchronous modes) using in eGovernment of the Republic of Serbia.
neutral formats (XML or XML-based) and simple transfer ESB in EXTRANET zone has already been
implemented in information system MOI in order to

Page 380 of 478


ICIST 2014 - Vol. 2 Poster papers

provide simpler and safer connection with other public effectiveness for the delivery of basic public services to
authorities. all citizens and business users.
Figure 3 represents a proposal of implementation of Service oriented architectures based on web services
interoperability in the government of the Republic of provide architectures for eGovernment interoperability in
Serbia. The connection between eGovernment portal and accordance with EIF.
information system of MOI through ESB (GSB and ESB) GSB provides secure, aligned and controlled exchange
is presented. Besides this, the examples of connection of of data and use of electronic services.
eGovernment service at eGovernment portal with services The connection of IS MOI Republic of Serbia and
and records stored in MOI are given. eGovernment portal of the Republic of Serbia through
ESB would provide the solution which would support a
simple, safer and faster development of eGovernment of
the Republic of Serbia.

REFERENCES
[1] Commission Staff Working Document (2003) „Linking up Europe:
the importance of interoperability for eGovernment services―
Brussels, 2003.
[2] MITRE (2004) „Information Interoperability― , THE EDGE,
MITRE’s Advanced Technologiy Newsleter, Vol.8,num.1
[3] European Commission, European Interoperability Framework or
European Public Services (EIF), Version 2.0, 2004.
[4] Radatz, J., Geraci, A., Katki, F., 1990. IEEE Standard Glossary of
Software Engineering Terminology. IEEE Standard, pp.1-84.
[doi:10.1109/IEEESTD.1990.101064]
[5] Breitfelder, K., Messina, D., 2000. IEEE 100: the Authoritative
Dictionary of IEEE Standards Terms (7th Ed.). IEEE Press.
[doi:10.1109/IEEESTD.2000.322230].
[6] European Commission, European Interoperability Framework for
pan-European eGovernment services, version 1.0, 2004.
Figure 3. Interoperability GSB - ESB
[7] European Commission, The Role of eGovernment for Europe’s
Future, 2003.
The realization of GSB would provide: [8] United Nations Development Programme, eGovernment
 a platform for a high level of interoperability of interoperability: overview, 2007.
information systems of public authorities of the [9] Bloomberg and Schmelzer, Service Orient or Be Doomed.Hoboken,
Republic of Serbia; NJ: Wiley & Sons, 2006, p. 118.
 a platform for a standardized integration of [10] Franc¸ois B. Vernadat, Technical, semantic and organisational
issues of enterprise interoperability and networking, Annual
public authorities of the Republic of Serbia; Reviews in Control 34 (2010) 139–144.
 a safe data exchange between public authorities [11] (Denmark) Ministry of Science,Technology and Innovation, White
of the Republic of Serbia; Paper on Enterprise Architecture,
 a simple registration of the services at http://www.oio.dk/files/whitepaper.pdf
[12] Norbert Biebertein, Sanjay Bose,Marc Fiammente, Keith Jones, and
eGovernment portal; Rawn Shah. Service Oriented Architecture Compass: Business
 a tight coupling with a module for generating of Value, Planning, and Enterprise Roadmap.Upper Saddle, NJ: IBM
electronic services at eGovernement portal; Press, 2006, p. 4.
[13] Chappell, D. A. (2004). Enterprise Service Bus. USA: O’Reilly
Media Inc.
CONCLUSION
[14] V.Nikolić, P. Đikanović, D. Batoćanin, eGovernment Republic of
European interoperability framework supports the Serbia: The registration of motor vehicles and trailers., YU INFO
strategy of European Union in sense of provision of 2013.
electronic government services which is adapted to users [15] V.Nikolić, J. Protić, P. Đikanović, G2G INTEGRATION MOI OF
enabled at pan-European level, in sense of provision of REPUBLIC OF SERBIA WITH EGOVERNMENT PORTAL,
interoperability services and systems between public ETRAN 2013.
sector as well as between public sector and public
(citizens and business entities).
EIF represents a conceptual model as a model which
provides a reuse of information, concepts, patterns,
solutions, and standards in Member States and at
European level for European Public Services and
considers few levels of interoperability: political
interoperability, legal interoperability, organisational
interoperability, semantic interoperability, and technical
interoperability. Besides this, EIF promotes few
interaction scenarios.
e-Government interoperability, as the ability of
government constituencies to work together, is very
important for enhancement of governments and

Page 381 of 478


ICIST 2014 - Vol. 2 Poster papers

Integrating processing in RAM memory and its


application to high speed FFT computation
Danijela Efnusheva, Aristotel Tentov
Computer Science and Engineering Department
Faculty of Electrical Engineering and Information Technologies
Skopje, Rep. of Macedonia
{danijela, toto}@feit.ukim.edu.mk

Abstract—The growing rate of microprocessor which dissipate low power (1 watt) and use smaller
performances significantly overcomes the one in DRAM number of pins, [2].
memory. This problem of ever-increasing gap between CPU The main disadvantage of the industry split is the
and memory speeds has been explored by many researchers. unequal improvement rate of the processor and memory
As a result, diverse specialized memory chips, that integrate speed, which approaches around 60%/year and less than
processing, have been proposed. This paper discusses the 10%/year, accordingly. It seems that the CPU speeds and
architecture and implementation of several "smart" memory sizes have grown at a much more rapid rate than
memories (computational RAM - CRAM, processing in the throughput between them, resulting in a constantly
memory - PIM chip and intelligent RAM - IRAM),
increasing processor-memory bottleneck, [5]. Therefore,
including their strengths, weaknesses, and opportunities for
several approaches have been proposed to help alleviate
high speed processing. Considering that these integrated
this bottleneck, including branch predictor algorithms,
chips allow for a high on-chip memory bandwidth and low
techniques for speculative and re-order instructions
latency, we examine IRAM's application for performing
memory intensive operations, such as the complex
execution, wider and faster memory connections and
calculations of the fast Fourier transform (FFT).
multi-level cache memory. These techniques can serve as
a temporal solution, but are still not able to completely
solve the problem of increased memory latency, [1], [6].
I. INTRODUCTION Contrary to the standard model of processor-centric
The extraordinary increase of microprocessor speed has architecture, some researchers have proposed a novel
caused significant demands to the memory system, computer organization which suggests integrating the
requiring an immediate response to the CPU requests. processor on the same die as the main memory. This
Considering that the memory price, capacity and speed are resulted in creating a variety of memories that include
in direct opposition, an ideal memory system cannot be processing capabilities, known as computational RAM,
implemented in practice, [1]. Therefore, today's modern processing in memory chips, intelligent RAM, [3], [7] etc.
computer systems are characterized with hierarchical These smart chips usually integrate the processing
memory, organized in multiple levels, each of them elements in the DRAM memory, instead of extending the
having smaller, faster and more expensive memory, SRAM processor memory, basically because DRAM
compared to the previous level. memory is characterized with higher density and lower
price, comparing to SRAM memory.
The hierarchical approach of memory organization is
based on the principle of temporal and spatial locality, [1], The merged memory/logic chips have on-chip DRAM
[2], and the rule "smaller is faster" which states that memory which allows high internal bandwidth, low
smaller pieces of hardware are generally faster and hence latency and high power efficiency, eliminating the need
more expensive than the larger ones. Based on the for expensive, high speed interchip interconnects, [2].
previous considerations, a usual computer memory system Considering that the logic and storage elements are close
is composed of: small fast buffer registers, multi-level to each other, smart memory chips are applicable for
cache memory, main memory (usually DRAM) and performing computations which require high memory
virtual memory, [2], [3]. bandwidth and strided memory accesses, such as the fast
Fourier transform, [8]. The FFT processing complexity
The main reason for the growing disparity of memory depends on the organization of the hardware module
and processor speeds is the division of the semiconductor (ASIC, GPP, DSP) which executes the multiplication and
industry into separate microprocessor and memory fields. addition operations of the transform.
The former one is intended to develop fast logic that will
accelerate the communication, while the latter one is The aim of this paper is to discuss the various
purposed to increase the capacity for storing data, [4]. integrated memory/logic chips, purposed as a possible
Considering that the fabrication lines are tailored to the solution to the problem of continuous memory-processor
device requirements, separate packages are developed for performance gap increase. These memory-centric chips
each of the chips. Microprocessors use expensive provide extended on-chip memory bandwidth, which is
packages that dissipate high power (5 to 50 watt) and very suitable for performing FFT computations.
provide hundreds of pins for external memory Therefore, the paper also focuses on measuring the IRAM
connections, while DRAMs employ inexpensive packages performances, while running the FFT calculations.

Page 382 of 478


ICIST 2014 - Vol. 2 Poster papers

This paper is organized in five sections. Section two


presents the problem of processor and memory unequal
speed rate increase. Section three discusses several smart
memory chips (CRAM, PIM and IRAM), purposed to
provide reduction in memory latency and increase in
memory bandwidth. Section four estimates the
performances of the high bandwidth IRAM memory,
while performing radix-2 Cooley-Tukey FFT algorithm.
The paper ends with conclusion, stated in section five.

II. THE GAP BETWEEN PROCESSOR AND MEMORY


SPEEDS
The performance of computer system is expressed in Figure 1. Processor-memory performance gap
terms of processor execution time and average memory
access time, [1] given with the following equations: Generally, microprocessors manufacturers develop their
own architecture standards (instruction set) in order to
provide software compatibility with prior generations, but
CPU time = Number of instructions * are free to invent novel interfaces with different packages
Cycles per instruction * Clock period (1) and pins, etc, [6]. On the other hand, DRAMs
manufacturers standardize at package level and innovate
Average memory access time = in the size of the memory cell and in the efficiency of the
= Hit time + Miss rate * Miss penalty (2) manufacturing process. Considering that the DRAM
manufacturing is tightly tied to the fabrication process, it
Generally, average memory access time is much greater is a common practice to develop a novel fabrication
than the processor execution time. This dissimilarity can process, when designing a new memory cell. This allows
cause many wasted processor cycles, while performing production of dies that will operate at a minimum refresh
memory read/write operations. However, an ideal memory rates and will supply data with acceptable bit error rates.
system with zero latency and an infinite bandwidth cannot Furthermore, the manufacturers can shrink the die in order
be realized in practice, [2]. The latency is a parameter to increase the number of chips per wafer and hence lower
which measures the time between the initiation of a the costs, [6]. On the other hand, the microprocessor
memory request by the processor, and its completion, industry tend to have a design cost target, expressed as die
while the bandwidth is the rate at which information can size, and then build the fastest chip that can fit that size.
be transferred to or from the memory system. These two However, when it comes to embedded microprocessors,
parameters define the performance of the processor- characterized with lower cost targets and power budgets,
memory interface. the performance can be sacrificed at the expense of price
and/or power.
The disparity between processor and memory speed
leads to a long latency memory operations, during the Table I presents the improvement rate of several
processor's instructions execution. Considering the close characteristics of the DRAM and microprocessor
relationship among the memory bandwidth and the industries, per year. As can be seen from table 1,
latency, it is a very demanding task to clarify whether the microprocessor performance has been improving at a rate
memory-related processor degradation is caused by the of 60 percent per year, while the DRAM access time has
long memory latency or from insufficient bandwidth. been improving very slowly (7%/year), [2]. Actually, the
However, the improvement of the memory latency DRAM memory industry has shown significant increase
degrades the bandwidth, causing increased memory of the storage capacity per chip, by fourfold per three
traffic, [3]. Given that a simultaneous improvement of the years (60%/year). The provided memory capacity increase
both parameters (latency and bandwidth) cannot be is caused by reducing the memory cell size with a factor
achieved by the standard processor-centric computer of 2.5 and increasing the die size by a factor of 1.5, [6].
organization, the use of "smart memories" has been The increasing die size is a major reason that the cost per
proposed, as an approach which suggests merging of logic bit changes more slowly than the capacity per chip.
with storage elements on the same chip die, [2].
The diverse development of the DRAM memory and TABLE I.
IMPROVEMENT RATE OF SEVERAL CHARACTERISTICS OF DRAM AND
microprocessor industries is the major cause of the ever- MICROPROCESSOR INDUSTRY
increasing memory-processor performance gap. As a
result of the different direction of these industries growth, Improvement rate
Industry Characteristic
(yearly)
each of them has achieved significant results, but solely
for itself, not for the entire computer system, [1]-[4]. The Capacity 60%
provided increase of the processor and memory speeds, Cost/bit 25%
during the last thirty years, is illustrated on figure 1. This DRAM memory
figure presents that each of the separate industries is Bandwidth 20%
improving exponentially, in such a way that the exponent Latency 7%
for the microprocessor is significantly larger than that for
the DRAMs. The difference between the diverging Performance 60%
exponentials also grows exponentially. Microprocessor
Price Little change

Page 383 of 478


ICIST 2014 - Vol. 2 Poster papers

III. OVERWIEV OF LOGIC IN MEMORY CHIPS


The idea of merging logic and memory on a single chip
was introduced to solve the problem with the growing
performance gap between processor and DRAM memory.
Although a number of solutions, such as more efficient
multi-level on-chip caches, wider and faster connections
to memory and pipelined multi-threaded processors, [2],
have been used to correct this gap, these approaches didn't
succeed to improve the complete system performance.
Assuming that the largest computational bottleneck occurs
between the interface of processor and memory, some
researchers proposed a processor-in-memory architecture
which integrates processing power in the standard
memory cells. This approach makes highly effective use
Figure 2. Standard and computational RAM memory architecture
of the internal memory bandwidth, and thus improves
system performance and power consumption and in some CRAM integrates single-bit PEs at the sense amplifiers
cases reduces the system cost. The following subsections of a standard RAM, which adds about 5-10% to the area
give a brief overview of several intelligent memory of the RAM, [12]. The PEs are small with simple
models, including CRAM, PIM and IRAM, [9]-[11], architecture, allowing a big number of them to be
discussing their advantages and shortcomings. integrated in a RAM chip and thus increasing the degree
of computational parallelism. In this way, the highest
A. Computational RAM performance per silicon area is achieved.
Computational RAM is one of the pioneering intelligent Each PE consists of three 1-bit registers (X, Y and W),
memory models, developed at the University of Toronto single Arithmetic and Logic Unit (ALU), and a bus-tie
in 1989, [9]. The CRAM chip was designed as a Single circuit (global OR). The ALU is a 8-to-1 multiplexer,
Instruction Multiple Data (SIMD) - memory hybrid, which can execute any boolean function of three inputs,
which consists of an array of 1-bit processing elements regarding to the instruction opcode, [14]-[16]. The three
(PEs), integrated at the sense amplifiers of a standard selection inputs of the ALU are provided by two source
RAM memory, [12]. Each PE is matched to a small registers, together with a third line from the local memory
number of memory columns (1, 2, 4, 8 or 16), allowing (via sense amplifier). After the ALU operation is
simultaneous access to the memory, by applying a executed, the result is written to one of the local registers,
common row address. In other words, CRAM is a SIMD neighboring register or local memory. Actually, each PE
processor, [13], with distributed, non shared, uniformly operates on a single element of a vector, one bit at a time.
addressed memory. The inter-processor communication (IPC) between PEs is
Computational memory can replace standard memory realized via: closest neighbor shift-left/shift-right network,
in a variety of applications, because of its property to add and broadcast bus which implements global-OR function
processing power to the memory. Actually, all PEs in the of all the outputs of PEs, [12]. The discussed architecture
CRAM execute the same common instruction and of CRAM PE, requires only 75 transistors, and is
optionally access the same memory offset within their illustrated in figure 3.
private memory partitions, without the need to read the
data out of the RAM chip and transmit it over long high-
capacitance buses to the processor, [14] - [16]. This mode
of operation can significantly speed up the execution of
massively-parallel applications.
CRAM memories are organized in row-column logical
architecture, (shown on figure 2), which is roughly square
because of the similar numbers of address bits that are
allocated to row and column. The data from the CRAM
memory are selected through row and column decoders,
and then applied to the PEs. Each PE performs the
operation specified with the content of the control input.
The communication between PEs is realized through
common interconnection bus, which is also used for Figure 3. CRAM processing element architecture
performing combinational operations. Having such
architectural organization, the CRAM chip highly utilizes B. Processing in memory chips
the internal memory bandwidth and therefore provides
performance gains and low power consumption, [12]. Processing-In-Memory technology supports designing
memory-oriented computing elements, whose architecture
The CRAM logic in memory chip can operate either as
is usually more complex than the one of CRAM. PIM
a standard memory or as a SIMD processor. When used as
chips can provide high speed SIMD processing that can be
memory, it is competitive with the conventional DRAM,
further improved if multiple PIM nodes are placed on the
while when is used as SIMD processor it is capable to run
same chip, which allows eliminating of the complex inter-
parallel applications thousands of times faster than a CPU,
CPU communication paths. Therefore, in the continuation
[15]. Fig. 2 illustrates that CRAM memory is realized as a
we discuss several PIM implementations, including: SRC,
simple extension of a standard RAM memory.
Terasys, Shamrock, DIVA and MIND, [10], [17] - [22].

Page 384 of 478


ICIST 2014 - Vol. 2 Poster papers

Researchers at the Super Computing Center in 1995


have designed a SRC PIM chip that integrates 64 PEs with
2 Kb local memory. The single-bit PE consists of three
local registers and ALU that generates three logic
functions. Each PE executes the same operation, every 70
ns, [12]. The PEs are connected in a linear array, and can
communicate through a global-OR, partitioned OR and a
parallel prefix network.
Figure 4. MIND system organization: Sea of PIMs (left) or
Terasys workstation consists of a Sun Sparc-2 Heterogeneous system with PIMs
workstation, a SBus interface card residing in the sparc
cabinet, a Terasys interface board, and 8 PIM array units,
[10]. Each PIM array unit consists of 64 PIM chips (4K C. Intelligent RAM
PEs), spread over two boards. The interface board is used Intelligent RAM [7], [11], is another merged DRAM-
as a controller for the PIM array and the memory. Sparc logic processor, designed at the Berkeley University of
processor executes instructions sequentially, such that California by a small Patterson's team of students. This
read and write operation to PIM memory is sent to the chip was designed to serve as multimedia processor on
interface board, which decodes whether it is a normal embedded devices. As a result of the design studies of the
memory operation or a PIM array command. When the Berkeley's team research group, it was shown that most
instruction issue rate is 100 ns, the Terasys system can applicable architecture to multimedia processing is vector
achieve 3.2 x 1011 peak bit operations per second. architecture, rather than MIND, VLIW, and other ILP
Shamrock [17], [18] is a PIM chip which includes organizations, [23]-[25]. This is basically because of the
repeating patterns of logic and memory areas, organized vector processors abilities to deliver high performance for
with mesh topology. The basic building block of the chip data-parallel tasks execution, and to provide low energy
consists of logic for a CPU and data routing in the center, consumption, simplicity of implementation and scalability
and four arrays of memory next to it, two on the top and with the modern CMOS technology, [26].
two on the bottom. Each memory array is made up of The resulted IRAM chip is called Vector IRAM
multiple sub modules, whose sense amplifiers are faced (VIRAM). VIRAM is processor that couples on-chip
directly with the CPU logic. This allows all the bits read DRAM for high bandwidth with vector processing to
from a memory in one cycle to be made available at the express fine-grained data parallelism [27], [28]. The
same time to the CPU logic. VIRAM architecture (shown on figure 5) consists of
The Data-Intensive Architecture (DIVA) system MIPS GPP core, attached to a vector register file that is
integrates PIM chips as smart-memory coprocessors to a connected to an external I/O network and also to a 12MB
conventional microprocessor, [19], [20]. The DIVA PIM DRAM memory organized in 8 banks. The vector register
chips are connected to single host processor through file includes 32 vector registers, each containing up to 32
conventional memory control logic. The communication 64-bit values and thus allowing execution of vector
between the PIM chips is realized through a separate PIM- operations over all the elements of the vector register in
to-PIM interconnect, without involving the host processor. parallel. VIRAM ISA, [28], consists of MIPS instruction
The access to non-local data and the operations that set, extended with special vector instructions for
include synchronizing activities and gathering results are performing floating-point add, multiply and divide,
accomplished via parcels. A parcel is similar to an active integer operations, and load/store memory operations.
message, which includes a reference to a function that will VIRAM architecture is divided into four 64-bit lanes
be invoked when the parcel is received, [20]. The parcels (pipelines) that can be further subdivided, resulting in
are exchanged between the PIM chips through the eight 32-bit lanes. Every lane has two integer functional
separate PIM-to-PIM interconnect. Each PIM chip units, whereas one of them can be used as a floating-point
includes scalar datapath that performs sequential functional unit, [29]. Each of the functional units supports
operations on 32-bit operands, and a WideWord datapath a multiply-add instruction that can complete in one cycle.
that performs fine-grain parallel operations on 256-bit Considering that the processor operates at frequency of
operands, along with register file and ALU blocks. The 200 MHz, the resulting peak performance of IRAM is 3.2
datapaths are executed from a single instruction stream GFLOPS for 32-bit floating-point operations and 6.4
under the control of a single 5-stage DLX-like pipeline. GOPS for 32-bit integer operations.
This results in significant speedups, while performing
parallel calculations with large working sets.
MIND (Memory, Intelligence, and Network Device)
system consists of MIND components (modules) and
possibly other devices integrated by one or more
interconnection networks. It can operate either in
standalone arrays of like devices or in conjunction with
other conventional microprocessors (shown on figure 4).
Each MIND component includes multiple memory-
processor nodes that perform message-driven (via parcels)
multithreaded execution, [21]. The MIND architecture is
the first general-purpose PIM device that supports virtual
tasks and data in a distributed shared memory
environment, [22], and thus allows achieving of high
Figure 5. Architecture of IRAM chip
performance-to-cost and power efficiency.

Page 385 of 478


ICIST 2014 - Vol. 2 Poster papers

IV. IRAM'S APPLICATION TO FFT PROCESSING As shown on figure 6, VR1 and VR2 are used for
FFT computations are difficult to be realized with loading the real parts of elements 0-7 and elements 8-15
conventional GPP architectures, given that they require of the first phase of the algorithm, accordingly. VR3 and
high memory bandwidth and strided memory accesses. VR 4 is a second pair of vector registers, used for holding
Considering the importance of the FFT function in the the appropriate imaginary parts of elements 0-7 and
domain of signal processing, several floating point DSPs elements 8-15. The first phase has a vector length of 8,
and specialized FFT chips have been designed to provide which is exactly N/2, while in each successive phase the
hardware support for computing FFTs [30]. On the other vector length decreases 2 times, and the number of
hand, some researchers, [31], have shown that the butterfly groups doubles, [31]. After the last phase, a bit
performance of computing floating-point FFTs on a reversing operation over the elements is performed.
VIRAM processor can be competitive with that of existing B. Performance estimation
floating point DSPs and specialized fixed-point FFT
chips. Therefore in this paper, we discuss the The theoretical maximal performances that can be
performances of Cooley-Tukey radix-2 FFT algorithm achieved on VIRAM, having 32MB of memory divided
execution on VIRAM memory-processor hybrid. into 16 banks, while computing FFT, is limited to 2
GFLOP/s (only arithmetical operations), [29]. This is
A. Cooley-Tukey radix-2 FFT algorithm implementation obtained in such a way that each basic computation is
on vector IRAM represented as 2 multiply-adds, 2 multiplies, and 4
adds/subs, resulting in total number of 8 operations.
FFT is an efficient and fast algorithm for converting Furthermore, it is considered that the theoretical limit for
signals from time domain to frequency domain, [30]. This performing only multiply-adds is 3.2 GFLOP/s, and for
function is critical for many signal processing problems, other single precision floating-point instructions is 1.6
including multimedia applications that involve images, GFLOP/s. Finally, the theoretical maximal performances
speech, audio or video. FFT is defined with the are computed as:
computations given in equation (3), which produce N-
element FFT vector for another N-element input vector: 2 GFLOP/s = 2 multiply-adds/8 * 3.2 GFLOP/s +
6 non-multiply-adds/8 * 1.6 GFLOP/s (4)
X (k )  n0 x(n)WNnk , k  0,1,..., N  1
N 1

 j 2nk
Figure 7 shows the performance of each stage of an N-
where W nk
e N (3) point FFT using C-T radix 2 algorithm on VIRAM. This
N
figure demonstrates that the performances are degraded as
Among the most frequently used FFT algorithm is the the vector length fall below 8, which causes incomplete
Cooley-Tukey radix-2 algorithm, whose complexity utilization of the ALUs in the 8 single precision virtual
amounts O(Nlog2N). This algorithm is based on the lanes, [31]. Table 2 shows that the percentages of total
“divide and conquer” paradigm, and thus performs FFT time and total work that the C-T radix-2 FFT algorithm
computation of the even-indexed and odd-indexed inputs spends in phases whose VL is less than MVL decreases as
separately and then combines those two results to produce the number of points in the FFT increase. However, the
the FFT of the whole sequence. This approach allows utilization of the available vector resources is still not
recursive execution of the algorithm in log2N phases, good, so the authors of [31] suggest several techniques
whereas all phases perform parallel computations, (auto-increment addressing, and memory and register
organized in butterfly processes. Each butterfly process transposes) which allow every vector instruction to
executes one operation of multiplication (4 multiplies, 1 operate with VL=MVL. The last optimizations make
add, and 1 subtract on VIRAM) and two operations of VIRAM to be competitive with FFT-specialized DSPs.
addition (2 adds/subs on VIRAM) with complex numbers, 2200
[30]. Therefore, the complexity for performing a butterfly 2000
IRAM Peak Performance (2000 MFLOPS)
process, referred as "basic operation" on IRAM is 10 1800
1024 points
arithmetic operations at all. The flow graph of C-T radix 2 1600 512 points
algorithm, executing 16-point FFT computation on 1400
256 points
MFLOPS

128 points
VIRAM is given in figure 6, [31]. 1200

1000 VL=8=#lanes
800

600

400

200

0
1 2 3 4 5 6 7 8 9 10

Stage #

Figure 7. Performance of each stage of C-T radix-2 FFT algorithm

TABLE II.
TOTAL TIME AND TOTAL WORK THAT THE FFT ALGORITHM
SPENDS IN STAGE WHOSE VL IS LESS THAN MVL.

Number of points 32 64 128 256 512 1024

Total time (%) 100 100 97 96 95 94

Total work (%) 100 100 86 75 67 60


Figure 6. Flow graph of 16-point C-T radix-2 FFT algorithm

Page 386 of 478


ICIST 2014 - Vol. 2 Poster papers

V. CONCLUSION [11] David Patterson, Thomas Anderson, et al., "A case for intelligent
RAM: IRAM", IEEE Micro, April 1997.
The idea of merging logic into DRAM memory has [12] Peter M. Nyasulu, "System design for a computational-RAM
shown many opportunities in performance, energy logic-in-memory parallel-processing machine", PhD Thesis,
efficiency, and cost, allowing 5 to 10 times reduction in Carletorn University, Ottawa, 1999.
latency, 50 to 100 times bandwidth increase, a factor of 2 [13] Ravikanth Ganesan, Kannan Govindarajan, Min-You Wu,
to 4 advantage in energy efficiency, and cost savings by "Comparing SIMD and MIMD programming modes", Journal of
reducing board area. In this paper we have discussed Parallel Distributed Computing, 1996.
several intelligent memory models, including CRAM, [14] Duncan Elliott, Michael Stumm, Martin Snelgrove,
PIM and IRAM. These chips are characterized with some "Computational RAM: the case for SIMD computing in memory",
In proc. of ISCA '97, June 1997.
form of parallel processing (SIMD, VLIW or vector)
[15] Duncan G. Elliott, Michael Stumm, W. Martin Snelgrove,
implemented in their chip architecture. Basically, the Christian Cojacaru, Robert Mckenzie, "Computational RAM:
smart memories operate in conjunction with other GPP, implementing processors in memory", Journal IEEE Design &
but some of them can be implemented as standalone Test, vol. 16, issue 1, January 1999.
arrays of like devices. The intelligent memories can make [16] Duncan G. Elliott, W. Martin Snelgrove, Michael Stumm,
highly effective use of the internal memory bandwidth, "Computational RAM: A memory-SIMD hybrid and its
while performing memory intensive computations, such as application to DSP", In proc. of Integrated Circuits conference,
FFT operations. This was confirmed with the Vector 1992.
IRAM chip, which has shown similar performances to [17] Peter M. Kogge, Jay B. Brockman, Thomas Sterling, Guang Gao,
"Processing in memory: chips to petaflops", Technical report,
some other DSPs and specialized FFT chips. International Symposium on Computer Architecture, June 1997.
The novel technology of merged logic-memory chips [18] Daescu, Ovidiu, Peter M. Kogge, Danny Chen, "Parallel content-
presents an opportunity for changing the nature of the based image analysis on PIM processors," In proc. of IEEE
semiconductor industry. The great challenge would be to Workshop on Content-Based Access to Image and Video
merge two technologically different fabrication lines into Databases, June 1998.
one, and accept the development of such hips, not only for [19] Jeff Draper, Jacqueline Chame, et al., "The architecture of the
embedded systems but also for the general propose DIVA processing in memory chip", In proc. of the 16th
international conference on Supercomputing ICS '02, USA, 2002.
computers. This means that intelligent memories have the
[20] Jeffrey Draper, Jeff Sondeen, Chang Woo Kang, "Implementation
potential to create a new generation of computers with of a 256-bit WideWord processor for the data-intensive
increased portability, reduced size and power consumption architecture (DIVA) processing-in-memory (PIM) chip", In proc.
without compromising on performance and efficiency. of the 28th European Solid-State Circuit Conference, September
2002.
ACKNOWLEDGMENT [21] Thomas L. Sterling, Huns P. Zimu, "Gilgamesh: a multithreaded
processor-in-memory architecture for petaflops computing", In
This work was partially supported by the ERC Starting proc. of SC 2002, 2002.
Independent Researcher Grant VISION (Contract n. [22] Thomas Sterling, Maciej Brodowicz, "The “MIND” scalable PIM
240555). architecture", In proc. of High Performance Computing Workshop,
2004.
REFERENCES [23] Christoforos Kozyrakis, David Patterson, "Vector vs. superscalar
and VLIW architectures for embedded multimedia benchmarks",
[1] John L. Hennessy, David A. Patterson, Computer Architecture: A In proc. of the 35th International Symposium on
Quantitative Approach, 4th ed., California:Morgan Kaufmann Microarchitecture, Instabul, Turkey, November 2002.
Publishers, 2007.
[24] Christoforos Kozyrakis, "Scalable vector media-processors for
[2] Nihar R. Mahapatra, Balakrishna Venkatrao, "The processor- embedded systems", PhD Thesis, University of California,
memory bottleneck: problems and solutions", in ACM Crossroads, Berkeley, 2002.
1999.
[25] Brian R. Gaeke, Parry Husbands, et al., "Memory-intensive
[3] Carlos Carvalho, "The gap between processor and memory benchmarks: IRAM vs. cache-based machines", In proc. of the
speeds", in proc. of ICCA 2002, Braga, Portugal, 2002. International Parallel and Distributed Processing Symposium
[4] Christianto C. Liu, Ilya Ganusov, Martin Burtscher, Sandip (IPDPS), April, 2002.
Tiwari, "Bridging the processor-memory performance gap with [26] David Patterson, Thomas Anderson, et al., "Intelligent RAM
3D IC technology", in IEEE Design & Test of Computers, vol. 22, (IRAM): chips that remember and compute", In proc. of Solid-
no. 6, pp. 556-564, 2005. State Circuits Conference, 1997.
[5] Damian Miller, "Reconfigurable systems: a potential solution to [27] Joseph Gebis, Sam Williams, David Patterson, Christos
the Von Neumann bottleneck", Senior Thesis, Liberty University, Kozyrakis, "VIRAM1: a media-oriented vector processor with
2011. embedded DRAM", 41st Design Automation Student Design
[6] João Paulo Portela Araújo, "Intelligent RAM: a radical solution?", Contenst, San Diego, CA, June 2004.
In proc. of the 3rd Internal Conference on Computer Architecture, [28] David Martin, "Vector extensions to the MIPS-IV instruction set
Universidade de Minho, 2002. architecture, the V-IRAM architecture manual", Technical paper,
[7] David Patterson, Krste Asanovic, et al., "Intelligent RAM March 4, 2000.
(IRAM): the industrial setting, applications, and architectures", in [29] Randi Thomas, "An architectural performance study of the fast
proc. of International Conference on Computer Design: VLSI in fourier transform on vector IRAM", Technical report, University
Computers & Processors (ICCD), University of California, of California, Berkeley, 2000.
Berkeley, USA, 1997.
[30] Kamaru Adzha Bin Kadiran, "Design and implementation of
[8] Richard E. Blahut, Fast Algorithms for Signal Processing, United OFDM transmitter and receiver on FPGA hardware", Master
Kingdom: Cambridge University Press, 2010. Thesis, Faculty of Electrical Engineering, Universiti Teknologi
[9] Christian Cojocaru, "Computational RAM: implementation and Malaysia, 2005.
bit-parallel architecture", Master Thesis, Carletorn University, [31] Randi Thomas, Katherine Yelick, "Efficient FFTs on IRAM", In
Ottawa, 1995. proc. of Workshop on Media Processors and DSPs, University of
[10] Maya Gokhale, Bill Holmes, Ken Jobst, "Processing in memory: California, Berkeley, 1999.
the Terasys massively parallel PIM array", IEEE Computer, 1995.

Page 387 of 478


ICIST 2014 - Vol. 2 Poster papers

CoAP communication with the mobile phone


sensors over the IPv6
Tomislav Dimcic*, Dejan Drajic*, Srdjan Krco*
Ericsson d.o.o., Belgrade, Serbia
*

[email protected], [email protected], [email protected]

Abstract—In this paper a few implementations were IPv6 protocol, allowing them to be accessible via IPv6
presented in order to show the importance of the IPv6 usage network. Communication with mobile phones is done
for IoT and possibilities that IPv6 offer. It is done in order over CoAP protocol, while Digcovery system is used for
to overcome addressing and routing limitations of IPv4 the service discovery.
based mobile networks and WiFi. Two set-ups are
Work presented in this paper is done in the scope of
presented. In the first one, smartphone is used as end point
that could be accessed directly through the IPv6 address. the EU FP7 IoT6 research project which aims at
Second set-up shows how smartphone could be used as a researching and exploiting the potential of IPv6 to
half-gateway for non IP devices, and provides access to the develop a service oriented architecture overcoming the
devices that use Bluetooth or Infrared communication with current IoT fragmentation [2]. Within the project various
smartphone. ways to integrate an IPv6-based IoT into mobile phone
networks enabling mobile devices to provide access to
smart objects as well as to use mobile devices as
I. INTRODUCTION
sensors/actuators are explored.
Every smartphone has a number of embedded sensors, The paper is organized as follows: after introduction,
like GPS, microphone, speaker, camera, light, etc. If the in Section II CoAP protocol and Digcovery system are
data from the phones sensors could be accessed from the explained. Section III covers test case with Mobile
internet they could be combined and used for forming the Phones with its own sensors, while in Section IV half
bigger picture about the environment. For example, data gateway implementation is presented. In Section V
gathered from many different sound sensors on phones conclusions and future work are given.
could provide information about the noise level in the
different parts of the city to form the noise level map.
Having in view that there are so many smartphones in the II. COAP PROTOCOL AND DIGCOVERY
world the potential is huge. Problem occurs while trying CoAP (Constrained Application Protocol) is proposed
to get the data from the phone. Phone, while on the IPv4 by the CoRE (Constrained RESTful Environments)
mobile network, does not have a static IP address and working group in order to optimize the use of the
every time when the phone is switched off and on, it RESTful web service architecture in the constrained
obtains new IP address from the network. On the other nodes and networks [3]. CoAP is an application layer
hand, if the phone is on the WiFi, through the IPv4 protocol designed to lower the complexity for the
network, port forwarding on the local router must be constrained networks but, also, to enable communication
provided. These problems are the reason why it is not over the existing internet infrastructure. Existing
practical as well as possible to implement any server on protocols on application layer that operates in request-
the phone that could be accessed from the internet. response model are not good match for low-power,
Google developed and offered service for notification resource constrained devices. CoAP is a light-weight
system. In order to enable this function, the constant application protocol based on UDP that supports
communication with their service needed to be open, multicast requests, cashing and REST web services
which has a large influence on the phone battery between the end-points, and is seen as a future protocol
consumption [1]. A promising solution for this problem is for IoT. The lack of reliability of the UDP is compensated
usage of the IPv6 network. IPv6 addressing system by using other methods that provide confirmation of
enables every IoT (Internet of Things) device to have a received messages. It is designed for low-powered
unique IP address which facilitates implementation by devices and it fulfills the IoT requirements [4]. CoAP is
avoiding port forwarding. still work in progress of the IETF CoRE Working Group
In this paper some practical implementations are [5]. It is intended to be used as the HTTP-replacement for
presented in order to show the importance of the IPv6 for smart objects that are connected to the Internet [6]. One
IoT and possibilities that IPv6 offers. In observed cases a of the requirements for CoAP design is resource, service
mobile device can have its own sensors (embedded) or and end point discovery. As suggested in CoRE Resource
different sensors can be connected wirelessly, for Directory draft [7], a good solution is usage of Resource
example via Bluetooth, when mobile phone acts as half- Directory for storing descriptions and resource
gateway for sensors from devices that do not support discovering. There are also other solutions. IETF

Page 388 of 478


ICIST 2014 - Vol. 2 Poster papers

considers DNS-SD (Domain Name System Service if there is no support in the ISP. Within this mechanism
Discovery) [8] as the service discovery mechanism. IPv6 address is carried by the IPv4 network to the
Protocols like ZigBee [9] and SLP (Service Location Freenet6 server, where the communication translates
Protocol) [10] are also working on the service completely to the IPv6 network. On Freenet6, a single,
discovering systems and they offer their solutions. permanent IPv6 address and a DNS name are assigned to
Digcovery is global discovery platform. This platform each user, making their device reachable from anywhere
is used to locate the different domains and the wide on the IPv6 internet. The Raspberry Pi was set with a
deployed directories with the different resources. It is Freenet6 client, which enables IPv6 communication. The
based on DNS (dig command in Linux OS). Digcovery is client keeps the IPv4 tunnel constantly open with the
public and accessible from anyplace through Freenet6 server. Beside that the client is in charge to get
digcovery.net. Digcovery allows the delegation of each the static IPv6 address from the Freenet6 server. Freenet6
domain to the end-user and/or service provider through has a unique IPv6 address for every account created on its
digrectories [11]. Digcovery is introduced as a service site.
discovery system on the IoT6 project [12]. It has a CoAP Raspberry Pi is basically a Linux machine and
interface build-in in order to enable communication with therefore it could be set to be a router for the local
the constrained devices on the network edge. On low network enabling internet access to the local devices.
power devices it is too complicated or impossible to Since Raspberry Pi already has a tunneling mechanism
implement DNS protocol, and usage of a CoAP for that provides IPv6, it is converted to be a Wi-Fi hot spot
discovery enables development of more distributed for the IPv6 network. In this way IPv6 enabled devices
systems. It allows end devices to discover services that it could get the IPv6 address through the Raspberry. A full
needs. /56 prefix is assigned to a Raspberry, enabling the
In this paper we study the access to the mobile phone distribution of IPv6 connectivity to an entire network. A
sensors over the internet, through the IPv6 network. A static IPv6 address, accessible from the web is assigned
smartphone application is responsible for registering to the Raspberry Pi. Raspberry Pi acts as a border router
phone’s sensors into a Digcovery directory. Another for the IPv6 network and enables the smartphone to get
device (a laptop in this test case) searches the directory its IPv6 address. To enable that, a DHCP server is built
for the required service (see Section III). After receiving on the Raspberry which assigns unique IPv6 address to
the required description, a client application on the laptop every device that tries to connect with it.
communicates with the phone and collects measurements The IP address assigned to the smartphone is the
from the sensors on the phone. In addition, access and following:
communication to the external device connected via
Bluetooth with the phone is presented (Section IV). 2001:5c0:1504:b800:50d0:f101:83cc:b8b4

III. MOBILE PHONE WITH ITS OWN SENSORS Since this is a REST CoAP server there are several
In this test setup an Android based smartphone was available interfaces: /sensor, /sensor/gps, / sensor/light,
used for implementation of the IPv6 CoAP server, a etc. After boot, in order to enable service discovery (Fig.
Raspberry Pi [13] was acting as an IPv6 border router and 2), the smartphone’s CoAP server registers its available
finally a laptop as an IPv6 client (Fig. 1.). resources and services on the Digcovery, by sending the
following PUT requests:

coap://[2001:720:1710:10::1000]:5683/dig?ep=sensors&pr
oto=coap&port=5683&d=example.com
&lat=25&long=45&z=Belgrade,
and
coap://[2001:720:1710:10::1000]:5683/dig?ep=gps&proto
=coap&port=5683&d=example.com
&lat=25&long=45&z=Belgrade,

Figure 1. Test set-up, IPv6 communication between laptop and CoAP


Server on the Android

In order to be able to route communication in a proper


way ISP (Internet Service Provider) needs to have a
support for IPv6. Because ISP’s in Serbia don’t support
IPv6 addressing system, a tunneling system was used
provided by Gogo6 [14] with its Freenet6 Tunnel Broker Figure 2. Communication flow – getting the list of available sensors
[15]. A tunneling mechanism enables IPv6 network even from the Android phone

Page 389 of 478


ICIST 2014 - Vol. 2 Poster papers

The other end of the communication link is devices via the Internet, it is crucial to show how these
embodied in an IPv6 Client installed on a laptop. Similar devices could have an internet access over the IPv6
to the Raspberry Pi, the laptop obtained an IPv6 address network. The role of half-gateway is to communicate
through the Freenet6 Tunnel Broker and is able to make a through IPv6 but still to be able to connect to a device via
request over IPv6. In order to read available services on Bluetooth or Infrared. The mobile phone performs
the phone, the IPv6 client sends a GET request to the registration of these devices in Digcovery or a Resource
Digcovery server: Directory thus allowing their discovery and obtaining
measurements. In the full gateway implementation
coap://[2001:720:1710:10::1000]:5683/dig?qt=2&ep=sensors, additionally protocol adaptations, security and privacy
aspects should be supported, what is not investigated in
Digcovery responds with a description of those this paper.
services:
In this setup an Android phone, with CoAP Server
[{"name":"coap.sensors","port":5683,"addr":"2001:5c0:1504:b800 implemented, is used as IPv6 half-gateway for the
:50d0:f101:83cc:b8b4","values":[{"value":"25@45","nameField": Bluetooth enabled device MindWave [16]. The
"geo"},{"value":"Belgrade","nameField":"gps"}],"gps":"Belgrade
","loc":[45.0,25.0],"domainName":"example.com"}, MindWave device is able to read brain wave activity and
{"name":"coap.gps","port":5683,"addr":"2001:5c0:1504:b800:50d to send raw measurements to the smartphone. As in the
0:f101:83cc:b8b4","values":[{"value":"25@45","nameField":"geo first test case, Raspberry Pi is set as the Border Router for
"},{"value":"Belgrade","nameField":"gps"}],"gps":"Belgrade","lo IPv6 and got its IPv6 address with the Freenet6 client
c":[45.0,25.0],"domainName":"example.com"}]
which enables the IPv6 communication. On Freenet6, a
single, permanent IPv6 address and a DNS name are
The descriptions of both services are being sent
assigned to each user, making their device (PC, laptop) or
because they are registered from one domain. After
any other device reachable from anywhere on the IPv6
receiving this message, the IPv6 CoAP client installed on
internet. The IP address on the phone is the following:
the laptop sends a GET request to the following address:
2001:5c0:1504:b800:71bc:54c0:eee3:3b83
coap://[2001:5c0:1504:b800:50d0:f101:83cc:b8b4]:5683/sensors/
The MindWave device is able to read brain wave
The request is tunneled through the Freenet6 over the activity and to send raw measurements to the smartphone.
Internet to the Raspberry Pi, which than transmits the A connection between the MindWave and the smartphone
request to its final destination, a smartphone. Because the is established using Bluetooth. An application installed
smartphone listens on the 5683 port, it gets the request, on the phone communicates over the IPv6 network, reads
processes it and sends back a response to the laptop and process the EEG (Electro Encephalograph) data from
through the same communication channel (Fig. 2.). The the MindWave (and interpret it as the level of attention
response holds information about the available sensors on and meditation) and it has a CoAP server that waits for
the phone (Table 1.): the request from the internet. The server listens the port
5683. This setup enables access to the Bluetooth device
TABLE I. via IPv6 network.
LIST OF AVAILABLE SENSORS

“This is a list of available sensors:


KR3DM 3-axis Accelerometer
AK8973 3-axis Magnetic field sensor
GP2A Light sensor
GP2A Proximity sensor
K3G Gyroscope sensor
Rotation Vector Sensor
Gravity Sensor
Linear Acceleration Sensor
Orientation Sensor
Corrected Gyroscope Sensor
GPS”

For example, if the request is sent to the:


coap://[2001:5c0:1504:b800:50d0:f101:83cc:b8b4]:5683/sensor/gps,
Figure 3. IPv6 communication between laptop and MindWave device
the response is the current location of the phone. Similar using Android phone as a gateway
to this, the measurements from all sensors can be read.
After running the IPv6 EEG CoAP Server Application
on the phone (Fig. 3.), it sends the registration data to the
IV. HALF GATEWAY IMPLEMENTATION
Digcovery:
In this test case mobile phone acts as a half-gateway
coap://[2001:5c0:1504:b800:71bc:54c0:eee3:3b83]:5683/di
for sensors from devices that don’t support IPv6 protocol g?ep=attention&proto=coap&port=5683&d=example.com
[10]. These devices are connected to the phone via &lat=25&long=45&z=Belgrade
Bluetooth, Infrared, etc. Since IoT means connected

Page 390 of 478


ICIST 2014 - Vol. 2 Poster papers

and enables every IoT device to have a unique address could


coap://[2001:5c0:1504:b800:71bc:54c0:eee3:3b83]:5683/di
be used.
g?ep=meditation&proto=coap&port=5683&d= In this paper some implementations were presented in
example.com &lat=25&long=45&z=Belgrade order to show the importance of the IPv6 for IoT and
possibilities that IPv6 offer. Since in Serbia ISPs don’t
have a support for IPv6, tunneling mechanism is used to
In order to read the data from the MindWave, IPv6
demonstrate proof of concept. Freenet6 Tunneling Broker
CoAP Client on the laptop sends GET request to the
enables carrying IPv6 address through the IPv4 channel
Android application that can handle the CoAP request
which enables local networks IPv6 connectivity. To avoid
and it is connected to the MindWave via Bluetooth:
having the tunneling mechanism for every device a
GET on coap://[ Raspberry Pi device was set to work as a Border Router
2001:5c0:1504:b800:71bc:54c0:eee3:3b83]:5683/attention/ for IPv6. In this way any device could have IPv6
or connectivity using WiFi through Raspberry Pi. Also,
GET on coap://[ every device connected through this system has a unique
2001:5c0:1504:b800:71bc:54c0:eee3:3b83]:5683/meditation/ IPv6 address.
Two set-ups are presented. In the first one, smartphone
The request is tunneled through the Freenet6 over the is used as end point that could be accessed directly
Internet to the Raspberry, which than transmits the through the IPv6 address. Smartphone has a number of
request to its final destination, Android phone. Phone sensors and, if asked, it could response with readings
listens the 5683 port, gets the request, process it and send from those sensors. The REST CoAP server is used so
back the response to the laptop through the same every sensor could be accessed independently. Second
communication channel (Fig. 4.). The response holds set-up shows how smartphone could be used as a half-
information about current data gathered from the gateway for non IP devices. In that way, access to the
MindWave. devices that use Bluetooth or infrared, is provided.

In the future work we plan to compare IPv6 set-up that


uses Digcovery system presented here, with the same set-
up where instead of the Digcovery IPv4 based Resource
Directory will be used [7]. Preliminary results show that
there is a major influence of the IPv6 tunneling
mechanism on the communication speed, so it is
necessary directly to use IPv6 access point in order to
avoid tunneling delays and to perform real comparisons
of IPv6 Digovery system and IPv4 based Resource
Directory.

ACKNOWLEDGMENT
Main part of the research presented in this paper was
undertaken in the context of European Project FP7 IoT6
project (STREP). The project has received funding from
the European Community's Seventh Framework
Programme under grant agreement n° 288445.
Figure 4 Communication flow – getting the attention or meditation
level from the MindWave REFERENCES
[1] http://www.androidhive.info/2012/10/android-push-notifications-
If the request was sent to the attention interface, the using-google-cloud-messaging-gcm-php-and-mysql/
response holds information about current attention level, [2] IoT6 D1.3 Updated version of IoT6 architecture & SOA
specifications
and if the request was sent to the meditation interface, the
[3] Tomislav Dimčić, Srđan Krčo, Nenad Gligorić, “CoAP
response holds information about current meditation (Constrained Application Protocol) implementation in M2M
level. These levels are relative values between 0 and 100. Environmental Monitoring System”, 2nd International Conference
on Information Society Technology, 2012
[4] Nenad Gligorić at al: “CoAP over SMS Performance Evaluation
V. CONCLUSIONS AND FUTURE WORK for Machine to Machine Communication”, 20th
Telecommunications Forum (TELFOR), Special Session, 2012
In IPv4 based mobile networks, mobile phone does not [5] http://datatracker.ietf.org/wg/core/charter/ IETF CoRE Working
have a static IP address, and every time the phone is Group [Last accessed April 2013]
switched off and on, it obtains new IP address from the [6] Bergmann, O.; Hillmann, K.T.; Gerdes, S., "A CoAP-gateway for
network, while if the phone is on the WiFi, through the smart homes," Computing, Networking and Communications
(ICNC), 2012 International Conference on , vol., no., pp.446-450,
IPv4 network, port forwarding on the local router must be Jan. 30 2012-Feb. 2 2012.
provided. It makes a problem to get sensors data from the [7] http://tools.ietf.org/html/draft-ietf-core-resource-directory-01
smartphones. To avoid these problems IPv6 networks that

Page 391 of 478


ICIST 2014 - Vol. 2 Poster papers

[8] S. Cheshire and M. Krochmal. DNS-Based Service Discovery. [11] IoT6 D6.2 Ubiquitous access and mobile phone network
RFC 6763. ISSN: 2070-1721, Internet Engineering Task Force, interactions report
February 2013. [12] http://www.iot6.eu/
[9] Zigbee Specification. ZigBee Document 053474r17, January [13] http://www.raspberrypi.org/
2008.
[14] http://www.gogo6.com
[10] E. Guttman, C. Perkins, J. Veizades and M. Day. Service Location [15] http://www.gogo6.com/freenet6/tunnelbroker
Protocol, Version 2. IETF RFC2165, June 1999
[16] http://www.neurosky.com

Page 392 of 478


ICIST 2014 - Vol. 2 Poster papers

Communication Networks 2-terminal Reliability


and Availability Estimation by Simulation
Radomir Janković*, Slavko Pokorni**, Momčilo Milinović***
*Union University School of Computing, Belgrade, Serbia
ITS Information Technology School, Belgrade, Serbia
**
***
Faculty of Mechanical Engineering, Belgrade, Serbia
[email protected] [email protected] [email protected]

Abstract— An approach to two-terminal reliability (2TR)


and availability (2TA) estimation of communication Destination
networks consisting of repairable nodes and links by means N1 L13
of the discrete events simulation has been presented in the N3
paper. The necessary definitions have been given, as well as Source L12 L23
the simulation model elements, the algorithms of the
realized GPSS World program-simulators for two-terminal N2
network reliability and availability estimation and an L14 L35
example with a brief analysis of the executed experiments L25
results.
L45
N4 N5
I. INTRODUCTION
Communication networks (Fig.1) are complex systems, Figure 1. Communication network example: 5 nodes, 7 links
consisting of many components which are subjects to
occasional failures by nature. With the increase of the
number of network elements (nodes and links), the difficult and often even impossible. This is the reason why
disruption probability of the connection between two users simulation can be used for the solution of such problems
who interchange information across the network also [2, 3, and 4].
increases. Mitigating factor is the possibility of We applied 2 simulation methods [5] for complex
reestablishment of the connection by choosing some communication network reliability estimation, consisting
alternative path in the network, which is available at the of nodes and links characterized by their mean times
time of the network element failure. between failures (MTBF). However, in further research
For any technical system, reliability and availability [6], we concluded that availability would be better quality
characterize its best capability to persist in functioning. indicator for users of services supported by
For communication networks in particular, it is necessary communication network consisting of repairable elements.
to determine which kind of network reliability and In this article, we present our approach to complex
availability are to be estimated. They can be [1]: communication networks 2-terminal reliability and
• Network 2-terminal reliability (2TR) and availability availability estimation by applying the discrete events
(2TA): probabilities that there is at least one simulation method.
operational path between 2 terminal node, Ns (source)
and Nd (destination) in time t = T, or in any time t, II. DEFINITIONS
respectively; Communication network is a set of network elements,
• Network k-terminal reliability (kTR) and availability interconnected for the sake of information transfer. In this
(kTA): probabilities that there is at least one research, complex communication network is a system
operational path between at least k defined nodes in consisting of at least 2 terminal nodes, connected by
time t = T, or in any time t, respectively; means of links and/or other nodes by at least 2 different
• Network total reliability (ATR) and availability ways.
(ATA): probabilities that all nodes (any two) are Network elements are nodes and links.
connected in time t = T, or in any time t, respectively. Node Ni is a network element for sending, receiving
There are analytical methods for reliability and and routing information through communication network.
availability calculation or at least assessment, but only a A node can be terminal (source or destination of
few of them can be applied if the network is complex, communication), or intermediate (router).
especially if its elements are repairable, which is usually All nodes in this research are capable of sending,
the case. In essence, these methods are based on counting receiving and routing the traffic over the links they are
the states in which system is operational, and by summing connected to.
the probabilities for the system being in these states. But
the number of these states can be great. Such a great Link Lij is a network element intended for connecting
number of states make the achievement of analytical of adjacent nodes Ni i Nj. In this research all links are
solutions for reliability and availability of a network bidirectional, i.e. Lij ≡ Lji.

Page 393 of 478


ICIST 2014 - Vol. 2 Poster papers

Connection is a communication in a network, B. Simulation Model for 2-terminal Communication


established between two terminal nodes (source and Networks Reliability Estimation
destination).
Path is a serial connection of network elements Model for communication networks two-terminal
(terminal nodes, intermediate nodes and corresponding reliability is in the discrete events simulators class. The
links). It consists of at least 3 elements: 2 terminal nodes simulator prototype has been implemented by means of
(source and destination) and 1 link. A path can have more the GPSS World simulation language [7].
additional elements, intermediate nodes for routing and
corresponding links. The simulator algorithm has been depicted in Fig. 2.
Basic modules of the simulator are network path
In this research, complexity of a path Si has been generator, access trial generator, and network operation
introduced, as a measure of path acceptance order for simulator.
connection establishment. It is defined by the expression:

Si = N i + Li = 2 N i − 1 (1) START
Where Ni and Li are the numbers of nodes and links in a
path i, respectively. Initialization
Ns, Nd, SV, T
On the occasion of establishing a connection between 2
terminal nodes, network management considers all
available paths, in ascending order of their complexity. Path generator:
Network traffic consists of all requirements for P1.....PK
establishing paths and busy times of those path elements
during connection life time.
Path
Failure of network element (node or link) occurs matrix
when an element stops functioning due to its failure, and
lasts until it is repaired. Failures of network elements are
defined by mean time, MTBF [h], and exponential
Trial
distribution of the time between failures. generator:
Path failure occurs when any of its constituent new trial
elements fails, because all of them are serially connected
in a path.
Repair of a network element is a process of putting the
failed element back into operational state. Network i=1
element repair time, trep, is defined by its mean, MTTR
[h], and its exponential distribution.
Yes
i>K?
III. THE SIMULATION APPROACH TO NETWORK TWO-
TERMINAL RELIABILITY ESTIMATION No
No Pi opera-
i=i+1 tional?
A. Reliability
Reliability R(t) of a technical system is a probability Yes
of the system being operational to perform the given Cs = Cs +1 Cf = Cf + 1
function for the period of time from zero to t. We supposed
that exponential distribution of time to failure of the
elements can be applied, so the reliability of the element of
the system is given by: SV = SV -1

R (t ) = e − λt (2) No SV = 0 ?
Where λ is failure rate of the element. Yes
Mean time between failures (MTBF) of nodes and links R(T)=
are basic input data for the reliability of a complex Cs/(Cs+Cf)
communication network calculation. If exponential
distribution of time to failure can be applied, the relation
between MTBF and λ is MTBF=1/ λ. This is adequate if STOP
the assumed network consists of electronic elements.
We assume that network fails if its elements (nodes and
links) fail in such way that there is no at least one
communication path between two nodes. Figure 2. Network two-terminal reliability simulator algorithm

Page 394 of 478


ICIST 2014 - Vol. 2 Poster papers

The algorithm is based on the fact that 2TR reliability is TABLE I.


IMPACT OF NETWORK ELEMENTS FAILURES TO COMMUNICATIO PATHS
the probability that a system dispose of at least one
operational path between source and destination node in Network Path
the time instant t = T. elementa P1 P2 P3 P4 P5
On simulator initialization, source node Ns, destination N1 0 0 0 0 0
node Nd, time instant t = T in which the network 2TR is to
be estimated, and the sample value (SV) of trials of N2 X 0 0 X 0
communication establishing are selected.
N3 0 0 0 0 0
Based on the network configuration and its operational
elements, module path generator forms the path matrix, N4 X X X 0 0
containing all available paths Pi, i = 1, … , K, leading N5 X X 0 0 0
from Ns to Nd. The matrix is sorted according to paths’
complexities Si, calculated by the expression (1). L12 X 0 0 X X
The trial generator generates SV sample of trials to L13 0 X X X X
communicate through the network. At each trial, the
simulator goes through all possible paths between Ns and L14 X X X 0 0
Nd, from the simplest to the most complex one. Each path L23 X 0 X X 0
element is examined, by sampling its probability of
correct functioning, calculated for the time instant t = T by L25 X X 0 X X
the expression (2). If any path element fails, the simulator L45 X X X 0 0
tries the next possible path, excluding every path that
includes the failed element. L35 X X 0 0 X
If at the end of the trial a fully operational path is found, a. 0 means that network element has failed, which disrupts every part containing it; X means
that network element has no impact to path operation
the communication between source and destination node
can be established and the network success counter Cs is The impact of every network’s element failure to every
incremented by 1. If not, the network failure Cf counter is possible path between is shown in TABLE I.
incremented by 1. After each completed trial, sample SV The results of the simulation (Fig.3) show that 2TR in
is decremented by 1. At the end of the simulation, when the first 10 hours of the system functioning is 100 %, with
SV = 0, two-terminal reliability of the network in time or without the communication network. Between 10 and
instant t = T is calculated by the expression: 100 hours, 2TR decreases relatively slowly, and from 100
hours more rapidly until 5000 hours, when it practically
Cs reaches 0.
R (T ) = (3)
Cs + C f In that interval, the network shows slightly better
performance than simple monolink connection, but not as
In order to dynamically estimate network two-terminal spectacular as one could have expected. It can be
reliability, one can execute several simulation experiments concluded that this is a consequence of the fact that
for different time points of interest, Ti, and put the results MTBF of the proposed nodes and links are of the same
together in the common time diagram, as it has been done order of magnitude, so the source and destination nodes
in Fig. 3. Those are the results of the simulation of the have predominant impact to 2-terminal reliability, and the
example network (Fig. 1), consisting of elements with the network helps only in cases when both N1 and N3 are
following characteristics: operational, and the direct link N13 has failed. In such
• Nodes: MTTF = 1/λ = 3000h, exponential distribution; situation, in order to maintain an acceptable level of 2TR,
• Links: MTTF = 1/λ = 4000h, exponential distribution. one should focus on increasing the source and destination
node reliabilities.
Two-terminal reliability has been estimated between
nodes N1 and N3 , in the following time points: The simulator itself has been realised in GPSS World
language. It is relatively simple to implement and can be
Ti(h)∈{1, 10, 100, 1000, 5000, 10000} useful in situations when it is necessary to estimate
For every of time points a sample of SV= 10000000 performance of communication networks consisting of
access trials has been generated. non-repairable elements.
It is assumed that in the time instant t = 0, all network SIMULATION RESULTS
elements are operational.
Possible paths between nodes N1 and N3 (Fig.1), sorted Without network With network

by ascending complexity are: 1


0.9
• P1 N1-L13-N3
Two-terminal reliability

0.8

• P2 N1-L12-N2-L23-N3 0.7
0.6
• P3 N1-L12-N2-L25-N5-L53-N3 0.5

• P4 N1-L14-N4-L45-N5-L53-N3
0.4
0.3
• P5 N1-L14-N4-L45-N5-L52-N2-L23-N3 0.2

The experiments have been executed in order to 0.1

estimate two-terminal reliability in the following cases:


0
1 10 100 1000 10000

• N1 and N3 are connected by means of single link N13 Time (h)

• N1 and N3 are connected by means of the network. Figure 3. Network two-terminal reliability simulation results

Page 395 of 478


ICIST 2014 - Vol. 2 Poster papers

It is much more difficult to estimate how useful a


Failure
communication network is when it consists of repairable START generator
elements, because when failure of one or more nodes
and/or links occurs, the network can still try to continue
functioning, by choosing some of possible alternative Initialization
paths across operational nodes and links. Failure of a Ns, Nd, to, T
network as a whole occurs only if at the time of network
service request there is no available path enabling that
service. Path generator:
P1.....PK
In such situations, it is convenient to calculate or at
least estimate the communication network availability.
Path Change in
IV. THE SIMULATION APPROACH TO NETWORK matrix path matrix
AVAILABILITY ESTIMATION

A. Availability Traffic
generator:
Availability (A) of a technical system is a probability new traffic
that the system, when used under given conditions, will unit
satisfactorily operate in any time [8], which is defined by
the expression:
tuse i=1
A= (4)
tuse + t f
Where tuse is system usage time and tf is system fail Yes
i>K?
time.
Our approach to communication networks availability No
estimation consists of formulation and programming of No Pi
i=i+1 available?
simulation model for network 2-teminal availability
estimation, simulation of network operation and Yes
corresponding events in chosen time interval T, gathering
data of times tuse and tf during the simulation experiment Seize Pi
and network availability calculation according to the
expression (4).
B. Simulation Model for 2-terminal Communication t = t +t0
Networks Availability Estimation
The simulator algorithm has been depicted in Fig. 4.
Basic modules of the simulator are traffic generator, Release Pi
network path generator, failure generator and network
operation simulator.
In spite of some common modules in the realised tuse = tuse + t0 tf = tf + t0
simulators, our approach to network availability
estimation by simulation is rather different to that of
network reliability estimation. No
In the network reliability simulator for each point in t≥T?
time of interest we generate a sample SV of trials Yes
(attempts to establish a connection between Ns and Nd
along one of the operational paths) and than put the results Asd=tuse / T
of the experiments together to dynamically estimate 2TR.
On the other hand, in network availability simulator we
generate traffic units at each basic time interval t0, STOP
simulate the network operation and all important discrete
events in a long simulated time period T, estimate 2TA in
the time points of interest, and observe how 2TA
dynamically changes during the whole time period T. Figure 4. Network two-terminal availability simulator algorithm
On simulator initialization, source node Ns, destination leading from Ns to Nd, sorted according to paths’
node Nd, basic time interval t0 and total time T of complexities Si.
simulated network operation are selected.
The traffic in the communication network is simulated
Again, based on the network configuration and its by flow of traffic units, represented by means of GPSS
operational elements, module path generator forms the transactions [7].
path matrix, containing all available paths Pi, i = 1,…,K,

Page 396 of 478


ICIST 2014 - Vol. 2 Poster papers

In the model, we consider the worst case, when


continuous network service (information traffic between SIMULATION RESULTS
source node Ns and destination node Nd) is requested. In Without network With network
the program-simulator it is achieved by generating new 1.001
traffic unit at every basic interval t0 by the traffic generator 1

Two-terminal availability
module. On the occurrence of every new traffic unit, path 0.999

matrix content is examined, in ascending order of path 0.998

complexities. 0.997
0.996
If there is an available path Pi from Ns to Nd, the traffic 0.995
unit seizes it, and keeps it busy for 1 interval t0, then 0.994
releases it, increments use time tuse counter by 1 interval t0 0.993
and finally leaves the simulation. If there is no available 0.992

path, the traffic unit increments fail time tf counter by 1 0.991

interval t0 and leaves simulation. 100 1000 10000


Time (h)
100000 1000000

Independently of traffic generation in the network, the


Figure 5. Network two-terminal availability simulation results
failure generator module generates element failures, also
represented by means of GPSS transactions.
(with network and with simple monolink connection), in
There are as much independent failure generators as the considered time interval of T = 1000000 hours the 2-
there are network elements, one for each. Failure terminal availability is 99,22 % ≤ 2TA ≤ 100 %.
generator for i-th network element (node or link)
generates new GPSS transaction which simulates that In the first 1000 hours of the system functioning 2TA in
element failure, according to its MTBF-i and exponential both cases is 100 %, with or without the communication
distribution of the time between failures. network.
When i-th network element fails, it passes into failure With network, between 1000 and 2000 hours, 2TA is
state, the consequences of which are change in the path still 100 %, and then decreases, with slight oscillations,
matrix and the beginning of that element repair. The but never below 99,6%, in the whole simulated period.
change in the path matrix consists of putting all the paths Without network, in the first 1000 hours, 2TA is 100 %.
containing the failed network element into state of Between 1000 and 10000 hours, it decreases, with some
unavailability for traffic in the network. Failed network oscillations, and than increases in the whole remaining
element repair is simulated by incrementing network fail period of simulated time T, but never over 99,56 %.
time tf counter by the repair time trep which is calculated
according to that element mean time to repair (MTTR-i) V. CONCLUSIONS
and exponential distribution of the repair time. Our approach to communication networks two-terminal
After finishing the repair, transaction representing the reliability (2TR) and availability (2TA) estimation
failure updates the path matrix, by putting that element in consists of formulation and programming of network
operational state in all paths containing it. simulation model, simulation of network operation in
In all time points of interest, the realised program- chosen simulated time interval, gathering data of relevant
simulator calculates and returns the so far reached events and processes during that operation and calculation
network 2-terminal availability, according to the the network reliability and availability based on obtained
expression (4). experimental data.
The simulation stopping criterion is the expiration of After developing of a set of basic modules, two
predetermined network simulated operation time T. When different simulators have been realised.
time T expires, simulation is terminated and the finally The network 2TR simulator is simpler and suitable for
reached value of 2-terminal availability is calculated and estimation of communication networks consisting of non-
returned. repairable elements. In each time point of interest, it
Fig. 5 represents the results of the simulation, executed generates a sample of attempts to establish communication
in order to estimate 2-terminal availability (2TA) of our between source and destination nodes, and calculates 2TR
example network (Fig.1). based on success and failure counters values at the end of
The initial assumptions are the same as for the 2TR the simulation. One can dynamically estimate network
simulator in the Section III, with the exception that our two-terminal reliability over the given time period by
example network now consists of repairable elements, putting the results of simulations for every time point of
with the following characteristics: interest.
The more complex network 2TA simulator is
• Nodes: mean time to repair MTTR = 5h, exponential
particularly useful for estimation of networks consisting of
distributions;
repairable elements. It generates traffic units in each basic
• Links: MTTR = 3h, exponential distributions. time interval, independently generates failures of network
It is assumed that the repair of failed network element elements, simulates network operation and all relevant
starts on failure occurrence. events in simulated time period, and returns 2TA values in
The total simulated time period is T = 1000000 hours. all time points of interest, enabling users to dynamically
estimate network two-terminal availability.
The basic time period is t0 = 1 minute.
The realized simulators can be useful in communication
As it can be seen in Fig. 5, the introduction of
networks early conception phases, as well as in their
repairable elements in our example communication
exploitation and maintenance planning.
network results in much better performance: in both cases

Page 397 of 478


ICIST 2014 - Vol. 2 Poster papers

ACKNOWLEDGMENT [4] Y. Jiang et al. “Monte-Carlo Methods for Estimating System


Reliability“, proceedings of ICQRMS 2012 - International
This work has been done within the projects III-40027 Conference on Quality, Reliability, Risk, Maintenance, and Safety
and TR-35026, supported partly by the Ministry of Engineering, 2012.
Education and Science of the Republic of Serbia. [5] S. Pokorni and R. Janković, “Reliability Estimation of a Complex
Communication Network by Simulation”, proceedings of 19th
REFERENCES Telecommunications Forum TELFOR 2011, Belgrade, 2011.
[6] R. Janković and S. Pokorni, “Communication Networks
[1] M. Čabarkapa, Đ. Mijatović and N. Krajnović, “Network Availability Estimation by Simulation”, proceeding of 4th DQM
Topology Availability Analysis”, Telfor Journal, Vol. 3, No. 1, pp. Intenational Conference on Information Technology ICDQM-
23-27, 2011. 2013, June 27-28, Belgrade, 2013.
[2] H. K. Ping, “Network Reliability Estimation”, PhD thesis, [7] Minuteman Software, “GPSS World Reference Manual”,
University of Adelaide, 2005. www.minutemansoftware.com
[3] M. Luby, “Monte-Carlo Methods for Estimating System [8] N. Vujanović, “Teorija pouzdanosti tehničkih sistema”, VINC,
Reliability“, University of California Berkeley, 1984. Beograd, 1990.

Page 398 of 478


ICIST 2014 - Vol. 2 Poster papers

Analysis of Monitoring Dipole and Monopole


Antennas Influence on Shielding Effectiveness of
Enclosure with Apertures
Vesna Milutinović*, Tatjana Cvetković*, Nebojša Dončov**, Bratislav Milovanović**
Republic Agency for Electronic Communications, Belgrade, Serbia
*

University of Niš / Faculty of Electronic Engineering, Niš, Serbia


**

[email protected], [email protected], [email protected],


[email protected]

Abstract— In this paper, numerical model of metal methods of moments (MoM) in [5] and the transmission
enclosure with apertures and monitoring antenna is line matrix (TLM) method in [6]. Conventional approach
considered for the purpose of accurate shielding based on fine mesh description of fine features such as
effectiveness calculation. TLM method incorporating wire slots and apertures, was used in [4] and [6]. Various
node model is used to account for the presence of factors, such as aperture patterns, their dimensions,
monitoring dipole or monopole antennas used in practice to number and orientation with the respect of enclosure
detect the level of electromagnetic field at selected points walls or plane wave propagation direction, were analyzed
inside the enclosure. Numerical model is first verified in terms of their influence on the SE of enclosure by
through comparison with experimental results and then authors of this paper. The results of this analysis have
used to study an impact of physical parameters of been presented in [7,8]. In addition, an impact of plane
considered antennas on shielding properties of rectangular wave excitation parameters on shielding properties of
enclosure and shift of its resonance frequencies. In addition, enclosure with multiple apertures has been considered by
both antennas are mutually compared in terms of their the authors in [9,10]. A conventional TLM method was
influence on detected level of electromagnetic field inside also used as in [5] to numerically study these various
enclosure. effects at high frequencies in [7-10].
I. INTRODUCTION Aim of this paper is to investigate an impact of a small
monopole/dipole antenna on the SE of enclosure, often
Performances of electronic systems enclosed in used in an experimental setup to measure the level of EM
enclosures, dominantly depend on the nature and field at some characteristic points in the enclosure.
existence of interconnecting paths, from the viewpoint of Antenna of finite dimensions could significantly affects
electromagnetic compatibility (EMC). In eliminating or the EM field distribution in closed environment as
reducing these interconnecting paths which cause already experimentally shown in [12] for rectangular
coupling between electromagnetic (EM) energy sources cavity. In a metal enclosure, this effect on the results of
and sensitive electronic systems, shielded enclosures have SE is numerically illustrated in [11]. The existing
a major role. Wired and dielectric structures within the measurements presented in [2] and [3] is improved in this
electronic system have also EM radiation excitation paper to include monopole and dipole antenna presence,
characteristics. The enclosures made from various respectively. This mentioned model did not take into
material protect system, but they have a number of account the presence of receiving antenna as one of
apertures with different size and patterns, used for airing, possible causes for some differences between model and
heat dissipation, control panels, outgoing or incoming experimental SE results. The TLM method, incorporating
cable penetration or other purposes, that can compromise compact wire model presented in [13], is also used here
its shield role. Therefore, it is very important to estimate as in [12] to create a numerical model capable to take into
shielding properties of enclosure in the presence of account the antenna presence and its parameters such as
apertures. length and radius and their impact on the SE of enclosure.
The performances of a shielding enclosure is
quantified by shielding effectiveness (SE) defined as the
ratio of field strength in the presence and absence of
enclosure. There are several methods already developed II. COMPACT TLM WIRE MODEL
for the calculation of SE of metal enclosures with
apertures on their walls, such as analytical formulations The TLM method [14] is a numerical modelling
like in [1,2], while in addition the solution in [2] was technique based on temporal and spatial sampling of EM
enhanced in [3] to allow for considering oblique fields. In TLM method, a three-dimensional (3D) EM
incidence and polarization of incident plane wave and field distribution is modelled by filling the space with a
arbitrary location of apertures in relation to plane wave network of transmission link lines and exciting a
propagation. In solving many EMC problems in a wide particular field component in the mesh. EM properties of
frequency range, differential numerical techniques in the a medium are modelled by using a network of
time domain found their application such as the finite- interconnected nodes. A typical node structure is the
difference time domain (FDTD) method in [4], the symmetrical condensed node (SCN), which is shown in

Page 399 of 478


ICIST 2014 - Vol. 2 Poster papers

Fig. 1. Additional stubs can be attached to SCN to model The single column of TLM nodes, through which wire
inhomogeneous and lossy materials. conductor passes, can be used to approximately form the
fictitious cylinder which represents capacitance and
inductance of wire per unit length. Its effective diameter,
different for capacitance and inductance, can be
expressed as a product of factors empirically obtained by
using known characteristics of TLM network and the
mean dimensions of the node cross-section in the wire
running direction [14].

III. NUMERICAL ANALYSIS

A. Results for the presence of receiving dipole antenna

For numerical calculation of the SE for various radii of


receiving dipole antenna, rectangular metal enclosure,
with dimensions (300x400x200) mm (Fig. 4a) is
considered in this paper first. One rectangular aperture of
Figure 1. Symmetrical condensed node. dimensions lx2s = 50 mm x 30 mm exists on the front
enclosure wall of thickness 2 mm in zy-plane (Fig. 4b).
Aperture is symmetrically placed around the centre of the
Compact TLM wire model, which allows for accurate front wall. A plane wave of normal incidence to the
modelling of wires with a considerably smaller diameter frontal panel and with vertical electric polarization is
than the node size, has been introduced in [14]. It use used as an excitation. Dipole antenna of length 100 mm,
special wire network formed by using additional link and oriented along z-axis, is used to measure the level of EM
stub lines (Fig. 2) whose characteristic impedance field inside the enclosure. Choice of geometry and
parameters, Zw and Zws, are chosen to model the dimensions of enclosure and aperture, type of excitation,
capacitance and inductance increased by the wire location of antenna (5 mm off the enclosure center in x-
presence, while at the same time maintaining synchronism direction) and its length were governed by experimental
with the rest of the transmission line network. This wire arrangements presented in [3]. It should be noted that in
network is embedded within the TLM nodes to model [3] the radius of the receiving antenna used in the
signal propagation along the wires, while allowing for measurements is not specified as well as characteristics of
interaction with the EM field (Fig. 3). Coupling between balun placed between antenna and cable to enhance the
the additional link and stub lines with the rest of TLM antenna efficiency.
node is achieved through points A and B.

Figure 2. Wire network for a wire running in i-direction.


a) b)
Figure 4. a) Rectangular metal enclosure with receiving dipole
antenna, b) frontal panel with one rectangular aperture.

Numerical model, incorporating the compact TLM


wire description of dipole antenna, is used to calculate the
SE for various radii of receiving antenna of considered
enclosure. The receiving antenna is represented as z-
directed 100 mm long wire having the radius of 0.08 mm,
0.4 mm and 1.6 mm. SE results, obtained by numerical
TLM models without and with antenna, for one aperture
(50 x 30) mm on the front wall, are shown in Fig. 5
together with measurement results [3]. It can be seen that
numerical results follow well the shape of the SE curve
obtained by measurements, however there is a difference
regarding the level of SE which depends of the wire
Figure 3. Wire network embedded within the TLM nodes. radius.

Page 400 of 478


ICIST 2014 - Vol. 2 Poster papers

Figure 5. SEz of metal enclosure (300x400x200) mm with


Figure 6. SEz of metal enclosure (300x400x200) mm with
rectangular aperture (50 x 30) mm on the front wall -
rectangular aperture (50 x 30) mm on the front wall -
numerical TLM model for various radii of dipole
numerical TLM model for various length of dipole
antenna.
antenna.

Figure 5 shows that the SE with the receiving antenna,


placed in enclosure, regardless of wire radius value, has a It can be seen (Fig. 6) that when length of antenna is
constant lower value across the considered frequency increasing, the level of SE obtained by TLM model is
range than the curve for the SE obtained without decreasing in considered frequency range. Resonant
receiving antenna. This drop in the SE level can be frequencies shift towards lower frequencies when length
explained due to nature of numerical model to account for of antenna is increasing. The results for the SE shows that
two-ways interactions between antenna and EM field. i.e. the SE curves remains the same with a constant lower
induced wire current causes that wire behaves as a second value for the SE from the 0 to 1 GHz. In the rest of
emitter and it has a return influence on EM field inside frequency range the curve for the length of antenna 6cm,
the enclosure. When radius of antenna is increasing which is smaller than the half wavelength, has a bigger
resonant frequencies shift towards lower frequencies. drop in the SE level, resonant frequencies and additional
Numerical model is capable to explicitly account for peaks.
dependence of wire capacitance and inductance per unit The level of SE and the first resonant frequency
length on radius [14,18]. obtained by numerical model with antenna with various
The level of SE and the first resonant frequency lengths and radius 0.08mm for receiving dipole antenna
obtained by numerical model without antenna and with are shown in Table 2.
antenna with various radius for receiving dipole antenna
are shown in Table 1. TABLE II.
SE AT THE FIRST RESONANT FREQUENCY FOR VARIOUS LENGTH OF
RECEIVING DIPOLE ANENNA
TABLE I.
SE AT THE FIRST RESONANT FREQUENCY FOR VARIOUS RADII OF Length (cm) f (MHz) SE (dB)
RECEIVING DIPOLE ANENNA
6 626.3 -3
r (mm) f (MHz) SE (dB) 10 625.1 -4.9
1.6 623 -9.5 14 622.7 -6.9
0.4 624.5 -6.6 18 617.8 -8.6
0.08 625.1 -4.9

without antenna 626.6 -0.4 B. Results for the presence of receiving monopole antenna

Numerical model, incorporating the compact TLM For numerical calculation of the SE for monopole
wire description of dipole antenna, is also used to receiving antenna, rectangular metal enclosure, with
calculate the SE for various length of receiving antenna dimensions (300x300x120) mm (Fig. 7a) is considered
of considered enclosure. The receiving dipole antenna is also. One rectangular aperture of dimensions lx2s = 100
represented as z-directed long wire having the length 6 mm x 5 mm and lx2s = 200 mm x 30 mm exists on the
cm, 10 cm, 14 cm and 18 cm and radius 0.08mm. SE front enclosure wall of thickness 3 mm in zy-plane (Fig.
results, obtained for various length of receiving antenna 7b). Aperture is symmetrically placed around the centre
and one aperture (50 x 30) mm on the front wall, are
shown in Fig. 6. of the front wall. A plane wave of normal incidence to the
frontal panel and with vertical electric polarization is
used as an excitation. Choice of geometry and dimensions

Page 401 of 478


ICIST 2014 - Vol. 2 Poster papers

of enclosure and aperture, type of excitation and location level of SE and the first resonant frequency obtained by
of antenna (center of the enclosure) were governed by numerical model without antenna and with antenna with
experimental arrangements presented in [2]. As in the various radius for receiving monopole antenna are shown
case of a dipole antenna, in [2] is not specified the radius in Table 3.
and the length of the receiving monopole antenna used in
the measurements.

a) b)

Figure 7. a) Rectangular metal enclosure with receiving monopole


antenna, b) frontal panel with one rectangular aperture.

Numerical model, incorporating the compact TLM


Figure 9. Numerical model and measurements for SE z of metal
wire description of monopole antenna, is used to calculate
enclosure (300x300x120) mm with rectangular aperture
the SE for various radii of receiving antenna of (200 x 30) mm on the front wall with and without
considered enclosure. Monopole antenna of length 60 monopole antenna.
mm, oriented along z-axis, is used to measure the level of
EM field inside the enclosure, having the radius of 0.08
TABLE III.
mm and 1.6 mm. SE results, obtained by numerical TLM SE AT THE FIRST RESONANT FREQUENCY FOR VARIOUS RADII OF
models without and with antenna, for one aperture (100 x RECEIVING MONOPOLE ANENNA
5) mm and for one aperture (200 x 30) mm on the front
wall, are shown in Fig. 8 and Fig. 9 together with r (mm) f (MHz) SE (dB)
measurement results [2]. As expected, the level of SE is
higher in the case of aperture (100 x5) mm as surface of 1.6 677 -22.5
this aperture is smaller than the surface of aperture (200 x 0.08 690.3 -18.8
30) mm.
without antenna 703.5 -13

FOR APERTURE (100 X5) MM ON THE FRONT WALL

r (mm) f (MHz) SE (dB)

1.6 612.4 -26.5

0.08 618.4 -22.5

without antenna 623.7 -17.4

FOR APERTURE (200 X30) MM ON THE FRONT WALL

Impact of antenna presence on the SE in comparison


with the case when antenna is excluded from numerical
model can be also observed. The difference between the
measured level of SE (SEmeas) and the level of SE
Figure 8. Numerical model and measurements for SE z of metal obtained by TLM model without (SE1) and with (SE2)
enclosure (300x300x120) mm with rectangular aperture antenna of radius 0.08mm in considered frequency range
(100 x5) mm on the front wall with and without for both aperture patterns is given in Table 4.
monopole antenna.
As it can be seen, the antenna presence has a
considerable impact on EM field level inside the
The SE with the receiving antenna placed in enclosure enclosure. Results for SE obtained using antenna with
has a constant lower value across the considered radius 1.6 mm are lower than results obtained using
frequency range than the curve for the SE obtained antenna with radius 0.08 mm. The average value of
without receiving antenna. It can be seen that numerical difference between measured and numerical model
TLM model without and with antenna provides the results given in Table 4 is around 3.2 dB in the
results that follow the experimental results curve. The considered frequency range.

Page 402 of 478


ICIST 2014 - Vol. 2 Poster papers

TABLE IV.
COMPARISON BETWEEN MEASURED LEVEL OF SE (SEMEAS.) AND THE
LEVEL OF SE OBTAINED FROM TLM MODEL WHEN ANTENNA IS
EXCLUDED (SE1 ) AND INCLUDED (SE2)WITH RADIUS 0.08 MM

f SEmeas. SE1 SE2 |SEmeas- |SEmeas-


(MHz) (dB) (dB) (dB) SE1 | (dB) SE2 | (dB)

200 44.11 51 46.7 6.89 2.59

300 45 45.4 40.9 0.40 4.10

400 36.8 40.4 35.3 3.60 1.50

500 32.65 33.9 28.4 1.25 4.25

600 23.48 24.5 18.1 1.02 5.38

700 -3 -10.9 -3.8 7.90 0.80

800 16.4 18 12.8 1.60 3.60 Figure 10. SEz of metal enclosure (300x300x120) mm with
aperture (100 x 5) mm on the front wall - numerical
900 16.6 21.2 14.2 4.60 2.40 TLM model for various length of monopole antenna.
1000 16.1 21 12.2 4.90 3.90

FOR APERTURE (100 X5) MM ON THE FRONT WALL

f SEmeas. SE1 SE2 |SEmeas- |SEmeas-


(MHz) (dB) (dB) (dB) SE1 | (dB) SE2 | (dB)

200 29.8 28.8 25 1.00 4.80

300 21.9 22.4 18.5 0.50 3.40

400 16.3 15.7 11.6 0.60 4.70

500 5.5 6.5 1.8 1.00 3.70

600 -13.8 -11.9 -18.2 1.90 4.40

700 -9.8 -9.7 -14.5 0.10 4.70

800 -9.3 -13.4 -17.9 4.10 8.60

900 -2.4 -0.9 -6.2 1.50 3.80


Figure 11. SE z of metal enclosure (300x300x120) mm with
1000 -0.8 7.5 0.5 8.30 1.30 rectangular aperture (100 x 5) mm on the front wall -
numerical TLM model for monopole antenna
connected to the down and top plate.
FOR APERTURE (200 X30) MM ON THE FRONT WALL

It can be seen that when length of antenna is


increasing, the level of SE obtained by TLM model
Numerical model, incorporating the compact TLM decreasing in considered frequency range. Resonant
wire description of monopole antenna, is also used to frequencies shift towards lower frequencies when length
calculate the SE for various length of receiving antenna of antenna is increasing. This drop in the SE level is
of considered enclosure. It is represented as z-directed bigger for length of antenna 6cm than one obtained for
long wire having the length 3 cm, 4 cm, 5 cm and 6 cm other given lengths, as well as for drop of resonant
and radius 0.08mm. SE results, obtained for various frequency like in Table 5.
length of receiving monopole antenna and one aperture
(100 x 5) mm on the front wall, are shown in Fig. 10.
The results for the SE with various lengths (Fig. 10) TABLE V.
were obtained for the case when monopole antenna is SE AT THE FIRST RESONANT FREQUENCY FOR VARIOUS LENGTH OF
connected to the top plate. If monopole antenna is RECEIVING MONOPOLE ANENNA

connected to the down plate, the results would be the


same like it is shown in Fig.11 for radius 0.08 mm and Length (cm) f (MHz) SE (dB)
1.6 mm with rectangular aperture (100 x 5) mm on the 3 702.2 -13.2
front wall. The similar conclusions can be derived for
aperture pattern (lx2s = 200 mm x 30 mm on the frontal 4 700.2 -14.4
wall) considered in [2]. 5 696.5 -15.1

6 677.1 -22.5

Page 403 of 478


ICIST 2014 - Vol. 2 Poster papers

Finally, numerical results for enclosure, with REFERENCES


dimensions (300x400x200) with aperture (50 x 30) mm [1] H. A. Mendez, “Shielding theory of enclosures with
on the front wall and with monopole and dipole antenna apertures”, IEEE Transactions on Electromagnetic Compatibility,
with radius 0.08mm and lenght 10cm are compared in 1978, vol. 20, no. 2, p. 296–305.
terms of their ability to exactly account SE. The results [2] M.P. Robinson, T. M. Benson, C. Christopoulos, J. F.
for the SE obtained by both antennas (Fig. 12) shows that Dawson, M. D. Ganley, A. C. Marvin, S. J. Porter, D. W. P.
from the 0 to 1 GHz the SE curve for the case when Thomas, “Analytical formulation for the shielding effectiveness of
antenna is dipole, including the values of resonant enclosures with apertures” IEEE Trans. Electromagn. Compat.,
vol. 40, no. 3, pp. 240–248, Aug. 1998.
frequencies, follow the curve for TLM model without
[3] J. Shim, D.G. Kam, J.H. Kwon, J. Kim, ”Circuital Modelling
antenna better than curve for TLM model with monopole and Measurement of Shielding Effectiveness against Oblique
antenna. In the rest of frequency range they are mostly Incident Plane Wave on Apertures in Multiple Sides of
overlapping, including the values of resonant frequencies. Rectangular Enclosure”, IEEE Trans. Electromagn. Compat., vol.
52, no. 3, pp. 566–577, Aug. 2010.
[4] L. J. Nuebel, J. L. Drewniak, R. E. Dubroff, T. H. Hubing,
T. P. Van Doren, “EMI from Cavity Modes of Shielding
Enclosures – FDTD Modeling and Measurements”, IEEE
Transactions on Electromagnetic Compatibility, 2000, vol. 42, no.
1, p. 29–38.
[5] S.Ali, D. S. Weile, T.Clupper, “Effect of Near Field
Radiators on the Radiation Leakage Through Perforated Shields”,
IEEE Trans. Electromagn. Compat., Vol. 47, No. 2, pp. 367–373,
May 2005.
[6] B.L. Nie, P.A. Du, Y.T. Yu, Z. Shi “Study of the Shielding
Properties of Enclosures With Apertures at Higher Frequencies
Using the Transmission-Line Modeling Method” , IEEE Trans.
Electromagn. Compat., vol. 53, no. 1, pp. 73–81, Feb. 2011.
[7] B. Milovanović, N. Dončov, V. Milutinović, T. Cvetković,
“Numerical characterization of EM coupling through the apertures
in the shielding enclosure from the viewpoint of electromagnetic
compatibility”, Telecommunications - Scientific journal published
by the Republic Agency for Telecommunications – RATEL, no.6,
pp.73-82, 2010.
Figure 12. SEz of metal enclosure (300x400x200) mm with [8] V. Milutinović, T. Cvetković, N. Dončov, B. Milovanović,
rectangular aperture (50 x 30) mm on the front wall - “Analysis of the shielding effectiveness of enclosure with multiple
numerical TLM model for dipole antenna, monopole circular apertures on adjacent walls”, Proc. Int. Conf. on
antenna and without antenna. Information, Communication and Energy Systems and
Technologies – ICEST 2011, Niš, Serbia, vol. 3, pp.685-688, 2011.
[9] Cvetković, V. Milutinović, N. Dončov, B. Milovanović,
IV. CONCLUSIONS “Analysis of the influence of polarization and direction of
propagation of a incident plane wave on the effectiveness of
An impact of parameters like radius and length of rectangular enclosures with apertures”, Proc. Int. Scientific-
small dipole or monopole antenna, used to measure the Professional Symp. INFOTEH, Jahorina, vol. 10, Ref.B-I-6, pp.90-
94, 2011.
level of EM field at some characteristic points in the
[10] V. Milutinovic, T. Cvetkovic, N. Doncov, B. Milovanovic,
enclosure, on the SE of enclosure is investigated in this
“Analysis of enclosure shielding properties dependence on
paper. Model based on TLM method with compact wire, aperture spacing and excitation parameters”, Proc. of the IEEE
have been used to accurate estimate the SE of enclosure TELSIKS Conference, Niš (Serbia). 2011, vol. 2, p. 521-524.
with apertures. Obtained results shows that the coupling [11] V. Milutinovic, T. Cvetkovic, N. Doncov, B. Milovanovic,
due to receiving antenna presence, inevitable in the „Shielding Effectiveness of Rectangular Enclosure with Apertures
measurement process, can be very significant especially and Real Receiving Antenna”, Proceedings of the INFOTEH
regarding the resonant frequencies locations and level of Conference, Jahorina (Bosnia and Herzegovina), 2012. vol. 11,
Ref.KST-4-9, p. 440-444.
SE in considered frequency range. Therefore, this impact
has to be taken into account during the experimental [12] V. Milutinovic, T. Cvetkovic, N. Doncov, B. Milovanovic,
„Circuital and Numerical Models for Calculation of Shielding
characterization in order to correctly estimate the SE of Effectiveness of Enclosure with Apertures and Monitoring Dipole
metal enclosure. Numerical TLM model is capable to Antenna Inside”, Radioengineering journal, Brno University of
accurately account for not just passive but also active Technology - Faculty of Electrical Engineering and
presence of dipole antenna inside the enclosure. Communication, Vol.22, No. 4, pp 1249-1257, ISSN:1210-2512 –
SCI journal.
[13] A.J. Wlodarczyk, V. Trenkic, R. Scaramuzza, C.
Christopoulos, ” A Fully Integrated Multiconductor Model For
ACKNOWLEDGMENT TLM”, IEEE Transactions on Microwave Theory and Techniques,
This work has been partially supported by the Ministry 1998, vol. 46, no. 12, p. 2431-2437.
for Education, Science and Technological Development [14] C. Christopoulos, “The Transmission-Line Modelling (TLM)
Method”, IEEE Press in association with Oxford University Press,
of Serbia, project number TR32052. Piscataway, NJ, 1995.

Page 404 of 478


ICIST 2014 - Vol. 2 Poster papers

The Cross Layer Model for Wireless


Networks Energy Efficiency
Borislav Odadžić*, Dalibor Dobrilović*, Željko Stojanov*, Dragan Odadžić**
* University of Novi Sad / Technical Faculty "Mihajlo Pupin", Zrenjanin, Serbia
** Javno preduzeće Transnafta, Novi Sad, Serbia

[email protected], [email protected], [email protected], [email protected]

Abstract— This paper discusses fundamentals of energy In contrast, the improvement in battery technology is
efficiency and spectral efficiency. The main focus is on a much slower, increasing by 10% every two years, leading
system based approaches towards energy efficient to an exponentially increasing gap between the demand
transmission and resource management across frequency for energy and the battery capacity offered [1].
and spatial domains. As energy-efficient wireless networks Power management is one of the most challenging
design requires a cross‒layer approach because power problems in wireless communication, and recent
consumption is affected by all components of system, researches have addressed this topic. Wireless devices
ranging from physical to application layers, this article have maximum utility when they can be used anywhere at
presents a details related to recent advances in cross-layer any time. However, one of the greatest limitations to that
model design for energy-efficient wireless networks. goal is finite power supplies. Since batteries provide
limited power, a general constraint of wireless
communication is the short continuous operation time of
I. INTRODUCTION mobile terminals. Additional, power savings may obtain
As an important part of the Information and by incorporating low power strategies into the design of
Communications Technology (ICT), wireless network protocols used for data communication.
communications and networks have become an essential Energy consumption is affected by all layers of wireless
part of the modern life and are responsible for energy system, from electronic circuit to applications. The layer
saving. The future success of wireless networks hinges on by layer approach leads to independent design of different
the ability to overcome the mismatch between the layers and results in high design margins. Cross layer
requested Quality of Service (QoS) and limited network approaches can significantly improve energy efficiency
resources. During the past decades, much effort has been considering interactions between different layers and as
made to enhance wireless network throughput. However, well as adaptability to service, traffic, and environment
high wireless network throughput usually implies large dynamics. Hence, a system approach is required for
energy consumption. A very important task is how to energy efficient wireless communications. This paper
reduce energy consumption while meeting throughput and addresses the tasks of energy efficiency at all layers of the
quality-of-service requirements in such networks and protocol stack for wireless networks.
devices.
Radio Frequency (RF) spectrum is a natural limited II. BACKGROUND OF WIRELESS NETWORK DESIGN
resource and therefore must be used efficiently, that is AND ARCHITECTURES
where the significance of Spectral Efficiency (SE) lies. On
the other hand, EE (Energy Efficiency) is also becoming There are two wireless network architectures available,
increasingly important for mobile and wireless devices, as namely ad‒hoc (client to client) an d infrastructure
battery technology has not kept up with the growing (centrally coordinated) wireless networks.
requirements stemming from novel multimedia Ad‒hoc networks are multi‒hop wireless networks in
applications. which a set of wireless devices cooperatively maintain
Three key problems in wireless networks are: network connectivity [2]. By using ad‒hoc mode, all
devices in the wireless network are directly
 to operate with limited energy resources, communicating with each other in peer to peer
 the need to maintain quality of service (throughput, communication mode. No access point (routers or
delay, bit error rate, etc.) over time varying radio switches) is required for communication between devices.
channels, An example of an ad‒hoc topology is shown in figure 1.
 to operate in a dynamic and heterogeneous wireless Ad hoc networks are characterized by dynamic random,
environment multi-hop topologies with typically no infrastructure
support. For setting up ad‒hoc mode, the wireless
The demand for high data rate multimedia wireless adaptors of all devices to be at ad‒hoc mode and all
communications has been growing rapidly and requires adaptors must use the same channel name and same
higher capacity wireless links to meet increasing demands. Service Set Identifier (SSID) for making the connection
At the same time mobile and wireless devices power active. Ad‒hoc mode is most suitable for small group of
consumption is also increasing, while silicon processor devices and it is helpful in situations in which temporary
power consumption increasing by 150% every two years. network connectivity is required.

Page 405 of 478


ICIST 2014 - Vol. 2 Poster papers

 5 GHz Frequency band


 Wide radio channel bandwidth
 New Modulation and Coding Scheme (MCS)
 Compatibility with existing WLANs
 Coexistence with existing networks
 Support for up to eight spatial streams
 Beam forming and Multi-User MIMO
 Energy Efficiency
Figure 1. Ad - hoc wireless network topology
In Table 1 we compared technical specifications of
The infrastructure wireless network architecture is IEEE 802.11ac and IEEE 802.11n.
shown in figure 2. Wireless networks often extend wired
networks, and are referred to as infrastructure networks TABLE 1
COMPARISON BETWEEN TECHNICAL SPECIFICATIONS
All devices are connected to wireless network with the IEEE 802.11AC AND IEEE 802.11N
help of Access Point (AP). Wireless APs are usually Technical specification IEEE 802.11n IEEE 802.11ac
routers or switches which are connected to Internet by Frequency 2.4, 5GHz 5GHz
broadband modem. A hierarchy of wide area and local Modulation OFDM OFDM
area wired networks is used as the backbone network.
Channel bandwidth 20, 40MHz 20, 40, 80,
160MHz
Aggregate data rate Up to 600Mb/s Up to 3.47Gb/s
Spectral Efficiency 400b/s/Hz 200b/s/Hz
Energy efficiency 750bits/μJ 1400bits/μJ
EIRP 22‒36dBm 22‒29dBm
Range 12‒70m 12‒35m
Through Walls Yes Yes
Non LOS Yes Yes

The IEEE 802.11ac provides two times higher energy


efficiency with respect to the existing IEEE 802.11n.

Figure 2. Infrastructure wireless network topology


III. FUNDAMENTALS OF ENERGY EFFICIENCY AND
The Access Point are responsible for coordinating SPECTRAL EFFICIENCY
access to one or more transmission channels for wireless In general, EE means using less energy to accomplish
devices located within the radio signal coverage area. the same task. Energy efficiency is commonly defined as
Therefore, within infrastructure networks, wireless access information bits per unit transmit energy: bits/Joule. For
to and from the wired host occurs in the last hop between an additive white Gaussian noise (AWGN) radio channel
AP and wireless terminal devices that share the bandwidth for a given transmit power P, and system bandwidth B, the
of the wireless channel. channel capacity is given by equation 1:
The IEEE 802 Standards Committee formed in 2008 a
new task group with the goal of creating a new
amendment to the IEEE 802.11-2007 standard. The new R = ½ log 2 (1+P/N0B) bits (1)
amendment, known as IEEE 802.11ac, includes
mechanisms to improve the data throughput of the Where N0 is the noise power spectral density.
existing Wireless Local Area Networks (WLAN), According to the sampling theory Δt = 1/2B and the
enabling wireless networks to offer wired network channel capacity is C = 2BR bits/s. Consequently, EE is
performance. The IEEE 802.11ac employs more spatial given by equation 2:
streams through 8x8 Multiple-Input Multiple-Output
(MIMO), offers wider radio channel bandwidths (up to
η EE = C/P = 2R/N0 (22R – 1) (2)
80MHz) and use of radio channel aggregation, for up to
160MHz of total system bandwidth. Furthermore, IEEE
802.11ac has been building on the existing IEEE 802.11n The EE derived from equations (2) based on the
standard. In this way compatibility is ensured with the information theoretic analysis might not be achieved in
existing WLAN networks and applications which use practical systems due to performance loss of capacity,
IEEE 802.11n or previous IEEE 802.11 standards. The approaching channel codes, insufficiently knowledge of
IEEE 802.11ac standard improve significantly throughput CSI (Channel State Information’s), cost of
for existing application areas with data rate over 1Gb/s synchronization, and transmission associated electronic
and many several new features. This new standard is also circuit energy consumption. We use common channel
known as Very High Throughput (VHT), 802.11ac and state information (CSI) to dynamically assign wireless
achieves this purpose by building on the existing 802.11n resources to users to improve spectral and energy
technology. The key characteristics of IEEE 802.11ac efficiency [3, 4].
technology are:

Page 406 of 478


ICIST 2014 - Vol. 2 Poster papers

Power efficiency means the transmission with a specific devices cannot provide constant linearity if they are
bit/symbol error probability at the smallest received power powered by a limited power supply [6, 7].
levels. The received power level is usually measured in The modulations techniques such as MFSK and MPSK
terms of the Signal to Noise Ratio (SNR) expressed as the are less sensitive to PA nonlinearities, with respect to
ratio E b /N 0 between the received energy per bit and the QAM modulation, however, they are less spectrally
noise power spectral density. The power efficiency can be efficient. The solution to this problem is to use a
expressed as E b /N 0 required to achieve a given bit error combination of amplitude and phase modulation.
rate P b.
Bandwidth efficiency or Spectral Efficiency (SE) B. Physical layer
means the ability to transmit a specific amount of data per In the wireless communication systems the physical
second within a smallest bandwidth. The bandwidth layer consists of radio frequency (RF) circuits, modulation
efficiency η SE is usually defined as the ratio between the devices, and channel coding systems. From an energy
data rate R b and the bandwidth B required to transmit the efficient perspective, considerable attention has already
data [4]. been given to the design of this layer. At the physical layer
Since EE and SE are two important system performance energy can be saved if the system is able to adapt its
indicators, the tradeoff between EE and SE for general modulation techniques and basic error correction schemes
networks must be taken into account also in the design of to radio channel conditions and application requirements.
wireless system. Many approaches to dynamically changing the
transmission power in wireless networks have been
IV. CROSS PROTOCOL LAYER APPROACHES TO THE proposed. However, few of them were designed with
ENERGY EFFICIENCY consideration for the battery lifetime of wireless units.
Efficient utilization limited resource such as radio Most of them aim to guarantee limits on SNIR (Signal to
related resources (bandwidth, power, time) and user Noise and Interference Ratio) or maximize area of
device related resources (battery, buffer capacity) is of coverage with radio signal [8].
great importance for the wireless networks. Spectral and From an energy efficient point of view it is important to
energy efficiency is affected by all protocol layers of use modulation schemes that are insensitive to
system design, ranging from electronic circuit to nonlinearities which allow using more efficient power
applications. The traditional layer-wise approach leads to amplifiers.
independent design of different protocol layers and results 1) Energy-efficient transmission in frequency domain
in high design margins. Cross-layer approaches exploit
interactions between different protocol layers and can Orthogonal Frequency Division Multiple Access
significantly improve system performance as well as (OFDMA) modulation technique has been extensively
adaptability to service, traffic, and wireless environment studied for next generation wireless communication
dynamics. Recent efforts have also been undertaken to systems. In different frequency bands occur different
cross-layer optimization for throughput improvement and types of fading, which is why Orthogonal Frequency
energy consumption at all layers in protocol layer stack of Division Multiplexing (OFDM) becomes a key
wireless communication systems, from architectures to modulation scheme for next generation broadband
algorithms [3]. Many mechanisms are proposed that aim wireless standards. Energy efficient OFDM systems, a
to reduce energy consumption during both active special case of OFDMA, have been first addressed with
communication and idle periods in communication. consideration of circuit consumption for frequency
Following a hybrid protocol architecture based on the selective fading channels [1]. While OFDMA can provide
Internet and the IEEE 802.11 architectures, we can list high throughput and SE, its energy consumption is large
some major energy saving algorithms located at different sometimes. OFDM is the most important example of
protocol layers [5]. spectrally efficient transmission scheme, but on the other
hand the power efficiency is low because of the high Peak
A. Transceiver Power Amplifier Efficiency to Average Power Ratio (PAPR) level. In contrast to the
traditional spectral efficient scheme that maximizes
Understanding the power characteristics of the radio throughput under a fixed overall transmit power
used in wireless devices is important for the efficient constraint, the new scheme maximizes the overall EE by
design of communication protocols. A typical radio adjusting both the total transmit power and its distribution
applied in wireless systems can have three states of among subcarriers. Since EE and SE are two important
operation: transmit, receive and idle mode. Maximum system performance indicators, the tradeoff between EE
power consumption is in the transmit mode, and the least and SE for general OFDMA networks should be exploited
in the idle mode. The goal of protocol development for to guide system design.
given power resources is to optimize the transceiver
power consumptions. 2) Energy-efficient transmission in spatial-domain
The function of the Power Amplifier (PA) in a wireless Multiple input and multiple-output (MIMO) techniques
communication system is to increase the power level of have been shown to be effective in improving wireless
the transmit signal so that the corresponding received system capacity and spectral efficiency. There are two
signal has provided at least a minimum level of quality. major categories of MIMO: spatial diversity, in which the
The energy consumption of PAs is a large part of the same data is transmitted over each of the multiple paths,
overall power consumption of wireless communication and spatial multiplexing, in which each of the paths carries
systems, therefore high efficiency PAs are of great different data. In 2x2 MIMO with spatial diversity, for
importance. However, high efficiency and high linearity example, each of the two antennas is essentially
are conflicting requirements in PA, because electronic transmitting and receiving the same data although the data

Page 407 of 478


ICIST 2014 - Vol. 2 Poster papers

is coded differently. This mode is primarily used to have to be made. There are several mechanisms for an
improve signal quality, or to increase the coverage area. In energy-efficient MAC protocol design:
2x2 MIMO with spatial multiplexing, two different  Minimizing unsuccessful actions
streams of data are transmitted over each of the two  Minimize the number of transitions
channels, which theoretically doubles the system
throughput. Spatial multiplexing is the mode that really  Power management
takes advantage of the capacity improvement capabilities  Synchronization between the wireless devices
of MIMO. The system throughput can be increased and the APs
linearly with the number of transmit antennas without There are many ways in which these principles can be
using any additional spectrum resources. implemented and several researchers have compared
Although MIMO techniques have been shown to be various mechanisms
effective in improving capacity and SE of wireless 2) Energy-efficient Logical Link Control protocol
systems, energy consumption also increases. First of all, design
more circuit energy is consumed due to the duplication
number of transmit or receive antennas. The exploitation The reliability of the single wireless link is provided by
of multiple antennas requires more active circuit the Logical Link Control (LLC) sub layer by using
components, which increases both transmit power and techniques for error control. Due to the dynamic nature of
circuit power. Depending on the ratio of the extra capacity wireless networks, adaptive error control gives significant
improvement and the extra energy consumption, the EE of improvement in SE and EE. The errors that occur on the
a multiple antenna system may be lower than that of a wireless channel are caused by phenomena such as:
single antenna system. Moreover, more time or frequency fading, interference and wireless user mobility. In
resources are spent on the signaling overhead for MIMO characterization the performances of wireless channel,
transmission. For example, in most of MIMO schemes, there are two variables of importance:
CSI is required at the receiver or at both the transmitter
and the receiver to obtain good performance [4].  Bit Error Rate (BER) as a function of Signal to Noise
Ratio (SNR) at the receiver
C. Data link layer  Burstiness of the errors on the wireless channel.
The data link layer is composed out of a Medium Error control is applied to handle these errors. Note that
Access Control (MAC) sub layer and a Logical Link even a single uncorrected bit error inside a packet will
Control (LLC) sub layer. This layer is responsible for result in the loss of that packet.
wireless link error control, security, encryption and The two most common techniques used for error
decryption, mapping network layer packets into frames, control are: Automatic Repeat reQuest (ARQ), and
and packet retransmission. Forward Error Correction (FEC). Hybrids of these two
1) Energy-efficient Medium Access Control protocol also exist. Adaptive error control adapts the applied error
design control to the observed wireless channel conditions. Both
ARQ and FEC error control methods waste network
The MAC sub layer of the data link layer is responsible bandwidth and consume power resources due to
for providing reliability to upper layers for the point to retransmission of data packets and greater overhead
point connection established by the physical layer. The necessary in error correction. Care must be exercised
MAC sub layer interfaces with the physical layer and is while adopting these techniques over a wireless channel
represented by protocols that define how the shared where the error rates are high due to noise, fading signals,
wireless channels are to be allocated among a number of and disconnections caused from mobility [4].
wireless subscribers.
In such a dynamic wireless environment it is likely that
The MAC protocol can be used to define in advance any of the above techniques is not optimal in terms of
when each wireless device may receive or transmit data. energy efficiency all the time.
Each device is allowed to use power saving operational
modes when it is certain that no data will arrive for it. D. Network layer
Power management protocols manage the trade-off
between energy and performance by determining when to The main functions of the network layer are routing
switch between power save mode and active mode. packets and congestion control. Energy efficiency aspects
Depending on the chosen MAC protocol, some energy in this layer are mainly studied in the context of ad-hoc
may be consumed due to channel access or contention networks. In ad-hoc networks the data possibly has to pass
resolution. For example, in IEEE 802.11 networks, the multiple hops before it reaches its final destination. This
sender transmits a Ready To Send (RTS) message to leads to a waste of bandwidth as well as an increased risk
inform the receiver of the sender’s intentions. The receiver of data corruption, and thus potentially higher energy
replies with a CTS (Clear To Send) message to inform the consumption due to the required error control mechanism.
sender that the channel is available at the receiver. The In wireless mobile networks, the network layer has the
energy consumed for contention resolution includes the added functionality of routing under mobility constraints
transmission and reception of the two messages. and mobility management including user location, update,
Additionally, the nodes may spend some time waiting etc. Energy efficient routing is a wide field of research for
until the RTS can be sent and so consume energy listening ad hoc networks.
to the channel [4]. The objective of an energy efficient E. Transport layer
MAC protocol design is to maximize the performance by
minimizing the energy consumption of the wireless The transport layer provides reliable data transport,
devices. These requirements often conflict and a tradeoff independent of the physical networks used. The Transport

Page 408 of 478


ICIST 2014 - Vol. 2 Poster papers

Control Protocol (TCP) is a reliable, end-to-end, transport layers, while it relies on a top-down approach when the
protocol that is widely used to support various information flow from upper to lower layers. In a full
applications. However, due to wireless channel properties, cross-layer optimization where information are exchanged
the performance of traditional transport protocols between all the protocol layers, middle level protocols
degrades over a wireless channel. TCP and similar (e.g. network protocols) can be optimized by
transport protocols resort to a larger number of simultaneously using a top-down and bottom-up feedback
retransmissions and frequently invoke congestion control [2].
measures, confusing wireless channel errors and loss due Application of the cross layer approaches in terms of
to handoff as channel congestion. This can significantly energy efficiency increase of the complexity of the system
reduce throughput and introduce unacceptable delays. and therefore the increase on computational power must
These measures result in an unnecessary reduction in the be considered. While implicit cross layer approaches only
bandwidth utilization and increases in energy increase the complexity of the design, explicit cross layer
consumption because it leads to a longer transfer time [9]. approaches also increase the complexity of the system
resulting in a higher computational energy consumption.
F. Operating system and application layer As a result, in order to increase the overall energy
At the operating system and application layers various efficiency of the system, the additional computational
mechanisms can be used to improve energy efficiency. power required for enabling an explicit cross layer
Energy efficiency at the application layer is becoming an approach must be lower than the amount of transmission
important area of research. The techniques on application power saved through the cross layer algorithm.
level can be used to reduce the amount of data to send
and, the amount of energy consumed. Furthermore, V. CONCLUSION
multimedia applications processing and transmission
Energy efficiency is an issue involving all layers of the
require considerable power as well as network bandwidth.
system. In this paper, several techniques applied in
In general, the majority of the techniques used in the
wireless networks from energy efficiency point of view
design of today’s applications to conserve bandwidth also
have been analyzed. Much of the work in this area used a
conserve battery life [9].
single layer view, however the cross layer approach
G. Cross Layer Approach to the energy efficiency allows to fully optimizing the performance of wireless
networks considering network parameters from several
As wireless channel is a shared medium, wireless layers of the protocol stack as a whole.
device energy efficiency is affected not only by the layers
composing the point-to-point communication link, but The cross-layer approach allows better resource for
also by the interaction between the links in the entire power utilization with respect to the traditional individual
network. According to this a systematic approach, layered approach. However, this is achieved at the
including both transmission and multiuser resource expenses of a more complex system which requires
management, is required for energy-efficient wireless adaptability to the system conditions.
communications. Hence, energy efficiency is an issue As for the information-theoretic aspect, most literature
involving all layers of the wireless system, from physical about EE mainly focused on point-to-point scenarios and
layer, to the application layer and the entire network the impact of practical issues on EE is not fully exploited.
In the cross-layer approach each protocol layer is Today the advanced techniques such as OFDMA, MIMO
optimized taking into account the knowledge of features and efficient MAC and routing protocols, existing
and parameters of other protocol layers, not necessarily research has proved that larger EE can be achieved
located at the bordering upper or lower levels. The cross- through energy efficient design.
layer design approach provides better resource utilization
and trade-offs solution with respect to a layered approach, ACKNOWLEDGMENT
but this can be achieved at the expenses of a more This work was supported in part by the Ministry of
complex design which requires adaptability to the system Education, Science and Technological Development of the
changes by propagating modification on one protocol Republic Srbija under Project MPN TR32044 "The
layer to all the others. The more general view of a cross- development of software tools to analyze and improve
layer approach to the network design leads to achievement business processes"
of a global optimization of the system performance.
The cross-layer approach can be categorized into REFERENCES
explicit and implicit. In case of implicit cross-layer [1] G. Miao, N. Himayat, G. Y. Li, and D. Bormann, Energy-efficient
optimization, interactions between different protocol design in wireless OFDMA, in Proc. IEEE Int. Conf. Commun.
layers are taken into account, but there is no exchange of (ICC'08), Beijing, China, May, 2008.
information between protocol layers during runtime. As a [2] C. E. Jones, K. Sivalingam, P. Agrawal, and J. C. Chen, A Survey
result, the implicit optimization is not adaptive. In case of of Energy Efficient Network Protocols for Wireless Networks,
explicit cross-layer optimization, the exchange of Wireless Networks Journal, Volume 7 Issue 4, doi
10.1023/A:1016627727877, pp 343 - 358, Kluwer Academic
information regarding protocol parameters, user Publishers Hingham, MA, USA, 08/01/2001.
requirements or wireless channel state is required with the [3] ETSI, Environmental Engineering (EE), Measurement Method for
aim of maintaining performance optimization and high Energy Efficiency of Wireless Access Network Equipment, TS
level of efficiency even if the communication parameters 102706 2008 V1.2.1 (2011-10).
change. The explicit one is adaptive to the channel [4] Li, G.Y.; Zhikun Xu; Cong Xiong; Chenyang Yang, Energy-
conditions and the application and user requirements. The efficient wireless communications: tutorial, survey, and open
explicit cross-layer optimization relies on a bottom-up issues, Wireless Communications, IEEE (Volume:18, Issue: 6)
approach when the information flow from lower to upper ISSN: 1536-1284, pp 28 - 35, December, 2011.

Page 409 of 478


ICIST 2014 - Vol. 2 Poster papers

[5] G. Miao, N. Himayat, G. Y. Li, and A. Swami, Cross-layer [8] B. Odadzic, D. Odadzic, Comparison of Approaches to Energy
optimization for energy-efficient wireless communications: a Efficient Wireless Networks, International Conference on Applied
survey, Wiley J. Wireless Commun. Mobile Comput., vol. 9, no. Internet and Information Technologies AIIT 2013, Proceedings,
4, pp. 529-542, April, 2009. ISBN 978-7672-211-2, COBISS.SR-ID 281228551, pp.7-13,
[6] DCKTN Wireless Technology & Spectrum Working Group Technical Faculty "Mihajlo Pupin" Zrenjanin, Serbia, Zrenjanin,
Energy Efficient Wireless Communications, Russell Square House October 25, 2013.
10 - 12 London, March, 2011. [9] 2011Khan, S., Peng, Y., Steinbach, E., Sgroi, M., & Kellerer, W.
[7] B. Panajotovic, B. Odadzic, Design and "Intelligent" Control of Application-driven cross-layer optimization for video streaming
Hybrid Power System in Telecommunication, 15th IEEE over wireless networks. IEEE Communications Magazine, 44 (1),
Mediterranean Electromechanical Conference, MELECON 2010, pp 122-130, 2006.
IEEE Proceeding, pp. 1453-1458, Valletta, Malta, April 25-28,
2010.

Page 410 of 478


ICIST 2014 - Vol. 2 Poster papers

OLAP ANALYTICAL SOLUTION FOR


HUMAN RESOURCE MANAGEMENT
PERFORMANCE MEASUREMENT AND
EVALUATION: FROM THEORETICAL
CONCEPTS TO APPLICATION
Ruţ ica Debeljački*, Olivera Grljević *
* University of Novi Sad/Faculty of Economics/Department of Business Informatics, Subotica, Serbia
{ruzica, oliverag}@ef.uns.ac.rs

Abstract — Human capital provides organizations the sight into employee contributions to achievement of com-
ability and readiness for adaption to turbulent pany’s goals and shows which employees possess suitable
changes in the business environment. Human re- competencies that are required to build competitive advan-
sources are the only active and creative part of work tage.
process. For this reason, the strategy of human re- II. THEORETICAL AND METHODOLOGICAL
sources development is one of the most important FOUNDATIONS OF THE DEVELOPMENT OF A SYSTEM FOR
segments of the strategy of organization development. MONITORING THE PERFORMANCE OF THE PROCESS OF
Organizations can improve their business perfor- HUMAN RESOURCE MANAGEMENT
mance by developing analytical systems. Using the
analytical systems, organizations are able to conti- A. The Purpose and Nature of Human Resource
nuously measure and assess its performance, take cor- Management
rective actions and provide stabile position in today's The majority of authors agree that a company cannot
competitive environment. The system should enable function without human resource support. With their
easy and efficient monitoring and analysis of data. knowledge and skills, the employees greatly impact the
Comparing planned values with actual achievements company’s performance and its position in the market.
of management, organizations can identify critical Human Resource Management - HRM represents a stra-
points in terms of achieving the goals. This paper will tegic, integrative and coherent approach to recruitment
present a theoretical and methodological foundation and development of employees so that they can contribute
to the achievement of the company’s goals, [1]. HRM
of human resource management, and describe devel-
comprises a variety of processes and activities, such as
oped OLAP analytical solution for human resource human resource planning and organizing, recruitment,
management performance measurement and evalua- education and training of employees, assessing the effec-
tion. tiveness of HRM strategies and undertaking corrective
actions. The scope of HRM covers five main activities:
I. INTRODUCTION
• increasing the effectiveness of the process of re-
The emergence of information systems as a key factor cruitment
in business operations analysis and a tool for reduction of
• optimising the process of employee training
business operating costs has significantly changed the
approach to business management. Companies are forced • controlling labour costs
to constantly measure and evaluate their performance and • managing employee attrition
undertake corrective actions in order to retain their posi- • measuring and assessing employee performance
tion in the competitive environment. Since in majority of Unlike other company resources, human resources are
companies employee earnings account for almost a half of the only active and creative component of a work process.
business operating costs, it is possible to conclude that one Therefore, a strategy for employee development
of the most vital factors in gaining competitive advantage represents one of the most important segments in compa-
is exactly the contribution of labour force to a company’s ny development.
performance. The use of functional and powerful system B. Measuring and Assessing Human Resource
for human resource management increases the possibili-
Management Performances
ties of a company’s potential growth and development.
The application of analytical systems enables companies Performance can be viewed as a company’s ability to
to manage business operations more efficiently and hence achieve its goals, [4]. However, a more accurate definition
improve business performance, increase profitability of performance would be: a company’s success in achiev-
through more efficient cost control and optimize the num- ing its established goals and accomplishing its mission.
ber of employees. The use of analytical systems to track Business Performance Management – BPM refers to a
labour force performance allows managers to gain an in- continuous cycle of setting goals, planning, measuring and

Page 411 of 478


ICIST 2014 - Vol. 2 Poster papers

analysing, providing feedback, and improving perfor- C. The Role of IT in Human Resource Management
mance, [4]. In order to ensure the success of BPM, the The efficient business operations management is not
company has to fully understand its own business opera- possible without suitable IT/IS support. The IT/IS plat-
tions and activities in the process of reaching its strategic form enables companies to increase productivity and prof-
goals. Besides that, it is necessary to define duties and itability, and at the same time leads to cost reduction.
responsibilities correctly in order to facilitate tracking Business operations data are usually recorded in Enter-
progress in achieving the established goals. Hence, BPM prise Resource Planning systems – ERP systems. These
requires the existence of key performance indicators (KPI) systems belong to a category of transactional systems.
which should be available at the right time and the right They represent modular solutions, i.e. they are made up of
level of decision-making. The experts from this field have applications independent of each other, and offer an inte-
critical importance in identifying relevant indicators of grated platform for all business processes in a company.
business operations. Without good indicators of business The greatest contribution of ERP systems is that they ena-
operations, the effects of measuring and assessing perfor- ble integration of data of different business processes and
mance will definitely not be satisfying. Therefore they provide clear overview of transactions that have occurred
will not represent a reliable source of information for between them. ERP systems are limited in terms of data
making high-quality decisions. The process of managing analysis which is necessary for the decision-making
business performance could comprise the following activi- process because they are not developed to serve that pur-
ties: (i) Performance planning: (1) defining the expected pose. A thorough investigation of information requires the
performance (outcome/behaviour), which is connected introduction of new software solutions whose main pur-
with strategic company goals, (2) identifying critical fac- pose is data analysis and which help to obtain credible and
tors of success, (3) determining performance indicators, timely information. Reporting systems play an important
(4) establishing performance standards, (5) identifying and role in this field. Their main task is to enable performing
analyzing performance risk; (ii) Monitoring, measuring careful business analysis through offering credible and
and assessing performance: (1) observing performance, timely information, which increases business process effi-
(2) measuring and assessing performance, (3) comparing cacy. By using data obtained from transactional systems,
achieved goals with those established, identifying discre- reporting systems can create reports in all forms, textual,
pancy, (4) providing feedback about performance; (iii) tabular or graphic.
Continuous performance improvement: (1) performance
improvement planning, (2) implementing performance Since data availability has noticeably increased in busi-
improvement plan, (3) tracking progress of performance ness environment recently, companies need to be ready to
improvement, [5]. respond to a newly created challenge. From the point of
view of system performativeness, transactional systems
Managing human resource performances leads to a sus- are not suitable for storing a large amount of data. Data
tainable company success through employee performance warehouses – DW have been designed to perform that
improvement. Regular performance control provides task. A DW is an analytical database which is convenient
feedback about employee results and detects the need to for storing multi-annual data. It represents a central repo-
undertake corrective actions in order to improve perfor- sitory for both internal and external data, and is particular-
mance by offering training or other activities which can ly suitable as a basis for business operations analysis.
help to solve the identified problems, [3]. The impact of Since company’s business operations are highly complex
HRM process on company performance is difficult to and comprise several business processes, DW can be di-
measure. Therefore, it is necessary to design innovative vided into different data marts, where one data mart usual-
systems for assessing and measuring human resource per- ly corresponds to one business process, and in that way
formance which will help to present the impact of this they enable in-depth analyses. Unlike DW, data marts can,
process on the company’s general success, [6]. The sys- in addition to detailed information, contain aggregate in-
tem for measuring and assessing human resource perfor- formation as well and thus ensure faster data analysis.
mance should contain the following activities: Speaking of systems that are suitable for working with
• setting goals for tracking and assessing perfor- data warehouses, reporting systems are not as good as
mance solutions created in OLAP (On-Line Analytical
• determining components for assessing employee Processing) environment. The software designed in OLAP
work performance (quality and quantity of work environment focuses on the analysis of business data, i.e.
accomplished, attitude towards work, relations- detecting trends through (available) aggregate and detailed
hip with other employees...) data. The core of OLAP system is a multidimensional
• defining key activities for the process of asses- cube, which enables analysts to interactively manipulate a
large amount of detailed and consolidated data, and to
sing work performance investigate them from various perspectives. Since multi-
• selecting methods which will be applied to mea- dimensional view is hierarchical, the analyst can observe
sure and assess performance data from a hierarchical perspective. Such structure allows
• appointing a leader for the above mentioned data segmentation within the database, which entails data
process slicing according to a criterion given in the query, dicing,
• detecting problems and errors in the process of aggregation and disaggregation of data along the analyti-
measuring and assessing performance cal hierarchy and the like. When compared with reporting
• undertaking corrective actions in order to im- systems which enable static reporting through presentation
prove the above mentioned process of data formatted and organized according to specific
business requirements, OLAP software allows dynamic
data analysis by creating advanced reports such as score-
cards and dashboards. Scorecards are used to measure

Page 412 of 478


ICIST 2014 - Vol. 2 Poster papers

accomplishments compared to established goals. They are the analytical system based on the integrated analytical
created with the aim to show KPIs, of which each indica- data repository (DW).
tor represents one aspect of organizational performance.
Taken together, these KPIs offer a view of the company’s B. Architecture of the System for Measuring and Assess-
performances at a specific point in time. A dashboard ing Human Resource Management Process Perfor-
represents an aggregated view of different types of re- mance
ports, including scorecards too. It can contain reports of The architecture of the system for measuring and as-
different business processes in order to give an insight into sessing performance of HRM comprises data sources,
the whole business. ETL (Extraction, Transformation and Loading) processes,
DW and software designed in OLAP environment. Di-
III. OLAP ANALYTICAL SOLUTION FOR mensional modeling has been applied for designing this
MEASURING AND ASSESSING HUMAN RESOURCE analytical system. It represents a compilation of tech-
MANAGEMENT PERFORMANCE niques which are used to create OLAP analytical solutions
and analytical database schemes. The ETL processes ex-
A. The Purpose of the Analytical System for Measuring
tract data from source systems, transform them and load
and Assessing Human Resource Management them into the analytical database. A layer of data between
Performance the source (transactional) system and the analytical data-
The analytical system for measuring and assessing base contains normalized data which enable division of an
HRM performance is used as an information platform ETL process into two parts. The first part is the responsi-
whose aim is to improve the given process. The system bility of a company where the system has been imple-
should enable easy and efficient tracking and analysis of mented, and is related to extracting data from source sys-
data related to established and achieved goals of HRM in tems and their formatting. The responsibility for the latter
order to identify critical points in meeting objectives, [6]. falls on software suppliers and comprises cleaning, map-
Questions concerning HRM that have frequently been ping, restructuring and loading of data. The application for
asked are as follows: final users is designed in Microstrategy OLAP environ-
• What are the current and future HR needs in a ment and allows transformation of data into attractive and
company? easy to understand control tables and reports.
• How to attract, retain and motivate inventive
employees? Source sys- System for measurement and evaluation of the
tems human resources
• How to help employees give their best in terms
of productivity?
ERP ETL
• Are salaries and perks in accordance with res-
OLAP business
ponsibilities related to a position? solutions –
Human
• Does employee training meet company needs? Date resources predefined and
• What is the superordinate-subordinate relations- Spread- storage ad-hoc analysis
sheets
hip between employees? How does that relation- area Data Mart
ship change over time?
• Is there a suitable assessment tool of employee ...
performance and is it used efficiently? Figure 1. The architecture of the system for measuring and assessing
Seeking answers to the above listed questions entails HRM process performance
devising and undertaking a number of activities which
will lead to the achievement of objectives. It is necessary C. Functionality of the System for Measuring and Moni-
to investigate how successfully these activities are accom- toring the HRM Performance
plished so that discrepancies can be detected on time, The aim of the designed system is to enable efficient
which allows for timely response. This requires defining performance analysis which will contribute to high- quali-
the framework for activity accomplishment, which con- ty HRM. The analysis of information requirements has led
tains deadlines and planned outcomes. The aim of the to identification of subsystems of reports. Some of the
analytical system which measures and assesses the per- subsystems are as follows:
formance of human resource is to enable monitoring of
these activities through measuring and assessing the ac- • Attrition Analysis – The aim of this analysis is
complishment of planned activities in HRM. This system to maintain the attrition rate at an acceptable
should be able to provide the following types of informa- level and to minimise the loss of key employees
tion: (loss of talent).
• employee attrition analysis • Compensation Analysis –The analysis of this
• analysis of earnings compared to positions held group of reports is related to company compen-
• analysis of costs of working overtime sation costs. Performing these analyses enables
• analysis of employee training costs companies to follow trends in cost movement
• analysis of employee qualifications and their and to detect significant discrepancies from sta-
suitability for the position ndard values on time.
• analysis of employee productivity • Recruitment Analysis – This analysis optimises
These information requirements have a strong analyti- the recruitment process through identification of
cal feature, so it comes as natural to seek responses within employee profiles, company demand for
employees, and employment trends.

Page 413 of 478


ICIST 2014 - Vol. 2 Poster papers

• Workforce Development – This group of established goals. The yellow zone also shows unaccepta-
analyses helps to match the skills required to ble values which, however, do not pose a threat to the
perform certain activities with employee overall business operations of a company. If there are any
qualifications. discrepancies (yellow and red zone), users can obtain de-
• Productivity Analysis – This analysis refers to tailed reports which help to perform an analysis at the
level of an organizational unit, employees etc. Such re-
employee performance analysis in terms of achi-
ports contain much larger number of performance indica-
eving individual goals, and rewarding tors, whose analysis can reveal causes of those discrepan-
employees based on their accomplishments. cies. In order to carry out these analyses, users can employ
The analysis of information requirements also contri- OLAP objects such as metrics, attributes, hierarchies, fil-
butes to identification of KPI of certain HRM segments. ters and so on. Within a system, KPIs are manifested as
Out of several dozens of indicators, the following KPIs OLAP metrics which contain formulae to count their val-
best illustrate the effectiveness of specific groups of ana- ues. The perspectives from which KPIs can be observed in
lyses: the system are represented as dimensions of business op-
• employee attrition erations with their attributes. By choosing an OLAP
• average employees’ salary attribute, users choose the perspectives from which they
• hiring rate want to get an insight into results of the company’s opera-
• average training costs tions. Hierarchies enable users to observe the same data at
different levels of complexity (in terms of details). By
• work productivity listing data filtering conditions, users determine a subsys-
A variety of KPI monitors are used to illustrate KPIs. tem of data which will represent the basis for obtaining
KPI monitors are aimed at owners of HRM processes and results. In addition to attributes, metrics, filters and other
provide instantaneous visual view of the state of a compa- OLAP objects which allow users to perform ad-hoc ana-
ny in terms of achieved results compared to those planned. lyses, the applications for final users comprise predefined
KPI monitors offer guidelines for decision-making related reports. These reports help to carry out more thorough
to future company business operations. analyses so as to discover causes of the detected problems,
and they represent the basis for undertaking corrective
actions. If predefined reports are not sufficient for identi-
fying causes of problems, users can always do an ad-hoc
analysis and create new reports.

Figure 2. KPI monitor system for measuring and assessing the perfor-
mance of HRM

Fig. 2 shows KPI monitor (scorecard) which illustrates


KPIs of different subsystems of reports. A subsystem of
reports that deals with employee attrition analysis is
represented by employee attrition indicator, which is cal-
culated by comparing the number of employees who have Figure 3. Ratio of workers salaries in the market standards
left the company during the past year with the total num-
ber of employees. All values up to 1.5% are considered Fig. 3 illustrates a predefined report which represents
acceptable. In compensation analysis, the average em- the ratio between employees’ salaries and average salaries
ployees’ salary is compared to average market earnings. prescribed by standards for different positions. This analy-
The higher indicator value, the better company position in sis shows whether the amount given as a salary is suitable,
the market. Also, the more developed a company, the which can easily tell us something about employee com-
more it invests in its employees, which means that em- pensation satisfaction. A potential bonus shows the level
ployee training analysis has to show higher values. A of compensation in cases when employees do not reach
work productivity indicator shows the importance of hu- the established goals. The level of potential bonus is de-
man factor in a company, and is counted by comparing fined within the strategic company goals. It needs to be
employee performance with the number of working hours. mentioned that the illustrations do not use data from prac-
Hiring rate as an indicator is counted by comparing the tice but from the test base, which means that they do not
number of newly hired employees with the total number necessarily show the usual values in practice. By using a
of active employees. large number of components of the analytical solution,
such as tabular and graphic reports, scorecards and dash-
In this example, the acceptable values are marked boards, users can easily use the delivered OLAP solution.
green. The red zone represents unacceptable values and With technological advancements, analytical solutions
indicates that there has been a problem in reaching the

Page 414 of 478


ICIST 2014 - Vol. 2 Poster papers

offer a powerful information distribution. There is a pos- assessing performance. Our plan is to add more functions
sibility to automatically generate reports and send them so that the future systems can support all activities that are
via e-mail, mobile phones and other mobile devices, or by related to HRM performance.
personalized control tables.
IV. CONCLUSION
The challenge that companies face today is how to de- REFERENCES
velop and apply agile and high level of competency of [1] M. Amstrong, Armstrong’s Handbook Of Human Resource Man-
employees and maintain cost efficiency at the same time. agement Practice (11th edition), London and Philadelphia: Kogan,
Therefore, companies need analytical solutions which, 2009
together with other solutions, offer proactive analyses and [2] Gartner Group, Key Issues for Business Intelligence and Perfor-
mance Management Initiatives. Preuzeto 25. avgusta 2013. sa sajta
timely observations of events which are important for a http://www.gartner.com/it/content/660400/660408/key_issues_bi_
company to function successfully. Research have shown research.pdf, 2010
that more than 20% of users use BI proactively, [2]. Ana- [3] A. Nagpal, K. Kewal, Business Performance Management: Next
lytical solutions help companies to improve management in Business Intelligence. 2nd National Conference on Challenges
& Opportunities in Information Technology, Punjab, 2008
efficiency and enhance performance by increasing profit-
[4] L. Stainer,. Performance Management and Corporate Social Re-
ability through more efficient labour cost control. Compa- sponsibility: The Strategic Connection. Strategic Change, 15,
nies who see their employees as a developmental potential 2006, pp 253-264
and the basis for competitive advantage, and consider HR [5] N. Balaban, Ţ. Ristić, L. Šereš, K. Belić, N. Grubač,: Sistem
management a part of their strategy and policy, will suc- podrške upravljanju performansom u kontekstu strategijskog
ceed not only in adapting to changes more rapidly, but in upravljanja. Zbornik radova XIV Internacionalni naučni skup SM,
2009, Palić
becoming proactive as well, i.e. they will be bringing
[6] B. Bogićević, Menadţ ment ljudskih resursa, Ekonomski fakultet,
about changes and their competitors will have to adapt. Beograd.,2008. pp. 22
This paper has offered theoretical and methodological [7] L. Šereš, R. Debeljački, Educational Process Performance Mea-
foundations of the development of a system for measuring surement and Evaluation System for Higher Education Institutions
and assessing HRM performances. It has also presented – Architecture and Functionality. Proceedings of the International
Conference on Applied Internet and Information Technologies –
the main functions of the designed software and shown its AIIT, Zrenjanin, Srbija, 2012, pp 271-276
technical and technological architecture. At this stage of
development, the system supports only measuring and

Page 415 of 478


ICIST 2014 - Vol. 2 Poster papers

Approach to Multidimensional Data Modeling in


BI Technology
Jelena Lukić
JP “Elektromreža Srbije”, Belgrade, Serbia

Abstract: This paper focuses on the data modeling part of a and dashboards to tune their action to the company’s
Business Intelligence system. The proposed approach for strategy. [3]
multidimensional data modeling aims to provide background
information techniques used to design InfoCubes, the multi- The purpose of this work is to explain how to support
dimensional structure within Business Intelligence and to Online Analytical Processing (OLAP) in BI. The basic
introduce a common scenario of the integrated platform. Within objective of this paper is to present a method for
a Business Intelligence initiative, monitoring is possible by multidimensional modeling using extended star schema.
attaching the key performance indicators to the Online In order to achieve these goals, we set out to make a
Analytical Processing cube. Therefore, a minimal set of key comprehensive presentation of the theoretical aspects
performance indicators are proposed. The proposed approach is concerning the previously introduced concepts,
used to develop the pilot project of Business Intelligence methodologies and specific data modeling techniques.
solution in Public Enterprise “Elektromreža Srbije“. Results This research is divided into two parts. In the first part a
have shown that this approach can offer several benefits, the
most important of which is that no data other than keys is stored literature review of business intelligence, key
in dimension tables. Finally, we conclude the paper with some performance indicators, data warehouse and
suggestions for future work. multidimensional modeling is presented. The second part
Keywords: Business Intelligence, Data Warehouse, gives a detailed explanation of the BI data model as well
multidimensional modeling, Key Performance Indicators as the modeling aspects derived directly from the BI data
model. Although this study deals only with
BusinessObjects software in the integration with SAP
I. INTRODUCTION Netweaver Business Warehouse (BW) as the core data
management application with external interfaces, the
While the increased capacity and availability of data theoretical concepts discussed are also valid for any other
gathering and storage systems have allowed enterprises to external application that may be used for reporting on
store more information than ever before, most data from the native SAP data warehouse.
organizations still lack the ability to effectively
consolidate, arrange and analyze this vast amount of data. II. LITERATURE REVIEW
Analysis – oriented information systems are frequently
based on a data warehouse in which relevant data is In recent years Business Intelligence systems have
collected, formated and made available consistently been rated as one of the highest priorities of
Information Systems (IS) and business leaders [4]. BI
Data warehouse (DW) is the core element of a business aims at providing a closed-loop support that interlinks
intelligence platform and can be defined as [1]: a subject- strategy formulation, process design and execution with
oriented, integrated, nonvolatile, and time-variant business intelligence [5]. According to Ranjan [6]: BI is
collection of data in support of management’s decisions. the conscious, methodical transformation of data from
The data warehouse contains granular corporate data. any and all data sources into new forms to provide
The term Business Intelligence (BI) became rather information that is business-driven and results-oriented.
popular in the early 1990s and it has been defined by The Current BI approaches are subordinated to
Data Warehousing Institute (TDWI) as [2]: the processes, performance management [7]. Most BI platforms are
technologies, and tools needed to turn data into deployed as systems of performance measurement,
information and information into knowledge and whereas they are not used for decision support [8]. For
knowledge into plans that drive profitable business performance measurement modeling, the business
action. Business Intelligence encompasses data objective is translated into a KPI that enables the
warehousing, business analytics and knowledge organization to measure some aspect of the process
management. against a target that they define. KPIs have been found to
Key Performance Indicators (KPI) are an integral part be useful in performance measurement, and the use of
of a BI solution as they contribute to a successful KPIs has been widely adopted in organizations [9].
execution of the BI and the overall enterprise strategy. In some studies, BI is concerned with the integration
Business scenarios are always changing and the managers and consolidation of raw data into KPIs. KPIs represent
are now required to ensure that all processes are effective an essential basis for business decisions in the context of
by constantly measuring the processes’ performances process execution. Therefore, operational processes
through KPI and score cards. The strategic management provide the context for data analysis, information
analyzes medium and long term trends through OLAP interpretation, and the appropriate action to be taken [4].
tools and checks the effectiveness of the strategy pursued BI includes a set of concepts, methods and processes to
in a short period through KPIs and dashboards, whereas improve business decisions, using information from
tactical and operational decision makers use other KPIs multiple sources and applying past experience to develop

Page 416 of 478


ICIST 2014 - Vol. 2 Poster papers

an exact understanding of business dynamics [4]. It has III. APPROACH TO MULTI-DIMENSIONAL DATA
emerged as a concept for analyzing collected data with MODELING IN BI TECHNOLOGY
the purpose of helping decision making units to get
better comprehensive knowledge of an organization’s The core of every BI platform is the data model. The
operations, and thereby make better business decisions overarching goals of multi-dimensional models are [18]:
[10]. Therefore, BI has a broad scope and covers a wide • To present information to business analysts in a
range of tools, among which the most important and best- way that corresponds to their normal understanding of the
known are the applications such as data warehouse, data business i.e. to show the KPIs, key figures or facts from
mining, OLAP, DSS, Balance Scorecard (BSC), etc. many significant perspectives. In other words, to deliver
A DW is considered to be one of the most powerful structured information so that a business analyst can
decision support and business intelligence technologies easily navigate by using any possible combination of
that have emerged in the last decade [11]. The data business terms to illustrate the behavior of the KPIs.
warehouse (DW) technology has been developed to • To offer the basis for a physical implementation
integrate heterogeneous information sources for analysis that the software recognizes (the OLAP engine), thus
purposes [12]. allowing a program to easily access the data required.
Throughout the years numerous data modeling This section presents an insight into the
techniques have been discussed and used for building a multidimensional design methodology that we have
data warehouse. Data warehousing methodologies are selected for this survey. The proposed methodology
rapidly evolving, but vary widely because the field itself supports multidimensional data marts through an
is not very mature [13]. Data warehousing methodologies extended star schema design.
share a common set of tasks, including business The most popular physical implementation of multi-
requirements analysis, data design, architecture design, dimensional models on relational database system-based
implementation, and deployment [1,13,14]. One of the data warehouses is the Star schema implementation. BI
problems that exist related to data warehouse design is uses the Star schema approach and extends it to support
lack of procedures necessary to select the appropriate integration within the data warehouse, to offer easy
schema [15]. handling and allow high performance solutions[18].
Data warehouses store huge amount of information Data model is defined by the collection of
from multiple data sources which is used for query and InfoProviders that hold business data. InfoProviders
analysis. Therefore, data is stored in a multidimensional include objects that physically store data (data targets),
structure. A multidimensional model stores information such as InfoCubes, operational data store objects (ODS
into facts and dimensions. A fact contains interesting objects), and InfoObjects (characteristics with attributes
concepts or measures (fact attributes) of a business or texts).
process, whereas a dimension represents a perspective or Physical InfoProviders, also called InfoCubes, are
view for analyzing a fact using hierarchically organized central objects of a multi-dimensional model. An
dimension attributes. Multidimensional modeling InfoCube is possibly the most commonly used
requires specialized design techniques that resemble the InfoProvider for reporting purposes. It is built from
traditional database design methods [16]. The most well- multiple InfoObjects which are grouped together into
known dimensional model is the Star model, while other Dimensions and hold the master data.
variations are the Snowflake model and the Dimensional
Fact Model [17]. This structure is referred to as the Star Schema. The
transaction data is stored in a fact table, to which the
Multidimensional data models have three important dimensions are linked. InfoCubes in are either standard or
application areas within data analysis. First, real-time. In this paper the term always refers to standard
multidimensional models are used in data warehousing. InfoCubes. The SAP implementation of the Star Schema
Second, multidimensional models lie at the core of On- is called the Extended Star Schema. BI uses the Star
Line Analytical Processing (OLAP) systems. Third, schema approach and extends it to support integration
multidimensional data are becoming the basis for data within the data warehouse, to offer easy handling and
mining, whose aim is to (semi-) automatically discover allow high performance solutions [18].
unknown knowledge in large databases.
What is most significant about this approach is that
The aim of this research is to integrate and extend the data is no longer stored in the Dimension Tables. Instead,
findings of previous DW/BI research by developing, Master Data and attributes of Master Data are stored in
testing and refining a data model for a specific purpose, external tables and linked to the Dimension tables via SID
application and technology of BI. In the proposed or Surrogate Identification Numbers and the Dimensions
approach, the extended star schema model helps to are linked to the Fact Table via DIMID or DimensionID
achieve clear abstract modeling by using the Numbers. Each of these tables creates a SID which is a
multidimensional cube view and gives a brief overview number present in both the external Master Data Table
of related work in data warehouse design. This helps and the Dimension Table. This process is replicated for
users to easily model and develop analysis systems using each of the other Dimensions. In turn, each Dimension
the data warehousing framework. Table has a DIM which is linked to the central Fact Table
of the InfoCube. The Figure 4 visualises the general
structure of the Star Schema in BI [19].

Page 417 of 478


ICIST 2014 - Vol. 2 Poster papers

o Text Tables - textual descriptions of a


characteristic are stored in a separate text
table.
o External Hierarchy Tables - hierarchies of
characteristics or attributes may be stored in
separate hierarchy tables.
SID tables play an important role in linking the data
warehouse information structures to the InfoCubes and
DataStore Objects. To speed up access to InfoCubes and
DataStore Objects and to allow independent master data
layers, each characteristic and attribute is assigned a SID
column and their values are encoded into 4-byte integer
values.
A typical InfoCube consists of at least one InfoObject
of the type “key figure”, at least one InfoObject of the
type “time characteristic” and many InfoObjects of the
type “characteristic”. Key figures represent the actual
Figure 4. A Star Schema values that are to be evaluated in the InfoCube. Time
characteristics are usually chosen depending on the
The idea behind this model is to improve the Star granularity of the data and the requirements for its
Schema by moving the attributes from the dimension filtering on different date formats [21].
tables to master data tables that can be shared by several InfoObjects of type “characteristics” are the most
cubes. This, however, is not something that is done important elements in the data model. These InfoObjects
automatically. The developers must decide whether the define the data itself..
attributes are to be stored in the dimensions, master data
table or both. Based on experience with the Star schema, Characteristics may then be used in other
the BI data model (InfoCube) uses a more sophisticated characteristics as attributes and, if required, marked as
approach to guarantee consistency in the data warehouse navigation attributes. Navigation attributes make the data
and to offer data model-based functionality to cover the model more dynamic and easier to understand, which
business analyst’s reporting needs. The following table makes it a widely used feature in the data modeling [21].
shows differences in the terminology [18]: The procedure for building the model includes the
TABLE I. following steps [22]:
TERMINOLOGY 1. Creating BI data model with InfoObjects
Star Schema BI Data Model (InfoCube) (characteristics, key figures) and an InfoCube for
Fact Key Figure storing data in the BI system in the following steps:
Fact Table Fact Table • Creating Key Figures - the key figures provide
(Dimension) Atribute Characteristic, the transaction data to be analyzed.
Navigational Attribute, • Creating Characteristics - the characteristics are
Display Attribute, the reference objects for the key figures.
External Hierarchy Node • Creating InfoCubes - the “container” for data. It
Dimension (Table) Dimension Table, consists of key figures and characteristics.
Master Data Table,
2. Mapping the source structure of the data in the BI
Text Table,
system and defining the transformation of the data
External Hierarchy Table, (SID Table)
from the source structure to the target format.
This is the way to define the data flow in the BI
A multi-dimensional data model in BI consists of the system. The InfoPackage (IP) and Data Transfer Process
following tables [18, 20]: (DTP) are the objects used to execute the transfer data
1. The center of an InfoCube forms the fact table through the transformation process. The PSA (Persistent
containing the key figures. Staging Area) is the physical storage level that holds a set
2. The fact table is surrounded by several dimensions. of data in the exact way they are held in the source
system, and it can therefore be used for a variety of tasks,
3. Dimensions consist of different table types: such as consistency checks against the data in the source
• Dimension Table - in BI the attributes of the system table to validate the accuracy of uploaded data
dimension tables are called characteristics. The [23].
metadata object in BI which describes The structure and properties of the source data are
characteristics and also key figures (facts) is represented in the BI system with DataSources. In this
called InfoObject. case, DataSources are used to copy master data for the
• InfoObject Tables (i.e. Master Tables): characteristics as well as data from the relevant database
o Master Data Table - dependent attributes of tables to the entry layer of the BI system. The
a characteristic can be stored in a separate transformations define which fields of the DataSource are
table called the Master Data Table for that assigned to which InfoObjects in the target and how the
characteristic. data is transformed during the load process. The

Page 418 of 478


ICIST 2014 - Vol. 2 Poster papers

necessary objects for defining the data flow are created in warehouse layer, which stores granular, integrated data
the following steps [22]: resulted from the staging processes; and it supports
• Creating DataSources for Master Data of multidimensional data marts through an extended star
Characteristics. schema design. The particular methodology of the SAP
BW system, although based on general concepts, is
• Creating DataSources for Transaction Data. highly technical and complex, thereby providing a
• Creating Transformations for Master Data from powerful tool for data analysis.
Characteristics. Table II provides a list of some characteristics and key
• Creating Transformations for InfoCube. figures that were chosen in the model.
3. Loading the data – Loading processes are executed TABLE II.
using InfoPackages and data transfer processes CHARACTERISTICS AND KEY FIGURES
(DTP). Technical name Description
The InfoPackages load the data from the relevant file Characteristics
into the DataSource and the data transfer processes load bw_year Year
the master data from the DataSource into the bw_typesOfIndicators Types of indicators
characteristics or the transaction data into the InfoCube. bw_countryBorder Country Border
When the data transfer process is executed, the data is bw_direction Direction
subject to the corresponding transformation. The Key Figures
necessary objects for loading data are created through the min_bid_price Min Bid price
following steps [18, 20]: max_bid_price Max Bid price
min_number_of_auction_participants Min Number of Auction
• Creating Master Data Directly in the System. Participants
• Loading Master Data for Characteristic. max_number_of_auction_participants Max Number of Auction
• Loading Transaction Data. Participants
ATC Available Transmission
4. Defining a query. Capacity
5. The last step is creating the front-end analysis total_requested_capacity Total Requested Capacity
applications such as reports and dashboards. total_allocated_capacity Total Allocated Capacity
Tracking, reporting and managing the results is an number_of _bids Number of Bids
important part of any analyses-oriented system. The best number_of_congestions Number of Congestions
way to do this is with easy to read dashboard reporting. number_of_days_with_zero_capacity Number of Days With Zero
Capacity
IV. IMPLEMENTATION AND RESULTS
Public Enterprise “Elektromreža Srbije“ (PE EMS) is For the implementation of the data model based on the
the Serbian Transmission System and Market Operator concepts offered in the paper one data model was built:
(TSMO) and a member of the Europian Network of Cube for the electricity market (Technical name
Transmission System Operators for Electricity (ENTSO- CUBE_TEE). Figure 5 illustrates the star schema
E). Some of the core activities of the company are power structure of the InfoCube. Shown below is a typical Star
transmission, system operation and the organization of Schema arrangement, in this case, for Yearly and
the electricity market. Monthly Auctions for the allocation of transmission
The proposed approach is used to explore the capacities at the border of control areas of PE EMS,
possibilities for development of a BI solution in PE EMS. which is not a standard SAP BW Business Content but
Future work will be on the implementation of a BI will serve to illustrate how to design efficient dimensions.
solution with the use of the proposed approach which BW_DIRECT ION
ID INT EGER
BW_CAPACIT YT YPE
ID INT EGER
BW_AUCT IONT YPE
ID INT EGER
would allow for efficient decision making and global CODE
NAME
VARCHAR2(0 BYT E) CODE
VARCHAR2(0 BYT E) NAME
VARCHAR2(0 BYT
CODE
VARCHAR2(0 BYT
NAME
E)
E)
VARCHAR2(0 BYT E)
VARCHAR2(0 BYT E)
analysis. VALID
...
VARCHAR(0) VALID
...
VARCHAR(0) VALID VARCHAR(0)
...
The data modeling in the BI system is based on BW_YEAR
BW_KPIT YPE
recommendations from SAP, featuring a multilayered ID INT EGER
Auction_Fact ID
CODE
INT EGER
VARCHAR2(0 BYT E)
VARCHAR2(0 BYT E) FK_YEAR
structure, where the first DataStoreObject contains the CODE
NAME VARCHAR2(0 BYT E) FK_QUART AL NAME
VALID
VARCHAR2(0 BYT E)
VARCHAR(0)
FK_MONT H
transaction data in a granular form. Since data model of VALID
...
VARCHAR(0)
FK_DAY ...
FK_DIRECT ION
the integrated platform is fully defined in SAP BW FK_USER
ID
BW_QUART AL
INT EGER
FK_KPIT YPE
system, it is based on the BW concepts [20]. BW_PLANNED_REALIZED FK_PLANNED_REALIZED
CODE
NAME
VARCHAR2(0 BYT E)
VARCHAR2(0 BYT E)
ID INT EGER FK_BORDER
VALID VARCHAR(0)
In 2007 the BusinessObjects company was purchased CODE VARCHAR2(0 BYT E)
NAME VARCHAR2(0 BYT E)
FK_CAPACIT YT YPE
FK_AUCT IONT YPE
...

by SAP. This increased the number of projects based on VALID VARCHAR(0)


...
FK_UNIT
FK_GROUP
BW_MONT H
ID INT EGER
the integrated platform of the companies’ products. The PRICE
CONGEST ION
CODE VARCHAR2(0 BYT E)
BW_BORDER NAME VARCHAR2(0 BYT E)
integrated platform combines two separate platforms that ID INT EGER
NDWZC
CAPACIT Y
VALID VARCHAR(0)

have been designed to function independently from each CODE VARCHAR2(0 BYT E)
NAME VARCHAR2(0 BYT E)
NBIDS
...
...
BW_DAY

other, so the communication can only be established with VALID VARCHAR(0)


...
ID
CODE
INT EGER
VARCHAR2(0 BYT E)
the usage of specific interfaces on both systems [21]. NAME
VALID
VARCHAR2(0 BYT E)
VARCHAR(0)
...
This technology platform offers a comprehensive BW_GROUP BW_UNIT BW_USER

foundation for BI tools through its architectural ID


CODE
INT EGER ID
VARCHAR2(0 BYT E)
CODE
INT EGER ID
VARCHAR2(0 BYTCODE
E)
INT EGER
VARCHAR2(0 BYT E)
components: it supports the data acquisition and staging NAME
VALID
VARCHAR2(0 BYT
VARCHAR(0)
E)
NAME VARCHAR2(0 BYTNAME
VALID VARCHAR(0)
E)
VALID
VARCHAR2(0 BYT E)
VARCHAR(0)
process designed to ensure quality and integration of ... ... ...

enterprise-wide data; it enables the definition of a data Figure 5: Star Schema for the InfoCube

Page 419 of 478


ICIST 2014 - Vol. 2 Poster papers

An InfoCube allows 16 dimensions. In SAP BI, an It is often the case in the standard content that the data
InfoCube always has three technical dimensions (Unit, modeling is to take place in transformation rules. This
Time and Package), which are fixed for each InfoCube, in means reading the master data of objects, aggregating the
addition to the user definable dimensions. The unit values, and applying other business rules and
dimension contains the units or currencies of the key requirements. In this case, the transformations are kept
figures, the time characteristics are included in the time simple and do not contain any complex rules. The
dimension and the package dimension contains technical assignment is direct, the fields of the source are copied to
characteristics relevant to each data load. the InfoObjects of the target one-to-one.
BEx queries are designed by using the BEx Query
Designer, which is a software tool within the SAP
Business Explorer Suite.
The reporting components of the SAP BW software are
often called BEx components, which stands for Business
Explorer. Business Explorer Suite is a set of software
tools that have been designed for creating queries and
reporting on the SAP BW system directly [23]. The
traditional and the most recommended reporting scenario
is the scenario with the usage of a BEx query built on the
relevant BW InfoProvider. In this scenario, a BEx query
plays a role of the filtering mechanism on the BW side,
since data filters can be set by constructing a BEx query,
while the filtering can be made dynamic by using BEx
variables.
Selection of KPIs is ultimately a company-specific
decision. KPIs for the case company are based on best
Figure 7: InfoCube with Dimensions and Key Figures Transmission System Operator (TSO) practices and
The InfoCube is built of 10 dimensions (package and company’s practices. This KPI project, which is in
unit dimensions are not shown here), most of which progress, should cover the full scope of case company
remained unused in this research. The InfoCube is activities, i.e. the technical, commercial, economic,
modeled in such a way that related business objects are human resources and financial fields. The minimal set of
located in one dimension. key performance indicators is proposed.
The data flow starts in the source system from where Next table presents the information required, the
the data is extracted. The transaction data is initially owner/place to find them and information calculation for
unmodified and stored in a write-optimised one of the core indicators set up for case company [24]:
DataStoreObject and then transferred to the subsequent TABLE III.
INDICATORS FOR PE EMS
data targets. Master data is directly stored in the master
data tables of the InfoObjects (bw_year, bw_kpitype, Indicator Definiton/ Owner Cube/
bw_direction, bw_border...). Calculation Dimensions
The transformation, where the InfoCube is the target AUCTIONS FOR Market Cube for
InfoProvider, uses standard functionality to read the TRANSMISSION CAPACITY Division, electricity
ALLOCATION Transmission market
master data table of a characteristic to fill characteristic System (Technical
fields in the InfoCube. Transformation rules map any Allocated Capacity: The
Promise of Capacity notified to Control name
number of source fields to at least one field in the target. Division CUBE_AUKC)
Auction Participants in yearly or
monthly Auctions becomes
Allocated Capacity when
fulfilling the payment conditions.
Auction: The mechanism used to
Allocate Capacity via explicit
Yearly Auctions and/or Monthly
Auctions and/or Daily Auctions.

TRANSMISSION LOSSES Section for Cube for


Percentage of the transmission Analysis and energy
losses in relation to the total of Planning balance
injected energy on the Serbian (Technical
transmission system. name
GUB = (Total energy injection – EB_CUBE).
Total energy withdrawal) / Total
energy injection

CUBE DIM1 DIM2 DIM3 DIM4


Year Types of Country Direction
indicators Border
EB_CUBE X X

Figure 9: Transformations in SAP BW CUBE_TEE X X X X

Page 420 of 478


ICIST 2014 - Vol. 2 Poster papers

Creating a valid multi-dimensional data model in BI dimensional BI data model. The reasons of this fact will
implies being constantly aware of the overall enterprise be further analyzed in our research.
data warehouse requirements and the solution-specific We have studied only one system, and the main
analysis and reporting needs. The verification process drawback of the paper is the lack of information that will
involves checking that the schema is the correct model of offer an estimation of the benefits that EMS would
the data warehouse. The attributes of the dimensions achieve through the results of the project. The analysis of
identified are meticulously cross checked for the the project's results will be one of the main objectives in
conformity to requirements, and the conformity to the our future work. The question arises whether the strong
two Info cubes that were selected for analysis during link between logical multidimensional model and the data
modeling. The master data can be used by multiple base implementation as a star schema can last in times of
InfoCubes at the same time. a modern in-memory database. However, the good
The data needs to be presented in a highly visualized findings provide useful avenues for future research.
manner, in a single dashboard built with SAP Business
Objects Dashboards 4.0 tool. Charting is one element of REFERENCES
the data visualization capabilities. [1] Inmon W.H., “Building the Data Warehouse”, Third Edition, Wiley
The dashboard is based on EMS company’s data from Computer Publishing, John Wiley and Sons, Inc., New York, USA,
the year 2010 until the year 2012. The data was collected 2002, ISBN: 0471081302.
[2] David Loshin, “Business Intelligence: The Savvy Manager’s Guide,
on an Excel workbook and then imported into Oracle Addison Wesley, 2003, TDWI Business Intelligence Fundamentals
database 10g. An ETL processes were used to [3] Sanjeev Khatiwada, “Architectural Issues in Real-time Business
populate the BW. The query was used as a basis for the Intelligence”,Master’s Thesis, Faculty of Science and Technology, 2012
dashboard report. Now, the management can use the [4] Vahid Farrokhi, Lászlő Pokorádi, “The necessities for building a
dashboard that graphically shows indicators related to model to evaluate Business Intelligence projects- Literature Review”,
Yearly and Monthly Auctions for the allocation of International Journal of Computer Science & Engineering Survey
(IJCSES) Vol.3, No.2, April 2012
transmission capacities at the border of the control areas [5] Andreas Seufert, Josef Schiefer, “Enhanced Business Intelligence -
of PE EMS. Supporting Business Processes with Real-Time Business Analytics”
The dashboard will have to be planned properly to [6] Ranjan, J. “Business justification with business intelligence”, The
ensure its functionality and usability. Journal of Information and Knowledge Management Systems, Vol. 38,
No.4, pp. 461-475, 2008
Some of the important and useful features of the [7] M. Kathryn Brohman, “The Business Intelligence Value Chain:
solution are: Data-Driven Decision Support in a Data Warehouse Environment: An
• Collecting data from different EMS locations to Exploratory Study”, Proceedings of the 33rd Hawaii International
Conference on System Sciences - 2000
produce reports with different indicators.
[8] John Hagerty, Rita L. Sallam, James Richardson, “Magic Quadrant
• Intuitive and simple overview of KPIs, targets and for Business Intelligence Platform”, Gartner Inc., 2012
results in real time. [9] Veerawat Masayna, Andy Koronios, Jing Gao, Michael Gendron,
“Data Quality And KPIs: A Link To Be Established”,The 2nd World
• Mastering the strategic management cycle in order Congress on Engineering Asset Management (EAM) and The 4th
to have a clear view of the future of EMS, as well International Conference on Condition Monitoring
as being able to compare the present performances [10] Rafi Ahmad Khan, Dr.S.M.K. Quadri, “Business Integrated: An
with past ones and to benchmark EMS against Integrated Approach”, Business Intelligence Journal, vol. 5(1), 2012.
other similar TSOs. [11] Eiad Basher Alhyasat, “Data Warehouse Success and Strategic
Oriented Business Intelligence: A Theoretical Framework”, Journal of
Finding an optimal multidimensional model is a Management Research ISSN 1941-899X 2013, Vol. 5, No. 3
delicate balancing exercise between the size and number [12] Wided Oueslati, Jalel Akaichi, “A Survey on Data Warehouse
of dimension tables. Dimensional modeling is defined by Evolution”, International Journal of Database Management Systems
certain characteristics, such as smaller number of entities, (IJDMS), Vol.2, No.4, November 2010
intuitive presentation of data, analysis and querying [13] Arun Sen, Atish P. Sinha, “A Comparison of Data Warehousing
optimized models, etc., that makes it a good candidate for Methodologies”, Communications of the ACM, 2005, Vol. 48, No. 3
[14] Kimball, R., Reeves, L., Ross, M., and Thronthwaite, W., “The
the analytical environment. Characteristics that logically Data Warehouse Lifecycle Toolkit”, Wiley, New York, 1998
belong together are grouped together in a dimension. [15] M. H. Peyravi, “A Schema Selection Framework for Data
Dimensions are to a large extent independent of each Warehouse Design”, International Journal of Machine Learning and
other, and dimension tables remain small with regards to Computing, Vol. 2, No. 3, June 2012
data volume. This is beneficial in terms of performance [16] Rajni Jindal, Shweta Taneja, “Comparative Study Of Data
as it decouples the master data from any specific Warehouse Design Approaches: A Survey”, International Journal of
InfoCube. The master data can be used by multiple Database Management Systems ( IJDMS ) Vol.4, No.1, February 2012
[17] Cassandra Phipps, Karen C. Davis, “Automating Data Warehouse
InfoCubes at the same time (Table III). This InfoCube Conceptual Schema Design and Evaluation”
structure is optimized for data analysis. [18] Multi-Dimensional Modeling with BI - A background to the
techniques used to create BI InfoCubes, Version 1.0 May 16, 2006
V. CONCLUSION AND FUTURE WORK [19] Lonnie Ayers, “SAP BW Data Modeling”
[20] SAP Education BW310: BW – Enterprise Data Warehousing
BI systems create an opportunity for an effective [21] Maxim Kulakov, “Access control in Business Intelligence
management of an enterprise. The proposed approach for integrated platforms based on the example of SAP BW and SAP
multidimensional data model in BI technology offers BusinessObjects”, Masterarbeit, Wien, 2011
several benefits, the most important of which is that no [22] Step-by-Step: From the Data Model to the BI Application in the
data other than keys is stored in dimension tables. One of Web, www.help.sap.com
the primary goals of this approach was to show the [23] SAP Business Information Warehouse Reporting
different modeling aspects that result in a different [24] http://www.ems.rs/electricity-market/market-code/?lang=en
location of an attribute in a dimension of a multi-

Page 421 of 478


ICIST 2014 - Vol. 2 Poster papers

An Educational Application Comprising Speech


Technologies for Serbian Adapted to Visually
Impaired Children - anMasterMind
Stevan Ostrogonac*, Nataša Vujnović-Sedlar*, Branislav Popović*, Milan Sečujski*, Darko Pekar**
University of Novi Sad, Faculty of Technical Sciences, Novi Sad, Serbia
*
** AlfaNum Ltd, Novi Sad, Serbia

[email protected], [email protected], [email protected], [email protected], [email protected]

Abstract—This paper presents a computer application based in this paper, many educational games have been
on speech technologies for the Serbian language, which has developed in order to address different aspects of their
been adapted to persons with disabilities and is especially development. A special subcategory of these games is
intended for the purpose of education of blind or partially intended for partially sighted children. Research described
sighted children. There are many different approaches to in [5] introduces image-based games which are intended
education of visually impaired pupils. However, language- for teaching partially sighted children to recognize some
dependent educational tools are still very rare for most basic shapes. Tactile-based games are commonly used for
languages. The computer application anMasterMind the development of children’s capabilities of grasping the
represents the first educational game adapted to visually concept of 2D geometrical shapes [6] [7]. Many audio-
impaired persons which comprises both automatic speech based games have been developed and accepted as
recognition (ASR) and text-to-speech (TTS) technology for standard educational tools in schools around the world.
Serbian. Therefore, it can be used without any physical Unfortunately, more advanced games usually require more
contact with the computer. This feature makes the than just a set of previously recorded audio files.
application useful for people affected by other types of
disability in addition to sight impairment. This paper
Speech technologies play an important role in the
describes anMasterMind in detail and gives a report on the
development of advanced educational tools for visually
first research conducted with blind children with regard to impaired people (and people with other disabilities as
their acceptance of this type of educational approach. The well) [8] [9]. Since many of the advanced games are
first results are very promising and serve as motivation for language-dependent, their existence is conditioned by the
further research and development of similar educational existence of high-quality automatic speech recognition
tools. (ASR) and text-to-speech (TTS) systems for a particular
language [10]. The speech technologies for Serbian have
been developed for more than a decade and have reached
I. INTRODUCTION very high quality [11] [12]. Large speech and textual
The inclusion of persons with disabilities has always corpora have been obtained in order to provide enough
been a very important issue and in the past decades, with training data for ASR and TTS systems [13]. These
the advancements in different areas of science and technologies have already been used to create applications
technology, the number of possibilities for improving the for the visually impaired, such as anReader, Audio library
quality of life of the people with different types of [14] etc. However, very little progress has been made in
disabilities has increased dramatically. the education of blind and partially sighted children. The
only educational game comprising TTS technology for
Aids for persons with visual impairment have been Serbian called LUGRAM [15] was developed several years
developed for centuries and they can be categorized by the ago and is still being updated with new features inspired
types of problems which they address [1]. Most of the by the feedback from the visually impaired children, who
research projects dealing with these problems are are included in the testing process. The application
multidisciplinary [2] and require teams of experts with anMasterMind represents the first educational game
different background knowledge such as neurologists, comprising both ASR and TTS technologies for Serbian.
ophthalmologists, engineers etc. Education of adults with It is intended for education of visually impaired children,
disabilities usually implies some simulations of everyday but it is also useful for children and adults with other types
situations for which they need to prepare. Some of physical disabilities or with no disabilities, as it
interesting work related to spatial orientation of blind comprises a simple graphical user interface (GUI).
people has been described in [3].
The rest of this paper is organized as follows. Section II
Education of people with disabilities is much easier and gives a brief overview of the speech technologies for
more effective if it starts at an early age. For children in Serbian. In section III, the proposed application is
general, the main problem is to maintain their interest in described in detail. Section IV presents the results of the
the task at hand, and one of the ways to accomplish that is acceptance test for the application done by visually
through educational games. Many of these games have impaired pupils. In section V, a discussion about the
been developed for very young children in order to educational value of this application is given along with
enhance their cognitive capabilities. Some of these games the plans for further improvement and research in this
can be found at [4]. For visually impaired children, who field.
represent the main motivation for the research presented

Page 422 of 478


ICIST 2014 - Vol. 2 Poster papers

II. SPEECH TECHNOLOGIES FOR SERBIAN concatenative approach still produces higher quality of the
Speech technologies fall into the category of language- synthesized speech. One of the reasons for developing an
dependent research fields and, as such, demand a great HMM-based TTS system is the possibility of storing the
effort in order to be ported to new languages. Large parameters for speech production in a file which is an
training corpora need to be collected in order to obtain order of magnitude smaller than the speech repository
high-quality performance. Although most of the training used in concatenative synthesis. This makes it convenient
algorithms are universal, they usually need to be adapted for a HMM-based TTS system to be used on platforms
to every particular language to ensure high quality of the with different resource restrictions.
results. The training corpus for Serbian TTS contains about 10
hours of speech from a single female speaker. About 3
The following text briefly describes the resources and
hours are currently annotated, and the synthesis quality is
the basic concepts of ASR and TTS systems for Serbian.
already very good [13].
A. Automatic Speech Rrecognition
III. THE ANMASTERMIND APPLICATION
Automatic speech recognition represents the process of
obtaining the word sequence from the input speech signal. The application anMasterMind represents an
The ASR system for Serbian is based on hidden Markov implementation of the popular educational game Master
models (HMMs) [12]. Emitting probabilities of each Mind, which is a commercial form of an earlier pencil and
HMM state are modeled by Gaussian Mixture Models paper game called Bulls and Cows [17]. It is a code
(GMMs). For each Gaussian distribution, the parameters breaking game for two players. One player has the role of
(its mean and full covariance matrix) are determined by the codemaker and the other one is the codebreaker.
using a Quadratic Bayesian classifier. The most probable Within the anMasterMind application, the codemaker is
word sequence is determined by using Viterbi algorithm. the computer. The code represents an ordered set of four
Beam search and Gaussian selection are used to speed-up symbols which are randomly chosen from a set of six
the search. possible symbols. The symbols used are: “leptir”
The training corpus contains approximately 230 hours (butterfly), “kuća” (house), “ključ” (key), “makaze”
of speech (from more than 1500 different male and female (scissors), “sunce” (sun) and “kišobran” (umbrella). Each
speakers), excluding silent and damaged segments. For the of these symbols can appear multiple times in the code.
testing phase, a small corpus containing 30 minutes of The object of the game is for the codebreaker to determine
speech from 180 speakers (100 male and 80 female) is the code by iteratively naming the combinations of
used. The transcriptions are at the phone level, and the symbols and using the feedback information to define the
phoneme boundaries are set automatically, after which combination for each new try. The number of tries is
manual corrections are done. These corpora are used for limited to six, with the option of adding another try which
Serbian, but also for Croatian language, because the was introduced to meet the needs of younger children. The
phonetic inventories of these languages are practically feedback information for each try consists of the number
identical and only several minor adjustments are of symbols contained within the combination given by the
necessary. The new data are being added continuously. codebreaker, which are also present in the code.
Besides the most probable word sequence, which is Additionally, the number of symbols for which the right
obtained from the input signal, the output of the ASR position in the code is “guessed” is given.
system is also a confidence measure, which is important When the application is started, an initial window,
for adapting all ASR-based applications to different shown in Fig. 1, is opened. This window offers several
surroundings. For noisy environments, a higher options to the player. These options are introduced in
confidence measure threshold is needed than for quiet order to make the application useful for all potential users.
environments in order to avoid frequent recognition As it was mentioned before, anMasterMind can be used
errors. by people with different kinds of disabilities, or with no
disabilities at all. The options include separate activation
B. Text-to-speech synthesis of ASR and TTS technologies, as well as the activation of
The process of synthesizing speech signal can be the additional, seventh try for breaking the code. The
divided into high-level and low-level synthesis. High-level window contains the buttons for exiting the application or
synthesis includes text processing in order to obtain a starting it with the selected options. Another button has
narrow phonetic transcription of the text. This been added to this window for activating the application in
transcription consists of a string of phonemes (or the mode for the visually impaired. This mode will be
allophones) which are to be produced, along with the
corresponding prosody information (f0 movement, energy
contour, temporal duration of phonetic segments). Low-
level synthesis refers to the actual synthesis of the speech
waveform, based on the output of the high-level synthesis
module.
For the low-level synthesis, the concatenative approach
proved to be the best in terms of intelligibility and the
overall quality of the produced speech [11]. This approach
is based on TD-PSOLA synthesis algorithm. Another
approach involving HMM-based synthesis is currently
being developed for Serbian. The initial results have been Figure 1. The initial window of the anMasterMind application
published in [16] and are very promising, but the

Page 423 of 478


ICIST 2014 - Vol. 2 Poster papers

described in detail and discussed in the rest of this paper.


By starting the game in the mode for the visually
impaired, the main window, which is shown in Fig. 2, is
opened. Both ASR and TTS technologies are
automatically activated along with special options which
will be discussed later. Note that the codebreaker has six
chances to find the right combination of symbols. The
seventh try should be enabled in the initial window if
needed.
For each try, the main window contains four picture
boxes which display the chosen symbols, as well as
additional four picture boxes which are used to display the
feedback information. On the right side of the window
there are another four picture boxes which are used to
display the solution if the task is not solved. The window
also contains buttons for selecting each of the six possible
symbols, a button for deleting the last chosen symbol (in Figure 2. The main window of the anMasterMind application –
case the player or the ASR system makes a mistake), a game in progress
button used to ask for feedback, and reset, help, and exit
monitoring the process.
buttons. Each of the buttons contained within the main
window can be activated by an appropriate voice The dialogue between the application and the player is
command. designed to resemble the dialogue between two humans.
This is accomplished by using simple grammar rules
When the player chooses four symbols within one try
which allow different word sequences to be used to
and asks for feedback, the information is presented to him
accomplish the same goal. For example, these sequences
or her in the form of synthesized speech. As the game
progresses, as shown in Fig. 3, the player can ask for the all activate the reading of the game history: “pročitaj”
(read), “pročitaj sve” (read all), “pročitaj celu istoriju”
entire history of the game to be read in order for him to
(read the entire history), “pročitaj tok igre” (read game
remember some significant details. Note that in Fig. 3 a
steps), “pročitaj od početka” (read from the beginning)
configuration including the seventh try is presented. The
player can also ask for a particular try to be read to him or etc. The grammar rules are given in the Backus-Naur
her. Furthermore, the symbols chosen within the active try form. The entire grammar used within the anMasterMind
application is given in the following text:
can also be read on command. This is useful not only for
the player to remember which symbols (s)he has chosen,
but also to check if the speech recognition was successful. command = KRAJ | IZAĐI | ZATVORI | (NOVA IGRA) | RESET |
RESETUJ | POMOD;
When choosing symbols, the player can name each of
symbol = LEPTIR | KUDA | KUDICA | KLJUČ | MAKAZE | MAKAZICE |
the four symbols within a try individually, but if two or SUNCE | KIŠOBRAN | ŠALJI | POŠALJI | BRIŠI | OBRIŠI | IZBRIŠI;
more consecutive symbols are the same, the player can quantity = DVA | DVE | DVOJE | TRI | TROJE | ČETIRI | ČETRI | ČETIR
issue the command in a more natural way, e.g. “četiri | ČETVORO;
kuće” (four houses) or “dva leptira i dva ključa” (two symbol_genitive = LEPTIRA | KUDE | KUDICE | KLJUČA | MAKAZE |
butterflies and two keys). MAKAZA | MAKAZICE | MAKAZICA | SUNCA | KIŠOBRANA;
Even in the mode for the visually impaired, the command_BPM = PROČITAJ;
application displays the main window. This is useful for try_number = PRVI | DRUGI | TREDI | ČETVRTI | PETI | ŠESTI | SEDMI
the partially sighted, because it helps them to learn how to | (TOK IGRE) | (CELU ISTORIJU) | SVE | (OD POČETKA);
try_optional = POKUŠAJ;
recognize shapes displayed within the window. The visual
rule1 = $command;
information is also useful if, for example, a blind pupil is rule2 = {($quantity$symbol_in_genitive) | $symbol};
playing the game and a sighted teacher or parent is rule3 = $command_BMP [$try_number [$try_optional]]
do = $rule1 | $rule2 | $rule3;
main = [$do];

The grammar consists of several elements. One of them


is a set of variables (command, symbol, quantity,
symbol_genitive, command_BPM, try_number,
try_optional, rule1, rule2, rule3, do, main). The variable
main is the only keyword and it represents the set of word
sequences which can be the result of the recognition. The
rest of the variables can be used in different contexts by
using the prefix “$”. Another element of the grammar is
“|” and it represents a choice between two options. The
square brackets “[]” represent an optional sequence.
Regular brackets “()” mark a mandatory order, and the
content between the curly brackets “{}” may appear
multiple times within the resulting word sequence.
Figure 3. The main window of the anMasterMind application
The dialogue process is represented in Fig. 4. Within
the mode for the visually impaired, the dialogue is

Page 424 of 478


ICIST 2014 - Vol. 2 Poster papers

number of times the pupils asked for the entire history of


the game to be read to them, as well as the number of
requests for one of the previous steps to be read. The time
needed to finish each round was also measured, along
with the information about the number of steps (tries)
which the pupils needed to solve the task (that is, if they
were successful). The results of the testing are presented
in Table I.
As can be seen from the table, the pupils had seven
chances to break the code. Four out of five pupils
managed to break the code at least once. This is the most
important result of this study, as the object of our research
was to determine if an educational application such as
anMasterMind can be useful to the visually impaired
pupils. Since three of the pupils who participated in the
study managed to successfully complete the task at least
three times, it is clear that this game is appropriate for
Figure 4. The dialogue process within a game their age and is acceptable to them that speech
technologies for Serbian are used to adapt the game to
accomplished as follows. The user activates a command their needs. Another important result of the study lies in
by his or her voice. The ASR server is then engaged to the number of the rounds played by different pupils. It can
obtain a textual transcription of the input speech message. be observed that the pupils who were less successful in
Next, based on the command, game history and the code, completing the tasks lost their interest in the game
the response to the given command is generated by the quickly. This prompted an idea on how to improve the
dialogue manager. After this, the TTS server generates a application, which will be discussed in the next section.
voice message. This message is finally played to the user. As for the temporal durations of all the rounds played, it
It should be noted that partially sighted persons may use seems that there are no consistent patterns. The conclusion
the computer mouse for playing the game rather than which can be drawn form this is that the children
giving voice commands. In that case, after selecting each understood the object and the rules of the game before
symbol, the name of the symbol is read to the player. This they played the first round and there was no adjustment
is helpful for partially sighted persons because they learn period which was supposed to be indicated by the
to recognize the shapes displayed in the window and over decrease of time need to complete the tasks as more
time gives them confidence when using other computer
applications.
TABLE I.
RESULTS OF TESTING THE APPLICATION
IV. RESULTS OF THE RESEARCH INCLUDING VISUALLY
IMPAIRED CHILDREN Requests
Requests
Task Final for a
The anMasterMind application has been presented to Age Sex Time
complete try
for
single
the public within the Science Festival in Novi Sad and the history
try
International Fair of Technique and Technical 3 Yes 6 0 0
Achievements in Belgrade in May, 2013. People of all 6 No - 0 1
ages had the opportunity to test the application and they 5 Yes 6 0 2
showed great interest in speech technologies in general, 10 M 3 Yes 6 0 0
but for the children it was particularly interesting to be 10 No - 1 1
8 No - 1 2
able to play a game without using a computer mouse or
8 Yes 6 1 1
keyboard. 8 No - 2 1
The acceptance of the application by the sighted 5 Yes 6 2 1
children, however, did not guarantee a success with 4 No - 2 3
introducing anMasterMind to the visually impaired pupils. 11 F 6 No - 2 0
The application had to be interesting for the children in 9 Yes 6 2 3
order to keep their attention, but it was not clear if the task 9 No - 1 3
at hand was too difficult without the access to visual 9 Yes 4 2 1
information. In order to test the application in terms of 5 No - 1 2
7 Yes 6 2 0
acceptance, research was conducted in the school for 5 No - 2 0
visually impaired pupils “Veljko Ramadanović” in 10 M
3 Yes 6 1 0
Zemun. 8 Yes 7 3 1
Five blind pupils, three boys and two girls, of the age 9 No - 3 3
between ten and twelve, participated in the study. They 15 No - 2 2
have all had previous experience with tactile educational 9 No - 1 1
12 F
games, but none of them was familiar with the 10 Yes 7 3 3
applications including speech technologies. Each pupil 8 No - 2 2
13 No - 3 1
was asked to play several rounds of the game, until it 12 M 10 No - 2 1
became hard to concentrate on the task. The parameters 9 No - 2 2
measured during the testing included (for each round) the

Page 425 of 478


ICIST 2014 - Vol. 2 Poster papers

rounds are played. The temporal durations of the rounds command “pročitaj prethodni pokušaj” (read the previous
are arbitrary because they depend mostly on the difficulty step). This flaw in the design was spotted when one of the
of the task in each round. The frequency of requests for pupils wanted to ask for the information about the
the information about the entire history of the game or previous step of the game, but couldn’t remember what
some particular previous step (try) is also arbitrary for all the number of that step was.
the rounds, but the fact that, for almost all the rounds, Maybe the most useful feature which should be added
there was at least one request for the entire history and to the application is automatic logging, i.e. storing the data
several requests for one of the previous steps (in about about every round played by a pupil. Each pupil would
85% of the cases the requests were to have the previous have to be logged in and the information about the
step read to the player, which is why there is no progress of that pupil over time could be monitored
information on which step the players requested for without human effort. This would make more complex
exactly) indicated the importance of these options for the research possible.
visually impaired players.
After the testing, the pupils were asked for feedback ACKNOWLEDGMENT
concerning their interest in further studies such as the one The presented research was performed as a part of the
described in this paper. All the children, including their project “Development of Dialogue Systems for Serbian
teachers have shown great appreciation of the introduction and Other South Slavic Languages” (TR32035), funded
of the speech technologies into their educational system by the Ministry of Education, Science and Technological
and have expressed their hope for receiving new Development of the Republic of Serbia.
applications of this type in the future. The pupils also
pointed out some details which they would like to change The authors would like to thank the pupils, teachers,
in the existing application, and gave some suggestions on and the management of the school “Veljko Ramadanović”
the new features which could be added to the application. in Zemun for their interest and help in testing the
These suggestions are further discussed in the following application.
section.
REFERENCES
V. DISCUSSION AND FURTHER RESEARCH [1] Marion A. Hersh et al, Assistive Technology for Visually Impaired
and Blind People. Glasgow, Springer-Verlag, London, Limited,
When evaluating an application such as the one ch. 18, 2008.
proposed in this paper, the main problem is to find enough [2] Lotfi B. Merabet, Erin C. Connors, Mark A. Halko, and Jaime
human subjects to conduct the tests. The people involved Sánchez, “Teaching the blind to find their way by playing video
in the testing process have to be similar by a lot of criteria games,” DOI:10.1371/journal.pone.0044958.
(age, education, type and degree of disability etc.). The [3] D. Archambault, D. Burger, S. Sablé, “Tactile Interactive
evaluation of the anMasterMind application described in Multimedia computer games for blind and visually impaired
this paper was conducted with the help of five pupils children,” INSERM U483 / INOVA - Université Pierre et Marie
Curie, Laboratoire d'Informatique du Havre - Université du Havre,
which meet the similarity criteria. Even though the group 2001.
of subjects was small, the study resulted in several
[4] http://www.wonderbaby.org/articles/best-accessiblecomputer-
conclusions about further development of the application. games-blind-kids
Naturally, further research would bring new insights in [5] Y. Eriksson, “Computer Games for Partially Sighted and Blind
the problem addressed by the application. The plans for Children,” Department of Art History and Visual Studies,
further evaluation of anMasterMind include forming Göteborg University.
larger groups of visually impaired and organizing a [6] S. Rouzier, B. Hennion, T. Pérez Segovia, and D. Chęne,
sequence of testing sessions. Furthermore, evaluation of “Touching geometry for visually impaired pupils,” Proceedings of
EuroHaptics 2004, Munich, Germany, 2004.
the application with the help of pupils without visual
impairment, who would be denied the visual information, [7] R. Raisamo, S. Patomäki, M. Hasu,V. Pasto, “Design and
evaluation of a tactile memory game for visually impaired
could be a great platform for better understanding of the children,” Interacting with Computers, vol. 19, Issue 2, Elsevier
way blind and partially sighted people perceive the world Science Inc. New York, NY, USA, March, 2007.
around them, which would be of great help in adapting [8] Gy. Mester, P. Stanić Molcer and V. Delić, “Educational Games,“
their education. Chapter in the book Business, Technological and Social
Further development of the proposed application has Dimensions of Computer Games: Multidisciplinary Developments,
Ed: M. M. Cruz-Cunha, V. H. Carvalho and P. Tavares, IGI
been defined based on the results of the tests as well as on Global, Pennsylvania, USA.
the feedback from the pupils who were included in the [9] V. Delić, N. Vujnović Sedlar, “Stereo Presentation and Binaural
testing process. The main adjustment that needs to be Localization in a Memory Game for the Visually Impaired,”
implemented is the possibility of choosing tasks from Lecture Notes in Computer Sciences, LNCS 5967, Heidelberg,
different categories. Namely, if some pupils have a Springer, pp. 354-363, ISBN 978-3-642-12396-2, 2010.
problem solving the task of cracking the code when six [10] T. Gaudy, S. Natkin, D. Archambault, “Pyvox 2: an audio game
possible symbols are used, they should be able to reduce accessible to visually impaired people playable without visual nor
the number of the symbols used for defining the code, verbal instructions,” In 3rd Int. Conf. on E-learning and Games,
Nanjing, 2008.
until they are ready for more complex and complicated
[11] M. Sečujski, R. Obradović, D. Pekar, Lj. Jovanov, V. Delić,
tasks. This adjustment would help keep their interest in “AlfaNum System for Speech Synthesis in Serbian Language,”
the game. The pupils themselves have expressed the need Lecture notes in computer science, No LNAI 2448, pp. 237-244,
for two additional features. One of them is the command ISSN 0302-9743, 2002.
for listing the symbols which are used. The other one is [12] N. Jakovljević, D. Mišković, M. Janev, D. Pekar, “A Decoder for
quite obvious and represents a minor adjustment in the Large Vocabulary Speech Recognition,” Proc. IWSSIP 2011,
speech recognition grammar which would allow the voice pp.1-4, Sarajevo, Bosnia and Herzegovina, 2011.

Page 426 of 478


ICIST 2014 - Vol. 2 Poster papers

[13] V. Delić et al. “Speech and Language Resources within Speech Partially Sighted Children Using Speech Technology,” 2nd
Recognition and Synthesis Systems for Serbian and Kindred South International Conference TAKTONS, Novi Sad, pp. 104-107,
Slavic Languages,” Proc. of the 15th SPECOM, Plzeň, Czech ISBN 978-86-7892-555-9, 2013.
Republic, 2013. [16] E. Pakoci, R. Mak, S. Ostrogonac, “Subjective Assessment of
[14] D. Pekar et al. “Applications of Speech Technologies in Western Text-to-Speech Synthesis Systems for the Serbian Language”, 20th
Balkan Countries,” 7th Chapter in the book Advances in Speech TELFOR, Belgrade, 2012.
Recognition, SCIYO, pp. 105-122, ISBN 978-953-307-097-1, [17] http://en.wikipedia.org/wiki/Mastermind_%28board_game%29
2010.
[15] B. Lučić, N. Vujnović Sedlar, V. Delić, “Computer game
LUGRAM – Aid Which Contributes to Education of Blind and

Page 427 of 478


ICIST 2014 - Vol. 2 Poster papers

Mining Location in Geographic Information


Systems using Neural Network
Željko Jovanović*, Marija Blagojević*, Vlade Urošević*
*Faculty of Technical Sciences Čačak, University of Kragujevac, Serbia
[email protected], [email protected], [email protected]

Abstract—This paper presents example of open source GIS the suggested model of mining location was shown and
client-server system realization and proposed mining model the conclusions were set.
for extracting useful information.
II. REVIEW OF RELATED RESEARCH
I. INTRODUCTION This chapter provides a look into the review of the
related research which refers to the systems based on user
Geographic information system (GIS) has emerged as a activity monitoring and mining.
need for assistance technology in control and management
systems. Its development has been quickly followed by Since we do not always need whole GIS functionalities,
rapid information technology growth. It combined several open source technologies allow us to develop custom GIS
technologies and its usage enables better management and client-server architecture. In [1] open source technologies
rational utilization of all potentials of the environment in GIS architecture are discussed for small and medium-
(fastest route, nearest restaurant, burned calories…). Most sized Web GIS project.
common commercial GIS offers to its users a lot of Mobile devices become useful for GIS application. In
predefined functionalities. Some of them are unnecessary [2] key mobile GIS technologies are presented for real
for some users and some needed are not included in the time downloading and mobile GIS distribution.
GIS product. A location marking is usually done by map In server based GIS marking important locations is
overview and visual location identification. Smartphones usually done by map overview and visual location
offer marking locations at the location, due to their built in identification. Authors of [3] present mobile GIS
GPS sensors, while mobile application development offers characteristic and explain that mobile GIS [3]
the possibilities of custom client GIS application development is feasible thanks to Smartphone’s
development. functionalities.
This paper presents open source GIS client-server In mobile GIS application the main task in the system
system realization that could be used for custom user design is the creation of distributed system [4] that is able
needs. It consists of a server web application for data to mark important locations and allow its management
management and a client mobile application for location using some GIS.
of data gathering. One realization of the distributed GIS is presented in [5]
Android application store or IOS play store offers large aiming at low cost data gathering using mobile GIS client
amount of applications to their clients. That kind of and server based GIS manipulation system.
application diversity brings forth an increase in the Due to the increased expansion of monitoring systems
popularity of the smart phones. Since large number of and user activity tracking, a great amount of recordings
clients can use some of the applications it is possible to has become available. Data mining techniques tend to be
record their activity (application usage activity or their an appropriate approach in the analysis of the recordings
location activity). and detection of interesting information from data
Thus, large amount of data can be obtained. The data collections. As stated in [6], data mining obtained from
can be further used in the analysis of user behaviour such systems can be used for various purposes,
patterns, for drawing conclusions and making nevertheless finding similarities among different
recommendations. Data mining techniques tend to be the individuals relating their movements presents an
right solution for the extraction of useful information from important issue. The authors state in [6] that the results
large amount of data which are available in the form of the obtained using data mining techniques applied to such
above mentioned GYS systems. This paper shows the systems offer significant information for the individual,
review of a GYS system, the data obtained from the business and entire society. The paper also suggests the
system and the suggestion of data mining. The mining environment for movement modeling for each individual
model which will be used when the amount of data is and measuring similarities among users.
sufficient enough for further research has been presented. Similarly, in [7], trajectory mining was conducted with
The paper has been organized in the following way: It the aim of monitoring of visits to interesting places
is the review of the related research which refers to GIS containing cultural and historical sights. Furthermore, the
systems and their mining that was primarily given; then research follows and analyzes visits to shopping malls and
the characteristics of the designed system and the restaurants. Besides making recommendations for most
environment as well as the purpose, objectives and visited sites, such research can be significantly beneficial
assignments of the research were presented; whereupon for users to understand local places. Mining of such

Page 428 of 478


ICIST 2014 - Vol. 2 Poster papers

system suggests that the interest for specific places does


not depend only on the number of visits but on the user’s
experiences during the visits and travels. In addition to
this, it is shown that the user’s previous experiences are
closely linked to the locations that will be interested for
the user.
Having been used in the previous research, user
trajectory mining is also used in [8] with the aim of
obtaining recommendations for discovering interesting
locations. The recommendations refer to the
recommendation of the places to be visited if the user
visits some of the interesting locations.

III. FEAMEWORK AND DESCRIPTION OF CREATED


SYSTEM
System is designed and presented in Figure 1.

Figure 2. Andromaps main page view

When selected, markings are shown with read markers


containing marker id on it. After clicking on marker info
window is shown with location description, longitude and
latitude data created on Androgram clients.
Client application called Androgram provides data
from important locations. After successful login list of
uncompleted previous markings jobs is shown with
options to finish or to continue marking. It is also possible
to start new marking job by assigning its name. These
functionalities are shown in Figure 3.
Figure 1. System design [9]

It consists of server Java Web application and client


Android application.
Java Web application is called Andromaps. It is created
using Java programming language following the model
view controller (MVC) software architecture. Model is
stored in Java classes and database which is accessible
thought data access objects (DAO). View part is realized
using Java Server Pages (JSP) and controller is realized
with java Servlets.
Main view functionalities and data processing is done
in Andromaps. The main page panel of Andromaps is
shown in Figure 2. The main part contains Google maps
while the options are organized left and bottom. It has two
types of users (regular, admin) and view, edit and delete
functionalities. Regular user can manage only his own
account while admin can manage every created account.
After login, user can select all markings, markings by date
or today markings.
Figure 3. Androgram startup functionalities

After choosing desired option user is able to save


important location by filling the description field and click
on Save button. This is shown in Figure 4. This screen is
shown while application is running. Option Back or Close
are available through Menu.

Page 429 of 478


ICIST 2014 - Vol. 2 Poster papers

that is being predicted joins one of the existing,


predefined classes.
Figure 5 shows the structure of the created network.

Figure 5: Structure of neural networks

The net consists of input, hidden and output layer. The


Figure 4. Save location screen input layer in neural network consists of: ID of a user,
name, start time, end time, status and action. The output
of the network is the “location”, and it is the location that
is being predicted. The location contains latitude and
IV. PURPOSE, OBJECTIVES AND ASSIGNMENTS OF THE
longitude of a place where a user can be positioned. The
RESEARCH
created network can be classified into feed-forward
networks with back propagation algorithm for training.
The systems based on recording of the users’ locations If we take into account small amount of data, it is
are widely used in contemporary society and individual training, testing and evaluation of the model that is
and group movements. The aim of the research is to
planned for future work.
analyze the created system. This paper presents early stage
of the analysis which implies creating of a mining model An example of the record obtained from the system is
in the given case of neural network. The objective of the shown in Table 1.
research is to set the appropriate inputs and outputs in
neural network, create neural network architecture and TABLE I.
select the techniques for evaluating and testing of the EXAMPLE OF TWO RECORDS
network, which will also be the subject of the future
research.
i ACTIO
Research assignments: d START N
- Description of the system M NAME _TIME END_TIME STATUS usd
Dormito 13.6.201 0000-00-00 V/D
- Selection of inputs and outputs in neural network 6 ry 3 9:58 00:00:00 1 17
- Creating of network architecture 2.7.2013 16.7.2013 V/D
- Selecting evaluation methodology and network testing 2 fax 1:57 10:31 0 21

Here is given explanation for Table 1.


V. LOCATION DATA MINING USER_ID (usd): is identification of the user who has
created a new marking group
Bearing in mind that the system enables user activity NAME: is a textual description of the marking group
tracking and monitoring and the review of all START_TIME: shoes the time when marking of the
measurements, user behaviour pattern was revealed using created group has begun
data mining techniques. Furthermore, the neural network END_TIME: presents the time when marking within one
technique which was successfully applied to the similar group has ended
research problems [10, 11], has been also used here. STATUS: shows whether the marking has been finished
Neural network technique is one of the artificial or not (0, 1)
intelligence techniques, or more precisely the technique ACTION: options that can be performed with marking. It
of machine learning. The technique was taken from is possible to delete it or to view all created markings
mechanical learning and it has been used within data within this marking.
mining with the aim of revealing of behaviour patterns
from the available data. Neural network model is used in
this paper with the aim of classification, i.e. the object

Page 430 of 478


ICIST 2014 - Vol. 2 Poster papers

REFERENCES
[1] D. Xia , X. Xie and Y. Xu, “Web GIS server solutions using open-
TABLE II. source software“, Open-source Software for Scientific Computation
EXAMPLE OF TWO MARKING (OSSC), 18-20 Sept. 2009, E-ISBN: 978-1-4244-4453-3
ACTION [2] T. Gen, “A new real-time downloading and distributed mobile GIS
ID DESCRIPTION LONGITUDE LATITUDE IDM model based on mobile agent”, Audio Language and Image Processing
V/D (ICALIP), Nov 2010, Shanghai, Print ISBN: 978-1-4244-5856-1
8 Room No. 1 45.123 30.124 6 [3] C. Guoliang, W. Yunjia and W. Jian, “Research on Mobile GIS:
V/D Characteristics, Frame and Key Technologies“, Internet Technology and
9 Room No. 2 45.452 30.452 6
Applications, 20-22 Aug. 2010, E-ISBN : 978-1-4244-5143-2
[4] X. Lu, “An investigation on service-oriented architecture for
In the Table 2 are present next categories. constructing distributed Web GIS Application” Proc. IEEE
ID: presents marking identification within IDM marking International Conference on Services Computing (SCC 05), IEEE Press
group July 2005, 0-7695-2408-7/05.
DESCRIPTION: presents textual description of a specific [5] B. He, X. Tong, P. Chen and B. Qu, “Application of Mobile GIS in
Real Estate Dynamic Monitoring”, Education Technology and Training,
marking 21-22 Dec. 2008, Shanghai, Print ISBN: 978-0-7695-3563-0.
LONGITUDE: presents longitude of a marked location [6] Q. Li, Y. Zheng, X. Xie, Y. Chen and W. Liu, “Mining user
LATITUDE: presents latitude of a marked location similarity based on location history”, Proceeding GIS '08 Proceedings
IDM: presents the identification of the marking group of the 16th ACM SIGSPATIAL international conference on Advances in
ACTION: options which can be completed with marking. geographic information systems, Article No. 34, ISBN: 978-1-60558-
323-5 doi: 10.1145/1463434.1463477
It is possible to delete it or to view all created markings
[7] Y. Zheng, L. Zhang, X. Xie and W.Y.Ma, “Mining interesting
within this marking. locations and travel sequences from GPS trajectories”, Proceeding
Future work implies evaluation and testing of data mining WWW '09 Proceedings of the 18th international conference on World
models. The evaluation will be performed using lift chart wide web, ISBN: 978-1-60558-487-4, pp. 791-800
and mean squared error, whereas the testing will be done [8] V.W. Zheng, Y. Zheng, X. Xie and Q.Yang,
using DMX questionnaire. “Collaborative location and activity recommendations with GPS history
data”, Proceeding WWW '10 Proceedings of the 19th international
conference on World wide web, ISBN: 978-1-60558-799-8, pp. 1029-
VI. CONCLUSION 1038.
This paper looks at the benefits of using GYS systems, [9] Z. Jovanovic, N. Pantelic, S. Starcevic and Sinisa Randjic, “Web gis
and reveals the necessity of mining and detection of user platform with android mobile gis client”, UNITECH, 22 – 23 November
2013, GABROVO, Bugarska, vol. II, pp. II.113 - II.116, ISSN: 1313-
behaviour patterns within GYS systems. The paper also 230X
presents neural network model which provides the basis [10] R. Kalra, M.C. Deo, R. Kumar and V.K. Agarwal, “Artificial
for data mining and revealing of new information from neural network to translate offshore satellite wave data to coastal
large data collections. Large amount of data for the locations”, Ocean Engineering, vol. 32, no. 16, November 2005, pp.
analysis will be provided for the following phase, 1917–1932.
whereupon testing and evaluation of the mining model [11] G. Li, R. Zengand L. Lin, “Research on Vehicle License Plate
Location Based on Neural Networks”, Innovative Computing,
will be conducted. In addition, the creation of the system Information and Control, 2006. ICICIC '06. First International
that will allow the users to select types of report and Conference, vol. 3, pp. 174-177, Print ISBN: 0-7695-2616-0
perform mining of the locations was predicted.

ACKNOWLEDGMENT
This study was supported by the Serbian Ministry of
Education and Science: Project III 44006 and TR32043.

Page 431 of 478


ICIST 2014 - Vol. 2 Poster papers

A SYSTEM FOR TRACKING AND RECORDING LOCATIONS OF


ANDROID DEVICES
Milan Lukić1, Goran Sladić2, Stevan Gostojić2, Branko Milosavljević2, Zora Konjović2
[email protected], {sladicg, gostojic, mbranko, ftn_zora}@uns.ac.rs
1
4Expand, Novi Sad
2
Faculty of Technical Sciences, University of Novi Sad

Abstract – A location-based service (LBS) is a platform don’t serve individual users only, but also play an
that provides information services based on current or important role in public safety, transportation, emergency
known location, supported by the digital map platform. response, and disaster management.
Most smartphones obtain location using Global
Positioning System (GPS), cellular network or wireless With an increasing number of mobile devices featuring
network. In this paper we present the system for locating built-in Global Positioning System (GPS) technology,
and tracking Android devices. It enables one person to LBS have experienced rapid growth in the past few years
track another and to receive notification when the tracked [1, 8].
user leaves defined area.
Android platform is a new generation of smartphone
Keywords – tracking, location, location-based services, platform launched by Google [10, 11, 12, 13]. Android
smartphones, Android supports location-based and mapping services, which is of
concern to vast numbers of software developers. Until
1. INTRODUCTION now, development of mobile LBS and mapping
applications was complex and difficult, and often required
Initially mobile phones were used for voice paying high copyright fees to map makers [11, 13, 14].
communication only, but nowadays the scenario has Android is free and open source, providing an easy-to-use
changed. Voice communication is just one aspect of a development kit containing flexible location-based
mobile phone. There are other aspects which are major services including map display.
focus of interest such as web browsers and location-based It supports three different methods to locate device
services [1]. position; Global Positioning System (GPS), cellular
network and Wi-Fi network using different
A location-based service (LBS) can be defined as a implementations of the location provider.
service that depend on and is enhanced by positional
information of mobile device [2]. A LBS is a mobile In this paper we present a tracking system which uses
information service that extends spatial and temporal smartphone with a built-in GPS module. The system
information processing capability to end users via Internet enables collection and storage of a user’s locations and
and wireless communications [3, 4]. monitoring of user's movement. GPS was used as the
location provider due to highest precision, although the
In [5] authors defined LBS as a service where: proposed system can be adjusted to use other location
- the user is able to determine its location, providers with the minimal effort.
- the information provided is spatially related to
user’s location, and The application periodically reads the GPS position of the
- the user is offered dynamic or two-way client device, and then sends coordinates to the server,
interaction with the location information or which processes and stores them in a database.
content. Afterwards, the user who monitors the client can see the
movements of the tracked user through web-based or
Location-based services have attracted considerable Android application. The user’s route is drawn on the
attention due to their potential to transform mobile Google map [15].
communications and the potential for a range of highly
personalized and context-aware services [6]. The potential The proposed solution also provides means to define each
for location-based services is evident from powerful and user's monitoring area. In this case, if the user leaves
ubiquitous wireless devices that are growing in popularity defined area, the monitoring user will receive a
[7]. They are the key enabler for a plethora of applications notification informing her that the monitored user has left
across different domains ranging from tracking and the area. This is especially useful for parents who want to
navigation systems, trough directory, entertainment and track the movement of their children.
emergency services, to various mobile commerce
applications [4, 8, 9]. LBS provide the user with contents The rest of this paper is structured as follows. The next
customized by her current location, such as the nearest section describes the global system architecture. The third
restaurants, hotels or clinics, which are retrieved from a section gives a brief overview of the client module for
spatial database stored remotely on the LBS server. LBS sending user’s location. The Android application for

Page 432 of 478


ICIST 2014 - Vol. 2 Poster papers

displaying route is given in the fourth section. The fifth The Android client application performs data
section describes the server side module of the proposed visualization. This module also receives notifications
solution. In the conclusion, strengths and weaknesses of form the server when monitored user leaves selected area.
this solution are elaborated and directions of further
research are given. The web module provides administration interface, also it
is possible to see user’s route using this application.

2. THE SYSTEM ARCHITECTURE Web Client

Application Server
The proposed system consists of four modules:
 Android module (service) that periodically sends
GPS coordinate to server,
Android Monitor Client
 server module that collect and process data and
provides administration of the system,
 Android module for displaying user’s route, and Database

 web module for displaying user's route. Android Monitor Service

Figure 1 shows modules of the system. The Android Figure 1. Modules of the system
service that periodically send information to the server is
based on the AlarmManager Android component [11] Figure 2 shows the use case diagram of the system.
which periodically "awakes" the device, reads the GPS Administrator performs administration of users, including
coordinates and sends them to the server. their registration and administration of the system. Before
starting to use the system, the user must be logged in.
The web service that receives and processes information Monitoring user can create a monitoring request for the
is a central part of the system. It receives data from selected user. This user can track route of the monitored
Android devices and provides data to web and Android user and define zone for this user. Before Monitoring user
client applications. can start monitoring the user, Monitored user must
approve the request.

Request for monitoring


<<include>>
Login

<<include>>

Select user to monitor


<<include>>

<<include>>
<<include>>

See route
<<include>>
Manage users

Monitoring User
<<include>>
<<include>>
Define area

Administrator

Manage system

Approve monitoring

Monitored User
Send coordinates

Figure 2. Use-case diagram

Page 433 of 478


ICIST 2014 - Vol. 2 Poster papers

3. SENDING CLIENT POSTION class reads device’s location; it also wakes the device and
ensures that the processor stays awake until the job is
Figure 3 shows the class diagram for Android service that done. LocationPollerService uses the PollerThread class
periodically sends location of the user to the web service. to look up the current location and to handle timeout
The LocationPoolerInit class performs necessary scenarios. The LocationReceiver class accepts
configuration and initialization of the service. broadcasted messages with location data and through
LocationPoller inherits BroadcastReceiver [11] in order RequestTask sends them to the server. LocationDelegate
to broadcast device’s coordinates [11]. This component is defines methods which will be executed before and after a
launched by AlarmManager [11]. It passes the work over message is sent to the server.
to LocationPollerService. The LocationPollerService

BroadcastReceiver
(android)

LocationPoller LocationReceiver
+ EXTRA_ERROR : String * requestTask : RequestTask = null
+ EXTRA_INTENT : String
+ onReceive (Context context, Intent intent) : void
+ EXTRA_LOCATION : String ...
+ EXTRA_PROVIDER : String
+ EXTRA_LASTKNOWN : String
+ EXTRA_ERROR_PROVIDER_DISABLED : String
+ EXTRA_TIMEOUT : String
+ onReceive (Context context, Intent intent) : void LocationPollerInit
... - PERIOD : int = 60000
- mgr : AlarmManager = null
+ onCreate (Bundle savedInstanceState) : void
LocationPollerService + omgPleaseStop (View v) : void
+ setUser (User user) : void
- DEFAULT_TIMEOUT : int = 120000 + getContext () : Context
- TIMEOUT : int = DEFAULT_TIMEOUT + onTaskStart () : void
- lockStatic : volatile PowerManager.WakeLock = null + onTaskProgressUpdate () : void
- locMgr : LocationManager = null + onTaskEnd () : void
- getLock (Context context) : PowerManager.WakeLock ...
+ requestLocation (Context ctxt, Intent i) : void
+ onCreate () : void
+ onBind (Intent i) : IBinder
+ onStartCommand (Intent intent, int flags, int startId) : int
PollerThread RequestDelegate
...
+ setTask (RequestTask task) : void
Activity + getContext () : Context
(android) + onTaskStart () : void
LocationPollerService::PollerThread + onTaskProgressUpdate () : void
+ onTaskEnd () : void
...
# <<Override>> onPreExecute () : void
# <<Override>> onPostExecute () : void
# <<Override>> onUnlocked () : void
...

RequestTask
HandlerThread
WakefulThread - maxNetworkCountAttempts : int =3
(android)
{abstract}
{abstract}

Figure 3. Android service for sending client location

4. DISPLAYING ROUTE ON ANDROID DEVICE and RoutesDelegate class defines similar methods for
Classes presented in Figure 4 are used to display user’s processing routes received from the server. The
location and route on the Android device. The MonitoredUserRequestTask class requests the list of
MonitorActivity class is the main class in the diagram. It monitored users (by the current user) to the server. The
inherits the Android Activity class and implements the RouteRequestTask class requests the routes of the selected
NetworkDelegate, MonitoredUsersDelegate and user to the server.
RoutesDelegate interfaces. By extending the Activity
class, MonitorActivity can interact with the user. The Figure 5 shows an example of the route drawn on the
NetworkDelegate class defines methods which will be Android smartphone and Figure 6 shows an example of
executed when the device opens and closes Internet the defined monitoring area.
connection. MonitoredUsersDelegate class defines
methods which will be executed when the application
receives a message about monitored user form the server,

Page 434 of 478


ICIST 2014 - Vol. 2 Poster papers

Activity NetworkDelegate
(android)

+ onNetworkConnected () : void
+ onNetworkDisconnected () : void
...

MonitorActivity

RoutesDelegate
MonitoredUsersDelegate

+ setRoutes (List<Route> routes) : void


+ setUsers (List<User> users) : void + setLocation (Location location) : void
+ onTaskStart () : void + getContext () : Context
+ onTaskProgressUpdate () : void + onTaskStart () : void
+ onTaskEnd () : void + onTaskProgressUpdate () : void
... + onTaskEnd () : void
...

Route
User - user : User
- userID : Long 1..1 - longitude : double
- created : Date 0..* - latitude : double
- created : Date

MonitoredUsersRequestTask
RoutesRequestTask
- isNetworkAvailable : boolean = false
- isNetworkAvailable : boolean = false
- monitoringActivity : MonitoredUsersDelegate = null
- maxNetworkCountAttempts : int =3
- maxNetworkCountAttempts : int =3
+ <<Constructor>> RoutesRequestTask (RoutesDelegate monitoringActivity)
+ <<Constructor>> MonitoredUsersRequestTask (MonitoredUsersDelegate monitoringActivity)
# onPreExecute () : void
# onPreExecute () : void
# doInBackground (String... urls) : List<Route>
# doInBackground (String... urls) : List<User>
# onPostExecute (List<Route> result) : void
# onPostExecute (List<User> result) : void
+ isFinished () : boolean
+ isFinished () : boolean
...
...
Figure 4. Android application for displaying user’s location

Figure 6. Monitoring area


Figure 5. User’s route

5. SERVER MODULE protocol. It uses MonitorService implementation


(MonitorServiceImpl) to manage and read users'
5.1. MONITORING SUBSYSTEM monitoring areas. The MonitorServiceImpl class uses
MonitorRepositoryImpl to access database. The Monitor
The part of the server application used to define class represents defined user's monitoring area since
monitoring and monitored zones is presented in Figure 7. monitoring area can be defined for each class instance
The MonitorController class is used for communication (the area attribute).
with client applications (Android and web) using REST

Page 435 of 478


ICIST 2014 - Vol. 2 Poster papers

<<EJBEntity>>
Route 0..*
MonitorController
- monitorAccountService : MonitorAccountService
- userService : UserService
+ getMonitorUsers (String monitorID)
+ addNewUser (String monitorID, Long userID)
1..1
+ removeUser (String monitorID, Long userID)
<<EJBEntity>> 1..1 + showDefineAreaForm (String monitorID, String userID)
monitoredUser <<EJBEntity>> + createUserArea (String monitorID, Long userID, String areaCordinates, BindingResult result, Model model)
Monitor 0..*
User + getArea (String monitorID, Long userID)
- area : Polygon

0..*
1..1
monitoringUser

MonitorService

+ getMonitors () : List<Monitor>
+ getMonitors (String monitorID) : List<Monitor>
+ getMonitors (Long userID) : List<Monitor>
MonitorRepository + getMonitor (Long monitorID) : Monitor
+ saveMonitor (Monitor monitor, String areaCordinates) : void
+ updateMonitor (Monitor monitor, String areaCordinates) : void
+ findAll (String monitorID) : List<Monitor>
+ deleteUser (Monitor entity) : void
+ findAll (Long userID) : List<Monitor>
+ getMonitor (String monitorID, Long userID) : Monitor
+ getMonitor (String monitorID, Long userID) : Monitor ...
+ saveMonitor (Monitor monitor, String areaCordinates) : void
+ updateMonitor (Monitor monitor, String areaCordinates) : void
...

MonitorRepositoryImpl MonitorServiceImpl

Figure 7. Monitoring part of the server module

5.1. ROUTE MANAGEMENT SUBSYSTEM and retrieving users’ routes. The RouteServiceImpl
implements the RouteService interface. It relies on the
Figure 8 shows server classes used for managing routes. appropriate RouteRepository implementation to store and
The RouteController class implements REST service that retrieve data from the database. The Route class models
communicates with client applications (Android and user’s route and the User class models system's user.
web). The RouteService class defines methods for storing

RouteController <<EJBEntity>>
- monitorService : MonitorService Route
- monitorAccountService : MonitorAccountService
+ getLastRoute (Long userID) : Route
+ addToRoute (Long userID, double latitude, double longitude) : String
...

RouteService

RouteRepository + getRoutes () : List<Route>


+ getRoute (long routeID) : Route
+ saveRoute (Route route) : void
+ getRoutes (long userID) : List<Route>
+ getRoutes (long userID) : List<Route>
+ saveRoute (Route route) : void
+ getLastRoute (long userId) : Route
+ addToRoute (Point point) : void
... + addToLastRoute (Point point) : void
...

RouteServiceImpl
RouteRepositoryImpl

Figure 8. Route management part of the server module

5.1. NOTIFICATION SUBSYSTEM each monitored users. Notifications are implemented


using the GCM (Google Cloud Messaging) service [16].
As noted above, if the monitored user exits the defined GCM is a service which provides messaging from the
zone, the monitoring user should be notified. Each server to Android applications.
monitoring user can define a different monitoring zone for

Page 436 of 478


ICIST 2014 - Vol. 2 Poster papers

Upon receiving user's coordinates, the server verifies if  Hirsch, F., Kemp, J. and Ilkka, J. (2006), “Mobile
monitoring zones are defined for that user and checks if Web Services: Architecture and Implementation”,
received coordinates are out of each zone. In the John Wiley & Sons, ISBN 978-0-470-01596-4.
monitored user is out of the zone, a notification is sent to  Hazas, M., Scott, J., and Krumm, J. (2004),
the monitoring user who defined that zone. “Location-aware Computing Comes of Age”.
Computer, 37(2), pp. 95-97.
When the Android application receives a notification, it  Rao, B., nad Minakakis, L. (2003), “Evolution of
displays it to the user. It is possible to adjust parameters Mobile Location-based Services”, Communications
such as the sound of the notification, its icon, etc. of the ACM, 46(12), pp. 61-65.
Clicking on the notification opens the application and  Ferraro, R., and Aktihanoglu, M., (2011),
shows user's location. “Location-Aware Applications”, Manning, ISBN
978-1-935182-33-7.
6. CONCLUSION  Dhar, S., and Varshney, U. (2011), “Challenges and
Business Models for Mobile Location-based
Location Based Services provide personalized services to Services and Advertising. Communications of the
the subscribers based on their current location. LBS offer ACM, 54(5), 121-128.
the tools for efficient management and continuous  Junglas, I. A. and Watson, R. T. (2008), “Location-
control. More and more people use LBS in their day to based Services”, Communication of the ACM, 51(3),
day life which helps then to more efficiently achieve their pp. 65-69.
goals.  Rani, C. R., Kumar, A. P., Adarsh, D., Mohan, K.
K., and Kiran, K. V. (2012), “Location Based
Massive usage of smartphones caused proliferation of Services in Android”. International Journal of
LBS. First and foremost, it is used as a navigation device Advances in Engineering & Technology, 3(1), pp.
since most smartphones have built-in GPS receiver and 209-220.
other location methods. This paper presents a system for  Mohapatra, D., and Suma, S. B. (2005), “Survey of
tracking users using their smartphones. Location based Wireless Services”, In Proceedings
of the IEEE International Conference on Personal
The system enables collection and storage of user’s Wireless Communications, 2005, ICPWC 2005,
locations and monitoring of their movement by other IEEE, pp. 358-362.
users. The Android client periodically sends device’s  Android, http://www.android.com/, accessed
location to the server, which processes and stores it in the January, 05. 2014.
database. The monitoring user can see the route of the  Android developer, http://developer.android.com,
monitored user and receive a notification when the accessed January, 05. 2014.
monitored user leaves monitoring area. Both Android and
 Meier, R. (2012), “Professional Android 4
web clients are available to monitoring users. One of the
Application Development”, John Wiley & Sons,
useful applications of this system is locating and tracking
ISBN 978-1-118-10227-5.
children by their parents.
 Annuzzi, J., Darcey, L., and Conder, S. (2013),
“Introduction to Android Application Development:
In the current version of the system we did not thoroughly
Android Essentials (4th Edition) (Developer's
analyze security and privacy issues. We plan to solve
Library)”, Addison Wesley, ISBN 978-0-321-94026-
those problems in future versions. Also, we plan to
1.
investigate how other positioning methods can be used to
provide indoor coverage.  Shu, X., Du, Z., & Chen, R. (2009), “Research on
Mobile Location Service Design Based on
REFERENCES Android”, Proceedings of the 5th International
Conference on Wireless Communications,
Networking and Mobile Computing, 2009.
 Kumar, S., Qadeer, M. A., and Gupta, A. (2009),
WiCom'09, IEEE, pp. 1-4.
“Location Based Services Using Android”, In
IMSAA’09: Proceedings of the 3rd IEEE  Google Maps API, https://developers.google.com/
International Conference on Internet Multimedia maps/, accessed January, 05. 2014.
Services Architecture and Applications, IEEE, pp.  Google Cloud Messaging, http://developer.android.
335-339. com/google/gcm/index.html, accessed January, 05.
2014.

Page 437 of 478


ICIST 2014 - Vol. 2 Poster papers

Software provided waste management


sustainability assessment
Gordana Stefanovića, Michele Dassistib, Biljana Milutinovićc
a Faculty of Mechanical Engineering, University of Niš, Niš, Serbia, [email protected]
b Politecnico di Bari, DMM, Bari, Italy, [email protected]
c
Collage of Applied Technical Science Niš, Niš, Serbia, [email protected]

Abstract— Principles of sustainable development are thinking. In the vast scientific literature on sustainability, a
currently an integral part of any advanced way of thinking huge number of different definitions and interpretations of
either in the development of society in general, the this concept have been proposed, so that it is really
development of an economy or an engineering practice. The difficult to have an unique clear-cut understanding of it.
problem lies in the complexity of such a mindset. In order Several models of waste management were developed:
for a system to be sustainable it must be at least models based on cost-benefit analysis, models based on
economically justified, not to have a negative impact on the life cycle analysis, and models based on the use of multi-
environment and to be acceptable to the society. In addition, criteria analysis [4]. Each of them has different
very often, it is necessary to consider the political approaches, benefits and limitations: for models based on
consequences of certain phenomena, technological solutions,
the cost benefit analysis all criteria for assessing scenarios
and other aspects. The development of software solutions
translate into a monetary measurement, for models based
that offer the possibility of including a lot of parameters
on life cycle assessment scenario the analysis is carried
(indicators) and their analysis are of crucial importance.
out on the basis of the analysis of environmental impact of
This paper presents the principle of a software package
application that provides the possibility of comparing the
all phases of a product that lead to the creation of waste,
sustainability of certain methods of waste management on and for models based on the multi-criteria analysis the
the territory of the local community. For this purpose three assessment and selection of scenarios is carried out on the
scenarios were taken into consideration: basic as usual, basis of a number of selected, usually conflicting criteria.
aerobic process and anaerobic process, together with the There are different models used in research depending
relevant pretreatment. For the assessment of their on the chosen criteria and criteria weight for the selection
sustainability, economic, environmental and social of optimal and sustainable waste management scenarios.
indicators were taken into account. The procedure of multi- In the case where economic criteria are the preferred
criteria analysis using the AHP method and pair-wise choice, cost-benefit analysis is commonly used [5,6]. If
criteria, performed with the Expert Choice 11 software, was environmental criteria are recognized as a priority, life
used. The results obtained by using the AHP method and a cycle analysis is used [7,8].
sensitivity analysis showed that, according to the selected When assessing the sustainability of waste management
indicators, it is possible to rank the scenarios and choose the scenarios according to three types of sustainable
best sustainable waste management scenario (the scenario development indicators (environmental, economic and
which involves resource recovery through recycling social), the best results can be provided by multi-criteria
inorganic waste and composting organic waste ). analysis. "For a waste management system to be
sustainable, it needs to be environmentally effective,
I. INTRODUCTION economically affordable and socially acceptable" [4]. The
The problem of waste management is not new, but has benefit of multi-criteria analysis in assessing the
become current with population growth, economic sustainable scenario is that it allows the use of both
development and recognition of the negative effects of qualitative and quantitative criteria (sustainable
waste on the environment. In recent decades, research has development indicators). It also allows for the
been done in the field of waste management, from finding participation of different groups of decision-makers even
different waste management methods, and the utilization with opposing goals in defining indicators and decision-
of energy from waste, to the election of the most optimal making.
method of waste management for specific local In literature [9,10] there are a number of multi-criteria
requirements. methods applied in the assessment of the sustainability of
From the moment that sustainable development became a waste management scenario. Depending on particular
current in different areas, the attention has been focused problems, AHP, ELECTRE or PROMETHEE methods
on sustainable waste management and the search for are commonly used in literature [11-13].
sustainable waste management scenarios. Different In this paper the principle of a software package
research was done [1-3] in order to determine a application that provides the possibility of comparing the
sustainable decision making model for waste management sustainability of certain methods of waste management on
scenarios. the territory of the local community is presented. For this
Sustainability has been appealed sometime as the purpose three scenarios are taken in the consideration:
revolution of the 21st century, having the business as usual, aerobic process and anaerobic digestion,
environmentalism the precursor of this new wave of both with relevant pretreatment. For assessment of its

Page 438 of 478


ICIST 2014 - Vol. 2 Poster papers

sustainability economic, environmental and social and Figure 2. shows the hierarchical structure presented in
indicators were taken into account. The procedure of Expert Choice software.
multi-criteria analysis using the AHP method and pair-
wise criteria, performed with the Expert Choice 11
software was used.
II. THE ANALYTIC HIERARCHY PROCESS (AHP) AND
EXPERT CHOICE
The Analytic Hierarchy Process (AHP) is a multi-
criteria decision making technique, quite often used to
solve complex decision making problems in a variety of
disciplines [13–15].
The AHP hierarchical structure allows decision makers
to easily comprehend problems in terms of relevant
criteria and sub-criteria. Additional criteria can be
superimposed on the hierarchical structure. Furthermore,
if necessary, it is possible to compare and prioritize
criteria and sub-criteria in the AHP practice, and one can
effectively compare optimal solutions based on this
information.
The decision procedure using the Analytic Hierarchy
Process (AHP) is made up of four steps: Figure 1. The hierarchical structure for selection and evaluation of a
sustainable waste management scenario.
“1) define the problem and determine the kind of
knowledge sought.
2) structure the decision hierarchy according to the goal
of the decision – in the following order: the objectives
from a broad perspective, through the intermediate levels
(criteria on which subsequent elements depend) up to the
lowest level (which usually is a set of the alternatives).
3) construct a set of pair-wise comparison matrices.
Each element of the matrix in the upper level is used to
compare elements in the level immediately below.
4) use the priorities obtained from the comparisons to
weigh the priorities in the neighboring level. Do this for
every element. For each element in the level below add its
weighed values and obtain its overall or global priority.
Continue this process of weighing and adding until the
final priorities of the alternatives in the bottom most level
are obtained.” [16].
Expert Choice, the user-friendly supporting software, Figure 2. The hierarchical structure presented in Expert Choice
has certainly largely contributed to the success of the AHP software
method. It incorporates intuitive graphical user interfaces,
automatic calculation of priorities and inconsistencies and III. EXPERIMENTAL RESEARCH
several ways to process a sensitivity analysis [17]. A. Scenario
A. The adopted AHP model The data on generated waste used in the considered
The most important step of these decision-making scenarios were taken from the Waste Management Plan of
processes is a correct pair-wise comparison, whose the city of Niš in the period up to 2015 and other
quantification is the most crucial step in multi-criteria previously published papers [18].
methods which use qualitative data. Pair-wise In this paper, three scenarios were developed: business
comparisons are quantified by using a scale. Such a scale as usual, aerobic process and anaerobic digestion. Main
is a one-to-one mapping between the set of discrete variation factors (annual distance driven by trucks, fuel
linguistic choices available to the decision maker and a efficiency, energy consumption, energy efficiency) of
discrete set of numbers which represent the importance, or each scenario are given in order to evaluate environmental
weight, of the previous linguistic choices. indicators.
The pair-wise comparisons in the AHP are determined Scenario 1 – business as usual: Most of waste (68,440
in line with the scale introduced by Saaty [16]. According t) is disposed to the landfill, only small amount of metal
to this scale, the available values for the pair-wise and glass (3560 t) is recycled. Annual distance driven by
comparisons are: {9, 8, 7, 6, 5, 4, 3, 2, 1, 1/2, 1/3, 1/4, 1/5, collection trucks is 118400 km. Trucks use diesel fuel,
1/6, 1/7, 1/8, 1/9}. with fuel efficiency of 2.5 km/l. Energy consumed by
Figure 1. shows the hierarchical structure considered in landfill operation: diesel 0.22 l/t. Energy consumption at
the selection of a sustainable scenario for waste treatment, materials recovery facility: electricity 25 kWh/t, natural
based on selected indicators of sustainable development gas 0.264 m3/t.

Page 439 of 478


ICIST 2014 - Vol. 2 Poster papers

Scenario 2 – aerobic process: Inorganic waste (28,840 Revenues: Evaluation of revenues was made on the
t) is recycled (plastic, glass, paper and metal). Organic basis of the data from literature [20] for waste treatment:
waste (31,790 t) is sent to in-vessel composting plant. composting (taking into account the market price of
Other waste (11,464 t) is disposed of in the landfill. composting), incineration (taking into account the price of
Annual distance driven by collection trucks is 112,596 electricity produced) and anaerobic digestion (taking into
km. Energy consumption in composting process: account the price of electricity produced). Assessment of
electricity 21 kWh/t. income from recycling was done on the basis of the
Scenario 3 – anaerobic digestion: Recyclable waste – market price of recycled materials (paper, metal, glass,
glass, metal and plastic (17,809 t) are recycled. Other compost), where the recovery rate taken into account was
waste (54,291 t) is sent to anaerobic digestion plant for the based on the data from literature [25].
purpose of electricity generation. Annual distance driven Job creation: Evaluation of new jobs created was done
by co-mingled trucks is 108150 km. Composition of on the basis of the data from literature [26,27], as shown
biogas produced: C02 45%, CH4 55%. Energy efficiency in Table 5. The number of new jobs in waste management
in anaerobic digestion process: 20%. Facility energy depends on waste treatment. Regardless of the different
consumption: 22% of energy produced. information that can be found in the literature related to
the level of employment in terms of tonnes of waste per
B. Evaluation of indicators job, it can be concluded that for the more labor-intensive
Nine sustainable development indicators were selected activities such as hand sorting and collecting waste, the
in order to carry out the research (which will be used to level of employment is less than 500 tonnes of waste per
rank the considered scenarios). Environmental indicators: job, while the less labor-intensive activities, such as
GHG emissions (CO2 Equivalents), acid gases emissions dumping, incineration and composting, the level of
(NOx and SO2), waste volume reduction. Economic employment is over 1000 tonnes of waste per job. The
indicators: investment costs, operational costs, revenues. level of employment in the recycling is between these two
Social indicators: job creation, public acceptance. extremes, depending on the type of materials that are
GHG and acid gases emissions: Amounts of carbon recycled and recycling methods [28].
dioxide, nitrogen oxides and sulphur dioxide emitted to Table 1. shows the evaluation of indicators.
the atmosphere were estimated using the data from the
previous paper [18] in which the amount of gas emissions TABLE I. EVALUATION OF INDICATORS
is determined based on the composition of waste using the
Integrated Waste Management Model [19]. In assessing Scenario 1 Scenario 2 Scenario 3
the emissions, this model takes into account the emissions GHG emission (CO2
701.48 -1184.35 -966.55
Equivalents) (kg/t)
from the point at which a material is discarded into the
waste stream to the point at which it is either converted NOx emission (kg/t) -0.035 -1.793 -1.550
into a useful material or finally disposed of. The model SO2 emission (kg/t) -0.047 -2.422 -1.982
does not evaluate the energy and emissions associated Volume reduction (%) 4.65 80.59 95.69
with the production of infrastructure (e.g. collection
vehicles, waste management facilities, etc). Emissions Investment costs (€/t) 8.90 16.70 24.40
during transportation of waste, consumption of fossil fuels Operational costs (€/t) 14,20 45.90 70,20
and electricity for the treatment of waste, and emissions Revenues (€/t) 0,60 51,50 28,20
during incineration were considered.
Job creation 368 450 488
Waste volume reduction: The amount of waste that
remains after treatment for landfill disposal, is also
estimated by the Integrated Waste Management Model Public acceptance: Public acceptance is a qualitative
[19]. criterion which cannot be measured, therefore, 9 levels
Investment and operational costs: Evaluation of scale established in the AHP method [16] (1 - Worst, 9 -
investment and operational costs for the defined scenarios Best) was used for the assessment of this criterion.
was performed on the basis of the data that are valid in the
European Union for landfill, composting, anaerobic IV. RESULTS AND DISCUSSION
digestion and incineration [20,21]. For this purpose, the After making hierarchal structure and evaluating of
data for Germany were taken, shown in Table 4. indicators, the procedure of multi-criteria analysis using
Evaluation of investment and operational costs for the AHP method pair-wise criteria (compare the relative
recycling used the data from literature [22-24]. For preference with respect to goal) was carried out,
evaluating the investment costs, land costs, the costs of performed with the Expert Choice 11 software. Figure 3.
design and construction of landfills, facilities and waste shows comparison the relative preference with respect to
treatment facility, and the transportation vehicles costs goal.
were all taken into account. To evaluate the operational
costs, maintenance costs, labor, energy costs and other
operating costs were also taken into account.

Page 440 of 478


ICIST 2014 - Vol. 2 Poster papers

Figure 3. Comparison the relative preference with respect to goal.

Following the pair-wise criteria, the criteria weight with respect to the goal is obtained and shown in Figure 4.

Figure 4. Criteria weight

The normalized performance of scenarios against the organic waste. Scenario 1, which corresponds to business
criteria is presented in Table 2. as usual scenario, ranked the last.

TABLE II. NORMALIZED PERFORMANCE OF SCENARIOS AGAINST


THE CRITERIA (SUSTAINABLE DEVELOPMENT INDICATORS)

Scenario 1 Scenario 2 Scenario 3


GHG emission (CO2
0,003 0,031 0,019
Equivalents) (kg/t)
NOx emission (kg/t) 0,004 0,016 0,012
SO2 emission (kg/t) 0,002 0,019 0,009
Volume reduction (%) 0,004 0,015 0,039
Investment costs (€/t) 0,07 0,042 0,028
Figure 5. Scenario ranking for evaluated indicators weight –
Operational costs (€/t) 0,096 0,038 0,019 performance presentation
Revenues (€/t) 0,016 0,044 0,072
In order to carry out sensitivity analysis, software
Job creation 0,008 0,039 0,022
provides over dynamic representation and change the
Public acceptance 0,011 0,15 0,068 criteria weight, as shown in Figure 5. The last step of the
decision process is the sensitivity analysis, where the input
data are slightly modified in order to observe the impact
Following the procedure, software provides graphical
on the results. If the ranking does not change, the results
representation scenarios ranking in different ways.
are said to be robust [17]. The sensitivity analysis is best
Figure 5. shows the ranking of scenarios after criteria performed with an interactive graphical interface. Expert
(sustainable development indicators) weighting. Choice allows different sensitivity analyses, where the
According to the obtained results it can be concluded that main difference is the various graphical representations as
the best ranking is Scenario 2, which includes recycling of shown in Figure 6.
waste (plastic, glass, paper and metal) and composting of

Page 441 of 478


ICIST 2014 - Vol. 2 Poster papers

Figure 6. An example of four possible graphical sensitivity analyses in Expert Choice.

V. CONCLUSION minimisation: The case of Malaysia”, Resources, Conservation


and Recycling, vol. 48 (1), pp. 86–98, 2006.
In this paper, the multi-criteria analysis AHP method is [6] Jamasb T., Nepal R., “Issues and options in waste management: A
applied to assessing the sustainability of waste treatment social cost–benefit analysis of waste-to-energy in the UK”,
scenarios. The selection of indicators of sustainable Resources, Conservation and Recycling, vol. 54 (12), pp. 1341–
development was performed as a result of the recognized 1352, 2010.
priorities in environmental, economic and social criteria. [7] Den Boer J., Den Boer E., Jager J., “LCA-IWM: A decision
support tool for sustainability assessment of waste management
Software Expert Choice based on AHP method is systems”, Waste Management, vol. 27, pp. 1032–1045, 2007.
presented and used for forming hierarchical structure, [8] Ekvall T., Assefa G., Bjorklund A., Eriksson O., Finnveden G.,
pair-wise comparison, criteria and weighting scenario “What life-cycle assessment does and does not do in assessments
ranking according to the goal - selection of a sustainable of waste management”, Waste Management, vol. 27, pp. 989–996,
waste management scenario. 2007.
The results obtained by using the Expert Choice [9] Chenga S., Chana C.W., Huangb G.H., “Using multiple criteria
decision analysis for supporting decisions of solid waste
software showed that, according to the selected management”, Journal of Environmental Science and Health; vol.
indicators, the best sustainable waste management 37 (6), pp. 975-990, 2002.
scenario is Scenario 2, recycling inorganic waste (plastic, [10] Roussat N., Dujet C., Mehu J., “Choosing a sustainable demolition
glass, paper and metal) and composting organic waste waste management strategy using multi-criteria decision analysis”,
(yard and food waste). The scenario which proved to be Waste Management, vol. 29, pp. 12–20, 2009.
the least sustainable is Scenario 3 (combination of [11] Karagiannidis A., Moussiopoulos N., “Application of ELECTRE
recycling inorganic waste parts and incineration of the III for the integrated management of municipal solid wastes in the
remaining waste) due to high costs and negative public Greater Athens Area”, European Journal of Operational
acceptance. Research, vol. 97, pp. 439-449, 1997.
[12] Karagiannidis A., Papageorgiou A., Perkoulidis G., Sanida G.,
REFERENCES Samaras P., “A multi-criteria assessment of scenarios on thermal
processing of infectious hospital wastes: A case study for Central
[1] Wilson E.J., McDougall F.R., Willmore J., “Euro-trash: searching Macedonia”, Waste Management, vol.30, pp. 251–262, 2010.
Europe for a more sustainable approach to waste management”, [13] Herva M., Roca E., “Ranking municipal solid waste treatment
Resources, Conservation and Recycling, vol. 31, pp. 327–346, alternatives based on ecological footprint and multi-criteria
2001. analysis”, Ecological Indicators, vol. 25, pp.77–84, 2013.
[2] Costi P., Minciardia R., Robba M., Rovatti M., Sacile R., “An [14] Triantaphyllou E., Mann S. H., “Using the analytic hierarchy
environmentally sustainable decision model for urban solid waste process for decision making in engineering applications: some
management”, Waste Management, vol. 24, pp. 277–295, 2004. challenges”. International Journal of Industrial Engineering:
[3] Hung M.-L., Ma H.-W., Yang W.-F., “A novel sustainable Applications and Practice, vol. 2 (1), pp. 35-44, 1995.
decision making model for municipal solid waste management”, [15] Samah M.A.A., Manaf L.A. and Zukki N.I.M., “Application of
Waste Management, vol. 27, pp. 209–219, 2007. AHP Model for Evaluation of Solid Waste Treatment
[4] Morrissey A.J., Browne J., “Waste management models and their Technology”, International Journal Engineering Techniques; vol.
application to sustainable waste management”, Waste 1(1), pp. 35-40, 2010.
Management, vol. 24, pp. 297–308, 2004. [16] Saaty, T.L., “Decision making with the analytic hierarchy
[5] Beguma R.A., Siwara C., Pereira J.J., Jaafar A.H., “A benefit–cost process”, International Journal Services Sciences, vol.1 (1), pp.
analysis on the economic feasibility of construction waste 83–98, 2008.

Page 442 of 478


ICIST 2014 - Vol. 2 Poster papers

[17] Ishizaka A, Labib A. “Analytic hierarchy process and expert <http://your.kingcounty.gov/solidwaste/linkup/documents/recyclin


choice: benefits and limitations”. ORInsight, vol. 22, pp. 201-220, g-economic-development-review.pdf> [accessed 5.2.2013].
2009. [23] The Costs of Recycling and Composting, Waste prevention,
[18] Stefanović G., Marković D., “Life cycle assessment of municipal Recycling and Composting Option: Lessons from 30 US
solid waste management: case study of Nis, Serbia”, The Communities – Available at:
24th International Conference on Efficiency, Cost, Optimization, <http://www.epa.gov/osw/conserve/downloads/recy-
Simulation and Environmental Systems, pp 3930-3937, Novi Sad, com/chap08.pdf> [accessed 7.2.2013].
Serbia 4-7 july 2011, ISBN 978-86-6055-015-8. [24] <http://www.letsrecycle.com/prices/> [accessed 8.2.2013].
[19] Integrated Waste Model – Available at: </http://www.iwm- [25] U.S. Environmental Protection Agency – Available at:
model.uwaterloo.ca/> [accessed 10.1.2013]. <http://www.epa.gov/epawaste/nonhaz/municipal/index.htm>
[20] Costs for Municipal Waste Management in the EU, Final Report [accessed 7.2.2013].
to Directorate General Environment, European Commission – [26] Employment Effects of Waste Management Policies, Final Report
Available at: – January 2001, Risk & Policy Analysts Limited – Available at:
<http://ec.europa.eu/environment/waste/studies/eucostwaste_mana <http://ec.europa.eu/environment/enveco/waste/index.htm>
gement.htm> [accessed 20.1.2013]. [accessed 8.2.2013].
[21] Tsilemou K., Panagiotakopoulos D., “Approximate cost functions [27] Robin Murray, Creating wealth from waste, 1999 – Available at:
for solid waste treatment facilities”, Waste Management & <http://www.demos.co.uk/files/Creatingwealthfromwaste.pdf>
Research, vol. 24, pp. 310–322, 2006. [accessed 8.2.2013].
[22] Cascadia, Recycling and Economic Development: A Review of [28] Friends of the Earth, Report: More jobs, less waste, 2010 –
Existing Literature on Job Creation, Capital Investment, and Tax Available at:
Revenues, 2009 – Available at: <http://www.foe.co.uk/resource/reports/jobs_recycling.pdf>
[accessed 8.2.2013].

Page 443 of 478


ICIST 2014 - Vol. 2 Poster papers

A Method for Web Content Semantic Analysis:


the case of Manufacturing Systems
Goran Grubić*, Miloš Milutinović*, Vanjica Ratković Živanović**, Zorica Bogdanović*, Marijana
Despotović-Zrakić*,
*Faculty
of Organizational Sciences, University of Belgrade, Belgrade, Serbia
**Radio Television of Serbia, Belgrade, Serbia

[email protected], [email protected], [email protected], [email protected], [email protected]

Abstract — This paper discusses a method for web content The goal is to develop the solution capable to support
classification and semantic processing applied in a Web each of strategic goals. The complete effectiveness of the
Business Intelligence model for production systems. The solution is achieved when all theoretically available
goal of the research was to formally describe the method, its information, required to achieve the strategic goal, is
procedures, and applicable ontologies. During the extracted from the web and expanded through inference.
evaluation of the related research, a set of appropriate This research uses data and domain knowledge of a
procedures were selected and adapted for the context of this company from 3rd use-case group, operating on the
research. Neural networks were trained using expert- market since 2002. Research dataset is extracted from the
enhanced datasets in order to solve a number of CRM database: the Main Sample with 340 stakeholders'
classification problems. Domain knowledge was described in
web sites. The Primary Sample, randomly populated with
a semantic Lexicon, used by the model to extract semantic
50 entries of the Main Sample is manually analyzed by an
Profiles. Applied ontologies were formally described and an
expert and extended with moderated meta-data. The
evaluation framework was proposed based on real-life data.
results are used for to train neural networks applied for
classification tasks.
I. INTRODUCTION
Using Web as a data source for Business Intelligence II. RELATED RESEARCH
has been recognized more than a decade ago as having a The purpose of Business Intelligence is to help
great potential in terms of further business process effective decision making and strengthening of
automation and integration. From development engineers competitive power trough better handling and processing
to the top management, decision makers of all possible of data about past, present, and anticipated business
levels and roles process information from the web on a environment and business operations. It is often seen as a
daily basis. In most cases, this is performed manually by set of applications and technologies for consolidation,
browsing the Web and typing relevant data into analysis and distribution of vast data loads describing
documents or a database. customers, competitors, the organization itself, the overall
The goal of our research is to develop a model of Web economic situation, and market trends. The Web is
Business Intelligence based on the Semantic Web core recognized as one of the most important sources of
and related technologies. The core feature of the presented external business environment information, therefore all
method is semantic mapping of relevant web content after modern BI systems contain tools and services for
appropriate classification and preprocessing procedures obtaining and processing of Web data [1], [2]. The
are performed. The main input for the method is a web purpose of the Semantic Web is to enable better
domain address of a stakeholder. The output is a cooperation between computers and people, offering new
structured and semantically enhanced knowledge about ways to directly describe relations between information
the stakeholder based on Company Semantic Profile and its meaning [3]. An important concept in the context
ontology. Development use cases are production systems of Semantic Web are ontologies, described as “the
and supply-chain related companies operating in the categories of things that exist or may exist”, enabling the
industry of metal processing. The cases are grouped in semantic understanding of information [4].
three classes (see Table 1) according to their particular A study on Web Mining Research (2000) anticipated
business model respective strategic goal. that further advances of Web content mining will be
Table 1 directed towards information integration in form of Web
Use-cases used for the model development knowledge bases or Web warehouses [2]. The growth of
# Use-case group Strategic goal
the Web, in both quantity and quality of information
1 Companies relied on Increase of outsourcing
outsourced production effectiveness
contained, led to the development of deeper integration
with BIs. Srivastava and Cooley (2003) introduced the
2 Job-shops providing Reaching new customers and
production services to third- production collaboration sub- concept of Web Business Intelligence in an article
parties partners published a decade ago. The underlying assumption,
3 Dealers of metal processing Market penetration for new which constitutes the concept, was that the knowledge
industrial equipment metal processing technologies extraction through Web content mining can be made
actionable by an individual or an organization [5]. An
important starting point and inspiration for our research
was the Business Stakeholder Analyzer proposed by a

Page 444 of 478


ICIST 2014 - Vol. 2 Poster papers

group of researches (2009). The evaluation of the analyzer systems need sets of interconnected data and semantic
confirmed that the web analyzer framework significantly models to communicate and exchange their knowledge.
outperforms the baseline method of stakeholders’ web The current research direction is focused on developing
sites classification, reducing information overload and semantic agreements among collaborating partners for
making the analysis more effective [6]. fixing standardized meanings and relations of
terminologies used[15, p. 910].
A. Semantic Business Intelligence for Manufacturing
Introduction of Semantic Web technologies to research B. Web Template detection and page classification
areas connected to BI created new room for data quality Template detection has an important role in any kind of
and results precision improvement. Many ontology-based Web information retrieval, enabling workload
information extraction and retrieval technologies, optimization and enhanced semantic awareness.
targeting various and heterogeneous data sources, were Information contained within the template has different
developed during the last decade [7][8][9]. A number of semantic context than a paragraph found on a web page -
studies have confirmed that the expansion of traditional BI it may be considered as highly relevant. Typically,
knowledge databases through introduction of semantic template headers and/or footers contain useful general
attributes provides improved performance of a BI service. information like company name, postal address, contacts,
For instance, to develop an improved personalized slogan, etc.
business partner recommendation e-service, a group of The template detection approaches based on a HTML
authors (2012) introduced semantic attributes to describe document structure analysis and common content
additional information about the candidates, resulting in a detection are well researched and evaluated. To support
significant performance increase confirmed by evaluation development and testing of the other components of our
[10]. Immense research efforts were invested into solution, we adopted a template detection algorithm based
automation and improvement of processes of transition on such approach. The Restricted Bottom-Up Mapping for
and data integration from relation database knowledge Template Detection caches the processed web pages as
warehouses to unified triple store based knowledge xPath directed graphs and combines common paths
repositories engaged by BI services [11], [12], [13]. As a approach (between the pages) with common root approach
result of aforementioned developments, a new concept of (inside a page) to construct one Template, representing the
BI solution was recognized by authors recently (2012): common structure and content. The method allows an
Semantic Business Intelligence - SBI [14]. accurate template detection even with a small set of input
It is an essential requirement for a production system pages, sharing one common template [18]. The main
that the information used for design and planning must be drawback of this and such algorithms comes from the
used and utilized efficiently and effectively. Typically, IT semantic blindness of the process.
solutions are applied to: speed up the information flow To overcome the stated above, we are working on a
and disseminating it more widely; and networking semantically aware Template Detection: to exploit basic,
technologies to enhance decision making and increase HTML immanent, semantic meta data, as well as, domain
connectivity and interoperability among information knowledge meta-data discovered using the Native
sources [15, p. 918]. With the advent of the networking, Language Processing and the existing ontologies applied
customers are able to obtain information more quickly and in this research. The algorithm will make use of meta-data
easily, facing production systems with a challenge to produced by the other parts of our solution and
deliver the products at required variety, lower cost and in components primarily developed to resolve the other
shorter lead times [15, p. 919] tasks. Such Semantic Template Detection should provide
Semantic manufacturing knowledge management is a greater overall performance in both Template detection
semantics-based knowledge solidification framework and application, compared to the semantically blind
running over the Internet for managing and developing algorithms. If the evaluation during later development
manufacturing knowledge[16, p. 115]. It benefits from phase confirms the expectations: currently applied
ontology-based knowledge representation - a powerful algorithm - the Restricted Bottom-Up Mapping - will be
scheme to model concepts and relationships that allow the substituted.
design, sharing, and reuse of knowledge in a disciplined There is a wide range of web page classification
way, for both humans and software [15, p. 919]. Semantic methods with different features and fields of application.
Web is extending current web technology to extend Typical methods use on-page features or/and related pages
information with a well-defined meaning that enhances features as input. The most popular representation of a
the interoperability of computers and people. It connects web page content are bag-of-words and set-of-words,
meaning to data and applications, used for automatic according to a survey on classification methods [19]. This
processing. Therefore, it plays a crucial role in automating paper applies a similar approach: web page content is
the functions performed by different manufacturing represented as bag-of-concepts - a collection of
applications [15, p. 910]. DictionaryConcept instances. The Concepts discovery
Semantic Web has great importance in realization of procedure is strongly inspired by the Improved Automatic
the Virtual Enterprise concept for geographically Concept Extractor [20]. The procedure applied in our
distributed engineering conglomeration. The solution is extended with particular preprocessing,
interoperability is typically maintained by independent upgraded tokenization, and deeper Word-To-Concept
software Agents, sharing knowledge through a common mapping using the semantic Lexicon and morphology
ontology. Such ontology based, multi-agent collaborative rules.
framework may be applied to a very demanding To deal with nondeterministic classification problems
manufacturing industries like the Micro Assembly of this research, an introduction of Support Vector
Devices[17, p. 205]. The next-generation manufacturing Machine and Multilayer Perceptron Artificial Intelligence

Page 445 of 478


ICIST 2014 - Vol. 2 Poster papers

algorithms is required. The both are universal the Key Pages helps both workload optimization and
approximators commonly used for pattern classification knowledge extraction trough better detection of the
and nonlinear regression [21, p. 318]. With error back- semantic context, i.e.: the lines containing a personal
propagation algorithm of supervised learning becoming name, last name, and cell phone number have different
available, the MPs attracted a lot of attention and were weight if found on “Guestbook” then “Contacts” or
successfully applied to solve difficult and diverse “Sales” page. After Key Pages are discovered a limited
problems [21, p. 156]. The MP has one or more hidden sample set is processed by Content Template Detection
layers with highly interconnected neurons used to extract algorithm to form a Template used for later extraction of
meaningful features during excessive time-consuming the most relevant content.
training process [21, p. 202]. SVM is one of the most The Key Page Type enumeration, with 11+1 possible
popular neural network frameworks for pattern values, is developed from the Primary Sample analysis
classification and nonlinear regression, based on (Table 2). Page categories having frequency within the
supervised learning, requiring no expert domain sample less than 10% were grouped into “other type”.
knowledge for successful application. The framework From 3 to 12 Key Pages were observed per a web site
provides linear classification with binary output for input within the accessible subset of the Primary Sample (47
vectors consisting of two or more scalar or binary features. cases out of 50). For 51% of the cases one or more Key
Pages that could not fit any category declared in the
III. THE SOLUTION enumeration. However, the enumeration covers more than
To filter company presentation from other site types, a enough categories required for the Semantic knowledge
supervised feed-forward neural network is applied. If the extraction. Unlike binary classification applied in the first
site type criteria are matched, an instance of Company phase, the Key Page Type detection uses a Multilayer
Semantic Profile is created or updated as the output. The Perceptron unit described later.
profile is a set of triples describing gathered business Table 2
intelligence on the stakeholder. The method has 6 phases, The Key Page Types observed in the Primary Sample. The frequency is
each having its own execution flow, logic and application calculated against total count of accessible domains (47/50).
of sub-procedures, where required (see Figure 1). Related
Key Page Type % super-
predicate *
1 About us 98% Organization
2 Contacts 81% Organization
3 Products 66% Portfolio
4 References 43% -
5 Specific product or branch 38% Portfolio
6 Quality statements, certificates / 34% Organization
standards...
7 Downloads / PDF Catalog 32% -
8 Photo gallery 26% -
9 Services 23% Portfolio
10 Map 17% Organization
11 Partners 11% -
Pages with other information type 51% -
Figure 1: The main steps of the WSP method * super-predicate assigned to the ProductionSystem class

The first phase handles loading of the main page for the The Web Site type test is another SVM application for
specified domain. Manual analysis of the Primary Sample binary classification, whose purpose is to prevent
found that 26% of cases has incompatible index page (8% execution of the Company Semantic Profile creation
intro pages, 10% language is not supported; 6% under- process against a web site that is out of targeted
construction or permanent access failure). Therefore, just stakeholder scope. To feed the SVM unit, a site-level and
after initial page load a set of content features is extracted structure-related feature extraction is performed. The last
and passed into two Support Vector Machine units to phase executes the Natural Language Processing
analyze potential incompatibility of the loaded content. procedure: a customized version of a knowledge based
Alternatively, the index page tests could be resolved by a approach proposed back in 2010 [22]. Template, index
set of linear criteria. page, and Key Pages contents are separately processed.
The second phase performs an optimized crawling Results of the processing are used to build a triple-set -
pattern and basic link analysis to discover URLs of the describing the Company Semantic Profile.
web sites’ most important pages. The purpose of this A. Content Classification with Artificial Intelligence
phase is to optimize content retrieval and knowledge
extraction by focusing the process to the most relevant The classification tasks are resolved by four instances
content. Key Page detection starts with initial random of feed-forward neural networks, based on a supervised
sampling of the main page’s internal links, the sample size learning process. Before running any classification
is limited for performance optimization. Once the content procedure, a proper set of features is extracted,
of sampled pages is loaded, a simple frequency ranking of transformed and/or calculated. The current design works
the internal links is performed. Links sharing the top rank with 59 features (10 binary and 49 scalar), divided into
are accepted as the most relevant pages on the site - Key three main groups - according to their origin and
Pages. We'll reconsider the top-rank margin size as well as application context:
top limit of Key Pages to be detected. The classification of  Structural metrics of the loaded content: 16 integer
scalars describing link structure, textual content

Page 446 of 478


ICIST 2014 - Vol. 2 Poster papers

structure, and HTML structure metrics. Relationships


between main structural scalars are described by 5
float scalar ratios.
 Key Pages content metrics: normal text content and
content of heading tags is tested against 8 Concept-
Bags related to one or more Key Page Types. The
resulting 16 scalars are primary content metrics;
additional 8 scalars are obtained by Part of Speech
Tagging and Named Entity Recognition procedures.
 General Site metrics: 4 scalars and 10 binary values: 4
showing what Basic Key Pages are detected; another 6
are triggered by detection of particular concepts inside
the content.
Features of the first group are extracted using simple
Regex queries, against the filtered textual content, and
XPath queries, against XML representation of the loaded Figure 3: The MP neural network resolving Key Page classification
HTML document. Extraction of the second group is
performed after the Key Pages are detected, content Using the features of the General Web Site metrics
loaded and processed by the NLP algorithm. The third group to feed the input layer, the SVM03 instance has to
features group is calculated after Key Page classification. approve or dismiss a web site for further processing and
To keep the number of inner (feature) space dimensions knowledge extraction (see Figure 4). Success rate of the
optimal, we designed SVM units with minimized count of SVM03 is highly dependent on accuracy of the Key Page
input nodes. The input vector of SVM01 contains greater Classification, while both Artificial Intelligence units
number of features (see Figure 2), since it has to deal with depend on the quality of the Lexicon.
three different situations: index is not in the supported
language; the web site is inaccessible or under
construction; the index page is loaded and it is in proper
language. The second unit confirms if the loaded index
page is real home page or an introduction page / splash
screen. Introduction pages typically have short textual
content, only few internal links, and some multimedia tags
for displaying a welcome banner, animation etc.

Figure 4: SVM unit used by 5th phase - the Web Site type check

B. Ontologies for Lexicon and Semantic Profile


Beside typical features of a dictionary, the Lexicon
describes the relations between words and abstract
concepts. It also contains a sort of “triggers” that describe
how and if a content portion related to a set of abstract
concepts should be transformed and mapped to a triplet
with a particular predicate. The classes of the ontology
may be grouped into three levels:
 Low level - all manifestation forms of the known
Figure 2: The SVM unit for testing the initially loaded page words and grammatical contexts related to them.
 Mid level - relations between manifestation forms and
The applied MP instance has typical three-layer their groupings into term instances.
architecture with one hidden layer (see Figure 3). The
input layer receives the signal from 24 integer scalar  High level - meta-relationships of the terms used by
sensors. In accordance with general recommendations, the Knowledge Extraction procedure.
hidden layer contains 12 feature detectors (half of input The class labeling follows the first capital letter rule,
neurons count) linked to 12 output signals mapped to each while the predicate labeling follows the has-prefix rule.
Key Page Type. Every class is affiliated with an appropriate set of object-
type predicates, creating a logical hierarchy between the
classes (see Figure 1). To support the languages with
complex morphologies the manifestation forms and words
(terms) are separately represented by respective OWL
classes: DictionaryWordInstance is a distinct
manifestation form of the DictionaryWord instance. One
distinct manifestation form may be related to more than
one grammatical context, so another dedicated class is
introduced: DictionaryGrammarContext represents a set
of grammatical properties describing a grammatical

Page 447 of 478


ICIST 2014 - Vol. 2 Poster papers

context of a DictionaryWordInstance. Available Logical hierarchy of the main profile class has four
grammatical properties (i.e. case, count, countenance, main branches, containing sub-nodes (see Figure 7).
gender, verb form) are related to the type of the word. Technology branch is particularly important for the goals
Grammatical properties are described via data-type of first use-case group, since it contains knowledge about:
predicates with predefined lists of allowed string values. o Materials, used in production
The ontology has two classes for terms description: o Ability, in technological sense
DictionaryWord to describe a word, it is represented by o Equipment, used in the production
the default manifestation form (nominative singular / Core Business output activities are covered by the
infinitive base) and DictionaryPhrase to manage multi- Portfolio branch containing sub-nodes:
word structures and abbreviations. The last contains a set
of several DictionaryWords and/or abbreviations o Products, produced and offered to the market.
described as DictionaryWordInstance. DictionaryConcept o Services, for third-party production systems
is a high-level class acting as a junction point for a A number of classes describe Products and Services
number of dictionary terms. The relations with and with inherence structure resembling the structure of the
between DictionaryConcept instances are heritable. It has NACE 2.0 business activity specifications, defined by
an important role in page classification procedure. The EuroStat [24]. The scope is focused on the activities
purpose of this class is to provide a “hook” or “trigger” described by the Section C, particularly inside divisions:
connecting lower dictionary classes with the semantic from C22 to C35. The knowledge about Organization has
knowledge extraction classes: Knowledge Fact Criteria main application in Customer Relationship and typical
and Knowledge Fact. The logical hierarchy of the classes Marketing activities. The branch contains following sub-
may be demonstrated trough an example (see Figure 6): nodes:
“Samostalna Trgovinska Radnja” (eng: unlimited liability o Ability, of the company as a whole
retail shop) is a DictionaryPhrase related to: o Events, history of the company, awards, licenses,
 DictionaryWords: (hasPart) certificate-type properties
o “Trgovinsko” (eng: trade-related) o Human resources, labor demography and a list of the
o “Samostalno” (eng. independent, in this context: most important persons, coupled with relevant contact
unlimited liability) information.
o “Radnja” (eng: retail shop, business) The knowledge about Finance performances might be
viewed as a part of an Organization, but it is separated
 DictionaryWordInstances: (hasInstance) into a stand-alone branch, more for technical reasons than
o Abbreviations: “S.T.R”, “STR”, “s.t.r.” for reasons of the business logic. While all other branches
may populate their content from the web site, with
reasonable reliability, in terms of information quality, the
Finance needs a data source with greater credibility than
content created by marketing department of a stakeholder.
In the great majority of the countries worldwide, there is a
state funded business registry with publicly available
financial statements and other performance data for every
legal business within the country. Therefore, this branch is
to be populated by data retrieved from a relevant state
agency.
From Extract-Transform-Load point of view Portfolio
and Technology may be considered as complement
knowledge branches, since knowledge about products and
services, if combined with expert knowledge about the
industry, may help to predict information for the
Figure 5: The Lexicon ontology classes and predicates Technology branch - and vice versa. Technology itself has
great knowledge inference potential if a proper
Company Semantic Profile contains a collection of technological expert utilizes knowledge applied by the
semantic triples, formed in accordance to the profiling system.
ontology. The ontology is designed to reflect both the
application case (see Table 1) and the anticipated potential
reach of knowledge extraction from a company web site
content and other web sources. The ontology is designed,
also having in mind, to provide foundation for
interoperability between production network peers.
However, a survey (2011) found that there is a persistent
quality problem with existing, dedicatedly developed,
interoperability semantic standards [23]. Having this in
mind, we limited the scope of our ontology to use-case
specifics, avoiding the challenge of developing a
comprehensive framework, having interoperability
domain fully supported too. Figure 6: Nodes of Production System profile class

Page 448 of 478


ICIST 2014 - Vol. 2 Poster papers

IV. CONCLUSION [8] M. Fernandez, I. Cantador, V. López, D. Vallet, P. Castells,


and E. Motta, “Semantically enhanced Information Retrieval:
The main contribution of the research is the integration an ontology-based approach,” Web Semant. Sci. Serv. Agents
and adaptation of various technologies and algorithms World Wide Web, vol. 9, no. January, pp. 434–452, 2011.
into one comprehensive solution to deal with real-world [9] E. ORO, M. RUFFOLO, and D. SACCÀ, “ONTOLOGY-
business challenges. The Web Business Intelligence BASED INFORMATION EXTRACTION FROM PDF
solution, proposed in this research, is designed to support DOCUMENTS WITH XONTO.,” Int. J. Artif. Intell. Tools,
vol. 18, no. 5, pp. 673–695, Oct. 2009.
strategic goals specific to the selected use-case groups of [10] J. Lu, Q. Shambour, Y. Xu, Q. Lin, and G. Zhang, “A WEB-
sheet metal manufacturing and technology companies. BASED PERSONALIZED BUSINESS PARTNER
The solution utilizes algorithms developed by several RECOMMENDATION SYSTEM USING FUZZY
computing science subfields: web crawling and web SEMANTIC TECHNIQUES,” Comput. Intell., p. no–no,
content retrieval, content template detection, Artificial 2012.
Intelligence classification of web sites and web pages, [11] M. Leida, A. Gusmini, and J. Davies, “Semantics-aware data
integration for heterogeneous data sources,” J. Ambient Intell.
internal structured and non-structured data acquisition Humaniz. Comput., vol. 4, no. 4, pp. 471–491, 2013.
with Business Information System integration, semantic [12] S. Nedjar, R. Cicchetti, and L. Lakhal, “Extracting semantics
Lexicon supported Natural Language Processing, in OLAP databases using emerging cubes,” Inf. Sci. (Ny)., vol.
ontology based Knowledge Extraction and triplestore 181, no. 10, pp. 2036–2059, May 2011.
based knowledge storage system. [13] S. Toivonen and T. Niemi, “Describing Data Sources
Semantically for Facilitating Efficient Creation of OLAP
The most of articles published on the semantic Cubes,” in Poster Proceedings of the Third Interntional
knowledge extraction and natural language processing are Semantic Web Conference, 2004.
focused on English language, benefiting from huge [14] D. Airinei and D.-A. Berta, “Semantic Business Intelligence -
resources available like: WordNet and DbPedia. In this a New Generation of Business Intelligence,” Inform. Econ.,
research, we are discussing an application of the semantic vol. 16, no. 2, pp. 72–80, 2012.
technologies for a morphologically complex language, [15] N. Khilwani, J. a Harding, and a K. Choudhary, “Semantic
supported by very few existing resources. One of the web in manufacturing,” Proc. Inst. Mech. Eng. Part B J. Eng.
Manuf., vol. 223, no. 7, pp. 905–924, 2009.
general contributions of our research is the application of [16] J. Zhou and R. Dieng-kuntz, “A SEMANTIC KNOWLEDGE
the semantic technologies in very specific and rarely MANAGEMENT SYSTEM FOR KNOWLEDGE-
researched scope: bounded by the language and the INTENSIVE MANUFACTURING,” in IADIS
specific business application environment of the INTERNATIONAL CONFERENCE e-COMMERCE 2004,
production systems. 2004, pp. 114–122.
[17] G. C. F. to R. V. E. U. 3APL Narayanasamy, J. Cecil, and T.
Shortcomings of the currently proposed solution are: C. Son, “A Collaborative Framework to Realize Virtual
high dependence on the domain knowledge ontology and Enterprises Using 3APL,” in DALT’06 Proceedings of the 4th
Lexicon; missing evaluation of the proposed Artificial international conference on Declarative Agent Languages and
Intelligence units design; development of the dedicated Technologies, 2006, pp. 191–206.
Semantic Company Profile ontology instead of adopting [18] K. Vieira, A. L. Costa Carvalho, K. Berlt, E. S. Moura, A. S.
an existing ontology with established user base; ETL Silva, and J. Freire, “On Finding Templates on Web
Collections,” World Wide Web Internet Web Inf. Syst., vol. 12,
algorithm is still in planning phase; more complex NLP no. 2, pp. 171–211, 2009.
might decrease demand on precise domain knowledge [19] X. Qi and B. Davison, “Web Page Classification: Features and
repository. Enlisted shortcomings are, at the same time, Algorithms,” ACM Comput. Surv., vol. 41, no. 2, 2009.
declared goals for the future research. After evaluation [20] Y. Zhang, R. Mukherjee, and B. Soetarman, “Concept
and design adjustment for greater knowledge quality, the extraction and e-commerce applications,” Electron. Commer.
overall optimization will follow to take advantage of Res. Appl., vol. 12, no. 4, pp. 289–296, Jul. 2013.
single-platform integration of various algorithms. [21] S. Haykin, Neural Networks: A Comprehensive Foundation,
2nd ed., vol. 8, no. 3. New Yarsey, USA: Prentice Hall, 1999,
p. 897.
REFERENCES [22] A. Arya, V. Yaligar, R. D. Prabhu, R. Reddy, and R.
Acharaya, “A Knowledge Based Approach for Recognizing
Textual Entailment for Natural Language Inference using
[1] D. G. Gregg and S. Walczak, “Adaptive web information Data Mining.,” Int. J. Comput. Sci. Eng., vol. 1, no. 6, pp.
extraction,” Commun. ACM, vol. 49, no. 5, pp. 78–84, 2006. 2133–2140, Nov. 2010.
[2] R. Kosala and H. Blockeel, “Web Mining Research: A [23] E. Folmer, P. Oude Luttighuis, and J. Hillegersberg, “Do
Survey,” ACM SIGKDD Explor. Newsl., vol. 2, no. 1, p. 15, semantic standards lack quality? A survey among 34 semantic
2000. standards,” Electron. Mark., vol. 21, no. 2, pp. 99–111, 2011.
[3] E. Miller and R. Swick, “An Overview of W3C Semantic [24] EUROSTAT, NACE Rev. 2 - Statistical classification of
Web Activity,” Bull. Am. Soc. Inf. Sci. Technol., no. economic activities in the European Community.
April/May, pp. 8–15, 2003. Luxembourg: Office for Official Publications of the European
[4] E. Jacob, “Ontologies and the Semantic Web,” Bull. Am. Soc. Communities, 2008, pp. 1–369.
Inf. Sci. Technol., vol. 29, no. 4, pp. 19–22, 2003.
[5] J. Srivastava and R. Cooley, “Web Business Intelligence:
Mining the Web for actionable knowledge,” INFORMS J.
Comput., vol. 15, no. 2, pp. 191–207, 2003.
[6] W. Chung, H. Chen, and E. Reid, “Business stakeholder
analyzer: An experiment of classifying stakeholders on the
Web,” J. Am. Soc. Inf. Sci. Technol., vol. 60, no. 1, pp. 59–74,
2009.
[7] H. Sun, W. Fan, W. Shen, and T. Xiao, “Ontology-based
interoperation model of collaborative product development,”
J. Netw. Comput. Appl., vol. 35, no. 1, pp. 132–144, Jan.
2012.

Page 449 of 478


ICIST 2014 - Vol. 2 Poster papers

An Overview of Selected Visual M2M


Transformation Languages
Vladimir Dimitrieski*, Ivan Luković*, Slavica Aleksić*, Milan Čeliković*, Gordana Milosavljević*
* University of Novi Sad/Faculty of Technical Sciences, 21000 Novi Sad, Serbia
{dimitrieski, ivan, slavica, milancel, grist}@uns.ac.rs

Abstract—Although many model transformation languages nonfunctional requirements, we feel that currently used tools
currently exist, only a small number of them has a visual rely too much on textual syntax. This often slows down the
concrete syntax. As a good visual syntax usually allows better whole modeling process, as only a small number of modelers
communication of ideas and easier understanding of are proficient in these languages. Furthermore, they cannot
specifications than textual syntax, we feel that visual
easily communicate with their colleagues that are inexperienced
transformation languages are unjustifiably neglected. In this
paper we give an overview of most popular visual model in the transformation languages.
transformation languages: AGG, AToM3, VMTS, Henshin, A good visual language would allow easier communication
and QVT. and more understandable specification. Hence, we present an
overview of visual languages in order to provide modelers with
I. INTRODUCTION useful details that may help them to select the most appropriate
Model transformations are essential part of model driven language for their problem domain. In this paper we present
software engineering (MDSE) [1]. This is due to the fact that currently used visual transformation languages: AGG, AToM3,
models are in no way static entities. Many operations, such as VMTS, Henshin, and QVT. In each of these languages, we
refactoring, semantic mapping, and translation, are specified in have implemented a sample transformation from the class to the
a form of model transformations, thus defining model’s relational model. Although we have implemented the whole
dynamic behavior. As a dynamic model behavior represents an transformation, in this paper we present only the rule that
important part of MDSE, it has been discussed in many transforms a non-abstract class to a table, by analyzing the
research papers. In these papers model transformations have following two questions: “how matching patterns are
been thoroughly defined and classified. However, to the best of specified?” and “how constraints are specified?” By this, we
our knowledge, only a small number of such papers are focused may conclude about the level of satisfaction of the
on transformation languages’ visual syntax. aforementioned nonfunctional requirements.
Whether to use visual or textual concrete syntax is an issue The paper is structured as follows. In Section 2, we present
present from the beginning of MDSE. In the hands of an expert, related work. In Section 3 we present the notions of graph and
textual syntax represents the means to specify models in a fast hybrid transformation approaches. In Section 4, we give an
and precise way. At the same time, it allows easier versioning overview of some of the commonly used visual transformation
and usage of existing tools. On the other hand, such languages with the focus on their visual syntax. Section 5 gives
specification may be too verbose and long for humans to easily conclusions.
process. Additionally, such verbose specifications are often
II. RELATED WORK
difficult to understand by inexperienced users of the modeling
language, which can lead to a poor communication of ideas. To the best of our knowledge, there are few papers that
Therefore, the principles of good visualization could be applied, compare visual transformation languages. However, none of
as a good graphical notation may allow people that are them focuses on visual syntax.
unfamiliar with the textual syntax to easily recognize all ideas in Taentzer et al. [5] present a comparison of four different
behind the formal specifications [2]. graph-transformation approaches based on a common sample
Among transformation language nonfunctional requirements transformation. They are more focused on the expressiveness of
presented in [3], four requirements are directly related to these graph transformation languages, instead of their visual
languages’ concrete syntax. Authors argue that the syntax. On the other hand, Lengyel et al. in [6] present their
understandability of a transformation specification is greater for graph transformation tool VMTS with the focus on a
a language that uses a visual syntax than the one based on the specification of constraints. They offer a comparison of VMTS
textual one, since humans as visual beings are able to process to other graph transformation languages from a point of
images easier than text. Thus, visualization is another constraint language’s expressiveness. In addition to previous
nonfunctional requirement that favors visual syntax. comparisons, Vara reviewed most of the modern model
Furthermore, verbosity and conciseness are also in favor of transformation languages [7]. Similarly, Czarnetcky et al. [8]
good visual languages. Visual languages that are at a right level presented an overview of transformation approaches and
of verbosity and conciseness keep a diagram easy to understand currently used model transformation languages. In contrast to
and explain. all these works, we are focused here on the language visual
In a survey of Italian IT industry [4], only a small number of syntax.
companies that are applying MDSE principles use model
transformations in their project. In the light of aforementioned

Page 450 of 478


ICIST 2014 - Vol. 2 Poster papers

III. PRELIMINARIES ON MODEL TRANSFORMATIONS


APPROACHES
Model transformations can be classified as Model-to-Model
(M2M) or Model-to-Text (M2T) transformations. M2M
transformations are the ones responsible for transforming one
model into another. M2T transformations transform models to a
textual file. M2T transformations are most commonly used in
software industry in the process of code generation [9]. In this
paper we focus on M2M transformations only. Currently, a
number of M2M transformation paradigms exist. Czarnecki and Figure 1 Simplified class meta-model
Helsen [8] distinguish among seven of them: direct-
manipulation, structure-driven, operational, template-based,
relational, graph-transformation-based, and hybrid approaches.
In this paper we present graph and hybrid approaches. We
consider graph approach as it is intended to be used in a visual
way. Graph-transformation-based approaches are usually using
visual concrete syntax to represent most important
transformation concepts. Graph transformation consists of
applying an iterative rule to a graph. Each application of a rule
transforms a graph by replacing its part by a new graph. For this Figure 2 Simplified relational meta-model
purpose, each rule contains a left-hand side (LHS) and a right- In the following subsections we overview the following
hand side (RHS). The application of a rule to a graph replaces graph-based visual transformation languages: AGG, AToM3,
each occurrence of the LHS in a graph by the RHS. VMTS, and Henshin. The only visual hybrid language
Additionally, the notions of applicability conditions (ACs) and presented is QVT. In the end, we present a summary of our
negative applicability conditions (NACs) are used to limit the observations.
application of these replacements. Together, they can be
considered as means of constraint specification. Therefore, such A. AGG
languages usually have three separate modeling parts in their AGG [12] is a tool environment for algebraic graph
visual representations: RHS, LHS, and NACs/ACs. In the transformation systems. Graphs in AGG are defined by a type
reminder of this subsection, visual graph-transformation graph on different abstraction levels, and as such they may be
languages are described. attributed by any kind of Java objects. Graph transformations
Hybrid approaches may combine all other approaches in a can be equipped with arbitrary computations on these Java
single transformation specification. These approaches can be objects described by Java expressions. The AGG environment
combined as separate components or, in a more fine-grained consists of a graphical user interface comprising several visual
fashion, at the level of individual rules. We present them here editors, an interpreter, and a set of validation tools.
mainly because they are a de facto standard to specify In AGG, two visually separated parts of a rule specification
transforming models. Of all hybrid approaches, only exist: LHS and RHS patterns. Before specifying any graphs,
Query/View/Transformation (QVT) [10] offers a graphical including aforementioned patterns, node and edge types must
syntax in support for transformation specifications. be specified. These are main building blocks of any graph and
may be provided with different colors and shapes. This allows
IV. OVERVIEW OF VISUAL MODEL TRANSFORMATION
easier visualization and idea communication by modelers as
APPROACHES
they are able to emphasize important parts of their models, i.e.
To analyze various visual transformation approaches, we graphs. Node coloring is particularly helpful in case of big
have specified a transformation of a class diagram to a relational diagrams to differentiate between source and target modeling
model. For each transformation approach, we show only a rule elements. Additionally, labeling elements with numbers allows
which specifies transformation of a non-abstract class to a table. modeler to denote whether a LHS element is deleted from a
Only non-abstract classes are to be transformed and a name of model or not.
the table should be the same as the class name. Our selection of Constraints are modeled in a form of ACs. AGG allows a
such example has been motivated mainly by its simplicity, since specification of negative, positive, and general application
our goal is to overview just a visual syntax, without going into conditions. All three of them are specified in a visual way. For
details about transformation specifications or underlying that purpose dedicated part of the tool’s workspace is designed
transformation mechanics. for a specification of ACs. Nevertheless, the specification is
As we need a class and relational meta-models to participate performed in the same way as for patterns. Furthermore, in
in the transformation, we have implemented them in each of order to set an execution order of its rules, AGG allows rules to
presented environments. Figure 1 and Figure 2 depict these be put in layers that are executed sequentially. Unfortunately,
meta-models implemented in Ecore which is a part of Eclipse this type of constraint could not be directly set in the diagram.
Modeling Project [11].

Page 451 of 478


ICIST 2014 - Vol. 2 Poster papers

concrete one. The usage of a concrete syntax is enabled by the


AToM3’s ability to import arbitrary formalisms, i.e. meta-
models, in its pattern editor. Therefore, it allows a modeler to
use the same concepts, buttons, and icons for transformation
rule specification as the ones used for model specification.
Furthermore, this concrete syntax approach is also very helpful
while running simulations on models specified in the AToM3
environment. Therefore, each step of transformation is
visualized and each change in the model is animated at the level
of a concrete syntax.
In AToM3, generalized ACs may be used as well as pre-
conditions and post-conditions to events. Constraints may be
Figure 3 AGG class2table rule specification and example model
defined by either a semantic or visual way. For each
component, a constraint on its attributes may be defined.
In Figure 3, an AGG rule specification is depicted. In the AToM3 allows a modeler to define whether a LHS pattern
bottom part of the AGG editor, an example of a model is element must have a specific value assigned to its property or
presented after an execution of the transformation. The any value is permitted for that property. Additionally, a global
transformation rule itself is presented in the top part of the rule pre-condition and post-condition may be specified. They
figure. Rule specification comprises three visually separated are specified in Python programming language and may be
compartments in which a modeler may specify AC, LHS, and used to control the execution of the whole rule. For example,
RHS respectively. This visual separation, offers a great insight post-condition is often used to specify a rule termination
in a transformation execution as a modeler is able to see a rule condition based on the evaluation of all previously created
specification while executing it step-by-step. In order to elements. Actions may also be specified. They are also specified
terminate the transformation from a class to a table, in this using Python, and they represent an imperative section of the
example we have specified NAC to be equal to the RHS. Thus, rule. Whenever the rule is executed, the action code is run, thus
if a class is associated with a table by a C2T node, that class is allowing more control over a flow of transformation. All of the
not to be considered again. C2T node represents a kind of aforementioned constraints are not specified directly on a
traceability links that help a modeler to trace which target transformation diagram, but through several dialogs. This
element is created from which source element. Furthermore, lowers the ability to see a whole transformation in one diagram.
values from source to target elements are passed by variables. In
this example, we have passed a value of name attribute from a
class to a table using a variable x. In order to preserve an
element in a model, element should be labeled with a same
number on both LHS and RHS. In the example presented in
Figure 3, a class from the LHS would be present in the target
model, as there is a class in the RHS with the same label
number. If those numbers are different, original class is to be
deleted and a new one is to be created.
B. AToM3
A Tool for Multi-formalism and Meta-Modeling (AToM3) Figure 4 AToM3 class2table rule specification
[13] is one of the first tools providing an integrated meta- Figure 4 depicts a transformation rule which adds one table
modeling environment. It allows a definition of meta-models, to a model for each class that is specified. All icons in this rule
concrete syntax, and model transformations. Additionally, it are previously specified as concrete syntax icons for Class and
allows simulation of user defined models’ transformation in a Table meta-elements. Similarly to AGG, AToM3 uses layers to
step-by-step mode. Such simulation is of a big help to define the application order of the transformation rules. Labels
transformation developers, as it allows them to debug their with numbers are used to distinguish between elements that are
transformations’ execution. The abstract syntax induced by the present in both LHS and RHS. In this example, class element is
meta-model provides definitions of all entities, their attributes, present in both patterns and is labeled with number 1 in the top-
possible connections and constraints amongst them. As users of left corner of Class element. It allows the environment to decide
a modelling environment are more often working with concrete whether to delete element from LHS after the rule execution.
syntax of transformed models, all transformations are defined at The same number in our example means that a class remains in
the level of that concrete syntax. the model while new table instance is added.
All patterns, LHS, and RHS are specified using a concrete C. VMTS
syntax of transformed models unlike other presented The Visual Modeling and Transformation System (VMTS)
approaches where generic icons are used. This allows [14] is a graph-based, domain-specific modeling environment.
transformation developers to visualize their transformations and The system provides a graphical interface for defining and
communicate their ideas in a more suitable way. The usage of transforming domain specific languages. VMTS allows visual
concrete syntax in transformation specifications is unique definition of meta-models and generation of domain specific
among all presented tools, since all others provide specifying modeling languages. Additionally, it allows a definition of
transformations using a generic abstract syntax instead of a

Page 452 of 478


ICIST 2014 - Vol. 2 Poster papers

model transformations using either a visual language or a other element properties. Figure 5 depicts a property view of the
textual, template-based language. selected relation between the class and the table nodes.
The visual specification comprises LHS and RHS connected
together in a rule, where connection holds the definition of an
operation that is executed. As it is a graph-based language, all
patterns are specified using nodes and edges. Nodes are Figure 6 VMTS transformation flow specification
specified by selecting a meta-element that is to be used, as well Figure 6 depicts the transformation flow for our example.
as an operation over the element. Allowed operations are delete, After the execution is initiated, CreateTable rule is executed.
modify, create, and no process. As the operation is selected, the CreateTable is the only rule that is executed in this example.
node’s visual representation changes color. Deleted elements After it is successfully executed, transformation is terminated.
are red, modified are gray, created are green, and not processed
ones are white. This behavior is very similar to the one of AGG. D. Henshin
The only difference is that in AGG arbitrary color may be Henshin [16] and its predecessor Transformation generation
assigned to an element, while in VMTS all colors are (Tiger) [17] are transformation languages and tools that best fit
predefined. Furthermore, VTMS is the only presented into graph-transformation approach. In contrast to other
environment that visually differentiates between various presented tools, they are developed as the Eclipse environment
element relations. Relations of four types may be placed on a plugins. Their main goal is to support various kinds of EMF
transformation diagram. Matched relation represents an edge of model modifications such as refactoring, introduction of design
source or target model. It is depicted by a solid arrow. Matched patterns and other modeling patterns. Therefore, they represent
reference represents a reference from one node to another and an in-place transformation approach, operating directly on EMF
replaced relation is used when a relation needs to be replaced in models. In contrast to Tiger, Henshin offers extended set of
a target model. These relations are represented with gray dashed declarative language constructs. Tiger’s basic concept of
arrow and yellow solid arrows. Similarly, replaced reference is transformation rules is enriched by powerful application
used when a reference needs to be replaced and it is represented conditions and flexible attribute computations based on Java or
with a yellow dashed arrow. JavaScript. The Henshin tool environment consists primarily of
After transformation rules are specified, transformation’s a fast transformation engine, several visual editors, and a state
control flow is also modeled. It is represented by a directed space. Since these transformation concepts are based on graph
graph, the nodes of which are the rule containers. Rule transformation concepts, it is possible to translate the rules to
containers represent a rewriting rule that is used for a AGG, where they might be further analyzed.
transformation of source to target graph. Special visual syntax is Visual syntax is somewhat different from previously
used for a specification of the control flow. Each flow has a start presented graph-transformation languages. It is developed in the
and an end state. Each execution follows an execution flow Eclipse environment and utilizes Graphical Editing
comprised of rule containers connected with flow edges. Framework’s (GEF) look and feel. Similarly to VMTS, RHS,
In contrast to AGG and AToM3, which use Java and Python LHS, and ACs are all a part of the same rule specification. In
languages, VMTS uses Object Constraint Language (OCL) order to alleviate some problems regarding readability of such
[15] for constraint specifications. OCL offers more declarative diagrams, the tool uses colors and annotations to distinguish
constructs than other programming languages used for the same between different patterns. Elements that are created in a RHS
purpose. Constraints may be specified for each transformation’s only are annotated with create label and are colored in green, by
meta-element. Constraints are not placed directly on a default. Elements from LHS that are not deleted in target model
transformation diagram. Instead, they are specified in the are annotated with preserve label. Elements for deletion are
element’s property editor. Hence, it is not always easy to annotated with delete label, and colored in red. The colors may
comprehend a whole diagram without going into details of each also be manually set, allowing a modeler to further emphasize
element. on some elements.
In addition to a rule specification, transformation unit is also
visually specified. Transformation unit represents a collection of
sequentially executed rules with a goal to transform a source to
a target model. They are visually represented as linear execution
flows of already specified rules. These flows are similar to the
ones in VMTS language. Notable difference is that Henshin’s
flows are specified in the same diagram as the rest of the rules.
This may be particularly useful in perceiving a big picture of a
transformation.
Figure 5 VMTS class2table rule specification
Patterns may be annotated with the AC and NAC constructs.
In Figure 5, a VMTS class to table transformation rule is
These conditions are either logical expressions or ACs which
depicted. Transformation editor is divided into two areas. On
are extensions of the original graph structure. A rule may be
the left hand side, transformation is specified using visual
applied to a host graph only if all application conditions are
symbols for nodes and edges. In this example, a class is being
fulfilled. All ACs are shown in the same rule diagram with
transformed to a table. As a table is created in a target model, it
other transformation pattern elements. Application conditions
is represented with a blue color. The right area of the visual
editor is a property view. It shows all specified constraints and

Page 453 of 478


ICIST 2014 - Vol. 2 Poster papers

are colored in blue and distinguished from other elements by diagram. It divides the diagram into two separate parts, i.e.
annotation labels, such as forbid. domains. A name of each domain is given on each side of the
transformation symbol. In the example, LHS domain is named
uml while RHS domain is named rdbms. Capital letters shown
in the transformation depict whether the transformation in that
direction is enforced or check-only.

Figure 7 Henshin class2table rule and transformation unit specification


The bottom part of Figure 7 depicts a class2table rule
specified in Henshin. A class is transformed into a table and a
traceability link is created. The class is preserved in the target
model, while the table is created and its name is set to Figure 8 QVT class2table rule specification
correspond to the class name. Aforementioned node When clause that specifies the conditions under which the
characteristics are specified with annotations above the node relationship needs to hold is presented in the first box below the
names. Passing an argument is done by variables in the same transformation rule. It specifies that this rule must hold when
way as in AGG or AToM3. In this example, a variable named package to schema transformation is executed, as a schema
className is used for passing a value of a class name to a table must contain all tables that are previously created. Where clause
name. For each table that is created from a class, a trace is that specifies the condition that must be satisfied by the model
created. It allows a modeler to see which tables are created from elements that are being related. In our example, before creating
which class. This is very important in a case of very large all of the tables, table’s columns must be created from attributes.
number of classes in a source model as it allows a modeler to Where clause is specified in the second box below the rule
know which table is created from which class. specification.
E. QVT F. Summary
Query/View/Transformation (QVT) is a hybrid The only presented language that uses concrete syntax of
transformation language supported by the Object Management transformed models is AToM3. The usage of concrete syntax
Group (OMG). QVT is of a hybrid declarative/imperative allows a modeler to use same concepts as the ones used while
nature, with the declarative part being split into two-level specifying models. Therefore, using concrete syntax makes the
architecture: QVT Relational and QVT Core. A user-friendly specification easier and transformations more understandable.
Relations meta-model and concrete syntax supports complex Additionally, it allows very good visualization of transformation
object pattern matching and object template creation. As execution simulation. Other tools use generic icons to represent
diagrammatic notations have been a key factor in the success of meta-elements participating in a transformation. In order to ease
their Unified Modeling Language (UML), OMG specified a modeler’s task while working with generic icons, AGG,
visual syntax of QVT Relational to complement its textual Henshin, and VMTS employ different colors of such elements.
syntax [10]. However, currently there are no QVT tools that In these tools, colors denote different operations and states of
allow a specification of transformations in a visual way. All such elements. Henshin and AGG allow a modeler to choose
visual representations are generated from textual specifications. arbitrary colors for their elements, while VMTS colors are
QVT notation is similar to the notation of UML class predefined for each operation executed on some element.
diagrams enriched with additional elements for transformation Additionally, AGG provides different shapes for elements.
specification. For each meta-element specified in any pattern, its Therefore, a modeler may choose not only the color of a
attributes are shown. Attribute values are also shown in a visual specific element but also its shape. This further improves
representation. If an attribute is assigned a variable instead of a diagram readability.
raw value, the variable may be used in the opposite pattern to QVT’s syntax is similar to other OMG languages, such as
pass a value to another attribute. UML. Transformation characteristics are shown in the center of
OCL constraints may be specified by attaching a note to an the diagram and are easily readable. Downside of its visual
object or pattern. Such note is shown within a diagram. syntax is that QVT transformations may be difficult to read if
However, the OLC expression is shown only after opening the there are many elements present in a diagram. They do not
note. Other constraints, like when and where clauses are also deploy any differentiation mechanisms like applying different
shown in the visual representation of a rule. Where and when colors or shapes for elements. Similarly, Henshin’s diagrams
boxes are shown at the bottom of a rule, if their respective may be overcrowded as it encourages so called multi-
clauses exist in its specification. transformation rules. Multi-transformation rules allow one rule
to represent several different rules in one compact way. This
Figure 8 depicts the class to table transformation
may lead to the mainly overcrowded rule specifications. In
specification using the visual syntax of QVT Relational.
Transformation is represented by a hexagon in the middle of the

Page 454 of 478


ICIST 2014 - Vol. 2 Poster papers

contrast to Henshin, QVT improves readability by a total that would allow modelers to easily specify transformations at a
separation of LHS and RHS. platform independent (PIM) level. Afterwards, such
Henshin and VMTS allow explicit specifications of the specifications are to be transformed into platform specific
transformation execution flows. As VMTS tool has a specific specifications in a hybrid transformation language. Therefore,
editor for this purpose, a modeler must constantly switch editors such visual PIM-level transformation language would provide a
in order to specify execution flow and transformation rules. On single visual syntax for transformation specification.
the other hand, Henshin allows a specification of both rules and Additionally, platform specific specification would allow
flows in the same editor. This greatly improves the speed of transformations to be executed on a desired platform.
specifying transformations, as well as their understandability. ACKNOWLEDGMENT
All languages provide constraints specification over The research presented in this paper was supported by
transformation elements. However, only QVT and VMTS use Ministry of Education and Science of Republic of Serbia, Grant
OCL as a language for constraint specifications. This is of a III-44010: Intelligent Systems for Software Product
great importance, as OCL is a de facto standard provided by a Development and Business Support based on Models.
variety of tools for the constraint specification. In all presented
languages, constraints may be specified on each element but not REFERENCES
all tools provide specifications directly on the diagram. AGG [1] S. Sendall and W. Kozaczynski, “Model transformation: The heart and soul
and AToM3 only support labeling of elements in order to of model-driven software development,” Softw. IEEE, vol. 20, no. 5, pp. 42–
differentiate between element instances. On the other hand, 45, 2003.
[2] I. Dejanovic, M. Tumbas, G. Milosavljevic, and B. Perisic, “Comparison of
VMTS tool shows a property view of the selected element, but Textual and Visual Notations of DOMMLite Domain-Specific Language.,”
it is overcrowded with various properties which make it hard to in ADBIS (Local Proceedings), 2010, pp. 131–136.
read. On the other hand, AGG allows an explicit visual [3] S. Nalchigar, R. Salay, and M. Chechik, “Towards a Catalog of Non-
modeling of its applicability condition. This is very useful for Functional Requirements for Model Transformations,” in Proceedings of the
Second Workshop on the Analysis of Model Transformations (AMT 2013),
transformation debugging as a modeler is able to execute Miami, FL, USA, 2013.
transformation step-by-step having the whole rule specification [4] F. Tomassetti, M. Torchiano, A. Tiso, F. Ricca, and G. Reggio, “Maturity of
visible including specified constraints. software modelling and model driven engineering: a survey in the italian
industry,” in Evaluation & Assessment in Software Engineering (EASE
V. CONCLUSIONS 2012), 16th International Conference on, 2012, pp. 91–100.
[5] G. Taentzer, K. Ehrig, E. Guerra, J. De Lara, L. Lengyel, T. Levendovszky,
In this paper we give a brief overview of five transformation U. Prange, D. Varró, and S. Varró-Gyapay, “Model transformation by graph
approaches: AGG, AToM3, VMTS, Henshin, and QVT. We transformation: A comparative study,” in Proc. Workshop Model
focus our attention to their visual syntax as it directly influences Transformation in Practice, Montego Bay, Jamaica, 2005.
[6] L. Lengyel, T. Levendovszky, G. Mezei, and H. Charaf, “Model
understandability, visualization, verbosity, and conciseness of transformation with a visual control flow language,” Int. J. Comput. Sci.
transformation approaches. We have asked two questions: “how IJCS, vol. 1, no. 1, pp. 45–53, 2006.
matching patterns are specified?” and “how constraints are [7] J. M. Vara Mesa, “M2DAT: a Technical Solution for Model-Driven
specified?” These questions helped us to focus on the two most Development of Web Information Systems,” 2009.
[8] K. Czarnecki and S. Helsen, “Feature-based survey of model transformation
important aspects when choosing a transformation approach approaches,” IBM Syst. J., vol. 45, no. 3, pp. 621–645, 2006.
and appropriate tooling. These two aspects are: (i) specification [9] M. Brambilla, J. Cabot, and M. Wimmer, Model-Driven Software
of patterns containing transformed elements and (ii) Engineering in Practice. San Rafael: Morgan & Claypool Publishers, 2012.
specification of pattern constraints. [10] “Query, View, and Transformation (QVT).” [Online]. Available:
http://www.omg.org/spec/QVT/1.1/PDF/. [Accessed: 17-Dec-2013].
From the literature overview, we conclude that there is a lack [11] “Eclipse Modeling Project (EMP).” [Online]. Available:
of information about visual syntaxes of transformation http://projects.eclipse.org/projects/modeling. [Accessed: 04-Jan-2014].
approaches. Therefore, a modeler has to try each transformation [12] G. Taentzer, “AGG: A tool environment for algebraic graph transformation,”
in Applications of Graph Transformations with Industrial Relevance,
language before a selection of the most suitable one. Our aim is Springer, 2000, pp. 481–488.
to assist a modeler in this process by providing an overview of [13] J. de Lara and H. Vangheluwe, “AToM3: A Tool for Multi-formalism and
common approaches. Based on our analysis, we conclude that Meta-modelling,” in Fundamental Approaches to Software Engineering, R.-
D. Kutsche and H. Weber, Eds. Springer Berlin Heidelberg, 2002, pp. 174–
Henshin provide the most complete visual syntax of all
188.
presented approaches. It is concise and easily understandable. [14] T. Levendovszky, L. Lengyel, G. Mezei, and H. Charaf, “A systematic
Henshin’s editors are the most compact ones allowing a approach to metamodeling environments and model transformation systems
definition of both rules and execution flows in a single diagram. in VMTS,” Electron. Notes Theor. Comput. Sci., vol. 127, no. 1, pp. 65–75,
2005.
Additionally, Henshin is an Eclipse plug-in. It is an advantage,
[15] “Object Constraint Language (OCL).” [Online]. Available:
as Eclipse is a widely used tool by the modeling community. http://www.omg.org/spec/OCL/2.3.1/PDF/. [Accessed: 30-Dec-2013].
In our future work, we plan to perform a larger comparative [16] T. Arendt, E. Biermann, S. Jurack, C. Krause, and G. Taentzer, “Henshin:
advanced concepts and tools for in-place EMF model transformations,” in
study that includes other visual transformation languages such Model Driven Engineering Languages and Systems, Springer, 2010, pp.
as PROGRESS, GReAT, and VIATRA. Additionally, we have 121–135.
notified a lack of hybrid transformation approaches with visual [17] E. Biermann, C. Ermel, L. Lambers, U. Prange, O. Runge, and G. Taentzer,
syntax as they often have only a textual syntax to be used by a “Introduction to AGG and EMF Tiger by modeling a conference scheduling
system,” Int. J. Softw. Tools Technol. Transf., vol. 12, no. 3–4, pp. 245–261,
modeler. This may lead to hardly understandable specifications, 2010.
especially for people inexperienced in these languages.
Therefore, we plan to develop a visual transformation language

Page 455 of 478


ICIST 2014 - Vol. 2 Poster papers

Methodology for initial connection of enterprises


in Digital Business Ecosystems using cost-benefit
analysis in collaborative processes planning
Ramona Markoska *, Aleksandar Markoski *
* University St.Kliment Ohridski/Informatics and Computer Sciences Department, Bitola, Macedonia
[email protected], [email protected]

Abstract — This paper presents an authentic approach and those of other enterprises in the DBE. At the stage of
pilot model for integrating the theoretic knowledge and development, DBE has expressed strong local component,
practical experiences for Digital Business Ecosystems and and local success is a prerequisite for global business
management procedures for analysis, evaluation and success [5]. In order to ensure profit and return on
redesign of business processes, including issues of investment, each enterprise should be best adapted to
collaborative planning. This model integrates conceptual other SMEs. The level of communication and cooperation
framework for the interoperability requirements using a between various business entities in e-business is always
comparative е-business charts based on calculations of e- limited to e-business performance of the weakest
business factor, procedures for management and evaluation company. Also, the time of payment of the investment
of business processes involved in the DBE, including cost- should be less than the time of use for ICT systems and
benefit analyses. technologies used in e-business processes.
Complementarity implies that there is possibility of
I. INTRODUCTION exchange of goods or information. The result is realization
of economic benefit in the process of cooperation between
E-business transformation of all enterprises including various enterprises within the DBE. In certain cases, when
consists of implementation and use of ICT solutions for a company wants to redesign an existing process or to
redesigning the classic business processes into electronic introduce a new one, it should at first check the option of
processes. Experience has shown that there is a synergy buying the process from another company.
and positive effects, if more enterprises form virtual
partnerships [1], [2]. Similar to the natural biological II. DBE DEVELOPMENT AND EVALUATION BASED ON
systems, where various forms can’t survive alone, but COLLABORATIVE PLANNING OF BUSINESS PROCESSES
exist in the natural ecosystem and SMEs should also be
USING COST-BENEFIT ANALYSIS
organized in clusters, and major business systems and they
are called Digital Business Ecosystems [3]. The group of
enterprises that cooperate each other in virtual In DBE as a working environment, all enterprises that
partnerships and digital processes grows into a Digital cooperate each other may have different level of e-
Business Ecosystem, and always there is a planned business transformation. In cooperation, interoperability
operational management of e-business transformation has very significant part because includes many different
processes with a strong emphasis on applied ICT solutions kind of requirements that should be met [6]. Many
[4]. For the purposes of this work, are used statistical data
for ICT evaluation of pre existing global DBE in Republic research leads to conclusion that it should build
of Macedonia, according Eurostat. An authentic case customized ICT architecture and use open standards and
study is given, in collaboration with Macedonian SMEs, dynamic P2P solutions [7]. Also, interoperability can be
for the purpose of creation and simulation of new digital considered and solved during the processes of exchange
business ecosystem like a part of global DBE in Republic of information between enterprises and practices of
of Macedonia (official statistical data, cost benefit enterprise modeling [8]. One of the key features of the
analyses for conditions in Macedonian economy in 2011 DBE is the ability for interactive changes [9]. Those
year). In e-business collaboration with other enterprises, changes influence their characteristics through improving
each SME should have an interoperability, which means the performance of enterprises that participate in DBE.
ability to work together. From the point of view of e- Underlying all the changes is an increase in productivity.
business, it should be provided compatibility of ICT This is the way to achieve profit and develop criteria for
systems which support processes of e-business and
redesign of processes in the enterprises. Process of
information exchange. Further, the company may plan
specific adjustments in the field of ICT and system evaluation should be done in precise defined steps. Each
engineering services, within their recalculated expenses. step consists of certain empirical calculations and
The changes should be compatible with the capabilities of graphical indicators, which allow control of parameters
other enterprises DBE. Compatibility is required to enable
each enterprise to optimize investments in the ICT domain
and redesign of business processes. E-business
 
and verification of changes.

performance of each company should be comparable with

Page 456 of 478


ICIST 2014 - Vol. 2 Poster papers

III. INITIAL EVALUATION OF THE ICT ENVIRONMENT IN


DBE USING COMPARATIVE E-BUSINESS DIAGRAMS

According to the recommendations of the EU


commission and e-Business Watch, it is necessary to make
qualitative and quantitative analysis for each of the DBE
enterprises. Qualitative analysis consists of answering the
appropriate questionnaire [10], and answers are used for
estimation the most important parameters (A, B, C, D) in
4 different areas:
− ICT infrastructures and basic connectivity,
− Internal business process automation,
− Procurement and Supply chain integration,
− Marketing and sales processes.
The quantitative analysis estimates ICT performances
via e-business index, and data also can be represented
visually, using e-business charts. The initial evaluation of Figure 1. Comparison of e-business performances of each
enterprises should be described like a conceptual company with performances of DBE
framework for the interoperability requirement of
collaborative planning process between enterprises in
DBE [11].
Any company can be evaluated and it is possible to
This initial evaluation consists of the following four
detect in which area of qualitative characteristics (A, B, C,
steps:
D) the company has strengths and weaknesses. For each
1. Process exchange and formation of DBE of n enterprises, table is generated, weight factors A, B, C,
and D are introduced, and then they are compared
A number of enterprises that will participate in the with A , B , C , D . Individual deviation in relation to the
DBE, agree on processes for interconnection and
exchange. The final product or service from one company average for each of the enterprises is presented.
can be a raw material for another company. After that, standard deviations σ A , σ B , σ C , σ D are
calculated using (2), which is a measure of deviation of
2.Qualitative and quantitative evaluation of e-business average performance of enterprises in specific categories
enterprises participating in the DBE according to the survey, compared to the mean value.
Standardized questionnaire for each of the enterprises is
answered, in aim to assess the performance level of ICT
systems and determine e-business index and 2D charts. σA = ∑ (A − A )
i
2

, σB = ∑ (B − B )
i
2

,
The working tables according the survey used in this step n −1 n −1
(2)
can be associated with particular parts of conceptual ∑ (Ci − C )2 ∑ (Di − D )2
framework like components, relationship among σC = , σD = .
requirements of interoperability and data, processes, n −1 n −1
services, and validation principles [11].
Each company should evaluate in which qualitative
3.Comparative analysis of the 2D e-Business charts field has a deviation (A, B, C, D according to the
questionnaire), and make appropriate adjustments. In
According to recommendations in [10], charts for all those areas in which the company has performances that
enterprises are drawn. The digital business system as a are above average or within the standard deviation, the
virtual working environment has its performance limit, company should not make major changes and
which is determined by the performance of enterprises. investments.
These are absolute maximum values obtained from
various enterprises (Fig.1). This step can be associated According to Fig.1, Company M has a weaker
with graphical presentation of IR validation principles performance than the average of DBE in the area D which
[11], enriched with cost-benefit analyses. is associated with processes of marketing and sales, but in
A, B, C, has excellent performance. For successful
4. Statistical evaluation and analysis of data operation of company M in the DBE, more important than
the absolute level of performance is their compliance with
The next step is calculating the statistical average the performance of enterprises in DBE. Investments to
performances A, B , C and D using (1) achieve significantly higher performance than the average
n n n n is not justified because the increased performance can’t be
∑A i ∑B i ∑C i ∑D i
(1)
used in cooperation within the DBE. On the other hand,
lower performances are the limiting factor in cooperation.
A= i =1
, B= i =1
,C = i =1
, D= i =1
,
n n n n
and plotting the charts.
This chart shows the average performance statistics of
DBE.

Page 457 of 478


ICIST 2014 - Vol. 2 Poster papers

IV. REDESIGNING PROCESSES IN DBE-EVALUATION The point of decision (break-even point), where the
AND ANALYSIS OF THE POINT OF DECISION processes of buying and making have the same price is
shown at Fig.2. The quantity of product or service is
At the beginning, for simplifying the analysis we determined with (5):
consider the case when each company is connected to
DBE through a single process with a single service or Fm − Fb
product [12]. For a start, it's best to be one of the processes Q= (5)
of supply chains [13], [14]. In parallel with the initial cb − cm
assessment of compatibility of e-business enterprises it is
from the relation Fb+cb·Q = Fm+cm·Q .
necessary to do and analysis of selected processes. This
analysis is done in order to calculate the productivity of Also, it is necessary to make pessimistic calculation for
the new process and the time for which investments will the minimum amount of service or product for a fixed
pay off, bearing in mind the time of use and replacement period of time Qp. Minimum time for analyses is 1 year.
of ICT systems. For this purpose, we should take the We can choose the time, more than 1 year, by taking into
following actions: account that time should not be greater than the time of
use of existing ICT solutions, used in the process. If the
1.Initial analysis of the process process provide fixed and variable costs of more modern
The company selects the process which it will be ICT solutions (hardware, software, training of staff,
connected to the DBE. Usually, an existing process can be reassignment of jobs), they should be taken also in the
used, and should be modified, or it can be an entirely new analysis in determining the quantity of decision. If the
process introduced for the first time. For the modification same ICT investment if it is used for other process, then
of an existing process, evaluating the selected process we make appropriate adjustments in costs. According to
should be done by calculation of: the analysis results, we should compare the 2 points of
decision and choose the lower value.
• Productivity as ratio of quantity of manufactured
It is necessary to compare quantity Q and planned
products or services (pieces), divided by necessary
minimal quantity for chosen time period Qp. When
time (in hours),
making decision, always the process with lower costs
• Multifactorial productivity as ratio of total standard should be chosen. If Q<=Qp for Qp' (minimal planned
cost of products and services divided by total value of quantity for make) decision should be make. In cases
input resources. when Q => Qp for Qp'' (minimal planned quantity for
It is necessary to make the analysis: to buy the process buy) decision for product or service should be buy.
or its products or services, or is it better to establish
(make) a new process [13]. 3.Evaluation of the service or product
This step is a solution which modifies the existing
2.Evaluation of processes process, or in this step, we make calculations to determine
In this step, calculations for the processes evaluation are the point of settlement in which costs and revenues are
performed. Options for buying or making are considered. equal. Above this point, for example Q>Qpr for Qp', we
For that purpose, we make calculation of total costs for always get profit, as shown in Fig.3.
both options (buy or make) using (3) and (4) Total profit TPm is calculated with (6) and total costs for
(3) service or product manufacturing TCm with (7)
Total cost(buy) = Fb+cb·Q
Total cost(make) =Fm+cm·Q, (4) TPm=p·Q (6)

where Fb are fixed costs to buy, and Fm are fixed costs to TCm= Fm+cm·Q (7)
make for one year, and variable costs of buy cb and make
where p – sale price of a single product or service, Fm –
cm per-article or service.(Fig.2) The total amount of
total fixed costs, cm – variable cost for single product or
product or service is marked with Q.
service. From the relation pQpr = Fm+cmQpr we get:
Fm
Q pr = (8)
p − cm
Similar with step 2 we make pessimistic calculation for
minimum quantity of service or product for chosen period
Qp. According to Fig. 4 for some hypothetical cases
projected value is Qp=Qp', Here the same remarks apply to
Q, the time and the analysis of fixed and variable costs.
According to the Fig. 3 comparison between quantity of
decision and Qpr is done. If Qpr<Qp for Qp', company gains
profit [13].
4. Interpretation of results
From the analysis, it can be seen as the correction of
fixed and variable costs affect the point of alignment and
Figure 2. Process evaluation through the point of decision (break- shortens the amount of product on the revenue level, and
even point) thus the time of return on investment. The assessment can

Page 458 of 478


ICIST 2014 - Vol. 2 Poster papers

Figure 3. Product/service evaluation through the point of decision

be made before the process changes, so we will be able to


see the effects and future improvements in correlation
with the planned investment.

V. CORRELATION BETWEEN THE REDESIGN OF


PROCESSES AND CHARACTERISTICS OF THE DBE
Redesign of processes in direction of e-business
Figure.4. ICT evaluation in RM, according Eurostat
transformation is made under the rules of operative
management and management of ICT systems [13], [14].
The redesign process will change the values and Furthermore, an authentic case study is given, in
parameters of specific categories relevant to the collaboration with Macedonian SMEs, for the purpose of
determination of e-business index, process parameters A, creation and simulation of new DBE, as a part of global
B, C, D and E-Business charts. By detecting the changes it DBE in Republic of Macedonia (Fig.5)
is necessary to re-answer the questionnaire for e-business
performance. It is expected measurable improvement in
basic connectivity and ICT infrastructure, increasing the
number of employees with an online presence, and
moving the significant part of its activities in the online
domain.
Each company compared to itself should show a
deviation decrease from the average performance of the
DBE. As expected, the standard deviation compared to the
previously calculated initial mean values of DBE is
reduced, which means a DBE is more adjusted. This opens
the opportunity for further connecting and networking and
inclusion of new enterprises.
This new situation should be statistically assessed
again. It is expected that the minimal performances are
improved also the average statistic performances should
be changed to higher values. Also, a new standard
deviation of the modified DBE can be calculated. These
parameters should be the basis for a future redesign and
improvement of DBE.

VI. MODEL VERIFICATION


For the purpose of verification of explained
methodology in this paper are used statistical data for ICT
evaluation of pre existing global DBE in Republic of
Macedonia, according Eurostat, according to Fig.4. Kind
and level of cooperation between enterprises in DBE, are
chosen according cost benefit analysis for one year.
Figure 5. Planed collaboration between enterprises

Page 459 of 478


ICIST 2014 - Vol. 2 Poster papers

The relationship between the characteristics of the pre- VII. CONCLUSION


existing DBE of national economy RM and new The presented method shows how to determine the
simulated DBE obtained from the collaboration between initial parameters of the initial collaboration between
the companies is shown in Fig.6. enterprises in a DBE under the rules of the operational
management of evaluation processes and products and
services provided within the planned costs and profits in a
given interval. It also shows how to make the redesign of
processes, in accordance with the characteristics of digital
business environment. Moreover, it is imperative for
success is adaptability to other business entities. With
precisely planed changes, each enterprise performances
during the process of e-business transformation in DBE
are successive improved. This is the way to create
conditions for further expansion of cooperation and
redesign of several processes and inclusion of more
enterprises. The analysis refers to the initial connections
between enterprises with a process with one product or
service. The model also can be used for situations when
the redesign of complex processes is needed, with more
products and services. In the case of multiple products or
Figure.6. Link between global and local DBE services, it is necessary to make modifications in the
quantity factors and prices of each product. In this case
According to Fig.5, and Fig.6 the questionnaire was multiple processes adjustments of fixed and variable costs
designed to have three states: “Yes”, “No,” “Partial”, should be made. The analysis can be used for evaluation
evaluated by “1”, “0” and “0.5”. Using multiple levels in of e-business characteristics of clusters, and to comparing
the questionnaire, would provide more uniform results, the characteristics of different clusters. According to the
but experience has shown that the surveyed enterprises findings related to e-business and DBE, according to
secure the accuracy of their responses, only if the answers which regional performance is a necessary for global
are described with a given level of “Yes”, “No”, performance, this model also can be used in the future to
“Partial.” The methodology is based on the fact that the analyze the potential for DBE connection between the
newly DBE, within the limits of the national DBE, entities from different countries.
inherited its features. Furthermore, all the changes in the
REFERENCES
new DBE, have an impact on the features of the national
DBE. Thus, the proposed methodology enables [1] S. Lilischkis (Ed.), "ICT and e-Business for an Innovative and
Sustainable Economy", 7th Synthesis Report of the Sectoral e-
continuous improvement and increase in features, at the Business Watch (2010), European Commision e-Business w@tch,
same time as the standard deviation decreases (Fig.7) http://www.empirica.com/themen/ebusiness/documents/EBR09-
10.pdf
[2] C. Broser, C. Fritsch, O. Gmelch, G. Pernul, R. Schillinger and
S.Wiesbeck, “Analysing Requirements for Virtual Business
Alliances - The Case of SPIKE” First Iternational ICST
Conference, DigiBiz 2009, Springer, Heildeberg vol. 21, pp. 35-
44, June 2009
[3] F. Nachira, P. Dini, A. Nicolai, “A Network of Digital Business
Ecosystems for Europe: Roots, Processes, and Perspectives”
Digital Business Ecosystems, Office for Official Publications of
the European Communities, Luxembourg 2007 www.digital-
ecosystems.org.
[4] S. Chong, “Business Process Management for SMEs: an
exploratory study of implementation factors for Australian wine
industry” in Journal of Information Systems and Small Business
vol.1, no1-2, pp. 41-58, 2007.
[5] P. Antoniela, “The Territorial Prospective of DBE. Case studies of
Technology transfer and Digital Ecosystems Adoption,” Digital
Business Ecosystems, Office for Official Publications of the
European Communities, Luxembourg, pp. 158-162, 2007
www.digital-ecosystems.org.
[6] S. Mallek, N. Daclin, V. Chapurlat, “An Approach for
Interoperability Requirements Specification and Verification,”
Enterprise Interoperability, Third international IFIP Conference
IWEI2011, Stockholm, pp. 89-101, March 2011
[7] P. Dini, et al.,“Beyond interoperability to digital ecosystems:
regional innovation and socio-economic development led by
SMEs” in International journal of technological learning,
innovation and development, Lorena 2008, vol. 1, pp. 410-426,
Figure 7. Features comparison in DBE- before and after [8] S. Blanc, Y. Ducq and B. Vallespir, “A Graph Based Approach
collaboration for Interoperability Evaluation”, Enterprise Interoperability II,
New challenges and approaches, R.J. Gonclaves, J.P. Muller, K.
Mertins, M.Zelm, pp.273-276, 2010

Page 460 of 478


ICIST 2014 - Vol. 2 Poster papers

[9] F. Nachira “Towards a Network Of Digital Business Ecosystems [12] N. Hormazabal, J. de la Rosa, J. Lopardo, “Monitoring Ecosystem
Fostering the Local Development,” discussion paper, Characteristics on Negotiation Environment” in Digital Business
http://www.digital-ecosystems.org/doc/discussionpaper.pdf, 2002 L. Telesca, K. Stanoevska-Slabeva, V. Rakocevic, Springer,
[10] E-Business Watch Survey, Methodology Report and Heildeberg (eds.) LNICST vol. 21. pp. 82-89, 2009
Questionnaire, http://www.ebusiness-watch.org/about/documents/ [13] L.P. Ritzman, L.J. Krajewski, M.K. Malhotra, M.K, Operations
2006 Management: Processes and Supply Chains, Prentice Hall, pp.26-
[11] M.E. Alemany, F. Alarcon, F.C. Lario and R. Poler, “Conceptual 43,2009
Framework for the Interoperability Requirements of Collaborative [14] K.C.Laudon, J.P. Laudon, Management Information System.
Planning Process”, Enterprise Interoperability IV, Making the Prentice Hall, 2009.
internet of the future of enterprises, K. Popplewell, J. Harding, R.
Peler, R. Chalmeta, pp.25-34, 2012

Page 461 of 478


ICIST 2014 - Vol. 2 Poster papers

Physical Medicine Devices with Centralized


Management over Computer Network

Vladimir Ciric, Vladimir Simic, Teufik Tokic, Ivan Milentijevic, Oliver Vojinovic
Faculty of Electronic Engineering, University of Nis, Serbia
Email: [email protected]

Abstract—Physical therapy cabinets are provided with various most of the cases set manually on each device before each
devices which shorten the time needed for patients’ recovery and treatment.
healing. These are the devices which use diadynamic currents,
interferential currents, ultrasound waves, vacum impulses, pulsed A lot of efforts with different goals have been done in
electromagnetic fields, etc. The aim of this paper is to develop recent years to interconnect different medical devices [2], [3].
a new generation of devices for physical medicine, based on the The goals are mainly oriented to data acquisition and alarming
existing products of Elektromedicina company, with key feature of hazzard situations. In the most of the cases developed
that allows the devices to become a networked modules in the systems are proprietary and patented [4], [5]. However, there is
centralized system for physical therapy. Both local and remote a set of open standards for network communication of medical
aspects of setting the treatment parameters on the devices, as equipment, such as X.73 [6], [7].
well as the monitoring of treatment progress on the centralized
control station will be implemented. This paper addresses the In 2011, partially supported by the Ministry of Education,
system’s architecture, network communication protocol, and user Science, and Technological Development of Republic of Ser-
interfaces both for devices and centralized server console. The bia, under the Grant No TR32012, the project ”Intelligent
application layer protocol for communication of control station
Cabinet for Physical Medicine - ICPM” was launched at
and physical therapy devices is developed. The proposed protocol
provides plug-and-play capability of physical therapy devices by Faculty of Electronic Engineering, University of Nis, Serbia.
handling devices discovery. The initial implementation proved The aim of the project is to develop a new generation of
that the networked physical therapy devices can respond to all devices for physical medicine, based on existing products of
requests in a timely manner. Elektromedicina company. Key feature of the new generation
of devices is to allow the device to become a networked
I. I NTRODUCTION module in the centralized system for physical therapy. The
devices should be integrated into a centralized system for net-
Physical therapy assumes the examination, treatment, and work management and data acquisition, based on the adjusted
instruction of human beings to detect, assess, prevent, correct, modification of open X.73 standard [6], in the same manner
alleviate, and limit physical disability, bodily malfunction and regardless of the type of the device.
pain from injury, disease, and any other bodily and mental
conditions. It includes the use of physical measures, activities, In this paper three new features that make the integrated
and devices, for preventative and therapeutic purposes [1]. centralized system for physical medicine and introduce new
added values to the devices for physical therapy are presented:
The cabinets for physical therapy are provided with various (1) remote semi-automatic or automatic setup of parameters
devices which shorten the time needed for patients’ recovery and devices control; (2) the ability to automatically record
and healing. These are the devices which use diadynamic the progress of the treatment; (3) new user interface and
currents, interferential currents, ultrasound waves, vacuum management console for centralized control station. Automatic
impulses, pulsed electromagnetic fields, etc [1]. There are control of the devices refers to setting of treatment parameters
about 150 medical institutions in Serbia which have cabinets on the specific device in the scheduled time for the patient.
for physical therapies, with about 3500 devices being used. Within the framework of the ICMP the new generation of
Well-known world wide manufacturers of such devices are devices for physical medicine is developed. The functional
Gymna-Uniphy, Shock Master, Enraf Nonius and Siemens, features of the previous generation in the sense of standalone
while Elektromedicina d.o.o. is the manufacturer with the operation were retained.
highest number of sold devices in Serbia.
This paper focuses on the design of the system’s architec-
A typical physical medicine cabinet has tens of stand- ture, design and implementation of the protocol for network
alone devices for different purposes. Usually the patients are communication, and the design of user interface for both
sequentially treated with different physical therapy devices, devices and the centralized control station.
applying different procedures for prescribed period of time.
Typical scenario for the patient to be treated is to apply several The paper is organized as follows: Section 2 gives a brief
different methods in one day, and to repeat the same procedure overview of Elektromedicina’s previous generation of physical
every other day for one or two weeks [1]. therapy devices and discusses the requirements for the new
generation of the devices, in Section 3 new user interface for
Each device has treatment parameters that are set according the new generation of physical medicine devices is presented,
to the prescribed therapy. Depending on the treatment, and Section 4 is devoted to the system’s architecture and network
development of patient recovery, device parameters are in the communication protocol, in Section 5 user interface of the

Page 462 of 478


ICIST 2014 - Vol. 2 Poster papers

management console on the centralized server is presented, The main project goal for the Integrated Centralized Sys-
in Section 6 the implementation results are discussed, while in tem for Physical Medicine (ICSPM) is to redesign existing
Section 7 the concluding remarks are given. physical therapy devices of Elektromedicina d.o.o company
by introducing networking capabilities, providing interconnec-
tivity for the support of system integration. The integrated
II. BACKGROUND AND I NTEGRATED S YSTEM system should provide: managing of physical therapy database,
R EQUIREMENTS automatic synchronization between the devices, and recording
and updating the history of patients therapy. The devices
Typical cabinet for physical medicine contains numerous should be monitored and managed by the unique control
standalone physical therapy devices of different types. It is station within the ICSPM. Centralized system should contain
common that one patient is treated by different devices [1]. Six a database of patients’ records with defined therapies and the
different types of Electromedicina’s devices that are included course of the recovery. One of the goals of centralization is to
in the ICPM are: EksposanTM , MagnemedTM , IntermedTM , integrate devices in such extent that, after the identification of
VakumedTM , DiatonTM and SonotonTM (Fig. 1). The devices the patient, the parameters are automatically sent from central
are specialized to treat different parts of human body using manager to the device, and the treatment is continuously
different physical agents. Each device allows an operator to monitored.
choose an initial set of therapy parameters, and to monitor
and change their values in the course of the therapy. The main requirements are:
1) the devices should have an extension which allows
them to be monitored and managed by the unique
control station within the ICSPM;
2) ability to manually or automatically over computer
network set the parameters on each device;
3) the scheduler;
4) a database with patients’ records, with scheduled
treatments, treatments history and device parameters
for each treatment;
5) common user interface for all device types.
Having in mind the significant number of produced devices
by Elektromedicina company, decreasing of costs for transition
to the new technology becomes the issue of the great impor-
tance. The cost-effectiveness led us to reusing of analog drivers
Figure 1. Types of Electromedicina’s devices that are included in the ICPM and applicators (Fig. 2).

The block diagram of a device for physical therapy, regard- In order to fulfill the requirement for the new user interface
less of its type, is shown in Fig. 2. The operator attaches the on the devices, as well as the ability to connect the devices
applicator to the patient’s body and sets the therapy parameters to local Ethernet network, microcontroller-based logic was
using the user interface, as it is shown in Fig. 2. The type attached to the existing analog drivers. The block diagram of
of applicator depends on the type of the device. Devices like the ICSPM device is shown in Fig. 3. Gray shaded blocks in
DiatonTM and EksposanTM , which use different types of Fig. 3 represent modules which are added in order to fulfil the
electrical current for the treatment, are equipped with elec- requirements. Chosen microcontroller is Microchip PIC18F877
trodes, while VakumedTM , MagnemedTM and SonotonTM J60, with built-in Ethernet controller.
use vacuum pumps, electrical coils, and ultrasound applicators, The user/operator interface, shaded in Fig. 3, is completely
respectively. redesigned in order to provide the possibility for setting of
device parameters for treatment, as well as new parameters
related to network operation.

III. U SER I NTERFACE OF THE NEW GENERATION OF


P HYSICAL T HERAPY D EVICES
As there is usually a barrier in medical staff acceptance of
new computer systems and technologies, part of the research
was devoted to the ergonomics of new system. The central
part of the Human-Computer Interaction (HCI) research was
devoted to methodologies and process for interface design and
implementation, quality estimation, and development of model
for intuitive interaction. In order to overcome the barrier for the
acceptance of the ICSPM devices by different types of medical
staff, the common user interface for the devices is designed.
The interface is driven by microcontroller, as it is shown in
Figure 2. The block diagram of a device for physical therapy Fig. 3. The chosen interface is 2x16 character display.

Page 463 of 478


ICIST 2014 - Vol. 2 Poster papers

therapy 1” option (Fig. 4). The therapy can be started from


the ready state by pressing the start key twice (Fig. 4). While
the therapy is in progress, pressing the start button will cause
the first parameter to be displayed in ”full-screen” mode,
letting the operator to change the value by pressing up/down
keys. The real-time parameters displayed during the therapy
on VakumedTM are shown in Fig. 5a, while the screens for
parameter changing are shown in Fig. 5b.

Figure 3. The block diagram of an ICSPM device

Having in mind that there are six different types of phys-


ical medicine devices with different functions and different
parameters, and that all of them should be supported by a
single user interface, the starting point in the interface design
was analysis of the current user interfaces. Table I gives the
summary of keys provided on the front panel in the previous
generation of the devices with their functions. As it can be Figure 4. The organization of the user menu on the ICSPM device
noticed from the Table I, Intramed has the largest number of
keys, 7 in total. However, instead of start and stop button,
which should be present in each device, the other keys are
reserved for the parameters setup. If we put the parameters
setup options within the user menu, then the requirement for
common user interface on all devices can be achieved. In this
case all six devices should have the following five keys:
1) start/stop – controls the beginning and the end of the
therapy,
2) up – moving up within the menu, or parameter value
increase, Figure 5. Parameters display in real-rime during the therapy
3) down – moving down within the menu, or parameter
value decrease, From Fig. 4 can be noticed that a part of the interface is
4) enter – enter the selected menu item, or remember devoted to the communication parameters. By moving down
the parameter value, through the main menu, starting from the initial screen, oper-
5) back – one level up in the menu, or cancel the ator can reach the menu item ”Communication parameters”,
parameter value. where the network parameters can be set.
The menu is organized as shown in Fig. 4.The initial screen
IV. DATA ACQUISITION AND C ONTROL P ROTOCOL
in Fig. 4 gives the operator the basic information about the
device type, notifying that the device is in ”ready” state. The In order to be able to setup treatment parameters remotely
most common task performed on the device is starting of the in semi-automatic or automatic manner over computer network
therapy. Within the designed menu, this is provided with the and to record the progress of the treatment in the ICSPM,
most recent parameters, which are recorded as ”predefined each device is provided with the Ethernet network adapter that

Page 464 of 478


ICIST 2014 - Vol. 2 Poster papers
TABLE I. T HE SUMMARY OF THE CONTROL FUNCTIONS ON THE FRONT PANEL IN THE PREVIOUS GENERATION OF THE DEVICES

EksposanTM IntramedTM SonotonTM MagnemedTM VakumedTM DiatonTM


key 1 start / start / start / start / start / start /
stop stop stop stop stop stop
key 2 modulation select function select function channel pulse mode select function
type (pm,if,g) (k,i1,i2) selection (1,2) (15,30,60) (g,pg,d)
key 3 select function frequency - - constant mode -
(sp,e1,e2) change type
key 4 - upper/lower - - water polarity
frequency (+,-)
key 5 set F current time time time vacuum inten- time
(up/down) (0-60 min.) (0-19 min.) (0-95 min.) sity (0-15 min.)
(0-600 mbar)
key 6 set G current current intensity frequency - base current
(up/down) (0-60mA) (0-3W/cm2 ) (0-50Hz) (0-5mA)
key 7 - frequency - magnetic field - dose current
(1-200Hz) (0-10mT) (0-29mA)

is embedded in the chosen PIC microcontroller (Fig. 3) [8]. physical therapies parameters [8].
The ICSPM is designed as hub-and-spoke topology, having
the control station as a hub, as it is shown in Fig. 6. In other The communication protocol between the physical therapy
words, devices can communicate with the control station only, device and the control station is designed to use UDP transport
layer protocol with port numbers 161 and 162. As UDP
but not with each other.
protocol doesn’t offer transmission and flow control, the basic
sequencing and acknowledgment of messages are provided at
the application level.
The message format is shown in Fig. 7. There are 18
different message types, which are used in different scenarios
for different purposes. Sign ”D” in Fig. 7 stands for field
delimiter.

Figure 7. The message format

The following scenarios are identified:


• starting of the therapy with predefined parameters
from the control station;
• recording the event if the therapy is started/stopped
from the device itself;
• pausing or stopping the therapy from the control
Figure 6. The topology of the ICSPM station;

The design and implementation of communication protocol • changing the therapy parameters from the control
is inspired by a set of standards X73 PoC-MDC [6]. To station;
operate in a coupled manner, two devices follow the next • recording the event if the parameters are changed from
four steps: connection, association, configuration and oper- the device itself;
ation. In the phase of connection, the device queries local
network for discovery of the control station. During the phase • periodic interchange of status information between
of association the device registers its ID within the control devices and control station;
station and retrieves the network ID [6], [7]. In the phase of • sending of unsolicited messages about starting or
configuration the device queries the control station to retrieve stoping of therapies, changing parameters or changing
physical therapy database. It analyzes the type of therapies and of device states from the device;
updates its local therapies database. In the operation phase the
device sends periodic hello packets and can be polled by the • new therapy registering if it doesn’t exist in the control
control station to start, stop or change therapy parameters. station database.
The coding and decoding of messages in X73 is found not The complete list of message codes and detailed description
to be applicable in straightforward manner for the particular of the message exchange during each scenario can be found
embedded microcontroller, due to complexity [2]. The protocol in the technical report [9]. For the sake of illustration, a
message coding is simplified to reflect the number and type of UML sequence diagram for one of the mentioned scenarios

Page 465 of 478


ICIST 2014 - Vol. 2 Poster papers

is presented in Fig. 8. The scenario illustrated in Fig. 8 is the therapy parameters changed on the device using the interface
scenario of starting the therapy on a device from the server. of the device itself, etc. These messages are visible for the
short period of time, and then they disappear from the list of
events.
If there are no therapies in progress, and there are no
events from the devices, sections 4 and 5 are hidden, and the
screen contains sections 1 to 3 only. As it is mentioned earlier,
section 2 displays the scheduled treatments for the patient
whose record is opened, putting the first coming treatment on
the top. The first coming treatment is the only one that has
buttons for start and edit, the therapy parameters next to it, on
the right side, as it is shown in Fig. 9. When the treatment
is started, the start button icon switches to stop button, the
edit button remains the same, and the progress is shown in the
section 4 of the screen.
The third row in the section 2 shows the type of the device
(Fig. 9). The first coming treatment is the only one where the
particular ID of the device is shown, because in this point
control station has the information which device is available.
Figure 8. The sequences of messages sent over the network when therapy This is not the case with the treatments scheduled for some
on a device is started from the control station future time. The problem of availability of the devices in
particular period of time is addressed in the scheduler part
After the command is received to start the therapy on the of the interface, but it addresses only the number of available
particular device (Fig. 8), the user interface on the station noti- type of the devices, and do not associate the patient to the
fies the module which implements the protocol on the control device until the time for the treatment comes.
station to send a start message to the device (start therapy(),
Fig. 8). The module that implements the protocol on the Having all relevant information available on the same
device side, gets the message, takes the necessary steps, and screen met one of the basic rules of HCI design, and indirectly
confirms the start to the control station. It should be noted that gave the opportunity to the institution to choose between
this scenario must precede by the message exchange which regular and touch screen while implementing the ICSPM.
connect, and associate, the device to the control station [9].
VI. I MPLEMENTATION R ESULTS
Beside these similarities, the implemented protocol has less
overhead compared with full X73 protocol stack. The physical therapy devices are upgraded with a
microcontroller-based board, using Microchip PIC 18F87J60
with integrated Ethernet controller (Fig. 3). The protocol
V. C ONTROL S TATION M ANAGEMENT C ONSOLE stack is compiled using MikroC compiler within MikroC PRO
The management console of the control station is shown IDE and runs on MikroC TCP/IP stack in OSA cooperative
in Fig. 9. Due to importance of particular parts of the user multitasking real-time operating system (RTOS) for Microchip
interface, patient data management, as well as the informa- PIC controllers. The size of the compiled protocol core is
tion about available devices are omitted. Once the patient is 44KB. The corresponding component, as well as the user
identified, and its record is opened, the operator on the central interface on a control station, is developed in C# with .NET
control station gets the screen shown in Fig. 9. 3.5.
For the sake of illustration, Table II, gives the total number
The screen is divided in 5 sections. In order to be able to
of different messages that can be exchanged between a device
reference the sections of the figure, the sections are enumerated
within ICSPM and control station through the network, and
by numbers 1 to 5 in Fig. 9.
the total number of different events caused by the messages
The main design goal was to provide all patients data, within the control station, as well as within the device itself.
including the data about scheduled therapies once the record is
opened. These data are provided within sections enumerated TABLE II. T HE NUMBER OF MESSAGES EXCHANGED BETWEEN THE
as 1 to 3 in Fig. 9. Section 1 shows the basic information SYSTEM MODULES

about the patient, while sections 2 and 3 give the information Message # of different messages
about scheduled treatments and the history of treatments, Device protocol → Station protocol 15
Station protocol → Device protocol 19
respectively. Station protocol → Station interface 20
Station interface → Station protocol 8
Sections 4 and 5 display the current status of the devices in Device protocol → Device interface 15
the ICSPM. The status of the devices on which the therapy is in Device interface → Device protocol 15
the progress is shown in section 4, along with the elapsed and
estimated time for the treatment. If the device is in ready state,
and the treatment isn’t running, the device isn’t displayed in
this section. In addition, section 5 displays the event messages Table III shows the number of messages exchanged for one
about the devices, such as: new device detected on the network, working day of 10 hours, between 12 physical therapy devices

Page 466 of 478


ICIST 2014 - Vol. 2 Poster papers

Figure 9. The management console of the control station

and control station, when the therapies are scheduled on every of a microcontroller-based interface, which is attached to
30 minutes. Having in mind that the average message size is each physical therapy device, adopted concepts are simplified
80B, and that total quantity of data exchanged per day for the accordingly. The proposed protocol provides plug-and-play
given example is less that 1MB, it can be concluded that the capability of physical therapy devices through handling of
protocol is very efficient both in terms of network throughput devices discovery and synchronization of therapy databases.
and, more importantly, the burden that it causes on the limited- The initial implementation proved that the networked physical
capability microcontroller. therapy devices can respond to all requests in a timely manner.

TABLE III. T HE NUMBER OF MESSAGES EXCHANGED OVER THE ACKNOWLEDGMENT


NETWORK IN THE TYPICAL CABINET
The research was supported in part by the Serbian Ministry
Scenario # of # of repetitions data sent
messages per day [KB] of Education, Science and Technological Development (Project
Device registering 5 1 0.4 TR32012).
HELLO messages 2 3600 562.5
Starting of therapy 2 20 3.2
Stopping the therapy 2 20 3.2 R EFERENCES
Parameters change 2 100 16.0
Synchronization 24 1 1.9 [1] American Physical Therapy Association, Todays Physical Therapist:
A Comprehensive Review of a 21st-Century Health Care Profession,
January, 2011, pp.85-86.
[2] I. Martnez et al., Implementation Experience of a Patient Monitoring
Solution based on End-to-End Standards, 29th International Conference
VII. C ONCLUSION of the IEEE on Engineering in Medicine and Biology Society - EMBS,
August 2007, pp. 6425-6428.
Physical therapy cabinets are provided with various devices
[3] Hak Jong Lee et al., Ubiquitous healthcare service using Zigbee and
which shorten the time needed for patients’ recovery and mobile phone for elderly patients, International Journal of Medical
healing. In this paper a new generation of devices for physical Informatics, Vol.78, No. 3, 2009, pp.193-198.
medicine is presented. The devices are based on existing [4] Kris R. Holtzclaw Router device for centralized management of medical
products of Elektromedicina company, with key additional device data, U.S. Patent, No. US 2007/0255348 A1, 2006.
feature that allows the devices to become a networked modules [5] Satoru Miwa, System for centralized management of medical data, U.S.
in the centralized system for physical therapy. Both local Patent, No. 4974607, 1990.
and remote aspects of setting the treatment parameters on [6] Galarraga M, Serrano L, Martnez I, de Toledo P., Standards for medical
the devices, as well as the monitoring of treatment progress device communication: X73 PoC-MDC. Stud Health Technol Inform.
Vol.121, 2006, pp.242-56.
on the centralized control station with data acquisition are
implemented. The paper presents the ICSPM as an integrated [7] Jianchu Yao, Steve Warren, Applying the ISO/IEEE 11073 Standards
to Wearable Home Health Monitoring Systems, The Journal of Clinical
system, keeping the focus on the system’s architecture, net- Monitoring and Computing, Vol. 19, No. 6, December 2005, pp. 427-436.
work communication, and user interface on both devices and [8] Vladimir Simic, Vladimir Ciric, Teufik Tokic, Ivan Milentijevic, Commu-
centralized server console. The new user interface on devices nication Protocol Design for Physical Therapy Devices, 18th Conference
and the user interface of the control station are presented. YuInfo, Kopaonik, Serbia, 2012, pp. 396-401.
The application layer protocol for communication of control [9] Vladimir Ciric, Vladimir Simic, Teufik Tokic, Darko Tasic, Emina
station and physical therapy devices is developed. The protocol Milovanovic, Igor Milovanovic, Design and Implementation of Network
design is led by the concepts of IEEE ISO/IEEE X73-PoC- Communication Protocol for Physical Medicine Devices, Technical re-
port, Faculty of Electronic Engineering, 2012.
MDC series of standards. To deal with limited resources

Page 467 of 478


ICIST 2014 - Vol. 2 Poster papers

Conceptual Model of External Fixators for


Fractures of the Long Bones
Dragan Pavlović*, Marko Veselinović*, Milan Zdravković*, Miroslav Trajanović*, Milan Mitković**
*
Faculty of Mechanical Engineering, University of Niš, Niš, Serbia
** Orthopaedic and Traumatology Clinic, Clinical Center Niš, Faculty of Medicine, University of Niš, Niš, Serbia
[email protected], [email protected], [email protected],
[email protected], [email protected]

Abstract - Efficiency and effectiveness of a surgery for bone information, that is by default understood by all of these
fractures can be achieved by making the proper decisions in a systems.
short period of time, based on complete and updated In specific, in this paper, we present the development of
information on the status, type of fracture and the type of an ontological model of external fixators that are used for
fixators used for a particular fracture. This way, the risk of fractures of long bones. The paper considers two types of
possible complications, caused by the late intervention can be external fixators:
reduced. Application of ontologies contributes to achieving this
goal. This paper presents the development of a conceptual - The external skeletal fixator "Mitković";
model of external fixators used for fractures of long bones. This - The hybrid external fixator.
conceptual model is represented by the ontological framework Long bones are bones of the limbs and can be grouped
in which the product ontology is mapped with the ontologies of into two categories:
bones and fractures. In this paper, we present only the process
- The bones of the upper limbs, i.e. bones of the
in which the product ontology is extended to describe the two
types of the external fixators, namely: external skeletal fixator
hand, which include: Humerus, Radius, Ulna, etc.
"Mitković" and hybrid external fixator. - The bones of the lower limbs, i.e. leg bones, which
include: Femur, Tibia, Fibula, etc.
The ontology is developed by extending the existing
I. INTRODUCTION Product ontology. Some features of the bones and
One of the major challenges of modern health care fractures for which the specific fixators are used are also
organizations is to improve the quality of health services. modeled to reflect the possible relationships between the
To achieve this goal, health care organizations are using fixator ontology concepts and co-related concepts of the
standardized clinical protocols, in many medical domains bones and fractures ontology.
[1]. These protocols are now represented in variety of
different formats, languages and formalisms. This variety II. ONTOLOGIES IN MEDICINE
is considered as a significant obstacle for interoperability The use of ontologies in medicine is mainly focused on
of the models, as well as of the respective systems that are data management, i.e. medical terminologies. Data
using those models. collection (grouping) is becoming one of the most
One way to resolve this problem, namely, to achieve important issues with which the researchers in the clinical
the unique representation of the clinical models and domain are faced. Due to the inconsistency of the formats
protocols is to use the ontologies. which are used to represent data, it is very difficult to
According to [2] "An ontology is an explicit develop generic computer algorithms for their
specification of a conceptualization." Ontologies can interpretation. Researchers are tending to represent
provide a significant contribution to the design and knowledge of their domain in an independent and neutral
implementation of information systems in the medical format so that data can be shared and reused in different
domain. The role of ontologies in the integration and platforms. This problem can be solved by using
harmonization of heterogeneous knowledge sources is ontologies. Ontologies provide a common framework for
already considered by many researches, especially in the structured knowledge representation. Ontological
field of clinical guidelines and evidence-based medicine frameworks provide common vocabularies for concepts,
[3]. definitions of concepts, relations and rules, allowing a
The aim of the work behind this paper is to demonstrate controlled flow of knowledge into the knowledge base [4].
that the ontologies can help in making the decisions Domain of anatomy is domain of medicine in which, so
regarding conceptually different notions in a healthcare, far, ontologies are most commonly used. In the medical
i.e. medical products and anatomy features. domain, the anatomy is a fundamental discipline that
Since these notions are handled in the different represent basis for most medical fields [5]. Formal
information systems, within or outside the clinical anatomical ontologies are an important component of the
domain, we indirectly aim at demonstrating that these informatics healthcare infrastructure [6], but also they are
systems can be made interoperable. Namely, based on the informatics tools used to explore biomedical databases.
common, inter-related models, the respective systems that Structural relationship that is primarily used in these
are using these models may exchange the relevant ontologies is part_of relationship, because the smaller

Page 468 of 478


ICIST 2014 - Vol. 2 Poster papers

anatomical entities are naturally seen as the components hybrid_external_fixator. Each of these fixators consists of
of the larger ones [7]. the certain elements. Hence, the next step in the extension
There are plenty of anatomical ontologies, clinical of the Product ontology was to create the subclasses of
ontologies or ontologies of other domains in medicine, part class that will represent these elements. These
and some of the most frequently used ontologies are elements are, as follows:
Foundational Model of Anatomy (FMA) and the - Rod;
Edinburgh Human Developmental Atlas (EHDA) - Screw;
ontologies. Anatomy of adult human is comprehensively
- Lateral supporting element;
represented with the FMA [8], and embryonic structures
are modeled in EHDA [9]. - Clamping ring on the lateral supporting element 1;
The above mentioned ontologies, as well as the others - Clamping ring on the lateral supporting element 2;
that are used in the medical domain, cannot be applied in - Screw nut;
the development of the ontology for external fixators. The - Washer;
fixators are a type of products, which are used in
medicine, and they do not directly represent the human - Clamping ring plate on the clamp ring 1;
and/or anatomy. - Clamping ring plate on the clamp ring 2;
- Ring;
III. ONTOLOGICAL MODEL OF EXTERNAL FIXATORS - Wire.
This chapter gives a description of ontology of external Each of these elements represents a subclass of a part
fixators. Ontology is created by using OWL (Web class, and each of them contains instances, corresponding
Ontology Language) in the software package Protégé. to the specific elements of two fixators.
OWL, adopted by the World Wide Web Consortium For instance, subclass lateral_supporting_element
(W3C), is a semantic markup language designed for contains the instance lateral_supporting_element_1.
publishing and sharing ontologies on the World Wide Structural dimensions of elements fixators depend on the
Web. OWL was developed by expanding the Resource certain dimensions, i.e, features, so the feature class
Description Framework (RDF) vocabulary and based on contains a subclass named dimension. Class dimension
the experience of developing the DAML + OIL Web contains subclasses of characteristic features that may
ontology language [10]. affect the structural dimensions of fixator elements. These
As it was mentioned before, the uniqueness of the features are:
stated problem is that it needs to combine the conceptual - Diameter of the bone;
models of the products with the models of the anatomical
features. - Diameter of the lateral supporting element;
For the representation of the fixators, the Product - Diameter of the limb;
ontology [11] is selected, for the reasons of its simplicity - Diameter of the ring;
vs. the fulfillment of requirements related to modeling - Diameter of the rod;
fixators and their features. The product ontology is - Diameter of the screw;
mapped to UNSPSC product classification scheme [12],
by using UNSPSC-SKOS ontology as a mediator. SKOS - Distance from fracture;
[13] is a family of formal languages, built upon RDF and - Length of the bone.
RDFS for representation of thesauri, classification Note that some of the features above are the features of
schemes, taxonomies or other type of structured controlled the bone and not of the fixator itself. However, these
vocabulary. features are represented at the level of the product, in
Product ontology consists of the following classes order to facilitate the selection of the proper fixators,
(concepts): part, feature and product; one transitive and based on the features of the bone and its fracture. This
one non-transitive property: hasPart and hasFeature; with aspect of the work behind this paper is published
appropriate inverse properties: isPartOf and isFeatureOf, elsewhere.
as well as one data property: hasValue. Each of features of the specific parts are modeled by
In addition, one class (inferred class) is defined as: the respective individuals. For example, class
diameter_of_the_bone contains the instance
assembly ≡ diameter_of_the_bone_1.
((∃hasPart.Part) ∩ (∃isPartOf.Part)) Figure 1 shows the class hierarchy of ontology of the
external fixators. The only element of the considered
Some concepts and features, relative to alignment of the
external fixators that has the variable dimensions, and
Product ontology with UNSPSC-SKOS ontology are
depends on the above mentioned features is the rod.
discarded in this overview.
On the dimension i.e. the length of the rod affects the
After importing the Product ontology in Protégé, it is
length of the bone (feature length_of_the_bone), while
extended with a specific information about the external
other elements of fixators have standard dimensions,
fixators.
regardless of the length of the bones or other features.
The first step was to create a subclass fixator, which
In the paper, as an example, the two lengths for each of
belongs to the class of product. There are many different
the bone are used. Hence, subclass length_of_the_bone
types of fixators, but this paper considers only two types
contains two instances for each of the long bone, i.e. total
of external fixators, represented as certain instances.
of 12 instances (length_of_the_bone_1,
External fixator "Mitković is labeled as instance
length_of_the_bone_2 ... length_of_the_bone_12). Each
external_fixator, while hybrid fixator is labeled as instance

Page 469 of 478


ICIST 2014 - Vol. 2 Poster papers

instance is defined by the data property hasValue and reasons of simplicity, in the below representations, there
length given in millimeters. Hence, for example, for the are labels of the corresponding lengths.
length of the bone of 385.22 mm, which is represented rod_1 hasFeature length_of_the_bone_(1,4,10)
with instance, property is defined as: rod_2 hasFeature length_of_the_bone_(2, 3)
rod_3 hasFeature length_of_the_bone_(5, 6, 7)
length_of_the_bone_1 hasValue 385.22 mm.
rod_4 hasFeature length_of_the_bone_12
For external fixators, there are five different rods in rod_5 hasFeature length_of_the_bone_(8, 9, 11)
standard dimensions, which can be used for different rod_6 hasFeature length_of_the_bone_(2,3)
lengths of bones. Therefore, subclass rod contains 10 rod_7 hasFeature length_of_the_bone_(1, 4, 10)
instances, 5 per each type of fixator (rod_1, rod_2, ..., rod_8 hasFeature length_of_the_bone_(5, 6, 7)
rod_10), and each instance is associated with a non- rod_9 hasFeature length_of_the_bone_12
transitive property hasFeature with certain feature rod_5 hasFeature length_of_the_bone_(8, 9, 11)
length_of_the_bone. This means that the one length of the The multitude of the rods above defines the families of
rod can be applied for fractures of long bones whose the external fixator "Mitković" (external_fixator_1, ...,
length are in a certain range. external_fixator_5) and the hybrid external fixator
The external fixator "Mitković" consists of the (hybrid_external_fixator_1, ...,
following elements: Rod, Screw, Lateral supporting hybrid_external_fixator_5). Transitive property hasPart
element, Clamping ring on the lateral supporting element connects instances of class fixator with instances of
1, Clamping ring on the lateral supporting element 2, subclasses of part classes that represent the integral
Screw Nut, Washer, Clamping ring plate on the clamp elements of fixators.
ring 1 and Clamping ring plate on the clamp ring 2. Relations between instances external_fixator_1, ..., 5,
Instances rod_1, rod_2, rod_3, rod_4 and rod_5 are and instances of subclasses of part class are given below:
elements of the external fixator "Mitković", while rod_6, external_fixator_1,...,5
rod_7, rod_8, rod_9 and rod_10 are the elements of the hasPart rod_1,..,5
hybrid external fixator. hasPart screw_1
The following rules, related to selection of the specific hasPart lateral_supporting_element_1
parts, relative to the length of the bones are given: hasPart
clamping_ring_on_the_lateral_supporting_element_
- For the length of the bones up to 250 mm, rod_5 or 1_1
rod_10 are used; hasPart
- for the length of the bones of 250-300 mm rod_4 or clamping_ring_on_the_lateral_supporting_element_
rod_9 are used; 2_1
hasPart screw_nut_1
- for the length of the bones of 300-350 mm rod_3 or hasPart washer_1
rod_8 are used; hasPart
- for the length of the bones of 350-400 mm rod_1 or clamping_ring_plate_on_the_clamp_ring_1_1
rod_7 are used; and hasPart
clamping_ring_plate_on_the_clamp_ring_2_1.
- for the length of the bones over 400 mm rod_2 or
rod_6 are used. Hybrid external fixator has the identical integral
elements as the external fixator "Mitković", with two
additional elements: ring and wire.
Relations between the instances
hybrid_external_fixator_1,...,5, and the instances the of
subclasses of part class are given below:
hybrid_external_fixator_1,...,5
hasPart rod_6,..,10
hasPart screw_1
hasPart lateral_supporting_element_1
hasPart
clamping_ring_on_the_lateral_supporting_element_
1_1
hasPart
clamping_ring_on_the_lateral_supporting_element_
2_1
hasPart screw_nut_1
hasPart washer_1
hasPart
clamping_ring_plate_on_the_clamp_ring_1_1
hasPart
clamping_ring_plate_on_the_clamp_ring_2_1
hasPart ring_1
Figure 1. Class hierarchy ontology of external fixators hasPart wire_1.

Relations between rod instances and


length_of_the_bone are given below. Each rod Figure 2 below presents a graphical representation of
corresponds to the different lengths of the bones. For the ontology of external fixators with all its elements and
relationships between them.

Page 470 of 478


ICIST 2014 - Vol. 2 Poster papers

Figure 2. Graphical representation of the ontology of external fixators

Page 471 of 478


ICIST 2014 - Vol. 2 Poster papers

clinical practice, funded by the Ministry of Education and


IV. CONCLUSION Science of Republic of Serbia, for the period of 2011-
With the variety of models, standards, protocols and 2014.
other formalisms for representing medical concepts and
processes, the healthcare domain is one of the most REFERENCES
diversified fields and test-beds for ontologies. [1] G. Jiang, K. Ogasawara, A. Endoh, and T. Sakurai, “Context-
Many researches have already demonstrated the number based ontology building support in clinical domains using formal
concept analysis,“ International Journal of Medical Informatics,
of advantages of using ontologies in a healthcare. vol. 71, pp. 71-81, 2003.
Ontologies can help in building more interoperable
[2] T. Gruber, “A Translation approach to portable ontology
information systems in healthcare. They can facilitate specifications“, Knowledge Acquisition, vol. 5, pp. 199–220, 1993.
transferring, re-use and sharing of patient data. Finally, [3] D.M. Pisanelli, Ontologies in Medicine. IOS Press, Netherlands,
ontologies can support the integration of the necessary 2004.
knowledge and information in healthcare. [4] R.K. Saripalle, Current Status of Ontologies in Biomedical and
In this paper, we present the approach, in which a Clinical Informatics. University of Connecticut.
medical product can be represented by using the [5] C. Rosse, J.L. Mejino, B.R. Modayur, R. Jakobovitz, K.P.
ontologies. Such a representation, when combined with Hinshaw, and J.F. Brinkley, “Motivation and organizational
principles for anatomical knowledge representation,“ JAMIA, vol.
the anatomy models, as well as the models of other 5, pp. 17-40, 1998.
concepts, such as the bone fractures, can facilitate the
[6] A. Burger, D. Davidson, and R. Baldock, Anatomy Ontologies for
critical decisions which are currently made in the extreme Bioinformatics. Springer, New York, 2008.
conditions, e.g. during surgeries, thus suffering of many [7] J. Bard, “The AEO, an ontology of anatomical entities for
possible risks. classifying animal tissues and organs,“ Frontiers in Genetics, vol.
The presented ontology of external fixators, makes the 3, pp. 1-7, 2012.
basis for a further work, particularly for the development [8] C. Rosse, and J.L. Mejino, “A reference ontology for biomedical
of an ontological model of fractures of long bones and for informatics: the foundational model of anatomy,“ Journal of
Biomedical Informatics, vol. 36, pp. 478-500, 2003.
determining semantic queries for defining the impact of
various factors on the selection of external fixators used [9] C. Mungall, G. Gkoutos, C. Smith, M. Haendel, S. Lewis and M.
Ashburner, “Integrating phenotype ontologies across multiple
for fractures. Currently, these factors, namely the relevant species,“ Genome Biology, vol. 11, pp. 1-16, 2010.
features of the bones are modeled as the features of the [10] M. Dean, and G. Schreiber, OWL Web Ontology Language
fixators parts. Reference, W3C recommendation, 2004.
In the future work, the ontology of fixators will be [11] M. Zdravković, and M. Trajanović, “Integrated product ontologies
integrated in an ontological framework, consisting of for inter-organizational networks,“ Computer Science and
product ontology, bone ontology and fracture ontology, Information Systems, vol. 6, pp. 29-46, 2009.
with modeled relationships between the relevant concepts [12] Klein, M. (2002) “DAML+OIL and RDF Schema representation
of UNSPSC”, Available from:
and features. http://www.cs.vu.nl/~mcaklein/unspsc/
[13] van Assem, M., Malaise, V., Miles, A., Schreiber, G. (2006), “A
ACKNOWLEDGMENT Method to Convert Thesauri to SKOS”, Proceedings of 3rd
This paper is part of project III41017 Virtual human European Semantic Web Conference, ESWC 2006, Budva
osteoarticular system and its application in preclinical and

Page 472 of 478


ICIST 2014 - Vol. 2 Poster papers

Making Sense of Complexity of Enterprise


Integration by Using the Cynefin Framework
Mila Mitić
Belgrade University, Mihajlo Pupin Institute, Belgrade, Serbia
[email protected]

Abstract—Enterprise integration is a complex problem. Cynefin framework and the Cynefin dynamics is given in
However, it is not often considered in that way. Complexity the section IV. Concluding remarks (given in the section
of the integration has to be understood. The Cynefin frame- V) points out that he Cynefin should not be neglected as
work could be used as scaffolding for making sense of en- scaffolding for making sense of enterprise integration.
terprise integration. In this paper, it is considered how the
Cynefin dynamics could be used for understanding possible II. CHALLENGES TO ENTERPRISE INTEGRATION
and desirable approaches to solving the problem of enter- To understand challenges to enterprise integration it is
prise integration complexity.  important to understand its purpose, its nature and the
ways it could be achieved.
I. INTRODUCTION
Enterprise integration is recognized as the underlying A. Enterprise Integration Purpose and Nature
enabler of business agility and is considered as crucial to Integration solutions have to enable business agility in
success and future agility of the business ([1]). the conditions of high rate of organizational, social and
There have been many problems with enterprise inte- technology changes ([7]). Enterprise integration is de-
gration. Many enterprises devoted significant resources to manded by customers ([5]). It begins with business prob-
their enterprise integration solutions which were disasters lems ([1]) and draws IT deeper than ever into the central
([2]-[6]). Many side effects and drifts were recognized nervous system of enterprises ([11]).
([6]). There were complaints that it was easier to fit the Business groups within enterprises are rethinking how
organization to the enterprise system solution than to to interact with customers, partners, and suppliers; manu-
adapt the technical solution to serve organizational re- facture goods; and operate, organize, and manage the
quirements ([7]). business to achieve some common purpose in a given con-
In fact, integration issues are not handled well in tradi- text ([1]). The essence of enterprise integration is the re-
tional systems engineering practices ([8]). It was suggest- cursively interoperation of constituent systems to compose
ed using system of systems thinking and exploring do- a system to achieve a specific purpose in a specific situa-
mains in which the order is emerging ([8]), as well as tion ([8]).
adoption of complex adaptive systems (CAS) techniques Integrated solutions are creative solutions developed to
to manage complexity while providing flexibility and address situational needs ([12]). They can be inventions /
adaptive behavior ([9]). imitations ([13]). As such, they deal with technological,
In other words, the problem of enterprise integration organizational, semantic, and legal issues ([14]).
was recognized as complex one. B. The Way toward Enterprise Integration
Gartner, one of the world's leading information tech- To understand possible way to enterprise integration
nology (IT) research and advisory company, has recently achievement it is necessary to understand issues relating to
recognized the Cynefin framework as one of few tools that enterprises themselves, as well as ones relating to integra-
can help us to cope with complex problem relating to IT tion.
applications (as cited in [10]). The Cynefin recognizes
complex domain of knowledge; it is based on CAS theory. 1) Enterprise Issues
Challenges facing enterprises today are different from
That is how the question how we could use the Cynefin those of the past ([15]). For much of the twentieth century,
framework in coping with problems of enterprise integra- enterprises operated in a relatively stable environment and
tion was risen. This paper is a preliminary attempt of us- had mechanistic design. Enterprises were characterized by
ing the Cynefin for discovering possible and desirable machine-like standard rules, procedures, and a clear hier-
approaches to solving the problem of enterprise integra- archy of authority. Enterprises were highly formalized and
tion complexity. centralized, with most decisions made at the top.
Challenges to enterprise integration are presented in the Enterprises were designed and managed on an imper-
section II of the paper. The short description of the sonal, rational basis, as if they were alike, neglecting so-
Cynefin framework is given in the section III of the paper. cial context and human needs. Employees were expected
Making sense of enterprise integration by using the to do as they were told, to perform activities according to
a specific job description. Formal systems were in place to
The presented work has been financially supported by the Ministry manage information, guide communication and control,
of Education and Science of Republic of Serbia under the project "New and detect deviations from established standards and
technologies in intelligent transportation systems – implementation in goals.
the urban and suburban settings", 2011-2014, reg. no. TR 36005.

Page 473 of 478


ICIST 2014 - Vol. 2 Poster papers

Since the 1980s, enterprises have undergone very great the existing institutional arrangements, and the ideas for
and far-reaching changes ([15]). The ultimate goal of the their further transformation ([16]).
enterprise is business agility, i.e. the ability to adapt to Great diversity within enterprises can bring vitality and
change rapidly. There is the dominant belief that enter- many benefits. However, it brings also a variety of chal-
prises must adapt to change or perish ([1]). lenges. It is necessary to maintain a strong corporate cul-
However, to some extent, enterprises are still imprinted ture and support diversity at the same time; to balance
with the hierarchical, formalized, mechanistic approach. work and family concerns; and to cope with the conflict
Enterprise management is still guided by Newtonian sci- brought about by varying cultural styles ([15]).
ence, which suggests that the world functions as a well- Beside interoperability at the technical level, it is neces-
ordered machine ([15]). sary to provide semantic, organizational and legal interop-
In the same time, some new perspectives on enterprises erability ([14]). Precise meanings of exchanged infor-
have emerged. Enterprises are seen as social entities, mation have to be preserved and understood by all parties.
made up of people and their relationships with one anoth- Processes in which different organizations achieve mutu-
er, and linked to the external environment. In other words, ally beneficial goal have to be coordinated. Legislation
it is considered that enterprises are open systems, with a has to be aligned so that exchange data is accorded proper
mostly organic design ([15]). legal weight.
Such enterprises are much looser and adaptive. There In other words, enterprise integration requires a change
are few rules or formal control systems. The hierarchy of of infrastructure management. There is a need for changes
authority is looser and not clear-cut. People have to find of technical infrastructure, institutional arrangements and
their own way through the system to figure out what to do. cognitive frames of people considering the current state
The emphasis is on collaborative teamwork rather than and needs.
hierarchy. Self-directed teams are the fundamental work Enterprise integration requires open culture and open
unit in highly organic enterprises. communications ([2]), as well as cross-organizational
People work across department and organizational teams ([11]). Cross-organizational teams have to foster
boundaries to solve problems. There are open lines of innovation throughout the company, for example, by de-
communication with customers, suppliers, and even com- ploying intranet-based forums and wikis for searching for
petitors to enhance learning capability. promising ideas. They have to establish the architecture
However, an enterprise does not always perform better and management practices essential for business integra-
with a strong organic design. Sometimes standard proce- tion, for example, by identifying integration opportunities,
dures, formal rules, and a mechanistic approach can be channeling resources to them, and reconfiguring enterprise
highly effective ([15]). systems to support cross-business collaboration.
2) Integration Issues C. Challenges
Because of the way enterprises and their information Thus, the challenges to enterprise integration are great.
systems were organized in the past, integration has been There is a need for changing our view on enterprises as
addressed as a technology and infrastructure topic ([1]). machines and on enterprise integration as a technical
Since scientific management and administrative principles problem. We have to find a way in which enterprise inte-
approaches were powerful in enterprises, infrastructure gration really serves enterprise’s needs.
was seen as a large management information system, in
which systems and applications might be heterogeneous, The challenges cannot be addressed by traditional mod-
but control and resource allocation ought to be centralized. eling and engineering approaches ([8]). Enterprises, as
It was considered that infrastructure had to be built, open systems, do not require design-time interoperability.
aligned, and controlled ([16]). Run-time interoperability is needed because the opera-
tional context is changing continuously and the developers
However, new approaches to enterprise management of those systems cannot know a priori systems with which
require different approaches to infrastructure. It was rec- they will interoperate ([12]).
ognized that infrastructure - an installed base of software,
hardware, institutional arrangements, cognitive frames, III. CYNEFIN – A SCAFFOLDING FOR MAKING SENSE
and imaginaries that actors bring to, and routinely enact in OF THE WORLD
a situation of action – is a formative context ([16]). In
other words, infrastructures constitute the background The Cynefin framework is scaffolding through which
condition for action, enforcing constraints, giving direc- and by which people can make sense of their world in
tion and meaning, and setting the range of opportunities different ways ([17]). It helps people to break out of old
for undertaking new actions. way of thinking and to consider intractable problems in
new ways ([18]).
Enterprises will need to continually evolve and opti-
mize their infrastructures to provide better response time A. Cynefin Framework
to their customers and changes in business conditions Cynefin1 is a multi-ontological framework for making
([1]). Old infrastructure cannot be changed instantly. New sense. It can also be used as a model for categorizing
infrastructure has to be integrated with the old one ([16]). knowledge domains. A short description of the Cynefin is
That is why a new infrastructure management approach given in this section according [18]-[20]. The framework
is required. It has to be a mixture of releasing and cultivat- is shown on Fig. 1.
ing ideas with the Internet and the intranet as a technical
infrastructure. It would contribute to the distribution of
knowledge within and across the business and governing 1
Cynefin (pronunciation kun-ev’in) is a Welsh word; its mean-
the invention of alternative forms of work, the revision of ing is the place of one’s birth and of one’s upbringing, the environment
in which one lives and to which one is naturally acclimatized ([18]).

Page 474 of 478


ICIST 2014 - Vol. 2 Poster papers

The underlying idea of the Cynefin is that ontology, A movement from the complicated into complex do-
epistemology and phenomenology (i.e. the way things are, main to allow new ideas to emerge is facilitated by relax-
the way we know things, and the way we perceive things) ing constraints and reducing management from the center.
have to be aligned to make our actions authentic to the Stable material moves from the complicated to simple
situation. domain, and, because of huge changes in the world, does
The framework distinguishes five domains of not stay in it forever. A movement between the simple and
knowledge: simple, complicated, complex, chaotic, and complicated domain is an incremental improvement.
disorder. The simple and the complicated domains are A movement from an ordered domain to the chaotic one
ordered domains. The complex and the chaotic domains may be intended to quit with usual way of thinking and
are un-ordered ones. Un-order does not mean the lack of encourage new ones. Entry into the chaotic domain may
order, but a different kind of order – emergent order. The be unplanned and caused by some mistakes. The domain
domain of disorder involves two states: inauthenticity and may be gone out by setting some constraints to move back
transition. Inauthenticity is a state of not knowing which to the simple or complex domain (depending on the size
domain we are in, and it is undesirable. Transition is a of change requiring in the system).
state of moving to another domain to achieve some
change. B. Cynefin v45
Each domain of the framework is characterized by dif- The Cynefin v45 has been recently completed by
ferent cause-effect relationship. That is why different do- bringing together previously developed models of the
mains require different approaches to management, differ- simple, complicated, complex, and chaotic domains (ro-
ent models for decision-making and different tools. Man- tated by 45). This version of the Cynefin can help us to
aging in each domain should be based on the stability af- understand desirable movements across the framework. A
forded by strong connections and on the freedom and re- short description of the Cynefin v45 relevant for this pa-
newal afforded by weak connections, without hardening per is given in this section according to [10] and [17].
strong connections so much to destroy flexibility and re- In each model, there are two dimensions and three val-
laxing weak connections so much to permanently remove ues along each dimension. In other words, the models are
useful patterns. 3X3 matrices. Nine fields are not rigid categories, but a
In the ordered domains, cause-effect relationships are way of giving perspective on a domain typology. They
stable. The relationship can easily be understood in the help us to understand domain dynamics.
simple domain. Understanding the relationship in the Leftmost and rightmost fields of the domain models (in
complicated domain requires some experts’ analysis and
v45) are transition fields to other domains. Regular path
interpretation. There is a cause-effect relationship in the
of movement within a domain is the horizontal line (in
complex domain, but, because of great number of causal
relationships and feedbacks, it is difficult to perceive it; in v45). Slipping from it requires management action for
fact, it can be perceived only backwards. A cause-effect returning to it. Otherwise, slipping into the disorder and
relationship does not have to be perceivable in the chaotic chaos is likely.
domain. In the disorder, cause and effect is not connected Legitimate areas of a domain are in green color. The
in all. blue color is used for the areas where some management
That is why the simple domain is the domain of stand- action is needed. Undesirable areas are colored in red.
ard procedures and the best practices, the complicated one Since the complex domain is the domain from where
is the domain of experts, analysis and good practices. The enterprise integration should be considered, an overview
complex domain is the domain of patterns and emerging of it is given farther in the section. The model is shown on
practices, and the chaos is the domain of novel practices. Fig. 2.
In the un-ordered domains, there is not dominance of ra- 1) Complex Domain Model
tionality, but dominance of cultural factors, inspired lead- The dimensions of the model are evidence and degree
ership, gut feeling, and other complex factors resulting of acceptance. The evidence denotes the degree to which
from interactions of various factors through space and any need or requirement is structured / defined / under-
time. In the disorder, people are pulling to the domain in stood. The other dimension denotes the degree to which
where they feel best, and it is difficult to determine re-
quired action.
COMPLEX COMPLICATED
In the ordered domains, the links to the center are
strong and usually expressed by structures that, in some Probe Sense
way, constrain the behavior of actors. In the un-ordered Sense Analyse
domains, the links to the center are weak. That is why Respond Respond
traditional top-down management does not succeed in Emergent Practice Good Practice
them. Links between actors are strong in the complicated
and complex domains, but weak in the simple and chaotic Disorder
ones.
The prime dynamic is between the complex and com- CHAOTIC SIMPLE
plicated domain. Ideas emerge in the complex domain. Act Sense
When sufficient evidence for validity of the idea is created Sense Categorize
and high level of its acceptance is achieved, it is possible
Respond Respond
to shift the idea from exploration to exploitation. Some
Novel Practice Best Practice
constraints on behavior are imposed and the shift to the
complicated domain is made.
Figure 1. The Cynefin Framework

Page 475 of 478


ICIST 2014 - Vol. 2 Poster papers

different interests groups agree on the needs and nature of can be created for checking some view. They are not de-
what is needed. signed to succeed but to form insight and understanding
The degree of acceptance is increasing from cognis- what is possible. They can be parallel and even contradict
centi through orthodoxy to synchrony. Cogniscenti de- each other because the domain is unknowable. Most peo-
notes the narrow acceptance of a small in-group. Syn- ple start from the central field assuming that the direction
chrony means that there is the overwhelming belief at the is known or knowable, but failing to start from the field of
enterprise level. Orthodoxy is the intermediary state; it safe-to-fail parallelism reduces scanning and resilience.
denotes a dominant view. Undesirable area in this domain - the area of a high risk
The evidence is increasing from gut feel, intuition of collapse in inauthentic disorder or the chaos - is the
through inductive, case based to beyond reasonable doubt. group think field. In this field, there is a high degree of
Evidence beyond reasonable doubt assumes some notions acceptance an idea, which is not well understood. This is
of sufficiency. Inductive or case based evidence assumes where assumptions can be followed without proper testing
some degree of sufficiency, and contingencies. It encom- or consideration of the need. It is often the area of retain-
passes also propositions that we can neither prove nor ing attractiveness of past practice long after its applicabil-
refute. The evidence labeled with “gut feel, intuition” de- ity has fallen to changing context. It is where dominant
notes dependence on gut feel based on rich experience or ideas cannot be challenged because of a fear culture.
in extremis, having an attempt. Other fields of the domain require some management
The complex domain can be entered from the chaotic actions. When there is high degree of acceptance of some
domain, from the field of managed innovation, which de- idea but not sufficient evidence, challenge evidence is the
notes the area of deliberate entrance into the chaos in the action applied to break the old idea up. This action is also
belief that ability to create the constraints is known. applied when evidence is gathered through research based
Regular path of movement in the domain is across the on assumptions that have been already out of date.
horizontal line, in both directions. Focus in the complex If a dominant view is not supported by evidence, it is
domain should be on safe-to-fail probes rather than fail-to- necessary to break it up, and review evidence. The field is
safe design. The main dynamic is the horizontal line. an early warning area because it means that the enthusi-
When a new idea emerges it will be understood by a small asm for an idea is growing faster than its evidence base.
group of believers and the evidence base will be necessari- This happens because people very often seize the shakiest
ly weak (the field of safe-to-fail parallelism). As more evidence to support ideas of people who have some pow-
work is done, then evidence accumulates and more people er. Such ideas have to be disrupted.
buy into the idea (the field of serious investment). Even- If there is no sufficient evidence for an idea, but it is ac-
tually there is a clear and near universal consensus (the cepted by someone who understands the situation and is
field of production ready to exploit), and a shift from ex- able to manage the organization, he or she may act as a
ploration to exploitation i.e. from the complex to compli- coach to the followers of the idea in order to shift it to the
cated domain can be made. (The shift is into the field of horizontal line. The field of coaching & mediation re-
focus on opportunities, where people with different views quires some changes to make the idea closer to dominant
check if the exploration is complete and focus on forming ideas. Often a simple change of language or matching to
strategy for exploitation the idea). some corporate goal can help.
The reverse path is also valid, when a need for explora- If someone with authority has some interest in the idea
tion is recognized because of contextual changes. and respects or believes in it, but knows that the organiza-
Since the field of safe-to-fail parallelism is on the edge tion as a whole will not accept it, she or he may create a
of chaos, the most of opportunities lies in it. Experiments safe space, free of normal controls to allow the idea to be

Group think

Managed Break it up Challange Focus on


Chaotic review opportuni- Complicated
innovation evidence
evidence ties
Safe to fail Serious Production
parallelism investment ready to
exploit
Coaching & Sandpits &
mediation skunk
works
Heretics &
mavericks

Figure 2. The complex domain model

Page 476 of 478


ICIST 2014 - Vol. 2 Poster papers

developed and proved in practice. In other words, sandpits usual controls, and would explore possible ways of
& skunk works can be allowed. achieving enterprise integration respecting the specificity
The field of heretics & mavericks is the area in which a of the situation.
small group of people believes in some idea that is contra- They could explore a way for fostering integration in
ry to dominant view. The group needs assistance to shift the enterprise. They could search high-potential ideas for
to the space of coaching and mediation, or of sandpits and business agility, monitor experiences of early adopters of
skunk works. new technologies, make advise for managing innovation,
help in developing pilot projects and prototypes, and pro-
IV. CYNEFIN – A SCAFFOLDING FOR MAKING SENSE vide expertise in achieving business agility and skills nec-
OF AN ENTERPRISE INTEGRATION essary for enterprise integration in the specific situation.
The Cynefin points out that enterprise integration and If there is an inductive evidence for validity of some
enterprise business agility should be achieved by move- idea and high degree of its acceptance, it is necessary to
ment between the complex and complicated domains, with challenge the evidence by finding some evidence opposite
possible transition over the chaotic domain. to it. If there is an inductive evidence for validity of some
Since enterprise integration is a complex problem and it idea, low degree of its acceptance and someone who has
requires finding emerging patterns, the complex domain is power in the enterprise believes in the idea, he should
the starting point for enterprise integration consideration. coach and mediate in order to increase its acceptance by
That means that the complex domain model (with its links clarifying its importance for business agility.
to other domains) may be used in considering enterprise In any case, safe-to-fail parallel probes are crucial for
integration problem. emergence of desirable patterns of enterprise integration,
We have to find where we are in the domain and take possible ways of integrating business processes and relat-
suitable actions according to it. It is desirable to come to ed data exchange. They can be real or thought experi-
the regular path of movement across the domain. ments. They have to seek promising ideas and possible
opportunities from interactions between technologies and
Many enterprises would start enterprise integration business in the concrete context (economic, political, le-
from the group think field of the complex domain because gal, and cultural) in order to provide business agility by
of the overwhelming belief that enterprise management enterprise integration.
and enterprise integration are not complex problems but
complicated ones. The field represents a high risk of sud- In other words, they have to seek possible ways to
denly collapse into the chaos or disorder. It is undesirable combine local perspectives into enterprise integration and
and management action is needed for leaving it. It is nec- build understanding the linkages between and limitations
essary to break up the idea of purely technological innova- of different factors important for enterprise integration
tion and the idea that success is possible without consider- (for example, relevant legislative, business processes,
ation of organizational and other relevant aspects of enter- business ideas, possible interpretation of data in / from
prise integration. To decrease degree of acceptance of the other context). They also have to find ways to build open
old idea, experiences from failure attempts of enterprise communication and open culture with respecting diversi-
integration, as well as from enterprise information systems ty; satisfy work and personal needs of people; interact
drift could be used. with customers, partners, and suppliers; as well as operate,
organize, self-organize, and manage the business. The
The complex domain could be entered from the chaotic emergent patterns would show which technologies and
one after breaking all links in order to stop enterprise inte- other ideas are useful for the enterprise, help in determin-
gration project performing only from technological per- ing integration purpose, scope and type, and show who
spective or recognizing some business risk due to un- probably can work and organize together.
integrated business and IT applications. Managing innova-
tion requires forcing constraints to allow a new approach The emergent pattern increases evidence for new enter-
to enterprise integration and enterprise agility. prise integration solution and the degree of its acceptance,
regardless of whether it means invention (being first in
The complex domain could be entered from the compli- doing something) or imitation (introduction or use some-
cated domain too, from the field of focus on opportunities. thing which has already been done elsewhere). It reveals
That may happen when a need for exploration of possible what serious investment is. Further considerations how the
solutions to problems of an enterprise integration project idea can be implemented increase evidence that the ap-
in progress or of un-integrated business and IT applica- proach is valid one, as well as degree of acceptance what
tions is recognized in time. can be done.
There is a possibility that some group of people has suf- It is necessary to form conditions for exploitation the
ficient evidence about validity of some approach to enter- idea and moving it to the complicated domain. In particu-
prise integration based on their prior experiences in IS lar, there is a need for building work environment that
development. Management should not ignore those here- allows the idea to return from the complicated domain to
tics & mavericks, but try to find a way to help them to the complex one. It requires some changes in enterprise
prove their ideas. The best way is probably to create a and IS development and management. For example, tradi-
space for sandpits and skunk works. tional IS development regulated by project contracts is not
Sandpits and skunk works should facilitate solving suitable.
problems of enterprise integration. One or more groups of All of that require changes in approaches to both enter-
talented people who combine broad business knowledge, prise management and enterprise integration. It is neces-
technology expertise, and social skills can work on build- sary to make conditions for work in the complex domain.
ing relationships both within and outside the enterprise. The cause-effect relationships are un-knowable here. They
They would work in the safe space, without formal and should be determined by a mixture of releasing and culti-

Page 477 of 478


ICIST 2014 - Vol. 2 Poster papers

vating ideas with using existing enterprise infrastructure. REFERENCES


Some patterns should emerge. That is why the work has to [1] B. Gold-Bernstein and W. Ruh, “The Business Imperative for
be based on principles rather than rigid rules, stories rather Enterprise Integration”, in Enterprise Integration: The Essential
than vision, heuristics rather than rules, diversity rather Guide to Integration Solutions, Addison-Wesley, 2004, chapter 1
than regulating, and evolution rather than defined future [2] J. E. Scott and I. Vessey, “Managing Risks in Enterprise Systems
state. Implementations”, Communications of the ACM, vol. 45,
pp. 74-81, April 2002.
All this shows how enterprise integration is a very hard
[3] M. I. Hwang and H. Xu, “The Effect of Implementation Factors
problem. It probably requires much movement in both on Data Warehousing Success: An Exploratory Study”, Journal of
directions along the regular path of the complex domain, Information, Information Technology, and Organizations, vol. 2,
as well as slipping from and returning to it. Those move- pp. 1-14, 2007.
ments should allow integration of old and new approaches [4] R. Atkinson, Enterprise Resource Planning (ERP) The Great
to enterprise management and IS development. They are Gamble: An Executive's Guide to Understanding an ERP Project
dependent on situational needs and have to be explored in Paperback, Xlibris Corporation, 2013.
collaborative work of people who have different views [5] B. Manouvrier and L. Ménard, Application Integration: EAI,
and knowledge and probably are not accustomed to face B2B, BPM and SOA, Wiley, 2008.
the problem as complex one. [6] I. Ignatiadis, and J. Nandhakumar, “Enterprise Systems, Control
and Drift”, in Enterprise Information Systems: Concepts, Method-
V. CONCLUDING REMARKS ologies, Tools and Applications, USA Information Resources
Management Association, IGI Global, 2010, pp. 1209-1232.
This paper is an attempt of breaking up dominant belief [7] E. A. Stohr and J. V. Nickerson, “Intra Enterprise Integration:
in traditional, technical approach to enterprise integration Methods and Direction“, in Competing in the Information Age:
and a preliminary attempt of using the Cynefin framework Align in the Sand, 2nd Edition, J. N. Luftman (ed.), Oxford Univer-
for enterprise integration. Talking about new approaches sity Press, 2003, pp. 227-251.
to enterprise integration can be understood as a transition [8] G. Morel, H. Panetto, F. Mayer, and J.P. Auzelle, System of Enter-
prise-Systems Integration Issues: An Engineering Perspective, In-
– a state of moving enterprise integration from ordered vited Plenary Paper, Proceedings of the IFAC CEA 2007 Confer-
domains to emergent one. ence on Cost Effective Automation in Networked Product Devel-
It is important to understand the nature of enterprise in- opment and Manufacturing, Mexico, 2007.
tegration and enterprises in order to make conditions for [9] J. Sutherland, and W.J.v.d Heuvel, “Enterprise Application Inte-
development and use better approaches to enterprise inte- gration and Complex Adaptive Systems”, Communications of the
gration. Only when we understand the nature of enterprise ACM, vol. 45, pp. 59–64, October 2002.
integration we can choose the right ways to support it. [10] D. Snowden, "Making Sense of Complexity", Keynote, 5th Lean,
Agile & Scrum Conference, LAS 2013, Zurich, September 2013,
The Cynefin framework can help us to understand that Available at: http://www.lean-agile-scrum.ch/las-2013/
there are un-ordered domains and that we have to leave [11] J. I. Cash, Jr., M. J. Earl, and R. Morison, “Teaming Up to Crack
the complicated domain where we are inclined to work in. Innovation and Enterprise Integration”, Harvard Business Review,
Methods of the complicated and simple domains do not vol. 86, pp. 90-100, November 2008.
allow us to solve complex problems on the right way. En- [12] D. Carney, D. Fisher, E. Morris, and P. Place, Some Current Ap-
terprise integration problems are no exception. proaches to Interoperability, Carnegie Mellon University,
CMU/SEI-2005-TN-033, August 2005.
Enterprise integration is a complex problem and so it [13] B. Godin, καινοτομία: An Old Word for a New World, or, The De-
requires probes and exploration, not only experts' think- Contestation of a Political and Contested Concept, Project on the
ing. There are many open issues. The Cynefin dynamics Intellectual History of Innovation, Working Paper No. 9, Montre-
could help us to understand the nature of problems and al, 2011.
possible movements toward better solutions. [14] European Commission European, Interoperability Framework
The consideration of enterprise integration by the (EIF) for European public services, Annex 2 to the Communica-
tion from the Commission to the European Parliament, the Coun-
Cynefin framework points to problems already recognized cil, the European Economic and Social Committee and the Com-
in the information system field. For example, many re- mittee of Regions 'Towards interoperability for European public
searchers and experts have argued that IS development is services', Bruxelles, December 2010.
a complex, social problem. It was recognized the im- [15] R. L. Daft, “Organizations and Organization Theory”, in Organi-
portance of parallel probes and hackers for achieving sus- zation Theory & Design, 11th ed., Cengage Learning, 2013,
tainable advantage of enterprises ([16]) and that IS do- pp. 2-53.
main is characterized by interactions between technology, [16] C. Ciborra, The Labyrinths of Information: Challenges the Wis-
business, and people ([21]). dom of Systems, Oxford University Press, 2002.
[17] D. J. Snowden Blog, www.cognitive-edge.com/blog, entries:
In other words, there are many problems, which are not 6149, 6086, 6081, 6078, 6076, 6074, 6040, 6039, 6038, 6004,
new. Replacing one approach to enterprise management 5989, 5820, 5818, 5792, 5667, 5666, 3234
and IS development by other has not solved the problems. [18] C. F. Kurtz and D. J. Snowden, “The New Dynamics of Strategy:
The new way of thinking is required. Sense-making in a Complex and Complicated World“, IBM Sys-
That is why the Cynefin framework should not be ne- tems Journal, vol. 42, no. 3, pp. 462-483, 2003.
glected as scaffolding for making sense of enterprise inte- [19] D. J. Snowden, “Complex Acts of Knowing: Paradox and Descrip-
tive Self-awareness”, Journal of Knowledge Management, Special
gration, as well as for making sense of IS and enterprise Issue, vol. 6, no. 2, pp. 100-111, 2002.
problems in general. Without understanding the nature of [20] D. J. Snowden, “The Origins of Cynefin”, 2010, Available at:
enterprises and enterprise integration, we will hardly be http://cognitive-edge.com/uploads/articles/
able to choose the right way for enterprise survival and The_Origins_of_Cynefin-Cognitive_Edge .pdf
success. [21] A. S. Lee, "Thinking about Social Theory and Philosophy for
Information Systems", in Social Theory and Philosophy for Infor-
mation Systems, J. Mingers and L. Willcocks (eds.), Wiley, 2004,
pp. 1-26.

Page 478 of 478

You might also like