100% found this document useful (2 votes)

481 views

Abstraction in Artificial Intelligence and Complex Systems

Uploaded by

Mathias Gatti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

481 views

Abstraction in Artificial Intelligence and Complex Systems

Uploaded by

Mathias Gatti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 487

Lorenza Saitta

Jean-Daniel Zucker

Abstraction
in Artificial
Intelligence
and Complex
Systems
Abstraction in Artificial Intelligence
and Complex Systems
(Pierre Soulages, Peinture 260 9 202 cm, 19 juin 1963)

‘‘Une peinture est une organisation, un ensemble de relations entre des formes (lignes, surfaces
colorées) sur lequel viennent se faire ou se défaire les sens qu’on lui pr̂ete’’ (Franz̈osische
Abstrakte Malerei, Catalogue de l’exposition, Stuttgart, 1948)
Lorenza Saitta Jean-Daniel Zucker
•

Abstraction in Artificial
Intelligence and Complex
Systems

123
Lorenza Saitta Jean-Daniel Zucker
Dipartimento di Scienze e Innovazione International Research Unit
Tecnologica UMMISCO 209
Università degli Studi del Piemonte Research Institute for Development (IRD)
Orientale Bondy
Alessandria France
Italy

ISBN 978-1-4614-7051-9 ISBN 978-1-4614-7052-6 (eBook)

DOI 10.1007/978-1-4614-7052-6
Springer New York Heidelberg Dordrecht London

Library of Congress Control Number: 2013934382

Ó Springer Science+Business Media New York 2013

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed. Exempted from this legal reservation are brief
excerpts in connection with reviews or scholarly analysis or material supplied specifically for the
purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the
work. Duplication of this publication or parts thereof is permitted only under the provisions of
the Copyright Law of the Publisher’s location, in its current version, and permission for use must
always be obtained from Springer. Permissions for use may be obtained through RightsLink at the
Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt
from the relevant protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for
any errors or omissions that may be made. The publisher makes no warranty, express or implied, with
respect to the material contained herein.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

To Attilio and Francesco
To Miao
Preface

When we started writing this book we were aware of the complexity of the task,
but we did not imagine that it would take us almost three years to complete it.
Furthermore, during the analysis and comparison of the literature from different
fields, it clearly emerged that important results have been achieved, but that much
more important ones are still out of reach. Then, the spirit of the book changed, by
itself, from the intended assessment of the past to a stimulus for the future. We
would be happy if the reader, instead of being content with the ideas we propose,
would take them as a motivation and starting point to go beyond them.
We present a large selection of works on abstraction in several disciplines;
nonetheless many relevant contributions to the field have been necessarily left out,
owing to the sheer amount of pages they would fill. We apologize for the missing
citations.
In this book we present a model of abstraction, the KRA model, but this is not
the core of the book. It has a limited scope and serves two main purposes: on the
one hand it shows that several previous proposals of abstraction theories have a
common root and can be handled inside a unified framework, and, on the other, it
offers a computational environment for performing abstraction by applying a set of
available, domain-independent operators (programs). In fact, there is still a gap
between general abstraction theories, mostly elegant logical formulations of rep-
resentation changes, and concrete approaches that heavily rely on specific domain
characteristics. The KRA model is meant to be something in between: the
domain-independence of the abstraction operators achieves both generality (it can
cover a broad spectrum of applications and application domains), and synergy (by
instantiating in different contexts some code written just one time).
Independently of the model, we believe that the basic ideas on which it relies
are more important than the model itself. These ideas are certainly arguable; some
reader might think that our view of abstraction is exactly what he/she has always
looked for, whereas some other might think that abstraction is totally something
else. Both reactions are welcome: what matters is to trigger interest in the subject
and stimulate more research.
The book is not intended to be a textbook: it is targeted to scientists working on
or using abstraction techniques, without limitation of fields. Computer scientists,
Artificial Intelligence researchers, artists, cognitive scientists, mathematicians, and

vii
viii Preface

curious minds can read the book. Some parts are more formalized, and they may
look complex at first sight. However, we believe that the greatest part of the
content is graspable by intuition.
Finally, we mention that we have set up a companion Web site (http://
www.abstractionthebook.com), where implemented operators are uploaded.
Anyone interested in abstraction is welcome to contribute to it.

Paris and Hanoi, January 2013 Lorenza Saitta

Jean-Daniel Zucker
Acknowledgments

The authors would like to thank Yann Chevaleyre for his invaluable help and
expertise in abstraction in Reinforcement Learning, Laurent Navarro and Vincent
Corruble for their help in abstraction in multi-agent systems, and Nicolas Reg-
nauld, from Edinburgh University, for providing them with both test data and his
expertise on relevant operations and measures on buildings in Cartography.
Lorenza would like to thank her husband Attilio, who contributed, with
insightful discussions, to shape the content of the book, and also for providing two
of the appendices.
Jean-Daniel would like to thank his wife (Miao), children (Zoé, Théo, Arthur
and Nicolas), family, colleagues, and friends (especially Jeffrey, Bernard, Joël,
Vincent, Laurent and Alexis) for encouraging and tolerating him through the long
hours of writing and longer hours of rewriting, and Pierre Encrevé for his unfailing
availability and communicative passion for Soulages.
Finally, the authors are deeply grateful to Pierre Soulages, to the Escher
Company, and to the Museo Larco (Lima, Peru), who allowed them to illustrate
their idea with magnificent works, and also to all the authors who granted per-
mission to publish some of their figures, contributing to the visual and conceptual
enrichment of this book.
Finally, Lorenza and Jean-Daniel are grateful to Melissa Fearon and Courtney
Clark, at Springer, for their patience in waiting that this book would be completed.

ix
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Abstraction in Different Disciplines . . . . . . . . . . . . . . . . . . . . . . . 11

2.1 Philosophy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Natural Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Computer Science. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.5 Art (Mostly Peinture) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6 Cognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.7 Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3 Abstraction in Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . 49

3.1 Theoretical Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2 Abstraction in Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3 Abstraction in Constraint Satisfaction Problems . . . . . . . . . . . 59
3.4 Abstraction in Knowledge Representation . . . . . . . . . . . . . . . 60
3.5 Abstraction in Agent-Based Modeling. . . . . . . . . . . . . . . . . . 62
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4 Definitions of Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.1 Giunchiglia and Walsh’ Theory . . . . . . . . . . . . . . . . . . . . . . 66
4.2 Abstraction in Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2.1 Wright and Hale’s Abstraction Principles . . . . . . . . . 70
4.2.2 Floridi’s Levels of Abstraction . . . . . . . . . . . . . . . . . 71
4.3 Abstraction in Computer Science . . . . . . . . . . . . . . . . . . . . . 77
4.4 Abstraction in Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.4.1 Miles Smith and Smith’s Approach . . . . . . . . . . . . . 79
4.4.2 Goldstein and Storey’s Approach . . . . . . . . . . . . . . . 83
4.4.3 Cross’ Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . 84

xi
xii Contents

4.5 Granularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.5.1 Hobbs’ Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.5.2 Imielinski’s Approach . . . . . . . . . . . . . . . . . . . . . . . 89
4.5.3 Fuzzy Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.5.4 Rough Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.6 Syntactic Theories of Abstraction . . . . . . . . . . . . . . . . . . . . . 94
4.6.1 Plaisted’s Theory of Abstraction. . . . . . . . . . . . . . . . 94
4.6.2 Tenenberg’s Theory . . . . . . . . . . . . . . . . . . . . . . . . 96
4.6.3 De Saeger and Shimojima’s Theory . . . . . . . . . . . . . 99
4.7 Semantic Theories of Abstraction . . . . . . . . . . . . . . . . . . . . . 103
4.7.1 Nayak and Levy’s Theory . . . . . . . . . . . . . . . . . . . . 103
4.7.2 Ghidini and Giunchiglia’s Theory. . . . . . . . . . . . . . . 106
4.8 Reformulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.8.1 Lowry’s Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.8.2 Choueiry et al.’s Approach . . . . . . . . . . . . . . . . . . . 113
4.8.3 Subramanian’s Approach . . . . . . . . . . . . . . . . . . . . . 114
4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

5 Boundaries of Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

5.1 Characteristic Aspects of Abstraction . . . . . . . . . . . . . . . . . . 118
5.1.1 Abstraction as Information Reduction . . . . . . . . . . . . 118
5.1.2 Abstraction as an Intensional Property . . . . . . . . . . . 120
5.1.3 Abstraction as a Relative Notion . . . . . . . . . . . . . . . 123
5.1.4 Abstraction as a Process . . . . . . . . . . . . . . . . . . . . . 125
5.1.5 Abstraction as Information Hiding . . . . . . . . . . . . . . 129
5.2 Boundaries of Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.2.1 Abstraction and Generalization/Categorization . . . . . . 130
5.2.2 Abstraction, Approximation, and Reformulation . . . . . 135
5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

6 The KRA Model. . . . . . . . . . . . . . . . . . . . ................ 141

6.1 Query Environment, Description Frame,
and Configuration Space . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.2 Query Environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
6.3 Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
6.4 The KRA Model of Abstraction . . . . . . . . . . . . . . . . . . . . . 163
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

7 Abstraction Operators and Design Patterns . . . . . . . . . . . . . . . . . 179

7.1 A Classification of Abstraction Operators . . . . . . . . . . . . . . . 179
7.2 Hiding Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
7.2.1 Hiding Element Operators . . . . . . . . . . . . . . . . . . . . 181
7.2.2 Hiding Value Operators. . . . . . . . . . . . . . . . . . . . . . 182
7.2.3 Hiding Argument Operators . . . . . . . . . . . . . . . . . . . 184
Contents xiii

7.3 Building Equivalence Classes Operators . . . . . . . . . . . . . . .. 185

7.3.1 Operators Building Equivalence Classes
of Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 186
7.3.2 Operators Building Equivalence Classes of Values . .. 189
7.3.3 Operators Building Equivalence Classes
of Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 190
7.4 Hierarchy Generating Operators . . . . . . . . . . . . . . . . . . . . .. 190
7.4.1 Operator that Builds a Hierarchy
of Types: xhiertype . . . . . . . . . . . . . . . . . . . . . . . . .. 191
7.4.2 Operator that Builds a Hierarchy of Attribute
Values: xhierattrval . . . . . . . . . . . . . . . . . . . . . . . . . . 192
7.5 Composition Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
7.5.1 Operator that Builds a Collective Object: xcoll . . . . . . 193
7.5.2 Operator that Aggregates Objects/Types: xaggr . . . . . . 194
7.5.3 Operator that Builds up a Group
of Objects: xgroup . . . . . . . . . . . . . . . . . . . . . . . . .. 195
7.5.4 Operator that Constructs a New Description
Element: xconstr . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
7.6 Approximation Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . 197
7.6.1 Replacement Operator: qrepl . . . . . . . . . . . . . . . . . . . 199
7.6.2 Identification Operator . . . . . . . . . . . . . . . . . . . . . . 199
7.7 Reformulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
7.8 Overview of Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
7.9 Abstraction Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
7.10 Applying Abstraction: the Method . . . . . . . . . . . . . . . . . . . . 204
7.10.1 Abstracting a P-Set with a Method. . . . . . . . . . . . . . 204
7.11 Abstraction Processes and Query Environment. . . . . . . . . . . . 213
7.12 From Abstraction Operators to Abstraction Patterns . . . . . . . . 216
7.12.1 Design Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
7.12.2 Use and Motivation for Design Patterns . . . . . . . . . . 218
7.12.3 Abstraction Patterns . . . . . . . . . . . . . . . . . . . . . . . . 218
7.12.4 Abstraction Pattern: Hiding . . . . . . . . . . . . . . . . . . . 220
7.13 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

8 Properties of the KRA Model . . . . . . . . . . . . . . . . . . . . . . . . . . 223

8.1 Abstraction, Approximation, and Reformulation . . . . . . . . . . . 223
8.2 Abstraction and Information. . . . . . . . . . . . . . . . . . . . . . . . . 227
8.3 Approximation and Information . . . . . . . . . . . . . . . . . . . . . . 230
8.4 Reformulation and Information . . . . . . . . . . . . . . . . . . . . . . 232
8.5 Query Environment and Abstraction Operators. . . . . . . . . . . . 232
8.6 Abstraction versus Concretion . . . . . . . . . . . . . . . . . . . . . . . 234
8.7 Inconsistency Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
xiv Contents

8.8 KRA’s Unification Power . . . . . . . . . . . . . ............. 244

8.8.1 Theories Defined at the Perception
(Observation) Level . . . . . . . . . . . . . . . . . . . . . . . . 244
8.8.2 Semantic Theories of Abstraction . . . . . . . . . . . . . . . 255
8.8.3 Syntactic Theories of Abstraction . . . . . . . . . . . . . . . 264
8.9 KRA and Other Models of Abstraction . . . . . . . . . . . . . . . . 266
8.10 Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
8.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270

9 Abstraction in Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . 273

9.1 A Brief Introduction to Machine Learning. . . . . . . . . . . . . . . 275
9.2 Abstraction in Learning from Examples or Observations . . . . . 277
9.2.1 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 278
9.2.2 Instance Selection. . . . . . . . . . . . . . . . . . . . . . . . . . 283
9.2.3 Feature Discretization . . . . . . . . . . . . . . . . . . . . . . . 285
9.2.4 Constructive Induction . . . . . . . . . . . . . . . . . . . . . . 286
9.3 Abstraction in Reinforcement Learning . . . . . . . . . . . . . . . . . 294
9.3.1 State Space Abstraction in Reinforcement
Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 297
9.3.2 Function Approximation in Reinforcement
Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 299
9.3.3 Task Decomposition and Hierarchical
Reinforcement Learning . . . . . . . . . . . . . . . . . . ... 300
9.3.4 Temporal Abstraction in Reinforcement Learning. ... 302
9.4 Abstraction Operators in Machine Learning . . . . . . . . . . . ... 303
9.4.1 Modeling Propositional Concept Learning
in the KRA Model . . . . . . . . . . . . . . . . . . . . . . ... 303
9.4.2 Answering a Query Q in Propositional
Concept Learning . . . . . . . . . . . . . . . . . . . . . . . ... 305
9.4.3 Feature Selection in Propositional Learning . . . . . ... 307
9.4.4 Modeling Relational Concept Learning
in the KRA Model . . . . . . . . . . . . . . . . . . . . . . ... 309
9.4.5 Modeling Reinforcement Learning
in the KRA Model . . . . . . . . . . . . . . . . . . . . . . ... 320
9.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 326

10 Simplicity, Complex Systems, and Abstraction. . . . . . . . . . . . . . . 329

10.1 Complex Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
10.1.1 Abstraction in Complex Systems . . . . . . . . . . . . . . . 334
10.2 Complexity and Simplicity . . . . . . . . . . . . . . . . . . . . . . . . . 338
10.3 Complexity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
10.3.1 Kolmogorov Complexity . . . . . . . . . . . . . . . . . . . . . 342
10.3.2 Normalized Complexity. . . . . . . . . . . . . . . . . . . . . . 346
Contents xv

10.3.3 Logical Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347

10.3.4 Thermodynamic Depth . . . . . . . . . . . . . . . . . . . . . . 349
10.3.5 Gamma Function (Simple Complexity) . . . . . . . . . . . 349
10.3.6 Sophistication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
10.3.7 Effective Complexity . . . . . . . . . . . . . . . . . . . . . . . 351
10.3.8 Predictive Information Rate . . . . . . . . . . . . . . . . . . . 352
10.3.9 Self-Dissimilarity . . . . . . . . . . . . . . . . . . . . . . . . . . 353
10.4 Abstraction and Complexity. . . . . . . . . . . . . . . . . . . . . . . . . 354
10.4.1 Turing Machine-Based Complexity Measures . . . . . . 354
10.4.2 Stochastic Measures of Complexity . . . . . . . . . . . . . 357
10.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361

11 Case Studies and Applications. . . . . . . . . . . . . . . . . . . . . . . . . .. 363

11.1 Model-Based Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . .. 363
11.1.1 An Example: The Fragment
of an Hydraulic System. . . . . . . . . . . . . . . . . . . . . . 367
11.2 Cartographic Generalization . . . . . . . . . . . . . . . . . . . . . . . . . 371
11.2.1 Operator Learning for Cartographic Generalization. . . 378
11.3 Hierarchical Hidden Markov Models. . . . . . . . . . . . . . . . . . . 384
11.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387

12 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . ....... . . . . . . . 389

12.1 Analogy . . . . . . . . . . . . . . . . . . . . . . . . ....... . . . . . . . 389
12.2 Computational Complexity . . . . . . . . . . . ....... . . . . . . . 393
12.2.1 Complexity Reduction in Search . ....... . . . . . . . 394
12.2.2 Complexity Reduction in CSPs . . ....... . . . . . . . 395
12.2.3 Complexity Reduction in Machine Learning . . . . . . . 397
12.3 Extensions of the KRA Model . . . . . . . . ....... . . . . . . . 402
12.3.1 The G-KRA Model . . . . . . . . . . ....... . . . . . . . 402
12.3.2 Hendriks’ Model . . . . . . . . . . . . ....... . . . . . . . 403
12.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . ....... . . . . . . . 404

13 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
13.1 Ubiquity of Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
13.2 Difficulty of a Formal Definition . . . . . . . . . . . . . . . . . . . . . 408
13.3 The Need for an Operational Theory of Abstraction . . . . . . . . 408
13.4 Perspectives of Abstraction in AI . . . . . . . . . . . . . . . . . . . . . 410

Appendix A: Concrete Art Manifesto . . . . . . . . . . . . . . . . . . . . . . . . . 413

Appendix B: Cartographic Results for Roads . . . . . . . . . . . . . . . . . . . 415

Appendix C: Relational Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417

xvi Contents

Appendix D: Basic Notion of First Order Logics . . . . . . . . . . . . . . . . 421

Appendix E: Abstraction Operators . . . . . . . . . . . . . . . . . . . . . . . . . . 427

Appendix F: Abstraction Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . 441

Appendix G: Abstraction of Michalski’s ‘‘Train’’ Problem . . . . . . . . . 443

Appendix H: Color Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
Chapter 1
Introduction

“To abstract is to distill the essence

from its superficial trappings”
[Goldstone and Barsalou, 1998]

he word Abstraction derives from the Latin verb abs—trahere, which, in

turn, derives from the Greek Aφαιεσις (aphairesis), and means “to draw away”.
Even though the etymology looks clear enough, through the centuries the word
“abstraction” has become overloaded with alternative meanings. If we look into the
Oxford dictionary, for example, we find for “abstraction” the following definitions:
• the quality of dealing with ideas rather than events,
• freedom from representational qualities in art,
• the process of considering something independently of its associations or attributes,
• a state of preoccupation,
• the process of removing something.
As abstraction is a pervasive activity in human perception, conceptualization
and reasoning, it enters the vocabulary of almost all disciplines, both scientific and
humanistic, as well as the everyday life. As a result, it does not exist, as for now, any
definition of abstraction that is able to cover all the meanings that it has acquired in
the multiplicity of its utilizations. In this regard, it is similar to other words, such
as “knowledge” or “beauty”, which also elude general and precise definitions. No
wonder, then, that the only shared consensus about the nature of abstraction does
not go beyond the generic idea of “distilling the essential”. When it comes down to
pinpointing this idea, the intended meanings in alternative contexts do appear not
only dramatically different, but also contradictory. Then, it may be instructive to
analyze and compare various definitions of abstraction in different disciplines1 with

1 See Chap. 2.

L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 1
DOI: 10.1007/978-1-4614-7052-6_1, © Springer Science+Business Media New York 2013
2 1 Introduction

Fig. 1.1 Vasilij Kandinsky,

Composition VII, 1913. The
Tretyakov Gallery, Moscow.
[A color version of this picture
is reported in Fig. H.1 of
Appendix H]

the aim of capturing what they might have in common. From this comparison one
has the feeling that coming out with a theory of abstraction, both sufficiently general
to cover all of its uses, and, at the same time, “operational”, is a task deemed to fail
from the onset.
Given the acknowledged importance of abstraction in human reasoning, it is likely
that an analogously basic role should be played by abstraction in the design of “intel-
ligent” artefacts. Researchers in Artificial Intelligence (AI) have indeed proposed
various theories of abstraction,2 based on different principles. However, the difficulty
of transforming these theories into some procedure able to generate, in a possibly
automatic way, abstractions useful in practice has suggested to target less ambitious
goals. As we are interested, in the end, in computational models of abstraction (even
though limited in scope), the work done in AI may be a primary source of inspiration,
as well as a term of reference to match theories proposed elsewhere.
Even amid the multiplicity of interpretations, there is a generic agreement that
abstraction plays a key role in representing knowledge and in reasoning. The first
intuitive idea of abstraction that comes to mind, especially in everyday life, is that
of something which is far from the sensory world, and can only exist in the realm of
thought. For instance, most people think of Mathematics as an essentially abstract
discipline, and a branch of modern art has assumed abstraction as its very defini-
tion (see, as an example, Fig. 1.1). This interpretation complies with the etymological
meaning, in the sense that “to abstract” is to take away all aspects that can be captured
with our senses. In abstract art objects are stripped of their mundane concreteness to
leave their bare essence. An important work relating art, abstraction and neurophys-
iology has been done by Zeki [581], who tried to explain how the brain perceives
art. In doing so, he claims that abstraction is the common ability that underlies the
functioning of most cells in the visual system, where abstraction is, in this context,
“the emphasis on the general property at the expense of the particular”.

2 See Chap. 3.
1 Introduction 3

The idea of abstraction as an estrangement from reality was used by Brook to

criticize Artificial Intelligence research. He claims [83] that “abstraction is usually
used to factor out all aspects of perception and motor skill”. In this way, all the
useful abstractions allowing a machine to reason and act are performed manually by
humans, leaving AI to cope with the easiest part of the problem to solve.3 Moreover,
in doing this we attribute to the machine our Merkwelt,4 instead of its own. In fact,
the machine’s sensors may be different from the humans’ ones, and the machine
should be better left to handle the input from its own sensors. Brooks’ observation
reported above is only a part of the debate he started about the contrast between
the symbol system hypothesis, at the basis of early AI, and the physical grounding
hypothesis, which states that “to build a system which is intelligent it is necessary to
have its representations grounded in the physical world”5 [82]. From this perspective
abstraction appears as a fundamental step that mediates between perception and
cognition in both natural and artificial beings.
The preceding debates bring about the issue of abstract thinking, and the abstrac-
tion abilities that humans are supposed to extensively exploit, even though in some
still unknown manner. This issue has been discussed in Philosophy, at least since
Plato’s work, and has been elaborated and interpreted in a large variety of ways
all along the centuries up to modern times. The related notion of abstract concepts
has been investigated in Psychology and Cognitive Sciences, notably by Barsalou
and co-workers, who provide a theoretical and an experimental account of the issue
[37]. An interesting connection can be done with Computer Science, namely with
the epistemological status of software and the basic skills needed for writing good
programs.6 As a matter of fact, Kramer wonders whether “abstraction is the key to
computing” [301]; abstraction refers here to the capability of removing inessential
details and to identify a common “essence” inside variability.
This ability of going to the core of things is another fundamental aspect attributed
to abstraction, namely the capacity to focus on relevance. Objects, phenomena, events
in the world are extremely rich in details and may be very complex. However, when
solving a problem or executing a task, only some aspects of the reality are useful,
and to take into consideration the whole wealth of details may be confusing. For
instance, when planning an aerial trip, the physical attributes of the aircraft, such as
color or exact shape and sizes, are irrelevant and can be ignored. As another example,
in Fig. 1.2, a satellite image of downtown Torino is reported, where the buildings and
monuments can be seen. However, just to find one’s way around the city it is more
convenient to reduce the information to the street network. By citing again Brooks
[83], “…abstraction is the essence of intelligence and the hard part of the problem
being solved”.
Actually, in trying to solve a complex problem it may sometimes be a good
strategy to proceed top-down, by starting with a coarse solution and then refining it.

3 See Chaps. 2 and 3.

4 Merkwelt is a German word for “way of viewing the world”, “peculiar individual consciousness”.
5 See Chap. 2.
6 See Chap. 2.
4 1 Introduction

Fig. 1.2 Satellite image of the center of Torino (left): buildings and monuments are visible. The
same area can be described by considering just the street network (right): this abstract map is more
convenient for moving around the city

At each step of refinement more details are possibly taken into account, generating
a sequence of solutions, each one more detailed than the previous one. In this case
we may speak of a hierarchy of levels of abstraction, with the highest levels poor
in details, and the lowest ones very rich. An example of a hierarchy is reported in
Fig. 1.3.
As we will see, the hierarchical approach is widespread in Computer Science and
Artificial Intelligence. However, choosing the correct level of detail to work with,
on any given problem, is a crucial step; in fact, a poor choice may be harmful to the
solution.
A sensible issue in defining abstraction, one which is at the core of a hot debate, is
its relation with generalization, defined as the process of extracting common prop-
erties from a set of objects or situations. Sometimes abstraction and generalization
have simply been equated. It is clear that, being a matter of definition, nothing hin-
ders, in principle, from defining abstraction as generalization. However, this equation
does not allow one to see possibly useful differences, which can be observed if the
two concepts are taken apart. Then, hypothesizing a certain type of relation between
generalization and abstraction is not a question of correctness or truth, but of conve-
nience. The discussion on the links between generalization and abstraction should
also include the notion of categorization; this triad is fundamental for the conceptu-
alization of any domain of thought.7
Another dimension along which abstraction can be considered is related to infor-
mation content. Abstracting, from this perspective, corresponds to reducing the
amount of information that an event or object provides. This information can be
hidden or lost, according to the view of abstraction as a reversible or irreversible
process. Clearly this aspect of abstraction is strictly related to the ideas of levels

7 See Chap. 5 for further discussions.

1 Introduction 5

Fig. 1.3 Example of hierarchy in the field of animal classification. The lower the level, the more
details are added to the characterization of the animals

of detail and of relevance. Considering information adds a quantitative dimension

to the process of removing details or focusing on particular aspects of a problem.
Reducing information includes a very important case, i.e., aggregation. Let us look
at Fig. 1.4; if we were asked what do we see in this figure, most of us would answer
a “computer”, and not a “monitor”, a “keyboard”, a “mouse”, and so on. We have
spontaneously grouped together a set of objects that are functionally related into a
composite object, i.e., a “computer”. We may also notice that, even though the whole
computer is perceived at first sight, the components do not disappear; in fact, as soon
as we speak of computer “configuration”, they can be retrieved and used again.
An approach considering information in changes of representation is proposed
by Korf [297], who follows Newell [397] in stressing the importance of using a
“good” representation in problem solving. Korf considers two aspects in dealing
with representations, i.e., information content and information structure. Accord-
ingly, he divides representation changes into two types: isomorphisms and homomor-
phisms. Isomorphisms change the information structure of a representation without
modifying the information content. On the opposite, homomorphisms change the
information content leaving the structure unaltered. Korf does not address explicitly
6 1 Introduction

Monitor

Body

Mouse
Keyboard

Computer

Fig. 1.4 a The components of a computer are perceived as constituting a unique object.
b Abstraction makes a substitution of a set of objects with a single one, reducing thus their number

the problem of abstraction definition, even though he makes use of an intuitive notion
thereof, but clearly his work can be put in relation with this fundamental problem.8
Also Archer et al. [21] link abstraction to information handling. They claim that
“Abstraction is probably the most powerful tool available to managing complexity”.
To tame the complexity of a problem they see two ways: reducing information or
condensing it. Reducing information can be related to the previously introduced idea
of selecting the most relevant aspects of a problem and deleting details, whereas
condensation is a form of aggregation. As in Brooks’ perspective, abstraction is the
bridge between an extremely rich sensory input and what we actually keep of it.
Globally, all the perspectives outlined before on the definition of abstraction con-
verge to a change of representation. In fact, it is often true that finding an adequate
representation for a problem may be the hardest part of getting a solution. The generic
process of abstraction is represented in Fig. 1.5. Of course, the change of representa-
tion must be goal-oriented, i.e., useful to solve a given problem, or to perform a task
more easily. Moreover, not any change of representation is an abstraction, and it is
necessary to circumscribe abstraction’s scope. Intuitively, an abstract representation
should be “simpler” that the original one. In this way, abstraction is strictly related
to the notion of simplicity; however, this link does not make its definition any easier,
as simplicity seems to be an equally elusive notion.9

8 See Chap. 3.
9 See Chap. 10.
1 Introduction 7

Fig. 1.5 Abstraction process for Problem Solving. Step 1 concerns a representation change justi-
fied by the need to reduce the computational complexity to solve a ground problem. Step 2 involves
solving the abstract problem. Step 3 refines the abstract solution to obtain one in the ground repre-
sentation space. The overhead of the representation changes (Steps 1 and 3) needs to be taken into
account to assess the abstraction usefulness

In order to make sense of the various definitions, theories, and practices of abstrac-
tion in different disciplines and contexts, it is necessary to go further, carrying out a
comparative analysis of the alternative approaches, with the aim of identifying com-
monalities (however vagues) and differences (either superficial or essential ones), in
order to possibly define a set of properties that abstraction should satisfy for a given
class of tasks. Furthermore, it is useful and clarifying to contrast/compare abstraction
with the notions of generalization, categorization, approximation, and reformulation
in general.10
Based on the results of the comparison, we come out with a model of abstraction,
the KRA model, which tries to bring back this notion to its perceptive source.11 The
model does not have the ambition to be universal; on the contrary, it is targeted to
the task of conceptualizing the domain of a given application field. In essence, it is
first and foremost suited to model abstraction in systems that can be experimentally
observed.
As we have already said, it is essential, in order to fully exploit the power of
abstraction, that this notion becomes “operational”; in other words, even though
finding a “good” abstraction is still a matter of art, it should nevertheless be possible
to identify a set of operators that can be (semi-)automatically applied when a given
pattern of pre-conditions is discovered in the problem at hand. Operators may be

10 See Chaps. 5, 6, and 8.

11 See Chap. 6.
8 1 Introduction

either generic, i.e., applicable across many domains, or domain-dependent. Generic

operators can be collected into ready to use operator libraries, whereas some guide-
lines can be provided for helping the user in defining domain-dependent operators.
In handling abstraction operators we took inspiration from the approach described
by Gamma et al. [190], who proposed design patterns as a new mechanism for
expressing object-oriented design experience. Design patterns identify, name, and
abstract common aspects of object-oriented design; they provide a common vocab-
ulary, reduce system complexity by naming and defining abstractions, and facilitate
software reusing. Abstraction operators are then built with the aim of recognizing
repeating patterns that can be abstracted automatically in various contexts.12
Even though the study of abstraction in interesting per se, it might also be at the
basis of other cognitive reasoning mechanisms, such as analogy. Analogy is again
a difficult notion to be precisely defined. It is most often equated to some kind of
similarity-based reasoning, which supposedly allows results to be transferred from
one domain to another. We argue, instead, that analogy is based on abstraction, and
that this distinction is more profound that it seems superficially.13
As mentioned earlier, abstraction is often related to the idea of simplicity: a
“simpler” representation, an “easier” way to solve a problem, a “reduced” compu-
tational complexity. Certainly abstraction is a well adapted tool to master complex
systems, but its links with simplicity/complexity may be stricter than that. In fact,
some notions of simplicity can be related to the definition of abstraction iself.14
Up to now we have mainly spoken of representation and reasoning as areas in
which abstraction plays a prominent role. However, it is also acknowledged that
abstraction is central in the activity that most deeply represents intelligence, namely,
learning. In Cognitive Science the relationships between abstraction and learn-
ing have been investigated mostly via the process of categorization. In Machine
Learning, abstraction as such is surprisingly little represented in theories of learning
and in learning systems. With this we mean that there are very few approaches try-
ing to link explicitly learning to some kind of abstraction theory. Implicitly, instead,
abstraction is largely represented, primarily in the tasks of feature selection (reduction
of information, focus on most relevant features), and feature construction (aggrega-
tion). Abstraction is particularly useful in relational learning, and in learning from
graphs and complex networks, where some kind of simplification is mandatory. Also,
abstraction is a central issue in Reinforcement Learning [493]. It is with respect
to Machine Learning that the relationships between abstraction and generalization
should be discussed in more depth. Understanding these relations ought to help
researchers in designing more effective and less computationally costly learning
systems.15
Throughout the above discussion the idea that abstraction should help simplifying
a problem emerged in various contexts. From the software engineering point of view,

12 See Chap. 7.
13 See Chap. 12.
14 See Chap. 10.
15 See Chaps. 5, and 9.
1 Introduction 9

an important way in which this concept can be instantiated is the reduction of the
computational complexity of programs. Clearly, if, on the one hand, abstraction
reduces the complexity of problem solution, on the other hand, its application has a
cost; this cost has to be traded-off with the beneficial effects of the simplification.
Then, choosing the “correct” abstraction requires to find a delicate balance between
different costs.16
Even in the absence of a general theory of abstraction, there are significant appli-
cations of this notion in different fields and domains. It is thus interesting to look at a
set of selected applications, in order to show what are the advantages that abstraction
provides, both in terms of simplification of the conceptualization of a domain and of
problem solving.17

1.1 Summary

In this chapter the book’s content is outlined. Investigating abstraction and its
computational properties involves a sequence of steps, the first one being collecting
and comparing various notions of abstraction used in a variety of disciplines, from
Philosophy to Art, from Computer Science to Artificial Intelligence. Then, in view
of building a computational model, it is necessary to set some boundaries around
the notion, distinguishing it from generalization, approximation, reformulation, and
so on. A computational model of abstraction tries to capture its essential properties,
and makes abstraction operational by means of operators.
As abstraction is often linked to simplicity, the relations between different defin-
itions of abstraction and different definitions of simplicity (or complexity) must be
investigated. In general, abstraction is employed, in problem solving, for reducing
the computational complexity of a task. As abstracting has a cost in itself, a balance
has to be obtained between this cost and the reduction of the cost required by finding
an abstract solution.
Abstraction is not only interesting per se, but we believe it is at the basis of other
forms of reasoning, for instance analogy. Finally, in order to show the utility of
using abstraction in general, the existing models are compared, and some domains
of applications will be described in detail.

16 See Chap. 12.

17 See Chap. 11.
Chapter 2
Abstraction in Different Disciplines

“Were it not for the ability to construct useful abstractions,

intelligent agents would be completely swamped by the real
world”
[Russel and Norvig, 2010]

he notion of abstraction has been used, formally or informally, in a large

variety of disciplines, including Mathematics, Cognitive Science, Artificial Intelli-
gence, Art, Philosophy, Complex Systems, and Computer Science [473].
In this section we outline the notions of abstraction used in some selected domains.
We will try to synthesize the nature of each one, in an effort to make it comparable with
similar ones from other contexts. Given the number and variety of fields interested
by the notion of abstraction, it is far from us the intent to provide an exhaustive
treatment of the subject. On the contrary, we focus on abstraction intended (at least
potentially) as a computational process.

2.1 Philosophy

Abstraction, either overtly or in disguise, is at the heart of most philosophical systems.

However, according to Rosen [457],1 the “abstract/concrete distinction has a curious
status in contemporary Philosophy. It is widely agreed that the distinction is of
fundamental importance. But there is no standard account of how the distinction is
to be explained.” Clearly, the ability to classify objects as abstract or concrete strictly
depends on the very definition of abstraction, which, apparently, is no more easy to
find in Philosophy than elsewhere.

1 http://plato.stanford.edu/entries/abstract-objects/

L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 11
DOI: 10.1007/978-1-4614-7052-6_2, © Springer Science+Business Media New York 2013
12 2 Abstraction in Different Disciplines

One of the first attempts to pin down the idea of abstraction has been made in
the Greek Philosophy, most notably by Plato, who proposed a distinction between
the forms or ideas (abstract, ideal entities that capture the essence of things) and the
objects in the world (which are instantiations of those ideas) [420]. According to
him, abstraction is simple: ideas do not exist in the world, they do not have substance
or spatial/temporal localization, but their instantiations do. In this approach we may
recognize the basic reflex of associating abstraction with being far from the sensible
world, and of capturing the “essence” of things; however, Plato’s ideas have still their
own kind of existence in some other realm, like “idols in a cavern”, from where they
shape the reality and have causal power.
The foundation of abstract reasoning was set later on by Aristotle, who perfected
the symbolic methods of reasoning, and whose views dogmatically entered the whole
body of Medieval Philosophy. According to Aristotle, there are three types of abstrac-
tion:
• Physical abstraction—Concrete objects are deprived of their specific attributes
but keep their material nature. For instance, starting from the physical reality
of an individual man, the physical, universal characteristics of all men can be
apprehended.
• Mathematical abstraction—Sensory characteristics of embodied objects are
ignored, and only the intelligible ones are kept.
• Metaphysical abstraction—Entities are considered as disembodied, leaving apart
any connotation linked to their realizations. Metaphysics starts not from things, but
from the idea of things (res or aliquid) and tries to discover the essence contained
in that idea.
In Philosophy the idea of abstraction has been mainly related to two aspects of
reasoning: on the one hand, generalization, intended as a process that reduces the
information content of a concept or an observable phenomenon, typically in order
to retain only information which is relevant for a particular purpose. Abstraction,
thus, results in the reduction of a complex idea to a simpler concept, which allows
the understanding of a variety of specific scenarios in terms of basic ideas.
On the other hand, abstraction has been investigated in connection with the very
nature or essence of things, specifically in order to ascertain their epistemological or
ontological status. Abstract things are sometimes defined as those things that do not
exist in reality, do not have a spatio/temporal dimension, and are causally inert. On
the opposite, a physical object is concrete because it is a particular individual that is
located at a particular place and time.
Originally, the “abstract/concrete” distinction was a distinction between words or
terms. Traditionally, grammar distinguishes the abstract noun “whiteness” from the
concrete noun “white” without implying that this linguistic contrast corresponds to
a metaphysical distinction. In the seventeenth century this grammatical distinction
was transposed to the domain of ideas. Locke supported the existence of abstraction
[339], recognizing the ability to abstract as the quality that distinguishes humans
from animals and makes language possible. Locke speaks of the general idea of a
2.1 Philosophy 13

triangle which is “neither oblique nor rectangle, neither equilateral nor scalenon,
but all and none of these at once”.
Locke’s conception of an abstract idea, as one that is formed from concrete ideas
by the omission of distinguishing details, was immediately rejected by Berkeley, and
then by Hume. Berkeley argued that the concept of an abstract idea is incoherent
because it requires both the inclusion and the exclusion of one and the same property
[53]. An abstract idea would have to be general and precise at the same time, general
enough to include all instances of a concept, yet precise enough to exclude all non-
instances. The modern empiricism of Hume and Berkely refuses that the mind can
attain knowledge of the universals through the generalization process. The mind does
not perform any abstraction, but, on the contrary, selects a particular and makes, out
of it, a template of all particular occurrences that are the only possible realities.
For Kant there is no doubt that all our knowledge begins with experience [280],
i.e., it has a concrete origin. Nevertheless, by no means it follows that everything
derives from experience. For, on the contrary, it is possible that our knowledge is a
compound of sensory impressions (phenomena) and of something that the faculty of
cognition supplies from itself a priori (noumena). By the term “knowledge a priori”,
therefore, Kant means something that does not come from the sensory input, and
that is independent from all experiences. Opposed to this is “empirical knowledge”,
which is possible to obtain only a posteriori, namely through experience. Knowledge
a priori is either pure or impure. Pure a priori knowledge is not mixed up with
any empirical element. Even though not set by Kant himself in these terms, the
counterposition between a priori (or pure) knowledge and a posteriori or empirical
knowledge mirrors the dichotomy between abstract and concrete knowledge. From
this point of view, abstraction is not (directly) related to generalization or concept
formation, but represents some sort of a priori category of human thinking.
The Kantian Illuminism, with its predilection for the intellect, was strongly
criticized by Hegel [240], who considered it as the philosophical abstraction of
everything, both real and phenomenological. According to Hegel, the philosophers
of his time had so abstracted the physical world that nothing was left. Hegel rejected
this line of reasoning, concluding in contrast that “What is real is rational—what
is rational is real”. He set out to reverse this trend, moving away from the abstract
and toward the concrete. Hegel viewed the phenomenological world (what can be
sensed by humans or manmade instruments) and the conceptual (thoughts and ideas)
as equal parts to existence. Hegel thought that abstraction inherently leads to the
isolation of parts from the whole. Eventually, abstraction leads to the point where
physical items and phenomenological concepts have no value.
Abstraction plays a central role also in Marx’s philosophy. By criticizing Hegel,
Marx claims that his own method starts from the “real concrete” (the world) and
proceeds through “abstraction” (intellectual activity) to the “thought concrete” (the
whole present in the mind) [355]. In one sense, the role Marx gives to abstraction is
the simple recognition of the fact that all thinking about reality begins by breaking
it down into manageable parts. Reality may be in one piece when lived, but to be
thought about and communicated it must be parceled out. We “see” only some of what
lies in front of us, “hear” only part of the noises in our vicinity; in each case, a focus
14 2 Abstraction in Different Disciplines

is established, and a kind of boundary set within our perceptions, distinguishing what
is relevant from what is not. Likewise, in thinking about any subject, we focus on
only some of its qualities and relations. The mental activity involved in establishing
such boundaries, whether conscious or unconscious, is the process of abstraction.
A complication in grasping Marx’s notion of abstraction arises from the fact that
Marx uses the term in four different senses. First, and most important, it refers to the
mental activity of subdividing the world into the mental constructs with which we
think about it, which is the process that we have been describing. Second, it refers to
the results of this process, the actual parts into which reality has been apportioned.
That is to say, for Marx, as for Hegel before him, “abstraction” functions as a noun
as well as a verb, the noun referring to what the verb has brought into being. But
Marx also uses “abstraction” in a third sense, where it refers to a kind of particularly
ill fitting mental constructs. Whether because they are too narrow, or take in too
little, or focus too exclusively on appearances, these constructs do not allow an
adequate grasp of their subject matter. Taken in this third sense, abstractions are the
basic unit of ideology, the inescapable ideational result of living and working in an
alienated society. “Freedom”, for example, is said to be such an abstraction whenever
we remove the real individual from “the conditions of existence within which these
individuals enter into contact” [356]. Omitting the conditions that make freedom
possible makes “freedom” a distorted and obfuscated notion.
Finally, Marx uses the term “abstraction” in a fourth sense, where it refers to a
particular organization of elements in the real world (having to do with the functioning
of capitalism). Abstractions in this fourth sense exist in the world and not, as in
the case with the other three, in the mind. In these abstractions, certain spatial and
temporal boundaries and connections stand out, just as others are obscure or invisible,
making what is in practice inseparable to appear separate. It is in this way that
commodities, value, money, capital are likely to be misconstrued from the start.
Marx labels these objective results of capitalist functioning “real abstractions”, and
it is to these abstractions that he refers to when he says that in capitalist society
“people are governed by abstractions” [356]. As a conclusion, we can say that
Marx’s abstractions are not things but rather processes. These processes are also,
of necessity, systemic relations. Consequently, each process acts as an aspect, or
subordinate part, of other processes, grasped as clusters of relations.
In today’s Philosophy the abstract/concrete distinction aims at marking a line
in the domain of objects. An important contribution was given by Frege [181].2
Frege’s way of drawing this distinction is an instance of what Lewis calls the Way
of Negation [329]. Abstract objects are defined as those that lack certain features
possessed by paradigmatic concrete things. Contemporary supporters of the Way of
Negation modify now Frege’s criterion by requiring that abstract objects be non-
spatial and/or causally inefficacious. Thus, an abstract entity can be defined as a
non-spatial (or non-spatio/temporal), causally inert thing.
The most important alternative to the Way of Negation is what Lewis calls the
Way of Abstraction [329]. According to the tradition in philosophical Psychology,

2 See also Sect. 2.3.

2.1 Philosophy 15

abstraction is a specific mental process in which new ideas or conceptions are formed
by considering several objects or ideas and omitting the features that distinguish
them. Nothing in this tradition requires that ideas formed in this way represent or
correspond to a distinctive class of objects. But it might be maintained that the
distinction between abstract and concrete objects should be explained by reference
to the psychological process of abstraction or something like it. The simplest version
of this strategy would be to say that an object is abstract if it is (or might be) the
referent of an abstract idea, i.e., an idea formed by abstraction.
Starting from an observation by Frege, Wright [568] and Hale [230] have devel-
oped a “formal” account of abstraction. Frege points out that terms that refer to
abstract entities are often formed by means of functional expressions, for instance,
the direction of a line, the number of books. When such a function f (a) can be
defined, there is typically an equation of the form:

f (a) = f (b) if and only if R(a, b), (2.1)

where R is an equivalence relation.3 For example,

direction(a) = direction(b) if and only if a and b are parallel (2.2)

These equations are called abstraction principles,4 and appear to have a special
meaning: in fact, they are not exactly definitions of the functional expression that
occurs on the left-hand side, but they hold in virtue of the meaning of that expression.
To understand the term “direction” requires to know that “the direction of a” and
“the direction of b” refer to the same entity if and only if the lines a and b are
parallel. Moreover, the equivalence relation that appears on the right-hand side of
the equation comes semantically before the functional expression on the left-hand
side [403]. Mastery of the concept of “direction” presupposes mastery of the concept
of parallelism, but not vice versa. In fact, the direction is what a set of parallel lines
have in common.
An in depth discussion of the concrete/abstract distinction in Philosophy, with
a historical perspective, is provided by Laycock [321].5 He starts by considering
the two dichotomies “concrete versus abstract”, and “universal versus particular”,
which are commonly presented as being mutually exclusive and jointly exhaustive
categories of objects. He claims that “the abstract/concrete, universal/particular
… distinctions are all prima facie different distinctions, and to thus conflate them
can only be an invitation to further confusion”. For this reason he suggests that the
first step to clarify the issues involved with the dichotomies is to investigate the
relationship between them.
Regarding the dichotomy of concrete and abstract objects, he notices that “this
last seems particularly difficult. On the one hand, the use of the term “object” in this

3 An equivalence relation is a relation that is reflexive, symmetric and transitive.

4 See Chap. 4.
5 http://plato.stanford.edu/entries/object/
16 2 Abstraction in Different Disciplines

context strongly suggests a contrast between two general ontic categories. On the
other hand, though, the adjective abstract is closely cognate with the noun “abstrac-
tion”, which might suggest “a product of the mind”, or perhaps even “unreal” or
“non-existent” …”. This dichotomy has “at least two prominent but widely divergent
interpretations. On the one hand, there is an ontic interpretation, and there is a purely
semantic or non-objectual interpretation, on the other hand. Construed as ontic, the
concrete/abstract dichotomy is commonly taken to simply coincide with that of uni-
versal and particular.” This interpretation has been adopted, for instance, by Quine
[437]. On the contrary, the semantic interpretation of the dichotomy was accepted
by Mill [373] and applied to names: “A concrete name is a name which stands for a
thing; an abstract name is a name which stands for an attribute of a thing.”
According to Barsalou and Wiemer-Hastings [37], concrete and abstract concepts
differ in their focus on situational contexts: concrete concepts focus on specific
objects and their properties in situations, whereas abstract concepts focus on events
and introspective properties.
Once the distinction between concrete and abstract has been introduced, it is a
small step ahead to think of varying degrees of abstraction, organized into a hierarchy.
The study of the reality on different levels has been the object of various kinds
of “levelism”, from epistemological to ontological. Even though some of the past
hierarchical organizations of reality seem obsolete, Floridi claimed recently [175]
that the epistemological one is tenable, and proposed a “theory of the levels of
abstraction”. At the basis of this theory there is the notion of “observable”. Given
a system to be analyzed, an observable is a variable whose domain is specified,
together with the feature of the system that the variable represents.6 Defining an
observable in a system corresponds to a focalization on some specific aspect of
the system itself, obtaining, as a result, a simplification. It is important to note that
an observable is properly defined only with respect to its context and use. A level
of abstraction (LoA) is nothing else than a finite and non-empty set of observables.
Different levels of abstraction for the same system are appropriate for different goals.
Each level “sees” the system under a specific perspective. The definition of a level of
abstraction is only the first step in the analysis of a system. In fact, taken in isolation,
each observable might take on values that are incompatible with those assumed by
some others. Then, Floridi introduces a predicate over the observables, which is true
only if the values assumed by the observables correspond to a feasible behavior of
the system. A LoA with associated a behavior is a moderated LoA.
As previously said, different LoAs correspond to different views of a system. It
is thus important to establish relations among them. To this end, Floridi introduces
the concept of Gradient of Abstraction (GoA), which is a finite set {L i |1 i n}
of moderated LoAs, and a set of relations relating the observables belonging to pairs
of LoAs. A GoA can be disjoint or nested. Informally, a disjoint GoA is a collection
of unrelated LoAs, whereas a nested one contains a set of LoAs that are refinements
one of another.

6 An observable does not necessarily correspond to a physically measurable entity, because the
system under analysis may be a conceptual one.
2.1 Philosophy 17

The use of LoAs in building up models of a system is called the Method of

Abstraction. It basically consists in clearly specifying the LoAs at which a system is
analyzed. A LoA is linked with the amount of information derivable from the corre-
sponding analysis: coarse models provide less information than fine-grained ones.
Methods of analysis similar to the LoAs have been proposed by Newell [398] and
Simon [490], with the “ontological” Levels of Organization (LoO), and by Benjamin
et al. [46], with the “epistemological” Levels of Explanation. One of the best known
layered analysis of an information processing system is provided by Marr [352],
who proposed the three-level hypothesis, namely, a system can be analyzed at the
following levels:
• Computational level—This level consists of a description of “the abstract com-
putational theory of the device, in which the performance of the device is char-
acterized as a mapping from one kind of information structures to another. The
abstract properties of this mapping are defined precisely, and its appropriateness
and adequacy for the task at hand are demonstrated”.
• Algorithmic level—This level consists of the description of “the algorithm, and
of the choice of representation for the input and output and the algorithm to be
used to transform one into the other”.
• Implementation level—At this level it is possible to discern “the details of how
the algorithm and representation are realized physically”.
The three levels are supposed to be related by a one-to-many mapping: for any com-
putational problem there may be several algorithms for solving it, and any algorithm
may be implemented on different machines and in different languages.
The theory of the LoAs has been used by Abbott [2] to show that software is
externalized thought. Assuming that (a) consciousness is implemented in the brain
as a LoA, (b) we all experience “to have and idea”, and (c) we are aware of having
an idea, Abbott claims that a computer scientist is able to turn this idea into a reality
that works by itself in the world (once written, a program works by itself when run
on a computer). This type of relation between an abstract idea and a concrete imple-
mentation differentiates computer scientists from engineers, who, on the contrary,
externalize their idea into material objects that act in the physical world through the
human intervention.
Generalization and abstraction in Engineering are discussed by de Vries [127] in
a recent paper. For de Vries abstraction is “abstaining from certain aspects of reality
in order to get a deeper understanding of the remaining aspects.” The elimination of
specificities from an observed behavior leads to generalization, because the observa-
tion can be extended to other situations as well. Another mechanism that produces
generalization is idealization, intended as “replacing a complicated detail of reality
by a simplified version of that detail.” Again, simplification allows knowledge elab-
orated for the simplified version to be applicable for a larger set of cases than the
original one. In summary, both abstraction and idealization are means to obtain gen-
eralization; the difference between the two is that while abstraction does not change
the description of the reality, but simply limit it (leaving aside some aspects it pro-
vides a description which is precise, but reduced), idealization describes reality in a
18 2 Abstraction in Different Disciplines

(slightly) different way than it is (approximating some aspects, it provides imprecise

knowledge).
An approach similar in spirit to Marr’s has been described by Pylyshyn [436], who
suggested a semantic, syntactic, and physical level of systems description; an addi-
tional level of functional architecture acts as a bridge between Marr’s algorithmic and
implementation levels. Finally, a third hierarchy, referring to levels of explanation,
has been proposed by Dennet [133], who distinguishes three stances: the intentional
stance, which sees the system under analysis as a rational agent performing a task;
the design stance, concerning the principles that guide the design of a system suc-
cessfully performing that task; and the physical stance, which considers the physical
construction of a system according to these principles.
In a recent paper, Weslake makes an interesting connection between explanatory
depth and abstraction [559]. By explanatory depth he means “a measure in terms
of which explanations can be assessed according to their explanatory value”. Even
agreeing with previous accounts associating explanatory depth with the generality
of the laws invoked in the explanation, Weslake claims that an important dimension
has nevertheless been overlooked, i.e., abstraction, which “provides a theoretically
important dimension of explanatory depth.” For him abstraction is the “degree to
which a whole explanation applies to a range of possible situations.” However,
Weslake does not commit himself to any measure of the degree of abstraction of
an explanation, a thorny notion to be defined. In order to illustrate his approach, he
takes into consideration the relationship between the macroscopic law of ideal gases,
P V = n RT , and its microscopic counterpart. The microscopic explanation is more
detailed than the macroscopic law, but the latter applies to a wider range of systems,
and, then, it is more abstract. Finally, Weslake notices that a gain in abstraction is
often obtained by omitting representational details, and that “deep explanations are
provided precisely by abstracting away from causal details.”
Before concluding, we mention, as a curiosity, that, in the Greek mythology there
existed some minor gods who were called “Abstractions”, and who were personifi-
cations of some abstract concepts such as the “vengeance” (Nemesis), or the “death”
(Thanatos).

2.2 Natural Language

As we have already mentioned, in natural languages there is a distinction between

abstract and concrete words. Abstract words denote ideas and concepts that cannot
be experienced through our senses, such as “freedom” or “beauty”; on the contrary,
concrete words denote objects that are part of the sensory reality, such as “chair”
or “car”. As in the philosophical context, also in languages it is not always easy
to classify a term as abstract or concrete. Moreover, this classification is dependent
on the cultural background where the language originates. For instance, Benner
explains that whereas the ancient Greeks privileged abstract thought (they viewed
the world through the mind), ancient Hebrews had a preference for concrete thought
2.2 Natural Language 19

(they viewed the world through the senses) [47]. As an example, he mentions that
the abstract word “anger” corresponds, in the ancient Hebrew, to “nose”, because a
Hebrew sees anger as “the flaring of the nose”.
In a sense, the whole language is an abstraction, because it substitutes a “name”
to the real thing. And this is another way of considering abstraction in language.
By naming an entity, we associate to the name a bundle of attributes and functions
characterizing the specific instances of the entity. For example, when we say car, we
think to a closed vehicle with 4 wheels and a steering wheel, even though many details
may be left unspecified, such as the color, the actual shape, and so on. The ontological
status of the “universal” names has been debated, especially in the late Medieval time,
with positions ranging from the one of Roscelin,7 who claimed that universals are
nothing more than verbal expressions, to that of Guillaume de Champeaux,8 who,
on the contrary, sustained that the universals are the real thing.
Independently from their ontological status, words stand for common features of
perceived entities, and they are considered abstractions derived from extracting the
characterizing properties of classes of objects. The word tree, for instance, represents
all the concrete trees that can exist. This view is based on a referential view of the
meaning of the words. Kayser [283] challenges this view, proposing an inferential
view of the words’ semantic: words are premises of inference rules, and they end up
denoting classes of objects only as a side-effect of the role they play. Barsalou sees
the process of naming an object as a way to simplify its representation, by endowing
it with invisible properties that constitute its very nature [38]. An interesting aspect
of naming is the interaction between vision and linguistic [144, 165]. Assigning a
name to a seen object implies recognizing its shape, identifying the object itself and
retrieving a suitable word for it. The name can then act as the semantic of an image.
The role of the name as an abstraction of the concrete thing also plays a relevant
role in magics. According to Cavendish [89], “the conviction that the name of a thing
contains the essence of its being is one of the oldest and most fundamental of magical
beliefs.... For the magical thinker the name sums up all the characteristics which make
an animal what it is, and so the name is the animal’s identity.” For instance, burying
a piece of lead, with the name of an enemy written on top together with a curse,
was, supposedly, a way of killing the enemy. View from this perspective, the name
is quite dangerous to a person, because he/she can be easily harmed through his/her
name. For this reason, in many primitive societies a man had two names: one to be
used in the everyday life, and another, the real one, is kept secret. For similar reasons
also the names of gods and angels were often considered secret. An Egyptian myth
tells that the goddess Isis, in order to take over the power of the sun-god Ra, had to
discover his name. Magical power or not, a name is, after all, a shortcut allowing a
complex set of properties to be synthesized into a word.

7 French philosopher, who lived in France in the second half of XII century. His work is lost, but
references to it can be found in the works by Saint Anselm and Peter Abelard.
8 French philosopher, who lived in the late XII century in Paris. He also has been a teacher of Peter

Abelard, who, later, convinced him to change his opinion about universals.
20 2 Abstraction in Different Disciplines

An approach relevant to both abstraction and language, even though not explicitly
stated so, is described by Gärdenfors [193]. He discusses the representations needed
for language to evolve, and he identifies two main types: cued and detached. A cued
representation “stands for something that is present in the current external situation
of the representing organism”. On the contrary, a detached representation may stand
for objects or events that are neither present in the current situation nor triggered by
some recent situation. Strictly connected with these representations are the notions of
symbol, which refers to a detached representation, and signal, which refers to a cued
one. Languages use mostly symbols. Animals may show even complex patterns of
communication, but these are patterns of signals, not symbols. Gärdenfors’ distinc-
tion closely resembles the distinction between abstract and concrete communication;
in this context an abstract communication may involve things that have been, that
could be, or that are not localized in time and space. A signal system, instead, can
only communicate what is here and now.
In natural language abstraction enters also as a figure of style. In fact, abstraction
is a particular form of metonymy, which replaces a qualifying adjective by an abstract
name. For example, in La Fontaine’s fable Les deux coqs (VII, 13) the sentence “tout
cet orgueil périt … (all this pride dies)”, refers, actually to the dying cock.

2.3 Mathematics

Mathematics is similar to languages with respect to its relation to abstraction; in

fact, in some generic way everything in it is abstract, because Mathematics only
manipulates objects that are far from the sensory world. As Staub and Stern put it
[506], “most people would agree that Mathematics is more abstract than Geography.
The concepts in the domain of Geography refer to real things, such as rivers and
volcanoes, and to concrete and perceptible events, such as floods. In contrast, the
“objects” dealt with in Mathematics are symbols that do not refer to specific objects
or events in the real world.” This intrinsic abstractness actually proves to be an
obstacle for students learning how to make mathematical proofs [151].
Inside the generic abstractness of Mathematics, specific theories of abstraction
have been proposed, which have also generated hot debates. One which deserves to
be mentioned saw Frege [181] launching a nasty attack to Husserl’s book Philosophy
of Arithmetic [268], which proposed a theory of number abstraction, and included,
in turn, a critic to Aristotle’s and Locke’s views on the subject. The idea behind
Husserl’s theory is that number abstraction is a counting process that forgets about
any property or relation involving objects, leaving them as just unities to be counted.
This idea was similar to Cantor’s description of the counting process [88]. Cantor was
a mathematician whose ideas Frege strongly opposed. Actually, there is the suspect
that Frege’s attack to Husserl was covertly targeted to Cantor himself [408]. In fact,
Frege accused Cantor of using the verb to abstract with a psychological connotation,
which is to be avoided in Mathematics.
2.3 Mathematics 21

Husserl’s theory was criticized by Frege because, according to Frege’s view, it

would change the objects, by emptying them of all their content. This observation
appears to be unfair, because Husserl clearly states that the abstraction process does
not actually change the objects, but simply “diverts the attention from their peculiar-
ities”. On the other hand, Frege himself appeared to change his position ten years
later, by asserting that the process of abstraction can indeed change the objects, or
even create new ones.
In the Word iQ dictionary9 abstraction is defined as “the process of extracting the
underlying essence of a mathematical concept, removing any dependence on real
world objects with which it might originally have been connected, and generalizing
it so that it has wider applications.” A good illustrative example of this abstraction
process is geometry, which started from the observation and measurement of physical
spaces and forms, moving then to the abstract axioms of the Euclidean geometry,
and, later on, to non-Euclidean geometries, farther and farther removed from the
perceived physical world. An interesting aspect of abstraction is that an increase
in the level of abstraction is paralleled by a deepening in the understanding of the
connections among mathematical concepts. For instance, by abstracting the Non-
Euclidean geometry to “the study of properties invariant under a given group of
symmetries” has revealed deep connections between geometry and abstract algebra.
Moreover, abstraction can suggest direction of knowledge transfer among different
domains.
Abstraction is also defined as a process by Lewis [329], who claims that abstraction
can be better characterized by looking at the way an abstract entity is generated from
a concrete one by “subtracting specificity, so that an incomplete description of the
original concrete entity would be a complete description of the abstraction.” Thus,
abstraction is about ignoring irrelevant features of an entity.
Even acknowledging some relations between generalization and abstraction in
Mathematics, also Staub and Stern claim that the essence of the mathematical
abstractness does not reside in generality, but in the principles underlying the use of
mathematical constructs [506]. More precisely, these authors link abstraction with the
way mathematical concepts are formed from simpler ones. For instance, the notion
of a rational number is more abstract than the notion of a natural number, based on
the idea of counting and therefore of integers. As counting is an operation that can be
directly experienced, the idea of natural numbers appears to be closer to reality, and
hence more concrete. The definition of concepts in terms of more “concrete” ones
might also parallel the order in which they are acquired.
In Mathematics abstraction does not only play a role in foundational issues, as the
ones mentioned before, but it also provides a key to specific approaches. For example,
Roşu describes behavioral abstraction as an extension of algebraic specification
[455]. More precisely, in his approach “sorts are split into visible (or observational)
for data and hidden for states, and the equality is behavioral, in the sense that two
states are behaviorally equivalent if and only if they appear to be the same under any

9 See http://www.wordiq.com/definition/Abstraction_(mathematics).
22 2 Abstraction in Different Disciplines

visible experiment.” Then, Roşu shows that the notion of behavioral abstraction is a
special case of a more general abstraction technique, namely information hiding.
Another technical notion of abstraction is presented by Antonelli in a recent paper
[20]. Starting from the abstraction principle (2.1), introduced by Wright [569] and
Hale [230] and reported in Sect. 2.1, he defines an abstraction operator, which assigns
an object—a “number”—to the equivalence classes generated by the equinumerosity
relation, in such a way that each class has associated a different object. According
to Antonelli, this principle is what is needed to formalize arithmetic following the
“traditional Frege-Russel strategy of characterizing the natural numbers as abstracta
of the equinumerosity relation.”
More precisely, numbers, as abstract objects, are obtained by applying an abstrac-
tion operator to a concept (in Frege’s sense). However in order to be an abstraction,
such mapping from concepts to objects must respect a given equivalence relation
[19]. In the case of numbers, the principle of numerical abstraction, or Hume’s Prin-
ciple, postulates an operator Num assigning objects to concepts in such a way that
concepts P and Q are mapped to the same object exactly when as many objects falls
under P as they fall under Q. The object Num(P) can be regarded as “the number
of P”.
His view of abstraction is called by Antonelli deflationary [19], because it denies
that objects obtained via abstraction enjoy a special status: they are “just ordinary
objects, recruited for the purpose of serving as proxies for the equivalence classes
of concepts generated by the given equivalence relation.” Abstraction principles are
linguistically represented by introducing a “term-forming” operator Φ(P), which
stands for the possibly complex predicate expression P.
An interesting overview of the notion of abstraction in Mathematics is given
by Ferrari, who tries to establish connections with other fields, such as Cognitive
Science, Psychology, and mathematical education practice [166]; the reason is that
“abstraction has been early recognized as one of the most relevant features of Math-
ematics from a cognitive viewpoint as well as one of the main reasons for failure in
Mathematics learning.”
By looking at the history of Mathematics, Ferrari acknowledges that abstract
objects have been characterized by a certain degree of both generalization and decon-
textualization. However, he points out that maybe their primary role is in creating
new concepts, when, specifically, a (possibly complex) process or relation are reinter-
preted as (possibly simpler) objects, as in Antonelli’s approach [19, 20]. An example
is provided by the arithmetic operations, which, at the beginning, are learned as pro-
cedures, but then become objects whose properties (for instance, associativity) can
be investigated. This transition is called encapsulation [142] or reification [482].
Ferrari argues that generalization, decontextualization and reification are all basic
components of abstraction in Mathematics, but that abstraction cannot be identi-
fied with any single one of them. For instance, generalization, defined as an exten-
sional inclusion relation, cannot exhaust the abstraction process, which also includes
recognition of common properties, adoption of a compact axiom set, and definition
of a notation systems to deal with newly defined concepts. Even though generaliza-
tion and decontextualization do not coincide, generalization implies a certain degree
2.3 Mathematics 23

of decontextualization, intended as privileging syntactic rules and disregarding

meaning and interpretation related to some given context. For instance, it is pos-
sible to work in abstract group theory without any reference to properties of any
specific group. As Hilbert suggests, mathematical practice requires the development
of the ability of focusing on what is important, without completely getting away from
the context. However, reification is the most interesting aspect of abstraction, cap-
turing the dynamics of the formation of new objects. An example has been reported
previously, when discussing Antonelli’s work on natural numbers as abstractions
of the process of counting. Finally, Ferrari stresses the role that language plays in
mathematical thinking, because mathematical objects cannot be reached but through
a suitably defined language.

2.4 Computer Science

According to Guttag [227], “the central problem in designing and implementing

large software projects is therefore to reduce the complexity. One way to do this is
through the process of abstraction.” There is in fact a widely shared agreement that
abstraction is a fundamental process in Computer Science, at the point that Kramer
wonders whether it is indeed the core of the discipline, and a mandatory cognitive
prerequisite for computer scientists and students to develop good software [301]. In
his paper Kramer explores two aspects of abstraction: the ability to remove details
for simplification, and the formulation of general concepts by abstracting common
properties from specific examples. While mentioning the utility of abstraction in
other domains, such as art or map drawing, he cautions the user that abstraction is a
strongly purpose-oriented process, which can be misleading if used for other goals
than those for which it was created.
Kramer is not alone in stressing the importance of abstraction in Computer
Science; Devlin [134] says that “once you realize that computing is all about
constructing, manipulating, and reasoning about abstractions, it becomes clear that
an important prerequisite for writing (good) computer programs is the ability to han-
dle abstractions in a precise manner.” Finally, Ghezzi et al. [201] identify abstraction
as a fundamental principle to master complexity in Software Engineering. A specific
example is abstract interpretation for program analysis, where a concrete program
domain is mapped to an abstract one, in order to capture its semantics.
Actually, as most computers can only manipulate two-state physical devices, the
whole software development can be considered abstract. Technically, there are two
main types of abstraction in Computer Science: procedural abstraction and data
abstraction [62]. Procedural abstraction consists in defining what a (sub-)program
does, ignoring how it does it: different implementations of the program can differ
over details, but the relation input-output is the same for all of them. Sub-routines,
functions and procedures are all examples of procedural abstraction. For instance,
we can define a function prod(x, y), which outputs the product of two numbers x
and y, without actually specifying how the product is computed.
24 2 Abstraction in Different Disciplines

pname = proc (...) returns (...)

requires % states any constraint on use
modifies % identifies all modified input
effects % defines the behavior
end pname

dname = data type is % list of operations

Overview % An overview of the data abstraction
Operations % A specification of each operation
end dname

Fig. 2.1 Specification templates for procedural and data abstraction. When assigning a name to the
procedure, its inputs and outputs are defined. For data, their structure and the applicable operations
are defined

For Liskov and Guttag [333], “abstraction is a many-to-one map.” It ignores irrel-
evant details; all its realizations must agree on the relevant details, but may differ on
the irrelevant ones. Abstraction is defined by Liskov and Guttag by means of speci-
fications. They introduce templates for procedural and data abstraction, examples of
which are reported in Fig. 2.1.
As we will see in Chap. 7, abstraction operators can be represented with Abstract
Procedural Types. Let us now introduce examples of procedural and data abstraction
in order to clarify these notions.
Example 2.1 Suppose that we want to write a procedure for searching whether an
element y appears in a vector X without specifying the actual program to do it. We
can define the following abstract procedure:
pname = Search(X, y) returns({true, false})
requires X is a vector, y is of the same type as the elements of X
modifies ∅
effects Searches through X , and returns true if y occurs in X , else returns false
end pname
Data abstraction, on the other hand, consists in defining a type of data and the
operations that manipulate it. Data abstraction makes a clear separation between the
abstract properties of a data type and its concrete implementation.
Example 2.2 Let us define the data type complex number z as a pair (x, y) of
real numbers, with associated some operations, such as, for example, Real(z),
Imaginary(z), Modulus(z) and Phase(z).
dname = complex is pair of reals (x, y)
Overview A complex number has a real √ part, x, and an imaginary one, y, such that
z = x + i y, where i = −1. In polar coordinates z has a modulus
and a phase.
Operations Real (z) = x
Imaginary (z) = y
2.4 Computer Science 25

Modulus (z) = x2 + y2
y
Phase (z) = arctg x
end dname

Data and procedural abstractions have been reunited in the concept of Abstract Data
Type (ADT), which is at the core of object-oriented programming languages. An ADT
defines a data structure, with associated methods, i.e., procedures for manipulating
the data. An ADT offers to the programmer an interface, used to trigger methods,
which is separated from the actual implementation, which the programmer does not
need to see. Thus, abstraction, in this context, realizes information hiding. Even
though the notion of ADT has been around for a while, a modern description of it is
provided by Gabbrielli and Martini [186]. ADTs are only one step in the evolution
of object-oriented programming, because they are passive entities, which can only
be acted upon by a controlling program; on the contrary, the notion of object goes
further, by introducing interaction possibilities via message passing, and a sort of
autonomy in letting an object invoke operations on other objects. The relationship
between classes, objects and data abstraction has been investigated by Fisher and
Mitchell [170], who compare three approaches to class-based programming, namely,
one called “premethods”, and two others called “prototype”. The authors claim that
object-based methods are superior to class-based ones.
Introducing an ADT leads spontaneously to the idea of several nested layers
of abstraction [170]. A data type may be part of an is-a hierarchy, organized as a
tree, where each node has one father, but may have several children. The interest
in defining such a hierarchy is that it is not necessary to define from scratch every
node; on the contrary, a child node automatically inherits the properties of the father
(unless specified otherwise) through downward inheritance rules, but, at the same
time, it may have some more specific properties added. For instance, if an animal
is defined as a being that eats, moves, and reproduces, a bird can inherit all the
above properties, with the addition of has-wings. The is-a relation between a type
and a sub-type is called by Goldstein and Storey an inclusion abstraction [217].
These authors define other types of abstraction as well, which will be described in
Sect. 4.4.2.
Colburn and Shute [111] make a point is differentiating Computer Science from
empirical sciences, because the latter ones have concrete models in the form of
experimental apparata as well as abstract mathematical models, whereas the former
has only software models, which are not physically concrete. Going further along
this line, the authors claim that the fundamental nature of abstraction in Computer
Science is quite different also from the one in Mathematics with respect to both the
primary product (i.e., the use of formalism), and the objectives.
The main reason of abstraction in Mathematics are inference structures (theorems
and their proofs), while in Computer Science it is interaction patterns (pieces of soft-
ware). Interactions can be considered at many levels, starting from the basic ones
between instruction and data in memory, up to the complex interactions occurring in
multi-agent systems, or even those between human users and computers. For what
26 2 Abstraction in Different Disciplines

concerns formalism, the one of Mathematics is rather “monolithic”, based on set the-
ory and predicate calculus, whereas formalism in Computer Science is “pluralistic”
and “multilayered”, involving programming languages, operating systems [481], and
networks [100]. Looking at the objectives of abstraction, Colburn and Shute make an
interesting distinction: in Mathematics the construction of models involves getting
rid of inessential details, which they call an act of information neglect, whereas in
Computer Science writing programs involves information hiding, because the details
that are invisible at a given level of abstraction cannot be really eliminated, because
they are essential at some lower level. This is true for programming languages, but
also for operating systems, and network architectures.
A teaching perspective in considering abstraction in Mathematics and Computer
Science is taken by both Leron [326] and Hill et al. [249]. Leron claims that in
Mathematics “abstraction is closely related to generalization, but each can also
occur without the other.” In order to support his claim, he offers two examples; the
first is the formula (a+b)2 = a 2 +2ab+b2 , which is generalized (but not abstracted)
when its validity is extended from natural numbers (a and b) to rational ones. On
the other hand, the same formula is abstracted when it is considered to hold for any
two commuting elements in a ring. The second example consists in the description
“all prime numbers less than 20”, which is more abstract (but not more general)
than “the numbers 2, 3, 5, 7, 11, 13, 17, 19”. In Computer Science the separation
between the high level concepts, used to solve a problem, and the implementation
details constitutes what Leron calls an abstraction barrier. Above the barrier the
problem is solved using suitably selected abstraction primitives, whereas, below
the barrier, one is concerned with the implementation of those primitives. Looking
at the mathematical examples we may see that Leron attributes to generalization
an extensional nature. Moreover, he notices that proofs of abstractly formulated
theorems gain in simplicity and insights. Finally, he makes a distinction between
descriptions of objects in terms of their structure and in terms of their functionalities,
and claims that abstraction is more often linked to the functional aspects.
For their part, Hill et al. [249] claim that “abstraction is a context-dependent, yet
widely accepted aspect of human cognition that is vitally important for success in
the study of Computer Science, computer programming and software development.”
They distinguish three types of abstraction: conceptual, formal, and descriptive. Con-
ceptual abstraction is the ability to move forward and backward between a big picture
and small details. Formal abstraction allows details to be removed and attention to
be focalized in order to obtain simplifications. Finally, descriptive abstraction is the
ability to perceive the essence of things, focalizing on their most important character-
istics; this type of abstraction also allows “salient unification and/or differentiation”,
namely it is related to generalization.
Abstraction not only plays a fundamental role in Computer Science in general
(namely, in discussing programming philosophy), but it also offers powerful tools
to specific fields. One is software testing, where abstraction has been proposed as
a useful mechanism for model-based software testing [345, 428]. Another one is
Database technology. In databases three levels of abstraction are usually considered:
the conceptual level, where the entities that will appear in the database are defined, as
2.4 Computer Science 27

well as their inter-relationships; the logical level, where the attributes of the entities
and the keys are introduced, and the physical level, which contains the actual details of
the implementation. Abstraction increases from the physical to the conceptual level.
Beyond this generic stratification, in a database it is often crucial to select an
appropriate level of abstraction concerning the very data to be memorized. With a
too fine-grained memorization the database may reach excessive size, whereas with a
too coarse-grained memorization important distinctions might be masked. The issue
is discussed, among others, by Calders et al. [87], who say that “a major problem
… is that of finding those abstraction levels in databases that allow significant
data aggregation without hiding important variations.” For instance, if a department
store has recorded every day the number and type of sold items, memorizing these
raw data over a period of three years may mask some trends that could have been
apparent if the data were aggregated, say, by weeks or months. In order to select the
appropriate level, database designers exploit hierarchies over the values of variables.
For instance, for a time variable, hour, day, week, month, and year constitute a
hierarchy of values of increasing coarseness. In an analogous way, city, region,
country constitute a hierarchy for a location variable.
In relational algebra, several among the operators can be interpreted, in an intu-
itive sense, as abstraction operators. For instance, given a relational table R with
attributes (A1 , . . . , An ) on the columns, the projection operator π Ai1 ,...,Air (R) hides
R’s columns that are not mentioned in the operator. In an analogous way the selec-
tion operator σϕ (R) selects only those tuples for which the logical formula ϕ is true,
hiding the remaining one. These operators clearly obey the principles of information
hiding, because the omitted columns or rows in R are not deleted, but only hidden:
they may be visualized again at any time.
Miles Smith and Smith [372] address the issue of abstraction in databases directly.
They say that “an abstraction of some system is a model of that system in which cer-
tain details are deliberately omitted. The choice of the details to omit is made by
considering both the intended application of the abstraction and also its users. The
objective is to allow users to heed details of the system which are relevant to the
application and to ignore other details.” As in some systems there may be too many
relevant details for a single abstraction, a hierarchy can be built up, in which some
details are temporarily ignored at any given level. In Codd’s model of a relational
database [109] abstraction requires two steps: first, a relational representation com-
patible with the intended abstraction’s semantics must be found. Second, the meaning
of this representation must be explicitly described in terms of data dictionary entries
and procedures. As we will see in Sect. 4.7.1, a similar approach is adopted by Nayak
and Levy [395] for their semantic model of abstraction.
In Computer Science another important aspect is software verification. According
to Yang et al. [572] formal program verification must cope with complex computa-
tions by means of approximations. Abstract interpretation [117] is a theory for defin-
ing sound approximations, and also a unifying framework for different approximate
methods of program verification tools. Therefore, abstract interpretation is widely
exploited in several fields, such as static analysis, program transform, debugging,
28 2 Abstraction in Different Disciplines

Fig. 2.2 “La trahison des images” (1928-9). Magritte’s painting is an “image” of a pipe, not the
“real thing”

and program watermarking. In their paper the authors describe the foundations of
abstracting program’s fixpoint semantics, and present a state of the art on the subject.

2.5 Art (Mostly Peinture)

Whatever “art” might be, according to Gortais [219] “as a symbolic device, art,
whether figurative or not, is an abstraction”. This statement is well illustrated by
Magritte’s picture of a pipe (see Fig. 2.2), where the sentence “Ceci n’est pas une
pipe”10 refers to the fact that the painting “represents” a pipe but it is not the “real
thing”. Certainly, if we look at a person, an event, a landscape in the world, any
attempt of reproducing it, be it through a painting, a sculpture, a novel or a music,
forgets something existing in the original. Under this respect, art tries to get at the
essence of its subject, and hence it is indeed an abstraction of the reality, if abstraction
is intended as a process of getting rid of irrelevancies. On the other hand, a work of art
is exactly such because it makes present something that was not present before, and
may reveal what was not visible before. Moreover, art’s true value is in the emotional
relation with the public. “Each work of art will resonate in its own way over the whole
range of human emotions and each person will be touched in a different way” [219].
Art involves an abstract process, exploiting a communication “language” using a
set of symbols. In visual arts, this language is based on colors, forms, lines, and so
on. The art language of Western cultures had, in the past, a strict link with the reality
that was to be communicated: arts were figurative. Later on, the language acquired
more and more autonomy, and (at least parts of) the arts became abstract [219].

10 “This is not a pipe”.

2.5 Art (Mostly Peinture) 29

Fig. 2.3 Nocturne in Black

and Gold by J. McNeill
Whistler (1875). It is con-
sidered as a first step toward
abstraction in painting [A
color version of this figure is
reported in Fig. H2 of Appen-
dix H]

Thus, abstract art does not aim at representing the world as it appears, but rather at
composing works that are purposefully non-representational and subjective. The use
of non-figurative patterns is not new, as many of them appear on pottery and textiles
from pre-historical times. However, these patterns were elements of decoration, and
did not have necessarily the ambition to be called “art”. A great impulse to the
abandon of faithfulness to reality, especially in painting, was given by the advent of
photography. In fact, paintings were also intended to transmit to posterity the faces
of important persons, or memories of historical events. Actually, a complex interplay
exists among figurative works, abstract works, and photography. All three may show
different degrees of abstraction, and all three may or may not be classified as art at
all: history, context, culture, and social constraints, all play a role in this evaluation.
Even before photography, some painter, such as James McNeill Whistler, stressed
the importance of transmitting visual sensations rather than precise representations of
objects. His work Nocturne in Black and Gold, reported in Fig. 2.3, is often considered
as a first step toward abstract art.
A scientific approach to abstract art was proposed by Kandinsky [279], who
defined some primitives (points, lines, surfaces) of a work of art, and associated to
them an emotional content. In this way it was possible to define a syntax and a lan-
guage for art, which were free from any figurative meaning. However, the primitives
were fuzzy (when a point starts to be perceived as a surface?), and the proposed lan-
guage found difficulties in being applied. Kandinsky, with Malevich, is considered
a father of the abstract pictorial art. An examples of Malevich’ work is reported in
Fig. 2.4.
30 2 Abstraction in Different Disciplines

Fig. 2.4 K. Malevich’s Por-

trait of Ivan Klioune (1911).
The State Russian Museum,
St. Petersburg [A color ver-
sion of this figure is reported
in Fig. H3 of Appendix H]

In Fig. 2.5 an even more abstract painting, by the contemporary French painter
Pierre Soulages, is reported. He says: “J’aime l’autorit é du noir. C’est une couleur
qui ne transige pas. Une couleur violente mais qui incite pourtant à l’int ériorisation.
A la fois couleur et non-couleur. Quand la lumière s’y refl ète, il la transforme, la
transmute. Il ouvre un champ mental qui lui est propre.”11
Since the eighteenth century it was thought that an artist would use abstraction for
uncovering the essence of a thing [377, 588]. The essence was reached by throwing
away peculiarities of instances, and keeping universal and essential aspects. This
idea of abstraction did not necessarily imply, at the beginning, moving away from
the figurative. But, once accepted that the goal of art was to attain the essence and
not to faithfully represent reality, the door to non-figurative art was open.
An example of this process is reported in Fig. 2.6, due to Theo van Doesbourg, an
early abstraction painter, who, together with Piet Mondrian, founded the journal De
Stijl. In 1930 he published a Concrete Art Manifesto, in which he explicitly denied
that art should take inspiration from nature or feelings. In Appendix A the text of the
Manifesto is reported. Actually, it sounds rather surprising that the type of totally
abstract art delineated in the Manifesto be called “concrete art”.

11 “I love the authority of black. It is a color that does not make compromises. A violent color,

but one that stimulates interiorization. At the same time a color and a non-color. When the light is
reflected on it, it is transformed, transmuted. It opens a mental field which is its own.”
2.6 Cognition 31

Fig. 2.5 Painting by Pierre Soulages (2008). Bernard Jacobson Gallery (Printed with the author’s
permission)

Fig. 2.6 Studies by Theo van Doesbourg (1919). From nature to composition

2.6 Cognition

Abstraction is a fundamental dimension of cognition. It is safe to say that without

abstraction no high level thinking would be possible. According to Brooks [81],
“Cognitive Psychology has a huge interest in the whole range of issues to do with
32 2 Abstraction in Different Disciplines

the abstract”. However, the name stands for a large variety of different cognitive
phenomena, so that it is difficult to come up with a unifying view.
In Cognitive Science the term “abstraction” occurs frequently; even though with
different meanings and in different contexts, it is mostly associated with two other
notions, namely, category formation and/or generalization. Barsalou and co-workers
have handled the subjects in several papers (see, for instance, [34]). In particular, a
direct investigation of the concept of abstraction led Barsalou to identify six different
meaning of the word [35]:
• Abstraction as categorical knowledge, meaning that knowledge of a specific cat-
egory has been abstracted out of experience (e.g., “Ice cream tastes good”).
• Abstraction as the behavioral ability to generalize across instances, namely the
ability to summarize behaviorally the properties of a category’s members (e.g.,
“Bats live in caves”).
• Abstraction as summary representation of category instances in long-term memory
(for instance, the generation of a template for a category).
• Abstraction as schematic representation, i.e., keeping critical properties of a cate-
gory’s members and discarding irrelevant ones, or distorting some others to obtain
an idealized or caricaturized description (e.g., generating a “line drawing” carica-
ture starting from a person’s picture).
• Abstraction as flexible representation, i.e., making a representation suitable to a
large variety of tasks (categorization, inference, …)
• Abstraction as an abstract concept, referring to the distance of a concept from the
tangible world (“chair” is less abstract that “truth”).
In connection with the above classification of abstraction types, Barsalou intro-
duces three properties of abstraction: Interpretation, Structured Representation, and
Dynamic Realization. Regarding interpretation, Barsalou agrees with Pylyshyn [435]
on the fact that cognitive representations are not recordings, but interpretations of
experience, a process based on abstraction: “Once a concept has been abstracted
from experience, its summary representation enables the subsequent interpretation
of later experiences.” Moreover, concepts are usually not interpreted in isolation,
but they are connected via relationships; then, abstractions assemble components of
experience into compound representations that interpret complex structures in the
world. Finally, abstraction offers dynamic realization, in the sense that it manifests
itself in a variety of ways that makes it difficult to define it univocally.
Similar to the notion of category is the one of concept. And, in fact, abstraction is
also viewed as the process of concept formation, i.e., the process aimed at identifying
the “essence” in the sensorial input [522].
An interesting discussion concerns the comparison between abstraction theories
in classical Artificial Intelligence (where Barsalou sees them based on predicate
calculus), and in connectionism. Barsalou identifies an abstraction as an attractor
for a statistical combination of properties; here the abstraction is represented by
the active units that characterize the attractor. The connectionist view of abstraction
suffers from the problem of concept complexity, as neural nets have difficulties in
representing structured scenarios.
2.6 Cognition 33

Abstraction as a representation of a category is contrasted by Barsalou with the

exemplar representation. In this context, abstraction is intended as a synthesis of
common properties associated to a category, as opposed to the memorization of a
set of concrete exemplars of the category itself. The two representations are com-
pared in terms of information storage, revision, and loss, and in terms of types of
processing that they support. The interesting conclusion of the study is that the two
representations are not distinguishable on the basis of empirical findings [34].
The view of abstraction offered by Barsalou consists in an embodied theory [35],
which is based on simulation [36]. According to his view, people have simulators of
objects’ properties and relations, which are acquired by experience and which they
run for interpreting sensory inputs. The set of simulators applied to an instance can
be considered as an abstraction.
A more sophisticated view of abstraction is provided later on by Goldstone and
Barsalou [216], where conceptual knowledge, however abstract, is strongly grounded
on perception; in fact, “abstract conceptual knowledge is indeed central to human
cognition, but it depends on perceptual representations and processes, both in its
development and in its active use. Completely modality-free concepts are rarely, if
ever, used, even when representing abstract contents.” Even if trying to link abstract
knowledge to perception may seem a counterintuitive approach, we will see in Chap. 6
that this view can provide the basis for a model of abstraction well suited to cap-
ture relevant aspects of concept representation. Actually, Goldstone and Barsalou
convincingly argue that there are mechanisms shared between cognition and per-
ception that allow abstraction to enter the picture; for instance, selectivity lets the
attention concentrate on particular aspects of a perception, blurring (involuntary or
purposeful) removes details from further processing, and object-to-variable binding
allows perception to have an internal structure as concepts do. Finally, productivity
generates new objects by aggregating parts.
A set of papers dealing with abstraction in Cognitive Sciences was collected in a
special issue of the Int. Journal of Educational Research in 1997.12 In these papers
several among the most fundamental questions in abstraction were dealt with, for
instance the nature of the notion itself as a state or as a process, the way in which
abstractions are acquired, the possible organization into several levels (introducing
thus a gradualness into the notion), and the relationships with generalization and
category formation.
In the debate about the relations between generalization and abstraction, Colunga
and Smith [112] take the extreme view of identifying the two processes, claiming
that “the processes that create abstract concepts are no different from the processes
that create concrete ones” and then abstraction is nothing else than the “natural and
very ordinary process of generalization by similarity”. According to the authors,
the evidence supporting this claim comes from early word learning. Outcomes from
experiments with the Novel Noun Generalization task [497] show that there are
two stages in learning words by children: a slow one, in which learning apparently
proceeds through direct association between many single pairs (word, object), and

12 Vol. 27, Issue 1 (1997).

34 2 Abstraction in Different Disciplines

a fast one, where children seem to use general rules about the nature of words and
lexical categories, and they become able to perform second-order generalization,
namely distinctions not between categories but between features allowing category
formation.13
The idea of an increasing children’s ability to handle abstraction agrees with
Piaget’s genetic epistemology [417], where he distinguishes empirical abstraction,
focusing on objects, and reflective abstraction, in which the mental concepts and
actions are the focus of abstraction. Young children primarily use empirical abstrac-
tion to organize the world, and then they increasingly use reflective abstraction to
organize mental concepts. The basis for Piaget’s notion of abstraction is the ability
to find structures, patterns or regularities in the world.
An interesting point is made by Halford, Wilson, and Phillips [231], who draw
attention to the role relational knowledge plays in the process of abstraction and in
analogy. In their view, the ability of dealing with relations is the core of abstract
thinking, and this ability increases with the phylogenetic level, and also with age
in childhood. The reason is that the cognitive load imposed by processing relational
knowledge depends on the complexity of the relations themselves; actually, the num-
ber of arguments of a relation makes a good metric for conceptual complexity. In fact,
the cost of instantiating a relation is exponential in the number of arguments. These
observations, corroborated by experimental findings, led the authors to conclude that
associative processing is not noticeably capacity limited, but that there are, on the
contrary, severe capacity limitations on relational processing.
According to Welling, abstraction is also a critical aspect of creativity [556].
He claims that the “abstraction operation, which has often been neglected in the
literature, constitutes a core operation for many instances of higher creativity”.
On a very basic level, abstraction can be uncovered in the principles of perceptual
organization, such as grouping and closure. In fact “it is a challenging hypothesis
that these perceptual organizations may have formed the neurological matrix for
abstraction in higher cognitive functions”. Abstract representation is a prerequisite
for several cognitive operations such as symbolization, classification, generalization
and pattern recognition.
An intriguing process, in which abstraction is likely to play a fundamental role, is
fast categorization of animals in natural scenes [132, 158, 211]. It has been observed
that humans and non-human primates are able to classify a picture as containing a
living being (or some similar task) after an exposure to the picture of only 30 ms, and
with a time constraint of at most 1 s (the median is actually 400 ms) for manifesting
recognition. The speed at which humans and monkeys can perform the task (answers
may be reached within 250 ms, with a minimum of 100 ms [211]) is puzzling, because
it suggests that the visual analysis of the pictures must occur in a single feed-forward
wave. One explanation is that recognition happens on the basis of a dictionary of
generic features, but how these features are represented and combined in the visual
system is not clear. We have here a typical case of abstraction, where the important

13 For instance they learn that solid things are named by their shapes (e.g., a glass “cube”), and

non-solid things are named by their material (e.g., “water”).

2.6 Cognition 35

discriminant features are selected and used to achieve quick decisions. The specific
features involved may have been learned during the evolution of the species, as
recognizing a living being (typically, a predator or a pray) may be crucial for survival.
It is interesting to note that the color (which requires a rather long analysis) does
not play a significant role in the recognition, as the same recognition accuracy is
reached with gray-scale images. The fact that color does not play an essential part
suggests that the sensory computations necessary to perform the task rely on the first
visual information available for processing. In fact, color information travels along
a relatively slow visual pathway (the parvocellular system), and the decision might
be taken even before it gains access to mental representations.
According to recent findings [132], recognition might exploit both global aspects
of the target and some intermediate diagnostic features. An important one is the
size of the animal’s body in the picture; in fact, humans are quite familiar with the
processing of natural photographs, so that they may have an implicit bias about
the scale of an animal target within a natural scene. However this does not seem to
be true for monkeys.
A hypothesis about the nature of the processing was investigated very recently by
Girard and Koening-Robert [211]. They argue that fast categorization could rely on
the quantity of relevant information contained in the low spatial frequencies, because
these last could allow a quick hypothesis about the content of the image to be built
up. It would be very interesting to come up with a theory of abstraction capable of
explaining (or, at least, describing) such a challenging phenomenon.
Another curious cognitive phenomenon, in which abstraction plays a crucial role,
is “change blindness” [327, 452, 491, 492], firstly mentioned by the psychologist
W. James in his book The Principles of Psychology [274]. This phenomenon arises
when some distracting element hinders an observer from noticing even big changes
occurring in a scene which he/she is looking at. Change blindness occurs both in the
laboratory and in real-world situations, when changes are unexpected. It is a symptom
of a large abstraction, performed on a scene, which has the effect of discarding a
large portion of the perceptual visual input, deemed to be inessential to one’s current
goal. For example, in an experiment a video shows some kids playing with a ball;
asked to count how many times the ball bounces, all observers failed to see a man
who traverses the scene holding an open umbrella.14 Clearly, abstraction is strongly
connected to attention, on the one hand, and to the goal, on the other.
Recent studies on the phenomenon include neurophysiological approaches
[11, 85], investigation of social effects (changes between images are easier noticed
when individuals work in teams as opposed to individually) [530], and level of exper-
tise of the observer (experts are less prone to change blindness, because they can reach
a deeper level in analyzing a problem than a novice) [161].
A field where the development of computational models of abstraction could be
very beneficial is spatial cognition. According to Hartley and Burgess, “the term
spatial cognition covers processes controlling behaviors that must be directed at

14 Examples can be seen in the two sites http://nivea.psycho.univ-paris5.fr/#CB and http://www2.

psych.ubc.ca/~rensink/flicker/download/.
36 2 Abstraction in Different Disciplines

particular locations, or responses that depend on the spatial arrangement of stimuli”

[235]. In spatial reasoning one should be able to abstract time- and space-independent
relations from contingent locations, to change among different reference systems,
to reason with landmarks and maps, and orient him/herself in unknown environ-
ments. All these activities would be impossible without abstraction. Actually there
is experimental evidence that not only humans but also animals build up abstract
representations of spatial configurations, sharing common spatial features [528]. In
this context, Thinus-Blanc states that “abstraction does not necessarily refer to the
highest level of abstraction, but it applies as soon as there is a generalization process
taking place. It refers to any cognitive processing, the result of which is not bound
to one unique feature or set of features of a given environment, but which can be
generalized to various other situations”.
When a subject scans with the eyes the environment, he/she obtains a set of
local views, because these views depend upon the position and orientation of the
subject’s eyes, head and body; for this reason local views correspond to a concrete
level of spatial description. For spatial representations to be flexible, the time/space
dependency should be dropped, and place, angular, and distance relations must be
processed in an abstract way. The place occupied by the subject can be defined as
the federating core of panoramic local views, because it is the point of view of all
local views that can be obtained by a 360◦ rotation around the subject.
Another aspect of spatial cognition where abstraction comes into play is place
and spatial relationship invariance. When an invisible target place has to be reached,
invariant relations among visible landmarks can be exploited. Knowledge of this
invariance is abstract, as it does not depend anymore from the concrete descriptions.
Abstraction intervenes also in setting up rules for encoding spatial relations and for
computing accurate trajectories.
Regarding abstract spatial reasoning, Yip and Zhao [575] have identified a partic-
ular style of visual thinking, namely imagistic reasoning. Imagistic reasoning “orga-
nizes computations around image-like, analogue representations, so that perceptual
and symbolic operations can be brought to bear to infer structure and behavior”.
This idea is implemented in a computational paradigm, called spatial aggregation,
which allows intermediate representations, the spatial aggregates, to be formed from
equivalence classes and adjacency relations. The authors introduce a set of generic
operators, transforming the information-rich input field into more and more abstract
aggregates.
Finally, defining the granularity of a spatial region is a classical form of abstraction.
According to Hobbs [252], granularity is a means to retrieve a simplified represen-
tation of a domain from more complex, richer representations. Spatial and temporal
granularities are closely related to the concept of grain-size in a local spatial con-
text, defined by Schmidtke [478]. Objects that are smaller than the grain-size can be
disregarded as unimportant details. If objects smaller than the grain-size need to be
accessed, a change of context is necessary: zooming out of a scene, a larger area is
covered, but small details are lost, whereas zooming into a scene, smaller details are
magnified, and objects further away become irrelevant.
2.6 Cognition 37

The notion of granularity has been addressed also by Euzenat [154–156] in the
context of object representation in relational systems. He defined some operators for
changing granularity, subject to suitable conditions, and used this concept to define
approximate representations, particularly in the time and space domains.
A very interesting link between abstraction and the brain’s functioning is provided
by Zeki [580–582], who gives to the first part of his book, Splendors and Miseries
of the Brain, the title “Abstraction and the Brain”. Zeki suggests that behind the
large variety of functions performed by the cells in the brain on inputs of differ-
ent modalities there is a unifying functionality, which is the ability to abstract. By
abstraction Zeki means “the emphasis on the general property at the expense of the
particular”. As an example, a cell endowed with orientation selectivity responds to
a visual stimulus along a given direction, for instance the vertical one. Then, the cell
will respond to any object vertically oriented, disregarding what the object actually
is. The cell has abstracted the property of verticality, without being concerned with
the particulars. The ability to abstract is not limited to the cells in the visual system,
but extends to all sensory areas of the brain, as well as to higher cognitive properties
and judgmental levels.
According to Zeki [582], the brain performs another type of abstraction, which
is the basis for the perceptual constancy. Perceptual constancy allows an object to
be recognized under various points of view, luminance levels, distance, and so on.
Without this constancy, recognition of objects would be an almost impossible task.
An excellent example is color constancy: even though the amount of red, green and
blue of a given surface changes with different illuminations, our brain attributes to the
surface the same color. Then, abstraction, in this context, is the capability of the brain
to capture the essence of an object, independently from the contextual conditions of
the observation. As a conclusion, Zeki claims that “a ubiquitous function of the
cerebral cortex, one in which many if not all of its areas are involved, is that of
abstraction” [582].

2.7 Vision

Vision is perhaps the field where abstraction is most fundamental and ubiquitous,
both in human perception and in artificial image processing. Without the ability to
abstract, we could not make sense of the enormous number of pixels continuously
arriving at our retina. It is abstraction that allows us to group pixels into objects, to
discard irrelevant details, to visually organize in a meaningful way the world around
us. Then, abstraction necessarily enters into any account of vision, either explicitly
or implicitly. In the following we will just mention those works that make more or
less explicit reference to some kind of abstraction.
One of the fundamental approach to vision, strictly related to abstraction, is the
Gestalt theory [558]. “Gestalt” is a German word that roughly means “form”, and
the Gestalt Psychology investigates how visual perception is organized, particularly
concerning the part-whole relationship. Gestalt theorists state that the “whole” is
38 2 Abstraction in Different Disciplines

(a) (b)

Fig. 2.7 a A case of clear separation between foreground and background. b A case of ambiguous
background in Sky and Water II, Escher, 1938 (Permission to publish granted by The M.C. Escher
Company, Baarn, The Netherlands)

greater than the sum of its parts, i.e., the “whole” carries a greater meaning than
its individual components. In viewing the “whole”, a cognitive process takes place
which consists of a leap from comprehending the parts to realizing the “whole”.
Abstraction is exactly the process by which elements are grouped together to
form meaningful units, reducing thus the complexity of the perceived environment.
According to Simmons [489], parts are grouped together according to function as
well; in this way the functional salience of parts [538] determines the granularity
level from the functional point of view, which often, but not always, coincides with
the level suggested by the perceptual one (gestalt).
The Gestalt theory proposes six grouping principles, which appear to underly the
cognitive organization of the visual input. More precisely:
• Foreground/Background—Visual processing has the tendency to separate figures
from the background, on the basis on some feature (color, texture, …). In complex
images several figures can become foreground in turn. In some cases, the relation
fore/background is stable, whereas in others the mind oscillates between alternative
states (see Fig. 2.7).
• Similarity—Things that share visual characteristics (shape, size, color, texture, …)
will be seen as belonging together, as in Fig. 2.8a. The same happens for elements
that show a repetition pattern. Repetition is perceived as a rhythm, producing a
pleasing effect, as in Fig. 2.8b.
• Proximity—Objects that are close to one another appear to form a unit, even
if their shapes or sizes radically differ. This principle also concerns the effect
generated when a collection of elements becomes more meaningful than their
separate presence. Examples can be found in Fig. 2.9.
2.7 Vision 39

(a) (b)
Fig. 2.8 a The set of circles in the middle of the array is perceived as a unit even though the
surrounding squares have the same color and size. b A pleasant repeated arrangement of bicycles
in Paris

(a) (b)
Fig. 2.9 a The set of squares is perceived as two separate entities (left and right), even though the
squares are all identical. b A ground covered by leaves, where the individual leaves do not matter
singularly, but only their ensemble is perceived

• Closure—The mind my provide missing parts of an object when there is suggestion

of a visual connection or continuity between them, as in the Kanizsa illusion,
reported in Fig. 2.10.
• Continuity—The eye tends to make lines continuing beyond their ending points,
as exemplified in Fig. 2.11.
• Symmetry—The eye likes symmetries, and is disturbed by the lacking thereof (see
Fig. 2.12).
A good theory of abstraction should be able to explain the computational aspects of
the Gestalt theory. This theory has inspired many works on image understanding,
whose citation is out of the scope of this book.
40 2 Abstraction in Different Disciplines

Fig. 2.10 We clearly see a

square even though the parts
of the contour between the
circles are not present

Fig. 2.11 The line AO is

automatically continued, by
our perception, into line OB,
as well as for lines CO and
OD

One of the first and most influential work, which has very strict links with abstrac-
tion, is Marr’s proposal of vision as a process going through a series of representation
stages [352, 353]. Particularly relevant for our purposes is the sketchy 3-D repre-
sentation by means of a series of “generalized cones”, as illustrated in Fig. 2.13.
The successive stages of a scene representation, from the primal sketch to the 3-D
description, can be considered as a series of level of abstraction. Another fundamen-
tal contribution to the modeling of human vision was provided by Biederman [59],
who introduced the idea that object recognition may occur via segmentation into
regions of deep concavity and spatial arrangement of these last. Components can
be represented by means of a small set of geons, i.e., generalized cones detectable
in the image through their curvature, collinearity, symmetry, parallelism, and co-
termination. As the geons are free to combine with one another, a large variety of
objects can be represented. A Principle of Componential Recovery asserts that the
identification of two or three geons in an object representation allows the whole object
to be recovered, even in presence of occlusion, rotation, and severe degradation.
2.7 Vision 41

Fig. 2.12 The symmetry of Notre Dame de Paris appeals to our sense of beauty

Fig. 2.13 Organization of shape information in a 3-D model description of an object based on
generalized cone parts. Each box corresponds to a 3-D model, with its model axis on the left side
of the box and the arrangement of its component axes on the right. In addition, some component
axes have 3-D models associated with them, as indicated by the way the boxes overlap (Reprinted
from Marr [353])
42 2 Abstraction in Different Disciplines

Fig. 2.14 The abstraction technique combines structural information (left) with feature information
(right) (Reprinted with permission from de Goes et al. [125])

An approach explicitly exploiting abstraction is presented by de Goes et al. [125],

who introduce “the concept of an exoskeleton as a new abstraction of arbitrary
shapes that succinctly conveys both the perceptual and the geometric structure of a
3-D model”. The abstraction that the authors propose combines the geometry-driven
and the perceptually-driven approaches, generating representations that contain both
local and global features of the modeled object, as described in Fig. 2.14.
An approach to vision that typically involves several levels of abstraction is
the multi-scale image processing [33, 71, 239, 494, 527]. At each level a differ-
ent resolution allows different details to emerge. As often images are represented via
graphs, multi-resolution analysis of graphs and networks is also relevant [456]. Multi-
resolution approaches are related to scale-invariance, which is a property that may
be required from abstraction of images. Another approach, which combines visual
input and functional information to build up concepts, was presented by Hoffmann
and Zießler [254]. Their approach is important for abstraction, because it allows
concepts to be defined as abstract data types in terms of properties and functions
(operations).
Without mentioning abstraction, Chella, Frixione, and Gaglio [94–96] propose
an architecture of robot vision that makes large use of it. Their goal is to propose an
image processing approach for scene understanding in which there is an interplay
among the visual signal (subconceptual), the high level linguistic description of the
environment, and an intermediate representation based on Gärdenfors’ notion of
conceptual space [194]. This intermediate level is where abstractions, intended as
meaningful groupings of pixels from the external world, are formed. In addition, a
mechanism of focus of attention, allowing only relevant aspects of the input signal
to be kept, implements another type of abstraction.
As we mentioned at the beginning of this section, abstraction is the primary
mechanism that allows us to make sense of the visual world we perceive, by grouping
sets of pixels into meaningful units. Primarily, these units are objects. It is then
important to define what an object is, what are its characteristic properties, and how
can these be extracted in such a way that the presence of the object (and possibly its
identity) is detected [15, 509].
2.7 Vision 43

The problematics around objects is of the same nature as those concerning

knowledge, abstraction, beauty, and so on, i.e., it involves discussions that cannot
start form the definition of their subject matter. In fact, the term “object” occurs in
a multiplicity of contexts, from Philosophy to Computer Science, from Geometry to
Perception. The word “object” comes from the Latin past participle objectus of the
verb obicere, namely “to throw at”. In everyday life it is roughly synonym of “thing”,
and is normally associated to something physical. In Philosophy the word has a much
more general meaning, including material things as well as events, ideas, and con-
cepts. Its definition requires that two problems are faced: change and substance. The
first one starts from the consideration that an object may undergo modifications with
respect to a given property without loosing its essence. For instance, a house may be
restructured, without stopping to be itself for this. On the other hand, a demolished
house stops existing. Then, changes have a limit, beyond which the object looses its
essence. To locate this limit is not at all obvious. The problem of change is also rele-
vant to abstraction, in the sense that abstraction can increasingly modify objects until
(almost) nothing is left of the original ones. An example was provided in Fig. 2.6.
The second problem starts from the observation that the substance that composes
an object cannot be experienced directly, but only mediated through its properties.
So, it is not possible to conclude for the existence of substance. The way out of this
is to say, according to Hume’s bundle theory, that an object is nothing more than the
set of its properties. Things become even more complex when the term object starts
to denote also immaterial or conceptual things.
As mentioned in Sect. 2.4, in Computer Science the notion of object is associated
to that of abstract data types, and is the basis for object-oriented programming. This
association is particularly relevant for developing models of abstraction, specifically
via the idea of encapsulation, which is exactly the information hiding or aggregat-
ing process the notion of abstraction is all about. According to Grady Booch [68],
“encapsulation serves to separate the contractual interface of an abstraction and its
implementation”. We also recall that encapsulation has been considered by Ferrari
(see Sect. 2.3) as one of the main aspects of abstraction in Mathematics.
A field where the notion of an object is not only fundamental but also strictly
related with the topics of this book is vision [509], both natural and artificial, including
both perceptive issues and computational image processing. For instance, Ballard
and Brown [31] say that “Computer vision is the construction of explicit, meaningful
descriptions of physical objects from images.” Or, as Zeki states it [579], “The brain’s
task, then, is to extract the constant, invariant features of objects from the perpetually
changing flood of information it receives from them.” Even though acknowledging
the fundamental role object recognition has in artificial vision, Stone [509] tries to
separate the task of object recognition and identification from several others that
vision must attend to, such as vision-guided motion control, determination of depth,
tracking changes in lighting, and so on. He maintains that a theory of vision should
instead be based on spatio-temporal characteristics, including motion.
Along this line, Amir and Lindenbaum [16] have proposed a quantitative approach
to grouping, which consists of a generic grouping method, applicable to many
domains [50], and an analysis of the expected grouping quality. In addition, a study
44 2 Abstraction in Different Disciplines

of the computational complexity needed for grouping is also presented, as well as a

criterion for evaluating the quality of grouping [51, 150].
The notion of abstraction operators bears resemblance with the visual routines,
introduced by Ullman [541]. These routines are applied to the early representation
of an image, and aim at generating visually abstract shape properties and spatial
relations. This ability plays a fundamental role in object recognition, visually guided
manipulation, and more abstract visual thinking.
A cognitive approach to image classification, which has strong links with abstrac-
tion (intended as the process of choosing relevant information), has been proposed
by Schyns and co-workers [220]. These authors have designed an algorithm, called
Bubbles, which allows the parts of an images, which the human attention concentrate
on in order to solve a given classification task, to be discovered. In a set of experi-
ments, pictures of human faces were used, and the considered tasks were to decide
the gender of the person and whether his/her face was expressive or not. In Fig. 2.15
the regions of the face (relevant features) where the eye of the observers rested the
most are reported. The described methodology could provide hints for designing
abstraction operators devoted to cognition-based feature extraction.
DeCarlo and Santella [129] describe an approach for stylizing and abstracting
photographs, which are translated into line-drawings using bold edges and large
regions of constant color, as exemplified in Fig. 2.16. The idea is to facilitate the

Fig. 2.15 Using the Bubbles

method, Gosselin and Schyns
have identified the information
a human observer focuses
on when deciding whether
a face is/is not expressive
(EXNEX), or determining its
gender. Expression is sought
by extracting information
from the mouth, whereas
gender classification requires
both mouth (but less precise)
and eyes (Reprinted with
permission from Gosselin and
Schyns [220])
2.7 Vision 45

Fig. 2.16 Example of a picture (left) and its rendering with lines and color regions (right) (Reprinted
with permission from DeCarlo and Santella [129])

observer to easily extract the core meaning of a picture, leaving aside details. A
human user interacts with the system, and simply looks at an image for a short
period of time, in order to identify meaningful content of the image itself. Then, a
perceptual model translates the data gathered from an eye-tracker into predictions
about which elements of the image representation carry important information.
In order to cope with the increased resolution power of modern cameras, image
processing requires a large amount of memory to store the original pictures. Then,
different techniques of image compression are routinely used. Image compression can
be lossy or lossless. Lossy compression methods exploit, among other approaches,
color reduction, Fourier (or other) transforms, or fractals. They throw away part of
the content of an image to accomplish a trade-off between memory requirements and
fidelity. For instance, in natural images the loss of some details can go unnoticed,
but allows a large economy in memorization space. Lossy compression can be seen
as an abstraction process, which (irreversibly) reduces the information content of
an image.
When reduction in information is not acceptable, a lossless compression is suit-
able. There are many methods that can be used, including run-length encoding, chain
codes, deflation, predictive coding, or the well-known Lempel-Ziv-Welch algorithm.
Lossless compression is a process of image transformation, because the content of
the image is preserved, while its representation is made more efficient.
A technique related to abstraction, which is widely used in graphics, is the Level of
Detail (LOD) approach, described by Luebke et al. [348]. In building up graphic sys-
tems, there is always a conflict between speed and fluidity of rendering, and realism
and richness of representation. The field of LOD is an area of interactive computer
graphics that tries the bridge the gap between performances and complexity by accu-
rately selecting the precision with which to represent the world. Notwithstanding
the great increasing in the power of the machines devoted to computer graphics,
the problem is still up-to-date, because the complexity of the needed models has
increased even faster.
The idea underlying LOD, illustrated in Fig. 2.17, is extremely simple: in render-
ing, objects which are far, or small, or less important contain much less details that
the more close or important ones. Concretely, several versions of the same object are
46 2 Abstraction in Different Disciplines

Fig. 2.17 The fundamental

concept of LOD. a A complex
(a)
object is simplified. b Creation
of LOD for rendering small or
distant or unimportant objects
(Reprinted with permission
from Luebke et al. [348])

(b)

created, each one faster and with less details than the preceding one. When composing
a scenario, for each object the most suitable LOD is selected.
The creation of the various versions starts from the most detailed representation
of an object, the one with the greatest number of polygons. Then, an abstraction
mechanism reduces progressively this number, trying to keep as much resemblance as
possible with the original one. In recent years several algorithms have been described
to automatize this simplification process, which was performed manually in the past.
As the generated scenes are to be seen by humans, an important issue is to investigate
what principles of visual perception may suggest the most effective simplification
strategies.
An approach inspired by the LOD has been described by Navarro et al. [394]
to model and simulate very large multi-agent systems. In this case the trade-off is
between the amount of details that must be incorporated into each agent’s behav-
ior and the computational power available to run the simulation. Instead of a pri-
ori choosing a given level of detail for the system, the authors propose a dynamic
approach, where the level of detail is a parameter that can be adjusted dynamically
and automatically during the simulation, taking into account the current focus and/or
special events.
2.8 Summary 47

2.8 Summary

Abstraction is a notion that plays a fundamental role in a multiplicity of disciplines.

By summarizing the basic definitions from various disciplines, five main views of
abstraction emerge:
• Abstraction is to take a distance from the concrete world.
• Abstraction coincides with (or is a close variant of) generalization.
• Abstraction is information hiding.
• Abstraction is to keep relevant aspects and to discard irrelevant ones.
• Abstraction is a kind of reformulation or approximation.
In most contexts abstraction has been considered at an informal level, except in
Computer Science, Artificial Intelligence, and, in part, Philosophy, where formal or
computational models have been proposed.
In later chapters all these notions will be discussed in detail, trying to come up with
a computational model of abstraction sufficiently general to unify several among the
existing approaches, but concrete enough to be used in practice to help solving non
trivial problems.
Chapter 3
Abstraction in Artificial Intelligence

“Abstraction is the essence of intelligence and

the hard part of the problems being solved”
[Brooks, 1991]

ven though abstraction appears to be a fundamental process in many disci-

plines, as Chap. 2 has shown, it is mostly in Artificial Intelligence (AI) that abstraction
has been an explicit object of theoretical and computational modeling, beyond its use
in many tasks. Theories have been actively investigated in the 1980s, when a diffuse
feeling suggested that it was possible to come out with general definitions and the-
ories of abstraction, typically involving domain-independent properties, applicable
in a large variety of domains and tasks. Unfortunately, this intuition turned out to be
incorrect. The difficulties that emerged in trying to capture the notion of abstraction in
a general way suggested to turn the research efforts to more pragmatical approaches.
In fact, in recent years we have witnessed a decrease in theoretical approaches and
an increase in domain-dependent use of abstraction. In order to reflect this situa-
tion, we briefly review the theoretical approaches first, and then we will illustrate
how abstraction has been used in a more practical way in various subfields of AI.
Machine Learning will be covered in a separate chapter.

3.1 Theoretical Approaches

The theoretical approaches to abstraction, developed in AI, usually consider the

relation between a ground problem, represented in a ground formalism, and a more
abstract problem and its representation, be it the same or different from that of the
ground (see Fig. 3.1). Most existing theories identify this relation with a mapping
from a ground to an abstract space. They differ in the nature of the spaces and in the

L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 49
DOI: 10.1007/978-1-4614-7052-6_3, © Springer Science+Business Media New York 2013
50 3 Abstraction in Artificial Intelligence

Fig. 3.1 The generic abstrac-

tion process in AI is modeled
as a change between a ground
representation and an abstract
representation

type of mapping. However, any given mapping is not necessarily an abstraction, and
some additional constraints are needed so that the mapping be so qualified. Usually,
the constraints require that the solution of the problem at hand, in the abstract space,
be “easier”, in some sense, than the solution in the ground one.
As already mentioned in Chap. 1, it is well known since the beginning of AI
[14, 371, 462, 490] (and even before [424]) that a “good” representation is a key to
successfully solve problems. From another perspective, AI theories of abstraction
can be embedded in the framework of representation changes.
Representation changes considered in abstraction theories broadly fall into one
of four categories:
• perceptive/ontological (mapping between sensory signals/objects)
• syntactic (mapping between predicates, namely words of formal languages)
• semantic (mapping between semantic interpretations in logical languages)
• axiomatic (mapping between logical theories)
Historically, the first explicit theory of abstraction started at the axiomatic level.
Plaisted [419] provided a foundation of theorem proving with abstraction, which he
sees as a mapping from a set of clauses to another one that satisfies some properties
related to the deduction mechanism. Plaisted introduced more than one abstraction,
including a mapping between literals and a semantic mapping. A more detailed
description of his work will be given in Chap. 4. Later on Tenenberg [526] pointed
out some limitations in Plaisted’s work, and defined abstraction at a syntactic level
as a mapping between predicates, which preserves logical consistency. Giunchiglia
and Walsh [214] have extended Plaisted’s approach and reviewed most of the work
done at the time in reasoning with abstraction. They informally define abstraction as
a mapping, at both axiomatic and syntactic level, which preserves certain desirable
properties, and leads to a simpler representation. Recently, Kinoshita and Nishizawa
have provided an algebraic semantics of predicate abstraction in the Pointer Manip-
ulation Language [288].
Nayak and Levy [395] have proposed a theory of abstraction defined as a mapping
at the semantic level. Their theory defines abstraction as a model level mapping
rather than predicate mapping, i.e., abstraction is defined at the level of formula
interpretation. More recently, De Saeger and Shimojima [464] have proposed a theory
of abstraction based on the notion of channeling. This theory considers abstractions
3.1 Theoretical Approaches 51

as theories themselves, allowing the nature of the mapping at the different levels to
be defined formally (axiomatic, syntactic, and semantic).
For abstractions at the ontologic level we can mention the seminal works by
Hobbs [252] and Imielinski [269]. Hobbs’ approach aims at generating, out of an
initial theory, a computationally more tractable one, by focusing on the granularity
of objects or observations. Similarly, Imielinski proposed an approximate reasoning
framework for abstraction, by defining an indistinguishability relation among objects
of a domain. While all the aforementioned models rely on symbolic representation,
Saitta and Zucker [468] proposed a theory which adds the possibility of explicitly
defining abstraction at the observation (perception) level. The associated KRA model
will be described in Chap. 6.
In the following we will describe the basic aspects of the theoretical approaches
to abstraction mentioned so far, with the aim of providing the reader with an intuition
of the ideas behind. The formal treatment of some of the models will be presented
in Chap. 4.
As Logic has been the most used formalism to represent knowledge at the start
of AI researches, it is not surprising that the first models of abstraction were defined
within some logical formalim. Plaisted considered clauses in First Order Predi-
cate Logic (FOL) as a knowledge representation formalism [419]. His definition
of abstraction consists in a mapping f from a clause A to a “simpler” set of clauses
B = f (A). The mapping is such that B has always a solution if A does, but not
vice-versa. The idea is that a solution for B may act as a guide for finding a solution
for A with a reduced effort. To be valid, the mapping must satisfy some properties
which are considered “desirable” from the resolution mechanism point of view.
Plaisted provided several examples of abstractions; for instance, the ground
abstraction associates to a clause the set of all its ground instances (which may
be infinite), whereas the deleting argument abstraction reduces the arity of a clause.
Among other cases, Plaisted also considers predicate mapping, where two or more
predicates are mapped to a single one.
Example 3.1 (Predicate mapping). Let us consider a simple problem represented in
FOL, namely making plane ticket reservation for a family. Let us suppose that there
are two predicates in the ground theory, namely Son and Daughter, which describe
the family relationship between a father John and two of his kids, Zoe and Theo. An
abstraction may consist in replacing the two predicates of the ground theory by a
single one in the abstract theory, for example, Kid. In the abstract theory John would
simply have two kids. Indeed, the abstract representation counts less predicates, and
supports faster processing where the distinction between girl and boy is not relevant,
as it is for both place occupancy and ticket price.
Even though the notion of predicate mapping is very intuitive and appealing,
Tenenberg [526] pinpointed a problem in it, which is related to the generation of
false evidence. In fact, reasoning with the corresponding abstract theory may lead
to inconsistencies. For instance, in Example 3.1, the presence of an axiom stating
that a Son is a Boy, may lead to conclude, in the abstract theory, that a Kid is a Boy,
52 3 Abstraction in Artificial Intelligence

implying thus that Zoe is a Boy as well. Plaisted was aware of this problem, which he
called the “false proof” problem. Tenenberg tried to solve the problem by defining an
abstraction as a predicate mapping (or a clause mapping), in which only consistent
clauses are kept. Unfortunately, checking consistency is only semi-decidable.
About ten years after Plaisted’s seminal work, Giunchiglia and Walsh [214] have
proposed a more general theory of abstraction, which integrates both predicate and
clause mapping. According to these authors, the majority of the abstractions in prob-
lem solving and theorem proving may be represented as a mapping between formal
systems. They also note that most abstractions neither modify the axioms nor the infer-
ence rules, and are therefore in most cases a pure mapping of languages. Giunchiglia
and Walsh’s goals in introducing a theory of abstraction included understanding the
meaning of abstraction, investigating the formal properties of the operators for a
practical implementation, and suggesting ways to build abstractions.
A useful distinction among abstractions, introduced by Giunchiglia and Walsh, is
among Theorem-Decreasing (TD), Theorem-Increasing (TI), and Theorem-Constant
(TC) abstractions. In a TI-abstraction the abstract space has more theorems that the
ground one, while the opposite happens for a TD-abstraction. In TC-abstractions,
instead, ground and abstract spaces have exactly the same theorems. Giunchiglia
and Walsh have argued that useful abstractions for the resolution of problems are the
TI-abstractions, because they preserve all existing theorems.
Example 3.2 (TI-Abstraction). Going back to Example 3.1, let us consider the book-
ing of a plane ticket for the family between Hanoi and Paris. If the axiom represent-
ing the constraint that the whole family is traveling on the same plane is dropped,
an online booking system could find more possible flights. Therefore, the abstrac-
tion consisting in dropping the axiom that the family should fly together is a TI-
abstraction.
Example 3.3 (TD-Abstraction). Let us consider now the axiom that states that two
cities A and B are connected by a flight if there is a city C that is directly connected
bidirectionally to A and B. In the ground space this axiom allows trips that have
several stops. A flight from Hanoi to Paris may then be booked even though there is
a stop in Saigon or Frankfurt. The abstraction consisting in removing such axiom is
a TD-abstraction. Indeed in the abstract space there will be far less possible flights,
as only direct flights will be considered.
Examples 3.2 and 3.3 show what might not be intuitive at first thought, i.e., that by
simplifying a representation from a syntactical point of view, one can either increase
or decrease the number of solutions (described in the theory as theorems).
The shortcomings of the syntactic theory of abstraction described above are that
while it captures the final result of an abstraction, the theory does not explicitly
capture the underlying justifications or assumptions that lead to the abstraction, nor
the mechanism that generates the abstraction itself.
Nayak and Levy [395], extending Tenenberg’s work, have proposed a seman-
tic theory of abstraction to address the shortcomings of syntactic theories. Indeed,
they view abstraction as a two step process. The first step consists in abstracting
3.1 Theoretical Approaches 53

the “domain” of interpretation, and the second one in constructing a set of abstract
formulas that best capture the abstracted domain. This semantic theory yields abstrac-
tions that are weaker than the base theory, i.e., they are a proper subset of TD-
abstractions. Nayak and Levy introduce two important notions: Model Increasing
abstractions (MI), which are a proper subset of TD-abstractions, and Simplifying
Assumptions (SA), which allow abstractions to be evaluated according to the relia-
bility of the assumptions themselves.
Example 3.4 Going back to Example 3.1, a simplifying assumption could be that
for the task of finding an airplane route, the difference between daughter and son
is not relevant, because what counts is that they are kids. The sets of models of the
two predicates Son and Daughter would then be merged into a single set of models,
corresponding to a new predicate, namely Kid. As for the axioms stating that a “Son
is a Boy” and a “Daughter is a Girl”, their abstract counterparts would both be
constructed from the models of the ground predicates, and not mapped syntactically.
One possible outcome of this construction is an abstract axiom stating that a “Kid is
a Boy OR a Girl”. Such mapping of models does not introduce false evidence.
Abstractions of the ontological type deal with the objects in a domain, and aim
at reducing the number of different objects by declaring some equivalence rela-
tion among subsets of them. The best known approaches to this type of abstraction
are those by Hobbs [252] and Imielinski [269]. Hobbs introduces the concept of
granularity, linked to an indistinguishability relation. Two objects are said to be
indistinguishable (and hence treated as the same object) if they satisfy a set R of
relevant predicates, defined a priori on the basis of domain knowledge. R parti-
tions the objects into equivalence classes, each one represented by a single symbol.
Equivalence between objects is used to make tractable FOL theories.
Imielinski starts from the notion of error in numerical values, and tries to extend it
to FOL knowledge bases, used to answer queries. One way of introducing an “error”
in a logical knowledge base is to apply Hobb’s indistinguishability relation, making
objets in the domain collapse into equivalence class. Reasoning with the abstract
knowledge base is called limited reasoning by Imielinski.
The syntactic theories presented before are good at characterizing and classifying
existing abstractions, but they fail to offer a constructive approach to the problem of
creating abstractions. Moreover, these approaches manipulate symbols that are not
related to real world object: they are not grounded. The semantic theory of abstraction
proposed by Nayak and Levy [395] provides the basis for a constructive approach,
but is substantially limited to one type of abstraction, namely, predicate mapping.
There are other approaches to abstraction, which address the grounding problem and
consider non logical representations, such as images or signals. These approaches are
at the perception level as opposed to more formal levels, where information is already
encoded in symbols. Such type of perception mapping is particularly important in
the signal analysis and media community.
Example 3.5 (Perceptual Abstraction). A very common example of perceptual
abstraction is changing the resolution of a digital screen. When the resolution is
54 3 Abstraction in Artificial Intelligence

Fig. 3.2 Different theories

of abstraction and associated
seminal papers

lowered, some details, which were visible at a higher resolution, are no more visible.
Another example of perceptual abstraction, easy to implement with simple image
processing tools, is to change a color image into a black and white one.
Although the theories mentioned above are all “theories of abstraction”, they are quite
different in their content and applicability, depending on the representation level at
which abstraction is considered. In Fig. 3.2 a summary of the types of approaches
proposed in the literature is presented, together with their associated seminal papers.
Notwithstanding the large use of abstraction in different tasks and domains, the-
oretical frameworks to model abstraction are only a few, and also rather old. More
general theories of representation changes, such as Korf’s one [297], are too general,
as they do not consider any particular language of representation or formalism. As
such they are not easily made applicable. One of the difficulties comes from the fact
that choosing a “good” abstraction is not easy, given that the very notion of “good”
is strongly task-dependent. It is then important to understand the reasons that justify
the choice of a particular abstraction, and the search for better ones. A comprehen-
sive theory of the principles underlying abstraction would be useful for a number of
reasons; in fact, from a practical point of view it may provide:
• the means for clearly understanding the different types of abstraction and their
computational cost,
• the semantic and computational justifications for using abstraction,
• the framework to support the transfer of techniques between different domains,
• suggestions to automatically construct useful abstractions.
Finally, we may also mention that, in AI, abstraction has been often associated to the
idea of problem reformulation, which goes back to Lowry [344]. Lowry is concerned
with design and implementation of algorithms; both specifications and algorithms
are viewed as theories, and reformulation is defined as a mapping between theories.
His system STRATA works in three steps: first, it removes superfluous distinctions
in the initial conceptualization of the problem supplied by the user. Then, it designs
an abstract algorithm, exploiting pre-defined algorithm schemas, which are seen
3.1 Theoretical Approaches 55

as parametrized theories. Third, STRATA implements the abstract algorithm using

stepwise refinements.
A more theoretical approach to reformulation is taken by Subramanian [512], who
proposes a first-principle approach to incremental reformulation to obtain computa-
tional efficiency. In her approach, reformulation is defined as a shift in the conceptu-
alization of a domain, and is justified by the need to remove irrelevant distinctions
in the initial problem formulation.
A more concrete goal is tackled by Choueiry et al. [102], who examine reformu-
lation in the context of reasoning about physical systems. The authors’ aim is not to
present a general theory of reformulation, but, instead, to analyze and compare some
known reformulation techniques inside a practical framework. Another approach to
abstraction in physical systems is presented by Wang [549], who uses fuzzy sets to
provide a linguistic description of the systems in terms of linguistic states and lin-
guistic decision rules. Reformulation in a more specific domain has been approached
by Charnely et al. [93] in the context of Constraint Satisfaction Problems.
Beyond the efforts to formalize the notion of abstraction, we will see in the fol-
lowing of this chapter that there is a huge number of algorithms that make use of
abstraction, and a large variability among techniques and approaches in different AI
tasks. Clearly, making a complete review of all the approaches related to abstraction
is impossible, and we will limit our interest to those tasks where abstraction had a
stronger impact.

3.2 Abstraction in Planning

Planning is one of the subfields of AI in which abstraction has played a significant

role since the beginning [574]. In this context, abstraction is often associated to a
transformation of the problem representation that allows a theorem to be proved
(or a problem to be solved) more easily, i.e., with a reduced computational effort,
according to the process described in Fig. 3.1. This view of abstraction makes it clear
that, in order to compare solving a ground problem with or without abstraction, all
the stages in the abstraction process ought to be taken into account. In other words,
the complexity of ground inference (the “direct solving” step) should not only be
compared to the complexity of performing abstract inferences (the “abstract solving”
step), but should also take into account the complexity of the representation change
(both the problem reformulation and solution refinement steps).
It is also frequent that a hierarchical representation be generated [105, 146, 290,
462], corresponding to various levels of abstraction, and, as a consequence, of dif-
ferent cost reductions.
The first planning system that explicitely made use of (hierarchical) abstraction
was ABSTRIPS [462], described by Sacerdoti in the early 1970s. ABSTRIPS stands
for “Abstraction-Based Stanford Research Institute Problem Solver”, and the system
is built on top of STRIPS, an automated planner previously developed by Fikes and
Nilsson [167]. The formal language of its inputs is also referred to as STRIPS. This
56 3 Abstraction in Artificial Intelligence

Fig. 3.3 In ABSTRIPS a complete plan is developed at each level of abstraction before descending
to a more detailed level. First, a plan that uses actions with the highest criticalities is found. Then
this plan is iteratively refined to reach one that satisfies all the less critical preconditions. Each
abstract plan is an esquisse of a final plan (Reprinted from Nilsson [402])

language is the basis of most languages for expressing automated planning inputs in
today’s solvers.
A STRIPS problem instance is composed of an initial state, a set of goal states, and
a set of actions. Each action description includes a set of preconditions (facts which
must be established before the action can be performed), and a set of postconditions
(facts that are added or removed after the action is performed). The search space can
be modeled as a graph, where nodes correspond to states, and arcs correspond to
actions. A plan is a path, that is, a sequence of states together with the arcs linking
them. ABSTRIPS first constructs an abstraction hierarchy for a given problem space,
and then uses it for hierarchical planning, as illustrated in Fig. 3.3. More precisely,
ABSTRIPS assigns criticalities (i.e., relative difficulties) to the preconditions of each
action. First, a plan that uses actions with the highest criticality is found. Then this
plan is iteratively refined to reach one that satisfies all the less critical preconditions.
The degree of criticality induces thus a hierarchy of abstract spaces. Each abstract
plan is an esquisse of a final plan.
Another historical system using a hierarchical approach has been PRODIGY,
in which abstraction has been integrated with Explanation-Based Learning (EBL)
[290].
Although abstraction and hierarchical planning seem central to controlling com-
plexity in real life, several authors argue that it has not been used as extensively as it
could have been. The reason might lie in the fact that there is still much work to be
done in order to better understand all the different ways of doing optimal abstraction
planning. As an example, even though ABSTRIPS was proposed in 1974, it is only
twenty years later that Knoblock did a thorough analysis of the algorithm [289]. As
a result it became clear that ABSTRIP implicitly assumed that the low criticality
3.2 Abstraction in Planning 57

preconditions (the ones considered “details”) had to be independent. When this

assumption does not hold, the planner performance may dramatically degrade [28],
showing an exponential slow down. This phenomenon can also occur for abstraction
hierarchies automatically generated by algorithms like ALPINE [289] or HIGH-
POINT [27].
Bäckström and Jonsson [28] showed that there is little hope of any drastic improve-
ment upon these algorithms, because “it is computationally difficult to generate
abstraction hierarchies which allow finding good approximations of optimal plans”.
One key problem identified at the end of the 1990s was that if abstraction spaces
are badly chosen, finding a concrete plan that satisfies the preconditions of the first
operator in the abstract plan is more difficult than reaching the original goal at
the concrete level. Among the desirable properties of abstraction, Tenenberg has
defined the so-called “upward solution property” which informally states that “if
there exists a ground solution, then an abstract one should also exist” [525]. The
intrinsic limitation of this property is that it does not constrain in any way the abstract
plan, and if the abstract plan does not bear any resemblance to the concrete one, there
is little to expect in trying to refine it [242]. A second desirable property, introduced
by Knoblock [289], is called “ordered monotonicity”. This property states that any
concrete solution can be derived from some abstract solution while leaving the actions
in the abstract plan intact and relevant to the concrete plan.
Commenting upon Knoblock’s work, Helmert [242] observes that, by itself, the
ordered monotonicity property is not sufficient to guarantee good performance of
a hierarchical planning approach. It guarantees that every concrete solution can be
obtained in a natural way from an abstract solution, but it does not guarantee that all
abstract solutions can be refined to concrete ones. Such a guarantee is provided by the
downward refinement property, introduced by Bacchus and Yang [27]. The down-
ward refinement property can rarely be guaranteed in actual planning domains, so
Bacchus and Yang developed an analytical model for the performance of hierarchical
planning in situations where a given abstract plan can only be refined with a certain
probability p < 1. Based on this analysis, they present an extension to ALPINE, called
HIGHPOINT , which selects an abstraction hierarchy with high refinement probabil-
ity among those that satisfy the ordered monotonicity property. In practice, it is not
feasible to compute the refinement probability, so HIGHPOINT approximates this
value.
More recently, Helmert and co-workers [237] have proposed Pattern Database
(PDB) heuristics for cost-optimal planning. The PDBs are based on abstractions
of the search space, and are currently one of the most promising approaches to
developing better admissible heuristics. A PDB stores the cost of optimal solutions
for instances of abstract subproblems. These costs are used as admissible heuristics
for the original problem. The informedness of PDB heuristics depends crucially on
the selection of appropriate abstractions (patterns), but finding good patterns is a
difficult task. The authors present a way of constructing good patterns automatically
from the specifications of planning problem instances, and show that a domain-
independent planner, guided by these patterns, is able to solve planning problems
optimally in some very challenging domains. Later on, PDBs have been used in
58 3 Abstraction in Artificial Intelligence

Fig. 3.4 Problem of spurious

states. The abstract state a is
the counterpart of the ground
state s. The abstract state a
is reachable from a, but a is
not the image of some ground
state

conjunction with “perimeter search”1 by Felner and Ofek [163] so as to improve

planning performances.
A problem arising in abstract planning is that spurious states often appear in the
abstract space. Given a state s in the ground space, a spurious state is an abstract state
that is reachable from the abstraction of s but is not the abstract image of any original
state reachable from s in the ground space. The problem is graphically illustrated in
Fig. 3.4.
The presence of spurious states may generate various difficulties in planning, such
as slow down or dramatic memory increase. In order to face these difficulties, Zilles
and Holte [587] have proposed the “Downward Path Preserving” (DPP) property.
This property is necessary and sufficient to guarantee that the abstract space does not
contain spurious states at all. Even though both the problem of determining whether
or not a given abstraction has the DPP property, and the problem of determining
whether or not a given state space possesses a DPP abstraction are computationally
hard, the authors have identified simple formal conditions on state spaces that support
finding DPP abstractions in polynomial time.
Another recent application of abstraction to conditional planning is derived by
Srivastava et al. [502] from software model checking; it supports the representation
of collections of states with different object quantities and properties [107]. The
methodology enables scalable, algorithm-like plans with branches and loops; these
plans can solve problems of unbounded size. Model checking and abstraction have
also been combined in the system SLAM [30], which is a symbolic software model
checker for the C language, combining predicate abstraction together with counter-
example-driven abstraction refinement. Counter-example-driven abstraction refine-
ment is an automatic process that produces abstract models of systems. When this
process is applied to software, the automatic theorem prover ZAPATO, for quantifier-
free First Order Logic, helps one to determine the feasibility of program paths and to
refine the abstraction. A number of methodologies combining model checking and
abstraction, including invisible auxiliary expressions and counter abstraction, have
been reviewed by Zuck and Pnueli [589].
An interesting approach to solve planning problems in AI is to model them as
Constraint Satisfaction Problems (CSP). In this way, all the effective existing solvers
for CSP could be applied. However, formulating effective constraint models of com-
plex planning problems is a challenging task. Rendl et al. [451] propose a method for

1 A kind of bi-directional search [163].

3.2 Abstraction in Planning 59

common subexpression elimination, which can lead to a great reduction in instance

size, saving time and search space. The authors identify general causes of com-
mon subexpressions from three modeling techniques often used to encode planning
problems into constraints.
Using constraints in planning goes back to Stefik [507]. He described a system,
MolGen, which aimed to cope with the problem of subproblem interaction in hierar-
chical planning. MolGen uses the approach of constraint posting, where constraints
are dynamically formulated and propagated during hierarchical planning, thus coor-
dinating the solutions of nearly independent subproblems.
Another approach that uses abstraction coupled with a reformulation of the prob-
lem at hand was proposed by Holte et al. [259], who exploit the transformation
of a problem solving task into finding a path in a graph between two nodes. The
graph-oriented view of abstraction suggested two types of abstraction techniques:
Algebraic abstraction and STAR abstraction. The latter one proved to be especially
useful, providing a significant speed up in several applications.

3.3 Abstraction in Constraint Satisfaction Problems

Abstraction techniques applied to Constraint Satisfaction Problems (CSP) have quite

a long history in AI, one that started in 1981 with the already mentioned MolGen by
Stefik [507]. Later on, abstraction has been used by Ellman for solving CSPs with
global constraints2 and low solution density, by clustering approximately equivalent
objects. Ellman has also proposed a theoretical framework to take into account the
symmetries in the constraints of the CSP [146].
A particular way of generating abstractions in CSPs has been described by
Choueiry et al. [101] and by Freuder and Sabin [183]; these authors base abstrac-
tion on “interchangeability”, a notion first proposed by Freuder [182]: a value a
for variable v is said to be fully interchangeable with value b if and only if every
solution in which v = a is still a solution when b is substituted for a and vice-
versa. In other words, interchangeability consists in the possibility of exchanging
one value for another in a CSP solution, for computing other consistent solutions by
local changes in already known solutions. Weigel and Faltings cluster variables to
build abstraction hierarchies for configuration problems viewed as CSPs, and then
use interchangeability to merge values on each level of the hierarchy [555].
Along the years there have been several approaches or frameworks, based on refor-
mulation, proposed for abstraction in CSPs. In particular, Lecoutre et al. [324] pro-
posed a framework sufficiently general, on the one hand, to unify previous approaches
and to suggest new forms of abstraction, and, on the other, sufficiently precise to prove
the “correctness” of a given abstraction.

2 A constraint C is often called global when processing C as a whole gives better results than
processing any conjunction of constraints that is semantically equivalent to C [56].
60 3 Abstraction in Artificial Intelligence

More recently, the framework for abstraction in CSPs has been extended to Soft
Constraints [63]. Soft constraints, as opposed to hard constraints, are represented
as inequalities, and may correspond to preferences [458]. Although very flexible
and expressive, they are also very complex to handle. The authors have shown that
“processing the abstracted version of a soft constraint problem can help us in finding
good approximations of the optimal solutions, or also in obtaining information that
can make the subsequent search for the best solution easier”.
The semiring-based CSP framework proposed by Bistarelli et al. [63] has been
extended by Li and Ying [332], who propose an abstraction scheme for soft con-
straints that uses semiring homomorphism. To find optimal solutions of the concrete
problem, one works first on the abstract problem for finding its optimal solutions,
and then uses them to solve the concrete problem. In particular, the authors find
conditions under which optimal solutions are preserved under mapping.
A method for abstracting CSPs represented as graphs has been proposed by
Epstein and Li [152]. Through a local search, they find clusters of tightly connected
nodes, which are then abstracted and exploited by a global searcher.
An improvement of the scalability of CSPs has been obtained via reformulation.
Bayer et al. [44] describe four reformulation techniques that operate on the vari-
ous components of a CSP, by modifying one or more of them (i.e., query, variable
domains, constraints) and detecting symmetrical solutions to avoid generating them.
Reformulation for speeding up solving CSPs has also been proposed by Charnley
et al. [93].
A very interesting, but isolated, approach to CSPs is described by Schrag and
Miranker [479]. They start considering the phase transition between solvability and
unsolvability existing in CSPs, and try to apply domain abstraction to circumvent
it. Domain abstraction is an efficient method for solving CSPs, which is sound and
incomplete with respect to unsatisfiability; then, its application is useful only when
both the ground and the abstract problems are unsatisfiable. The authors have char-
acterized the effectiveness of domain abstraction, and found that this effectiveness
undergoes itself a phase transition, dropping suddenly when the loosening of con-
straints, generated by the abstraction, increases. Finally, they developed a series
of analytical approximations to predict the location of the phase transition of the
abstraction effectiveness.

3.4 Abstraction in Knowledge Representation

As we have seen in Chap. 2, abstraction is at the core of knowledge representation in

many disciplines, either formally or informally. One of the most used abstractions
is the domain abstraction, already mentioned in Sect. 3.1; in this abstraction objects
in a domain are grouped into equivalence classes, represented by a unique element
[153, 252, 269]. Special cases of domain abstraction can be considered the theory of
fuzzy sets [118, 576] and the theory of rough sets [364, 414]. These latter are largely
used in the discretization of numerical variables.
3.4 Abstraction in Knowledge Representation 61

Katsiri and Mycroft describe a system for dynamic knowledge-base maintenance

in the context of the Sentient Computing environment [282]. Sentient Computing
constantly monitors a rapidly-changing environment, thus introducing the need for
abstract, computationally efficient modeling of the physical world.
The relation of the physical world with the symbolic knowledge used to describe it
is very relevant to abstraction. Creating and maintaining the correspondence between
symbols and percepts that refer to the same object is the “anchoring” or “grounding”
problem. One of the first approaches to anchoring was described by Coradeschi and
Saffiotti [115], for robotic applications. They consider an agent that is equipped with
a perceptual and a symbol system; moreover, there exists a mapping f between
predicates in the latter and measured properties in the former. The correspondence
f must be used to create a link between a symbol, which denotes an object in the
world, and the percepts generated by that object in the perceptual system.
An AI subfield where abstraction plays a fundamental role is spatial and/or tem-
poral reasoning. Time and space, in fact, lend themselves naturally to be abstracted
at different scales, including or excluding more and more details. Spatial abstraction
consists in the process of aggregating points or zones of the space into larger regions,
forgetting details as the scale increases [575]. In this way an originally continuous
input can be transformed into a discrete one, where equivalence between objects can
be established, and qualitative or quantitative spatial relations can be defined. Another
view of spatial abstraction is via the introduction of the notion of spatial granularity
[478]. In a landscape where there is still no global agreement on this notion, Belussi
et al. [45] propose a formal definition of spatial granularity and other related terms
involved in the spatial context. The spatial framework they present is merged with
the temporal one, giving a unified definition of spatio-temporal granularity able to
manage information about spatial granularities changing over time.
Temporal abstraction is used with two meanings: either as the process of rep-
resenting events at different time scales (minutes, days, ....) in temporal reasoning,
or as the process of finding common patterns in time series, in Data Mining [461,
573]. In the first meaning, changing scale can let phenomena become apparent or
hidden, and, for each problem, the determination of the most useful scale can be
fundamental for its solution. Temporal abstraction has a crucial role in Medicine,
as it supports precise qualitative descriptions of clinical data to enable effective and
context sensitive interpretation thereof [503].
Temporal abstraction has also been used in Reinforcement Learning, in Markov
decision processes [516], and in stochastic processes [413]. Fitch et al. [171] apply
abstraction in Reinforcement Learning in the context of multi-agent systems, where
the state space grows exponentially in the number of agents. Types of structural
abstraction applied to such systems are symmetry, decomposition into multiple
agents, hierarchical decomposition, sequential execution, or combinations of ele-
mentary abstractions. Abstraction in multi-agent systems acting in a virtual world is
also the object of investigation by McDonald et al. [359], as well as by Mukherji and
Kafura [386].
Abstraction has also been used in the representation of knowledge structured
into graphs. In this context, abstraction corresponds to grouping nodes preserving
62 3 Abstraction in Artificial Intelligence

some property. Examples of this type of abstraction in graphs are given by Saitta
et al. [466], Bauer et al. [43], Boneva et al. [67], Bulitko et al. [84], Harry and
Lindquist [234], whereas abstraction in hierarchical Hidden Markov Models has
been handled by Galassi et al. [187], Fine et al. [169], and Murphy and Paskin [387].

3.5 Abstraction in Agent-Based Modeling

At the frontier between AI and Object-Oriented Programming (OOP), abstraction

plays also a key role in modeling systems, and more particularly in Agent-Based
architectures. Agent-based modeling (ABM) aims at building generative models,
whose behavior is the result of the interactions between its components. Recent archi-
tectures support building these models in a modular and incremental way, offering
reusability and flexibility to the modelers. In effect, handling the dynamics of large-
scale agent-based models is an inherent complex task. To address this issue, several
approaches have relied on abstraction, and, more specifically, on identifying groups
of interacting similar agents and replacing them with abstract entities. The general
intuition is that by using such groups as single entities (representing large numbers
of interacting agents) increases the scalability of large-scale multi-agent simulations,
obviously at the expense of precision. However, the obtained abstracted process pro-
vides an approximation with a behavioral error that can be estimated [484]. Vo [547]
has proposed a fully operational architecture and language to support multiple levels
of representation within a pure agent-based modeling approach. This approach is
implemented in GAMA, a spatially explicit, multi-level, agent-based modeling and
simulation platform.
Some approaches in the Multi Agent Simulation (MAS) field also exploit the
principle of simultaneous use of microscopic and macroscopic models, by partition-
ing the environment and running the model in each partition [394]. The pedestrian
simulation described by Stylianou et al. [511] uses high level flows and distribu-
tions models to steer non-visible agents along a network of nodes that describe the
accessible areas of a city, and a microscopic collision avoidance model with speed
adjustment for visible actors. Similarly, Bourrel et al. [73] describe traffic simula-
tions using a static predesigned world. Thus, a macroscopic model, based on the flow
theory, is used in areas of low interest without crossroads, and a microscopic, multi-
agent, car-following model in high interest areas. This approach is also developed by
Anh [401]. These architectures can handle several thousand agents with high con-
sistency level, and offer a good interactivity with the agents’ behavior within both
macroscopic and microscopic areas. The drawback is that they require a preprocessed
environment and predefined transition functions between the agent models.
3.6 Summary 63

3.6 Summary

The brief survey of abstraction in AI gives an overview of the different concepts that
are frequently associated or used to define abstraction. Although this chapter does
not account for all the research carried out on abstraction in AI,3 it allows the main
concepts that are common to many studies to be identified.
The notion of “detail” is often associated to that of relevance for a class of tasks
[64, 320, 513]. Details which are hidden are indeed defined as being “less relevant”
to these tasks. In Machine Learning, for example, Blum and Langley [64] have given
several definitions of attribute relevance. Their definitions are related to measures
that quantify the information brought by an attribute with respect to other attributes,
or the class, or the sample distribution.
As in practice the choice among sets of alternative abstractions may be difficult,
given their large number and under-constrainedness, and the fact that abstraction must
preserve some additional “desirable property”. These “desirable” properties differ
according to the field where abstraction is used. In problem solving, for example, a
classical desired property is the “monotonicity” [289, 462]. This property states that
operator pre-conditions do not interfere once abstracted. Another useful property is
the “downward refinement” [27] that states that no backtrack in a hierarchy of abstract
spaces is necessary to build the refined plan. In theorem proving, a desirable property
states that for each theorem in the ground representation there exists a corresponding
abstract one in the abstract representation (TI-abstraction). In Machine Learning,
a desirable property states that generalization order between generated hypotheses
should be preserved [208]. In Constraint Satisfaction Problems a desirable property
is that the set of variables that are abstracted into one have “interchangeable supports”
[182, 183]. A domain independent desirable property states that some order relation
is preserved [252].
Finally, the notion of simplicity is essential to characterize abstract representations.
All these notions will be useful to establish a definition of abstraction in the next
chapter.

3 For example abstraction in games [205, 486], or abstraction in networks [466, 472, 586], or
abstraction in Multiple Representation Modeling (MRM) [13, 41, 123, 192], or many others.
Chapter 4
Definitions of Abstraction

“Abstraction is probably the most powerful tool

available to managing complexity”
[Archer et al. 1996]

ven though abstraction is a fundamental mechanism in human reasoning,

little is known about how we perform it. Thus, to make a model of human abstraction is
not possible at the present state of knowledge. Different is the situation for artifacts,
because we are free to choose any abstraction model we like. However, models
must satisfy some requirements in order to be acceptable and useful. In the case of
abstraction, we believe that a model should satisfy at least two properties:
• Its conceptualization must be reasonable, in the sense that, even though not dupli-
cating human processing, it must match our intuition, and produce results close to
the ones a human reasoner would;
• It must support the implementation of tools offering some computational or con-
ceptual benefit.
Unfortunately, finding “good” abstractions is difficult, and still a matter of art.
Intuitively, a good abstraction should be one that supports solving the task at hand in
a more efficient way, by simplifying the task itself, or its solution. In turn, efficiency
and simplicity are both difficult to define in general, but there exist definitions that
depend on the application domain.
From a practical point of view, abstraction is often combined with reformulation
of a problem, but often this reformulation represents a costly overhead; the success
of abstraction in many domains depends, therefore, on the invention of techniques
that will repay their overhead with a speedup in the abstract space.
As emerged from Chaps. 2 and 3, there are a number of issues that a definition
of abstraction should deal with. Actually, no single definition is able to handle all of

L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 65
DOI: 10.1007/978-1-4614-7052-6_4, © Springer Science+Business Media New York 2013
66 4 Definitions of Abstraction

them, because it stresses one or few aspects over all the other ones. In conclusion,
the important issues in abstraction can be summarized as follows:
1. Simplicity—There is a general agreement that abstraction should reduce the com-
plexity of tasks. Even though simplicity is not easy to define as well (see Chap. 10),
an intuitive notion may nevertheless help us in several contexts.
2. Relevance—Abstraction is largely supposed to capture relevant aspects of prob-
lems, objects, or perceptions. It is then a mechanism suitable to select those
features that are useful to solve a task.
3. Granularity—An entity can be described at different levels of detail; the less
details a description provides, the more abstract it is. By progressively reducing
the amount of details kept in a description, a hierarchy of abstractions is obtained.
Details or features may be hidden, and hence removed from descriptions (selective
abstraction), or aggregated into larger units (constructive abstraction). Granularity
is also linked to the notion of scale.
4. Abstract/concrete status—Abstraction is connected with the idea of taking a dis-
tance from the sensory world. The dichotomy applies to ideas, concepts, or words,
which can be classified as either concrete or abstract. This issue is also related to
the nature of abstraction as a state or a process.
5. Naming—When a name is given to an entity, this name stands for the properties
and attributes characterizing the entity; in a sense, the name captures its essence.
Abstraction is also naming.
6. Reformulation—Abstraction can be achieved through a change of representation.
Even though reformulation is most often used for problem formalization, it can
also be applied to data. Representation changes may involve either description
languages, or the described content, or both.
7. Information content—Abstraction is related to the amount of information an
entity (object, event, …) provides.
When introducing a formal definition, we will analyze which among the above issues
are specifically targeted. In this chapter the review of the theoretical models proposed
for abstraction shall mostly follow a chronological order. An exception is the first
work by Giunchiglia and Walsh [214], which is described first. The reason is that
this work introduces some notions that are useful to classify and compare abstraction
models, and we will use them to this purpose.

4.1 Giunchiglia and Walsh’ Theory

Some foundations of abstraction have been set by Giunchiglia and Walsh [213, 214],
who tried to provide a unified framework for handling abstraction in reasoning, at
the same time defining their own theory. The authors’ central goal was to provide a
general environment for the use of abstraction in automated deduction. Giunchiglia
and Walsh start from the definition of a formal system:
4.1 Giunchiglia and Walsh’ Theory 67

Definition 4.1 (Formal system) A formal system is a pair Σ = (Θ, L), where Θ is
a set of well-formed formulas (wff) in the language L.
Abstraction is then defined as a mapping between formal systems, preserving some
desirable properties (specified later on).
Definition 4.2 (Abstraction) An abstraction f : Σ1 → Σ2 is a pair of formal
systems (Σ1 , Σ2 ), with languages L1 and L2 , respectively, and an effective, total
function fL : L1 → L2 . Σ1 is called the “ground” space and Σ2 the “abstract” one,
whereas fL is called the “mapping function”.
In a formal system the set of theorems of Σ, denoted by TH(Σ), is the minimal set of
well formed formulas, including the axioms, that is closed under the inference rules
(used to perform deduction). Being oriented to theorem proving, the authors choose
provability as the central notion, and classify abstraction mappings with respect to
this notion.
Definition 4.3 (T ∗ -Abstraction) An abstraction f : Σ1 → Σ2 is said to be:
1. Theorem Constant (TC) iff, for any wff α, α ∈ TH(Σ1 ) iff fL (α) ∈ TH(Σ2 ),
2. Theorem Decreasing (TD) iff, for any wff α, if fL (α) ∈ TH(Σ2 ) then α ∈
TH(Σ1 ),
3. Theorem Increasing (TI) iff, for any wff α, if α ∈ TH(Σ1 ) then fL (α) ∈ TH(Σ2 ).
A graphical representation of the various types of abstraction is reported in Fig. 4.1.
Giunchiglia and Walsh do not consider TC-abstractions any further, because they
are too strong, and hence not very useful in practice. Furthermore, Giunchiglia and

Fig. 4.1 Classification of abstraction mappings according to provability preservation. The set of
theorems TH(Σ2 ) can be either identical, or a proper subset, or a proper superset of the abstractions
of the theorem set TH(Σ1 )
68 4 Definitions of Abstraction

Walsh make a classification of the uses of abstraction in theorem proving, identi-

fying two dimensions: deductive versus abductive, and positive versus negative. A
synthesis of the four combinations is reported in the following.
• Deductive/Positive—This combination of uses concerns TD-abstractions. If
fL (α) ∈ TH(Σ2 ), then it follows that α ∈ TH(Σ1 ). In other words, if we can
prove that an abstract wff is a theorem, certainly the ground wff is a theorem as
well.
• Abductive/Negative—This combination of uses concerns TD-abstractions. If
fL (α) ∈
/ TH(Σ2 ), then it is likely that α ∈/ TH(Σ1 ). In other words, if we cannot
prove that an abstract wff is a theorem, this “suggests” that the ground wff may
not be one.
• Abductive/Positive—This combination of uses concerns TI-abstractions. If
fL (α) ∈ TH(Σ2 ), then it is likely that α ∈ TH(Σ1 ). In other words, if we can
prove that an abstract wff is a theorem, this “suggests” that also the ground formula
may be one.
• Deductive/Negative—This combination of uses concerns TI-abstractions. Given a
wff α ∈ L1 , if fL (α) ∈
/ TH(Σ2 ), then it follows that α ∈/ TH(Σ1 ). In other words,
if we cannot prove that an abstract wff is a theorem, certainly the ground wff is
not.
A graphical illustration of the above combinations is given in Fig. 4.2.
A special case of abstraction, considered by Giunchiglia and Walsh, is predicate
abstraction.

Fig. 4.2 The combinations of the Deductive/Abductive and Positive/Negative modes of using
abstraction
4.1 Giunchiglia and Walsh’ Theory 69

Definition 4.4 (Predicate abstraction) A predicate abstraction is a mapping f :

Σ1 → Σ2 such that distinct predicates in L1 are mapped onto non distinct predicates
in L2 .
The previous definition states that all predicates p(x) in a given class are mapped,
through a function fpred , to the same predicate q(x). Such abstractions are TI. Predi-
cate abstraction can map a consistent theory into an inconsistent one. For example,
if fpred (p1 ) = fpred (p2 ) = p, and the set of axioms in Σ1 is Ω1 = {p1 , ¬p2 }, then Σ1
is consistent but Σ2 is not.
After showing that their definition of abstraction (i.e., two formal systems and
a mapping) is very general and allows a unified treatment of several other pro-
posed theories, Giunchiglia and Walsh define some operations that can be applied to
abstractions, in particular equality and composition.
Definition 4.5 (Abstraction equality) Given two abstractions f : Σ1 → Σ2 and
g : Σ1 → Σ2 , we say that f = g iff their mapping functions are identical, i.e., iff
fL = gL .
Definition 4.5 uses an extensional notion of equality, so that two functions can be
recognized as equal even though they have different intensional definitions. As a
special case, if f : Σ → Σ, f is called the identity abstraction.
Definition 4.6 (Abstraction composition) Given two abstractions f : Σ1 → Σ2
and g : Σ2 → Σ3 , the abstraction composition of f and g is defined as f ◦ g :
Σ1 → Σ3 with the mapping fL ◦ gL .
The composition of two abstractions is itself an abstraction. In particular, the com-
position of two TI-abstractions is a TI-abstraction, whereas the composition of two
TD-abstractions is a TD-abstraction. When a TI-abstraction is composed with a TD
one, in either order, the result is not determined.
Another notion introduced by Giunchiglia and Walsh is the one of order between
abstractions.
Definition 4.7 (Order between abstractions) If f1 : Σ1 → Σ2 and f2 : Σ1 → Σ3
are two abstractions, then f1 f2 iff for all wffs ϕ, if f1 (ϕ) ∈ TH(Σ2 ) then f2 (ϕ) ∈
TH(Σ3 ). We say then that f1 is stronger than f2 (or that f2 is weaker than f1 ).
In Definition 4.7 the abstractions f1 and f2 must share the same ground space, whereas
the abstract spaces may be totally different. Finally, the idea of abstraction hierarchies
is introduced as an iterative process consisting of the following steps:
• picking an abstraction,
• explicitly generating an abstract space,
• using the abstract space as a new ground one.
The properties of the abstractions in the hierarchy depend on the type of abstraction
used.
70 4 Definitions of Abstraction

4.2 Abstraction in Philosophy

Event though abstraction is acknowledged to be a crucial notion in Philosophy (as

described in Sect. 2.1), only a few authors have tried to build up a model of it. In the
following, we describe in detail two of the best known ones.

4.2.1 Wright and Hale’s Abstraction Principles

The first model that we consider (see Sect. 2.1) was proposed by Wright [569] and
Hale [230], following an idea of Frege, and is based on the notion of Abstraction
Principle.
Definition 4.8 (Abstraction Principle) Let f (x) be a function, defined on a variable
x ranging over items of a given sort. An Abstraction Principle is an equation:

f (x) = f (y) iff R(x, y), (4.1)

where R is an equivalence relation over the domain of f .

The idea behind the abstraction principles is that, in order to understand f (x) = f (y),
one must understand the relation R(x, y) first. For instance, x and y could be straight
lines, and the function f may denote their direction. Then, expression (4.1) states
that two parallel lines share the same direction.
If an abstraction principle holds, then there is an associated concept Kf , such that:

z is an instance of Kf iff ∃w | z = f (w) (4.2)

For instance, the concept Kf in (4.2) is the concept of “direction”. Belonging to

Kf means to be a direction shared by some lines.
Following the “Way of Abstraction” (see Sect. 2.1), an abstract entity can be
defined as follows:
Definition 4.9 (Abstract entity) An entity x is abstract if x is an instance of some
concept Kf , whose associated functional expression f is governed by an abstraction
principle.
Definition 4.9 considers abstraction as a status of an entity, and hence it addresses
the issue of classifying entities as abstract or concrete. The idea relies on a notion of
concept defined by Eqs. (4.1) and (4.2). According to these definitions, to be abstract
is equivalent to be a property shared by similar objects. However, the above approach,
in its simplest form, suffers from a number of drawbacks. For instance, it is not clear
whether it would classify any mathematical object as an “abstract entity”, such as
sets; this is because a suitable abstraction principle is not readily available. In other
cases, such as “chess", it is the function f (·) that is difficult to define.
4.2 Abstraction in Philosophy 71

As formulated, the Abstraction Principle approach has counterexamples. For

instance, it is reasonable to believe that an aggregation of concrete objects is still
concrete; for example, a train is a concrete object formed by a set of connected
carriages. Then, we can define a function train(x)1 , and an abstraction principle:

train(x) = train(y) iff (x and y are carriages) and (x and y are connected).

In words, the above principle states that two carriages, connected together, share
(belongs to) the same train. By applying the above definition we may conclude that
the train is an abstract entity. Hale and Wright have proposed more sophisticated
accounts of Abstraction Principles (see also [168]), but it is still unclear whether
their new approaches are free from counterexamples.

4.2.2 Floridi’s Levels of Abstraction

Another formal definition of abstraction in Philosophy is proposed by Floridi [175],2

who introduced the Method of Levels of Abstraction to describe systems. He starts
from the definition of a typed variable.
Definition 4.10 (Typed variable) A typed variable is a uniquely-named conceptual
entity (the variable) and a set, called its type, consisting of all the values that the
entity can take.
Two variables are equal iff they have the same name, and their types are the same
set. Variables are the basis for the definition of observables.
Definition 4.11 (Observable) An observable is an interpreted typed variable, namely
a typed variable together with a statement of what feature of the system under con-
sideration it represents. An observable is an abstraction.
Two observables are equal iff their typed variables are equal, they model the same
feature, and they always assume the same value. The concept of observable has to
be taken in a wide sense, because it may correspond to some physical measurement,
but also to an “artefact of a conceptual model”. Observables are used to define
descriptions of systems.
Definition 4.12 (Levels of abstraction) A Level of Abstraction (LoA) is a finite,
non-empty set of observables.
Floridi’s definition of abstraction addresses both the issues of granularity and of rel-
evance in abstraction. In fact, on the one hand, an observable may be a simplification

1 The function train(x) is defined as train : CARS → TRAINS, and train(x) provides the identifier
of the train that contains car x.
2 See Sect. 2.1
72 4 Definitions of Abstraction

of the system at hand, as it only focalizes on some specific aspects; on the other,
LoAs are intended to capture exactly those aspects that are relevant to the current
goal. For instance, if we want to choose a wine for a special dish, we may define a
“tasting LoA”, including bouquet, sweetness, color, acidity; if, instead, we want to
buy a bottle of wine, a “purchasing LoA”, including maker, vintage, price, and so
on, is more useful. LoAs allow multiple views of a system, but they are not sufficient
to completely describe it; the additional notion of behavior is needed.
Definition 4.13 (Behavior) The behavior of a system, at a given LoA, consists of a
predicate Π , whose free variables are observables at that LoA. The substitutions of
values for the observables that make the predicate true are called the system behaviors.
A moderated LoA is a LoA together with a behavior.
When the observables of a LoA are defined, it is usually the case that not all the
combinations of possible values for the observables are realizable. The behavior
aims at capturing only those combinations that are actually possible.
LoAs are then linked to the notion of granularity in describing systems, and
Floridi takes a further step by allowing multiple LoAs. To this aim, the notion of
relation must be recalled.
Definition 4.14 (Relation) Given a set A and a set C, a relation R from A to C is a
subset of the Cartesian product A × C. The reverse of R is the set {(y, x)|(x, y) ∈ R},
where x ∈ A and y ∈ C.
A relation R from A to C translates any predicate p(x) on A to a predicate qR [p] on
C, such that qR [p](y) is true at just those y ∈ C that are the image through R of some
x ∈ A, satisfying p, namely:

qR [p](y) = ∃x ∈ A| R(x, y) ∧ p(x)

In order to see more precisely the meaning of the introduced relation, let us define
the cover of R:
COV (R) = {(x, y)| x ∈ A, y ∈ C and R(x, y)}

Then:
COV (R−1 ) = {(y, x)| x ∈ A, y ∈ C and R(x, y)}

Let p(x) be a predicate in A. As illustrated in Fig. 4.3, we have:

COV (p(x)) ⊆ A, COV (qR [p](y)) = {y|p(x) ∧ R(x, y)} ⊆ C

qR [p] is a predicate whose instances are in that subset of C that contains the images
of the points in COV (p(x)).
Example 4.1 Let A be the set of men and C the set of women. Let x ∈ A, y ∈ C,
and Son be the relation Son ⊆ A × C, linking a male person with his mother. Let
moreover p be the predicate Student. Then:
4.2 Abstraction in Philosophy 73

Fig. 4.3 Graphical repre-

sentation of the relation
between the predicate p(x),
with COV (p(x)) ⊆ A,
and COV (qR [p](y)) ⊆ C,
linked through relation
R(x, y) ⊆ A × C. COV (p(x))
is the interval [a,b] on the
X axis, and COV (qR [p](y)) is
the interval [c,d] on the Y
axis

qSon [Student](y) = ∃x ∈ A|Student(x) ∧ Son(x, y)

We have that COV (qSon [Student](y)) = {Mothers (subsets of C) whose sons are
students (subset of A)}.
Finally, the main notion of Floridi’s account of abstraction is provided by the
following definition:
Definition 4.15 (Gradient of abstraction) A Gradient of Abstraction (GoA) is a
finite set of moderated LoAs Li (0 i n), a family of relations Ri,j ⊆ Li × Lj
(0 i
= j < n), relating the observables of each pair (Li , Lj ) of distinct LoAs in
such a way that:
1. relation Ri,j is the reverse of relation Rj,i for i
= j,
2. the behavior pj at Lj is at least as strong as the translated behavior.
The meaning of Definition 4.15 can be better understood by looking at Fig. 4.4. We
have two LoAs, namely Li , with observable {X1 , · · · , Xn } and Lj , with observable
{Y1 , · · · , Ym }. Observable Xr takes values xr in Λr (1 r n), whereas observable
Ys takes values ys in Λs (1 s m). Given a relation between pairs of observ-
ables, Ri,j (Xr , Ys ) ⊆ Li × Lj , the first condition of Definition 4.15 simply says that
−1
Rj,i (Ys , Xr ) = Ri,j (Xr , Ys ), i.e., the relation between Lj and Li is the reverse of the one
between Li and Lj . The second condition is a bit more complex. Let Πi (X1 , · · · , Xn )
be a behavior of Li , and let Πj (Y1 , · · · , Ym ) be a behavior of Lj . Let us transform Πi
with Ri,j , obtaining thus:

qRi,j [Πi ] = ∃(X1 , · · · , Xn )|Πi (X1 , · · · , Xn ) ∧ Ri,j (X1 , · · · , Xn , Y1 , · · · , Ym ),

74 4 Definitions of Abstraction

Ri,j(Xr, Ys)

Xr .
.Y s

Rj,i(Ys, Xr)

Li = {X1, …,Xn} Lj = {Y1, …,Ym}

Fig. 4.4 Correspondence between LoAs established by the relation Ri,j (Xr , Ys )

Ri,j(Xr, Ys)
qRi,j[ i]
i
j

(Li)n (Lj)m

“Abstract” LoA “Concrete” LoA

Fig. 4.5 Correspondence between behaviors established by the relation Ri,j (Xr , Ys )

where Ri,j (X1 , · · · , Xn , Y1 , · · · , Ym ), which is a function of Y1 , · · · , Ym , denotes the

behavior on the variables Ys ∈ Lj that corresponds to the variables Xr ∈ Li via Ri,j .
Then, this condition establishes that:

Πj (Y1 , · · · , Ym ) ⇒ qRi,j [Πi ](Y1 , · · · , Ym )

The situation is described in Fig. 4.5, where we can see that, according to Giunchiglia
and Walsh’ classification, Floridi’s abstraction is theorem decreasing. In fact, each
true behavior in Lj (the “concrete” LoA), which implies Πi ’s transformed predicate
qRi,j [Πi ], has a corresponding behavior Πi which is true.
For the sake of illustration, let us introduce an example.
Example 4.2 Let Li = {X} and Lj = {X, Y } be two LoAs, where X and Y assume
values in R+ . The relation Ri,j is a relation of inclusion, namely Li ⊂ Lj . Let Πi be
a behavior of Li , whose cover is COV (Πi ) = DX = [c1 , c2 ]. Any instantiation of
Πi , let us say Πi (a), is transformed into a vertical line x = a in Lj , as described in
Fig. 4.6.
Then, the cover of the predicate qRi,j [Πi ] is the vertical strip defined by c1
x c2 . For each behavior Πj inside the strip, the corresponding Πi is true. If
4.2 Abstraction in Philosophy 75

Fig. 4.6 States of a LoA Li , consisting of a single observable X, can be represented by points on the
X axis, whereas states of a LoA Lj , consisting of the pair of observables X and Y , can be represented
by points on the (X, Y ) plane. A state corresponding to a true behavior in Li is, for example, the
point a ∈ DX = [c1 , c2 ]. As all values of Y are compatible with X = a, then all points on the
vertical line x = a correspond to true behavior of Li . As far as the behavior Πj has a cover included
in the strip c1 x c2 , there is always a corresponding behavior in Li which is true. However, if
this is not the case, as for Πj (X, Y ), a corresponding true behavior on Li may not exist

Πj qRi,j [Πi ], as, for instance, Πj , there may not exist an a such that Πi (a ) is
true.
Two GoAs are equal iff they have the same moderated LoAs, and their families
of relations are equal. Condition (1) in Definition 4.15 states that the behavior mod-
erating each lower level LoA is consistent with the one of higher level LoAs. This
property links among each other the LoAs of the GoA. Definition 4.15 only asserts
that the LoAs are related, but it does not specify how. There are two special cases of
relatedness: “nestedness” and “disjointness”.
Definition 4.16 (Disjoint GoA) A GoA is called disjoint if and only if the Li ’s are
pairwise disjoint (i.e., taken two at a time, they have no observable in common), and
the relations are all empty.
A disjoint GoA is useful to describe a system under different, non-overlapping
points of view.
Definition 4.17 (Nested GoA) A GoA is called nested if and only if the only non-
empty relations are those between Li and Li+1 (0 i < n − 1), and, moreover, the
reverse of each Ri,i+1 is a surjective function3 from the observables of Li+1 to those
of Li .

3 We recall that a surjective function is a function whose image is equal to its codomain. In other

words, a function f with domain X and codomain Y is surjective if for every y ∈ Y there exists at
least one x ∈ X such that f (x) = y.
76 4 Definitions of Abstraction

Fig. 4.7 Surjectivity of relation Ri+1,i , which is the reverse of Ri,i+1

A nested GoA is useful to describe a complex system at several levels of abstrac-

tion, incrementally more accurate. The condition of surjectivity requires that any
abstract observation in Li has at least one concrete counterpart in Li+1 , and indeed it
may have more than one; in other words, any abstract observable can be refined by
many concrete observables, as illustrated in Fig. 4.7. Formally, we have that:

∀Xi [∃Xi+1 | Ri,i+1 (Xi , Xi+1 )]

We may notice that, for Floridi, “abstraction” proceeds by progressive refinements

and not progressive simplifications. In order to clarify the concepts introduced so far,
let us look at two examples.
Example 4.3 (Disjoint GoA) As an example of a disjoint GoA, let us consider the
services in a domestic dwelling; they can be represented by LoAs for electricity,
plumbing, telephone, security, and gas. Each LoA uses different observables, but,
globally, they provide a comprehensive view of the status of the house.
Example 4.4 (Nested GoA) Let us consider a traffic light, which is observed to have
Color with values {red, yellow, green}. The light constitutes a LoA La with the
single observable Color. If we wish to be more precise about color, we might consider
a second LoA, Lg , where the variable Wavelength, with domain R+ , corresponds to
the wavelength of the color. To determine the behavior of Lg , suppose that two
(1) (2)
constants, λred < λred , delimit the wavelength of red. Then, the behavior of Lg is
simply the following predicate, with free variable Wavelength:

(1) (2)
(λred Wavelength λred ) ∨ (Wavelength = yellow) ∨ (Wavelength = green)

The sequence consisting of the LoAs La and Lg forms a nested GoA. Informally, the
smaller, abstract space {red, yellow, green} is a projection of the larger, concrete
4.2 Abstraction in Philosophy 77

Table 4.1 Correspondence

Color Wavelength
between the values of the
(1) (2)
observable Color in La and red λred , λred
those of the observable yellow yellow
Wavelength in Lg
green green

one. The relevant relation associates to each value c ∈ {red, yellow, green}
a band of wavelengths perceived as that color. Formally, R(Color, Wavelength) is
defined to hold if and only if each time the color is red the wavelength is in the
appropriate, corresponding interval:

(1) (2)
Color = red ⇐⇒ λred Wavelength λred

In Table 4.1 the cover of the predicate R(Color, Wavelength) is reported.

If we consider as a “theorem” the determination of the values of an observ-
able, then, the correspondence between Lg and La is a TD-abstraction, as mentioned
before.
LoAs are used in the “Method of Abstraction” to construct models of a system.
According to Floridi, one might think of the input of a LoA as consisting of the
system under analysis, comprising a set of data; its output is a model of the system,
comprising information. The quantity of information in a model varies with the LoA:
a lower LoA, with finer granularity, produces a model that contains more information
than a model produced at a higher, or more abstract, LoA. Thus, a given LoA specifies
the kind and quantifies the amount of information that can be “extracted” from the
description of a system.
Whereas in nested GoAs the ideas of granularity, level of detail and hierarchical
representation show all the signs of abstraction, it is less clear whether disjoint GoAs
can be related with the notion, except for the fact that each one captures only some
aspects of the reality.

4.3 Abstraction in Computer Science

As discussed in Sect. 2.4, abstraction plays a fundamental role in Computer Science

under various respects, not only in programming languages, but also in Software
Engineering, computer architectures, and almost any other branch of the field. Here
we concentrate mostly on the theory of Abstract Data Types (ADT). Abstraction may
concern either procedures or data.
An abstract data type is a construct that does not depend upon a specific pro-
gramming language or a particular architecture, because it specifies the nature and
the essential traits of a kind of data, T. What is typical of these defining traits is
that, beyond the specification of the set of values that objects of type T can take,
78 4 Definitions of Abstraction

they are not descriptive but operational: in other words, they define the operations
that can be done on T. From this point of view, the definition of an ADT is in line
with the cognitive approach of classifying/recognizing objects by their functions (see
Sect. 2.6).
The typical way of working with ADTs is encapsulation, meaning that (imple-
mentation) details are not lost, but hidden inside the higher level definition of the
type. A generic definition of an ADT is as follows:
type name
definition
scalar or structured type definition
operations
procedures and functions
end name
Liskov and Guttag’s templates, reported in Sect. 2.4, are instances of this one [333].
The definition of an ADT involves the abstraction issues of simplicity (no implemen-
tation details included in the definition), of relevance (only the defining aspects are
considered), of granularity (the type can be defined at different levels of detail), and
of naming (the name stands for all its properties).
Example 4.5 (ADT queue) An example of an ADT is provided by the type Q =
queue:
type queue
definition
Finite sequence q of element of type T
operations
Init(q): Initialize the queue to be empty
Empty(q): Determine whether the queue is empty
Full(q): Determine whether the queue is full
Append(q, x): Add a new item at the end of the queue (if not full)
Head(q): Retrieve the first item in the queue (if not empty)
Remainder(q): Delete the first item in the queue (if not empty)
end queue
The type Q is a composite one, because it makes reference to the type T of its
elements. Using the ADT queue, a sequence of operations on a queue q can be
described without the need to specify any programming language.
The view of abstraction in ADTs is shared by Floridi, in that both accounts of
abstraction move from the abstract to the concrete: first an ADT is defined, then
concrete implementations come on. The relation between the ADT and the imple-
mentation is again a TD-abstraction.
4.4 Abstraction in Databases 79

4.4 Abstraction in Databases

Several approaches have been proposed in the literature to handle abstraction in

databases; some of them can be put in correspondence with similar notions in other
fields, such as Knowledge Representation, or Machine Learning.

4.4.1 Miles Smith and Smith’s Approach

The models proposed by Miles Smith and Smith [371, 372] for abstraction in data-
bases, even though quite old, are still fundamental, as mentioned in Sect. 2.4. In a
couple of papers they have defined two kinds of abstraction to be used in relational
database: “aggregation” abstraction and “generalization” abstraction.
Aggregation Abstraction This type of abstraction transforms a relation among
several named objects into a higher-level named object. For example, a relation
between a person, a hotel, a room, and a date may be abstracted as the
aggregated object “reservation”. This transformation is realized through the
introduction of a new type of entities, i.e., aggregate, defined as follows (using
Hoarse’s structures [251]):
type name = aggregate[key]
A1 : Attribute1
A2 : Attribute2
...............
An : Attributen
end
Component objects of the type name appear in the aggregate as attributes, whereas
the content inside the squared parentheses denotes the key, i.e., the subset of attributes
that uniquely identify the aggregate. The type name defines a table scheme, whose
columns are the attributes A1 , · · · , An .
Example 4.6 (Miles Smith and Smith [371]) Let us define an aggregate reservat-
ion as follows:
type reservation = aggregate [Number]
# : [key] Number
P : Person
H : Hotel
nR : Number of rooms
SD : Starting date
D : Duration
end
The aggregate has a key, the attribute Number, which univoquely identifies the reser-
vation.
80 4 Definitions of Abstraction

In order to be valid, an aggregate type T with component types T1 , . . . , Tn and key

key, must satisfy two conditions:
• Each instance of T must determine a unique instance of each Ti . This means, for
example, that the same reservation cannot correspond to two different persons, or
to two different durations.
• Two distinct instances of T must not determine the same instances of Ti for all Ti
occurring in key. This means, for instance, that two identical reservations cannot
have different identification numbers #.
A type T satisfying the above conditions is said to be well defined.
Example 4.7 (Miles Smith and Smith [371]) In this example we consider a more com-
plex, nested definition of ADTs. In particular, we want to define a type enrollment
as a composition of class and course. Let us introduce, first of all, the
enrollment data type:
type enrollment = aggregate [P, (CO,S)]
P : [key] Pupil
(CO,S) : [key] (Class, Semester)
G : Grade
end
We have now to define the data type class:
type class = aggregate [CO,S]
CO: [key] Course
S : [key] Semester
I : Instructor
R : Room
end
Finally, we define the course type:
type course = aggregate [CO]
CO : [key] Code
CH : Credit-Hours
D : Description
end
The type course does not need the other two in its definition.
The aggregate definition should be completed with the set of actual entities existing
in the considered system. For instance, in Example 4.7 the range of legal keys of
the courses is the set of natural numbers, much larger than the number of actually
existing courses. Then, we have to specify what is the set of really existing entities.
This is achieved by introducing the construct collection of, as in the following:

var Nameset = collection of names,

4.4 Abstraction in Databases 81

where names is an aggregate as defined previously.

Generalization Abstraction Generalization abstraction was introduced by
Miles Smith and Smith in a second paper [371], and was integrated with the previ-
ous notion of aggregation abstraction. Whereas aggregation abstraction combines
objects of different types into a unique object, generalization abstraction operates
on a class of objects of the same type, by factorizing their common properties and
assigning them to a generic object. For example, all dogs have “four legs” and
“sharp teeth”, so that a generic object dog can be defined with associated these
two attributes. Generic objects may also participate in relations, such as “dogs
hunt cats”.
Generic objects may be used by operators, may have attributes associated to them,
and may participate in relations with other (generic) objects. Generic objects can
be defined starting from other generic objects, so as to form hierarchies. More-
over, hierarchies need not to be necessarily trees.
Example 4.8 (Miles Smith and Smith [371]) Let vehicle be a generic object, root
of the hierarchy of generic objects described in Fig. 4.8.
The attributes of vehicle can be applied to all objects of level 1 in the hierarchy, and
so on, descending the tree. At each level new attributes may be added, to differentiate
among the children of the same node.
In order to represent a generic object G as a relation in a database, the set of attributes
that are common to all individuals in the class G must be selected. These attributes
are called G-attributes. If G is a node in a (tree-shaped) hierarchy, its children must
be partitioned into groups, whose elements are mutually exclusive; these groups are
called clusters. The generic object G, with associated the set I of individuals, can be
represented by a relation with schema G : [A1 , . . . , An , C1 , · · · , Cm ]. In this schema,
we have:
• Ai (1 i n) is a G-attribute.
• Cj (1 j m) is a cluster belonging to G.

Fig. 4.8 A hierarchy of generic objects including various kinds of vehicles. The root is at level 0.
(Derived with permission from Miles Smith and Smith [371])
82 4 Definitions of Abstraction

• There is one and only one tuple for each individual in I.

(i) (i)
• If an individual has the value vk for attribute Ai , then its tuple contains vk in
column Ai .
• If an individual is also included in a generic object Gi in cluster Cj , then its tuple
contains Gi in column Cj .
• If an individual is NOT included in any generic object in cluster Cj , then its tuple
contains a blank in column Cj (Clusters are disjoint but not necessarily exhaustive
of the domain).
In order to handle generic objects, Miles Smith and Smith have transformed the pre-
viously defined type collection of into a new one, called generic:

var R : generic
sk1 : (R1,1 , . . . , R1,p1 )
...............
skm : (Rm,1 , . . . , Rm,pm )
of aggregate [key]
s1 : R1
...............
sn : Rn
end

In the above definition, we have:

• Ri (1 i n) is either a generic identifier or a type identifier. These are the
entities used to form the aggregate generic object.
• key is the set of keys of the aggregate type.
• Each Ri,j (1 i m, 1 j pi ) is a generic identifier whose key domains are
the same as those of R. Each group ski consists of the names of subtypes of the
generic type. Istances of a generic type can be partitioned into more groups, each
group corresponding to the values taken on by one of the Ri .
• Each ski (1 i m) is the same as some sj (1 j n),
• If ski is the same as sj , then the type Rj is the range (Ri,1 , . . . , Ri,pi ).
The above structure generic specifies two abstractions at the same time, namely:
(1) R as an aggregation of a relationship among objects R1 , · · · , Rn
(2) R as a generalization of a class containing objects R1,1 , · · · , Rm,pm
An example of the generic type vehicle is the following:

var vehicle : generic

MC : (land vehicle, air vehicle, water vehicle)
PC : (motorized vehicle, man-powered vehicle, wind-propelled vehicle)
of
aggregate [ID]
ID : [key] Identification number
4.4 Abstraction in Databases 83

M : Manufacturer
P : Price
W : Weight
MC : Medium category
PC : Propulsion category
end
As we can see, the generic object “vehicle” is defined as an aggregate of the entities
ID, M, P, W, MC, and PC. Then, two of the components of the aggregate, namely
MC and PC, have been selected to form generalizations. More precisely, they have
been chosen to create a double partition of the vehicles according to their “Medium
category”, with values {Land, Air, Water}, and according to their “Propulsion cat-
egory”, with values {Motorized, Man-powered, Wind-propelled}. In this way two
clusters have been created, and each instance of vehicle belongs to one of the
cluster elements. The other attributes, i.e., ID, M, P, W are common to all instances,
and are assigned to the generic object. Figure 4.9 shows the resulting tabular repre-
sentation.
In order to encompass both cases of aggregation abstraction and generalization
abstraction, the two conditions for well-definedness introduced before are substi-
tuted by the following five:
• Each R-individual4 must determine a unique Ri -individual.
• No two R-individuals determine the same set of Ri -individuals for all Ri in key.
• Each Ri,j -individual must also be an R-individual.
• Each R-individual classified as Ri,j must also be an Ri,j -individual.
• No Ri,j -individual is also an Ri,k -individual for j
= k.
Aggregation and generalization abstractions can only model simple systems, if
used in isolation, while their power greatly increases when used in combination.

4.4.2 Goldstein and Storey’s Approach

Taking inspiration from the work just described, Goldstein and Storey [217] provide a
model for data abstraction which is a refinement of Miles Smith and Smith’s one [371,

ID M P W C1 (Medium) C 2 (Propulsion)
v1 Mazda 65.4 10.5 Land Moto rized
v2 Schwin 3.5 0.1 Land Man-powered
v3 Boeing 7,900 840 Air Motorized
v4 Acqua 12.2 1.9 Water Wind-propelled

Fig. 4.9 Relation corresponding to the generic object vehicle. This generic object has two
clusters of nodes as alternative partitions of its instances

4 R-individual denotes an instance of the generic object R.

84 4 Definitions of Abstraction

372]. They keep the aggregation and generalization (renamed inclusion) mechanisms
for abstraction, and add one more, i.e., association. The model specifies a number of
dimensions that have to be defined for each type of abstraction mechanism: Semantic
meaning, Property set, Roles, Transitivity, and Mapping.
Let us look more closely to the abstraction operations.
Inclusion Inclusion describes an is-a relation between a supertype (generic) and a
subtype (specific). The most important property is inheritance (anything true for
the generic is also true for the specific). Abstraction via inclusion is transitive.
There is a many-to-one mapping between specifics and generic. An example is
“Secretary is-a Employee”. Inclusion acts on classes of entities, i.e., the generic
types defined by Miles Smith and Smith.
Aggregation A relationship among objects is considered as a higher level (aggre-
gate) object. There are three kinds of aggregation: (1) An attribute can be an aggre-
gation of other attributes; (2) An entity can be an aggregation of entities and/or
attributes, (3) A relationship can be an aggregation of entities and attributes.
Aggregates have the property of partial inheritance from the components, and
may have emergent properties (properties that do not pertain to components but
only to the aggregate). Each component plays a peculiar role in the aggregate,
and it may be relevant (interesting, but optional), characteristic (required), or
identifying (defining the aggregate). Aggregation is transitive. An example is the
computer of Fig. 2.4.
Association A collection of members is considered as a higher-level (more
abstract) set, called an entity set type. Details of the member objects are sup-
pressed and properties of the object set emphasized. For association there is no
inheritance property, but there are derived properties. Members of an associa-
tion are not required to have different roles, and the mapping between members
and the set entity type is unrestricted. Association is transitive. An example is a
“forest” with respect to its component trees.
The approaches to abstraction proposed by Miles Smith and Smith and by
Goldstein and Storey can be labelled as semantic. Proposed in the main stream
of Computer Science, they address the issues of feature selection (generaliza-
tion/inclusion), feature construction (aggregation), and hierarchical representation
at different levels of detail. Marginally, they also address the issue of naming (ADT
definition).

4.4.3 Cross’ Approach

More recently, Cross [118] defined, in the context of object-oriented methods, some
dimensions along which abstraction mechanisms can be considered. Indeed, these
methods provide support for important abstraction principles, as illustrated in the
following.
Classification/Instantiation In classification, things which are similar are
grouped together. The properties that make the things alike are abstracted out
4.4 Abstraction in Databases 85

into an intensional definition of the type. Instantiation is the inverse of classi-

fication, that is, it generates instances that satisfy the intensional definition of
the type. An important part of instantiation is the process by which an object is
recognized as an instance of a particular type; this process may vary according
to the context and the application. The set of all instances of the type constitutes
its extension. The definition of the type looks as follows:
Interface name
Extent: instances-of-name
A1 : attribute1
..............
An : attributen
Relationship Set : {R1 , . . . , Rm }
end
An example of classification/instantiation is the following:
Interface employee
Extent: Employees
key: ID
A1 : ID
A2 : Name
A3 : Salary
A4 : Age
Relationship Set: <employee> recommended-by
inverse: employee : : recommends
end
Grouping/Individualization Grouping, as well as classification, collects objects
together; the group is based on some extensional property that is the same for
all objects in the group; other properties are irrelevant. A group is defined over
a base type (the type of its elements), and may be created by the user or on the
basis of a predicate. Individualization is the creation of a member of the group. A
scheme for the grouping abstraction, based on a predicate that all instances must
satisfy, is as follows:
Interface name
Extent: Members-of-name
Predicate : P
end
An example of this operator is the following:
Interface underage-employee: employee
Extent: underage-employees
Predicate : (age < 18)
end
86 4 Definitions of Abstraction

In case the group is created by a user according to his/her opinions about mem-
bership, the slot Predicate in the above definition is absent.
Generalization/Specialization Generalization is the process of creating a type
that has properties that are common to several other more specific types. The
generalized type, referred to as the supertype, defines these common properties.
Each one of the more specific types, referred to as a subtype, contains those
properties that are essential for its definition as a specialization of the super-
type. Specialization is the reverse of generalization, i.e., it creates a subtype with
more specific properties than the supertype. An important characterization of the
generalization/specialization principle is that it supports a hierarchical structur-
ing mechanism for conceptual specialization. Generalization abstraction can be
defined as follows:
Interface name
Extent: instances-of-name
A1 : attribute1
..............
An : attributen
Relationship Set : {R1 , . . . , Rm }
end
Here is an example:
Interface professor: employee
Extent: professors
A1 : rank {full, associate, assistant}
A1 : research-keywords
Relationship Set : <student> supervises
inverse: student :: supervised-by
end
Aggregation/Decomposition Aggregation creates a unique entity starting from
its components, so that the grouped entities are linked to the type via a part-of
relation. The inverse relation is composed-of. The components may have different
types among themselves, and do not have the same type as the aggregate object.
The aggregation type may be hierarchical as well. For example, a car type can be
defined as an aggregation of other types such as wheel, body and engine. The
type’s body may be created as an aggregation of other types such as hood, door
and handle. The intensional definition of car still would include a classification
group, but its intensional definition also includes the various types that are parts
of a car.
Even though the operations of Classification and Generalization look quite sim-
ilar, there is nevertheless a substantial difference, in that Classification acts on
instances and forms a type, whereas Generalization acts on types and builds up
a super-type.
4.5 Granularity 87

4.5 Granularity

The classical works on granularity, in the context of abstraction, go back to the late
80’s, with Hobbs’ [252] and Imielinski’s [269] approaches.

4.5.1 Hobbs’ Approach

Hobbs is concerned with tractability of models of (parts of) the world. He assumes
that the world is described by a global, First Order Logic theory T0 , and his goal is
to extract from this theory a smaller, more computationally tractable, local one. Let
P0 be the set of predicates of T0 , and S0 their domain of interpretation. Let moreover
R be that subset of P0 which is relevant to the situation at hand. The idea behind
Hobbs’ approach is that elements of S0 can be partitioned into equivalence classes,
by defining an indistinguishability relation on S0 :

∀x, y ∈ S0 : (x ∼ y) ≡ (∀p ∈ R : p(x) ≡ p(y)) (4.3)

Equation (4.3) tells that x and y are to be considered indistinguishable if no relevant

predicate can set them apart. The indistinguishability relation allows a mapping f to
be defined in such a way that the complex theory T0 collapses into a simpler, more
“coarse-grained” theory T1 .
Let S1 be the set of equivalence classes of S0 with respect to relation “∼”, and let
f : S0 → S1 be the mapping that takes any element of S0 into its equivalence class
in S1 ; then:
∀x, y ∈ S0 : (x ∼ y) ⇒ (f (x) ≡ f (y)) (4.4)

Let us now consider a predicate p ∈ P0 , and let x ∈ S0 . Then, we can define a

mapping κ that takes p to a predicate κp , such that κp , applied to the image f (x) of
x in S1 , is true iff p(x) is true:

p(x) ⇔ κp (f (x)) (4.5)

We define by P1 = {κp |p ∈ P0 } the set of predicates in T1 that are images of the

predicates in P0 .
Example 4.9 (Hobbs [252]) Let T0 be a theory of a block world, in which there
are agents and objects, where time is measured by real numbers, and places are
3-D Euclidean points. Suppose, however, that we are only interested in discrete,
non-overlapping events in the world. We may then introduce an abstract theory
T1 , describing a simplified world, where there is a single agent A, and the only
objects are a table (tab) and some blocks. The places of interest are the squares of a
100 × 100 board (origin in the lower left corner). Finally, let EE stand for “everything
else”. The mapping f can be defined as follows:
88 4 Definitions of Abstraction

f (A) = A, for agent A

f (x) = EE, for all other agents x
f (tab) = tab, for the table
f (b) = b, for all blocks b on the table
f (x) = EE, for all other blocks x
f (< x, y, z >) = < floor(x), floor(y) >, for 0 x, y 100, z = 0
f () = EE, for all other locations
κmove = move.
The introduction of the floor function is required by the fact that the original spatial
coordinates are continuous, whereas in the projection on the (x, y) plane only integer
values are considered. As for time, while an event e in T0 takes place during an
interval T , in T1 the same event appears to occur instantaneously, at the end of that
interval. Thus, we can say, for example, that

f (e, duration(e, T )) = (e, at-time(end(T ))) (4.6)

Moreover, block b1 will be on block b2 at time t in T1 iff b1 is on b2 at time t in T0 .

If it is not specified that the location of blocks at other times is relevant, than these
locations may not be preserved by f .
According to Hobbs, there is another way in which reality can be simplified,
namely idealization. Let S be a set of elements x, over which a continuous, numerical
function f is defined. An indistinguishability relation over S states that

∀x, y ∈ S0 : (x ∼ y) ≡ |f (x) − f (y)| < , (4.7)

where is a small, positive number, which can, for example, quantify the precision of
measurement. If definition (4.7) has to be traced back to definition (4.3), some set of
relevant predicates must be identified. To this aim, definition (4.3) must be extended
by allowing partial predicates to be relevant as well; the new definition reads then:

(x ∼ y) iff ∀p ∈ R : (p(x) and p(y) are both defined) ⇒ (p(x) ≡ p(y)) (4.8)

In order to distinguish between x and y, we must find a relevant partial predicate that
is true for one and false for the other. Finally, a system can be described at different
granularity levels, each one articulated with the others.
Example 4.10 (Hobbs [252]) Let T be a body’s temperature in ◦ C, and let p(x, t) be
the relevant predicate “temperature x is around t”, with t a real number. By varying t
in R we obtain an infinite set of relevant predicates. Suppose furthermore that p(x, t)
is true for t − 3 x t + 3, false for x t − 3 − , and x t + 3 + , and undefined
otherwise (see Fig. 4.10).
Two temperatures x1 and x2 are distinguishable is there exist a t such that p(x1 , t)
is true and p(x2 , t) is false (or vice-versa, owing to the symmetry between x1 and
x2 ). Both p(x1 , t) and p(x2 , t) must be defined, so that the intervals AB and CD in
4.5 Granularity 89

Fig. 4.10 The relevant predicate, for a given t, is true if x lies on the segment BC, it is false if x
lies to the right of D or to the left of A, and it is undefined otherwise. In other words, in order to
be distinguishable two temperature x1 and x2 must lie one in the “true” interval and the other in
one of the two “false” intervals. The remaining intervals AB and CD do not count, because the
predicate is undefined in them. Then, in order to distinguish x1 from x2 , it must be |x2 − x1 |

Fig. 4.10 are irrelevant. As a consequence, one of the two, say x1 must be inside the
interval BC, whereas x2 must be either to the left of point A or the right of point D.
In both cases it must be:
|x2 − x1 |
If |x2 − x1 | , no t exists for which x1 and x2 can be distinguished.

4.5.2 Imielinski’s Approach

Imielinski’s research [269] had the same motivations as Hobbs’ with respect to sim-
plifying reasoning; in fact, he called his approach to abstraction “limited” reasoning
(i.e., weaker than First Oder Logic). In his view, one of the problems in achieving
this simplification is that, differently from numerical computations, logical reasoning
lacks a proper notion of error, and hence that of approximation. Imielinski proposes
then a definition of error sufficiently general to cover both automated reasoning and
numerical computations.
Imielinski starts from a knowledge base containing some formulas of the type
M(X, v), denoting a measure v for the variable X. Then, taking into account the fact
that measurements may be affected by precision limits, he substitutes those formulas
with weakened ones, such as M(X, int), being int an interval of v values. Given a
property p of the original formulas (for instance, being true), some of the substitutions
preserve p and some not.
Definition 4.18 (Error in a knowledge base) The error in a knowledge base is the
set of all formulas that do not preserve a given property p.
Example 4.11 (Imielinski [269]) Let us suppose that we measure the volume of a
body X in m3 with a maximum error of 1 m3 . The formula Vol(X, 124) may or may not
be exactly true in the real world, because it just tells us that, due to the measurement
error, the actual value of the volume will be in the interval [123, 125] m3 (property
p). Then, the approximate formula ϕ1 = ∃v[Vol(X, v ∈ [123, 125])] preserves p.
Actually, p is preserved by any formula ∃v[Vol(X, v ∈ [a, b])] with a 123 and
b 125. On the other hand, the formula ϕ2 = ∃v[Vol(X, v ∈ [123.9, 124.1])] does
90 4 Definitions of Abstraction

not preserve p, because the true value could be, for instance, 123.2, and hence the
formula is part of the error.
Imielinski calls local the notion of error just introduced. He also defines a global
error, which results from the replacement of the whole knowledge base by the
“rounded up” one. The global error is simply the set of all formulas that are not
guaranteed to preserve the properties (usually the truth) of the original knowledge
base.
Imielinski’s notion of error is general, but not easy to apply to generic approxima-
tion schemes; then, he concentrates on the same type of abstraction as Hobbs, namely
domain abstraction. More precisely, he defines a knowledge base KB as a finite set
of formulas in some First Order Logic language, a query as an open formula ϕ, an
answer to the query as the set of all substitutions of domain constants for variables
in ϕ, such that the resulting closed formula is a logical consequence of KB. The
domain D of the knowledge base is the set of all objects occurring in the KB. On the
domain D an equivalence relation R (reflexive, symmetric, and transitive) is defined.
Relation R may represent the relevant features of the domain, or is supposed to hide
some features of the external world, or may correspond to the error in measurements.
The equivalence relation R induces a partition of the constant names into equiva-
lence classes, denoted by [a], where [a] can be considered the representative of the
class. If we substitute all constants in KB with their equivalence classes, a simplified
KB is obtained, from which approximate answers to queries can be derived.
Example 4.12 (Imielinski [269]) Let a knowledge base contain the following long
disjunction:
P(a,b) ∧ P(a1 , b1 ) ∧ · · · ∧ P(an , bn )

If we define two equivalence classes

[a] = {a, a1 , · · · , an } and [b] = {b, b1 , · · · , bn },

then the disjunction can be rewritten as P([a],[b]), with an obvious simplification.

Concerning the use of the abstract knowledge base, Imielinski distinguishes two
cases: either the user is aware of the equivalence classes, or he/she is not. In the first
case, the user will treat an equivalence class [a] as if it were truly a single object,
whereas in the second case, he/she will keep in mind, when answering a query Q,
that [a] is articulated inside. The first attitude leads to an increase of the set ANS(Q)
of answers to Q, including in it some incorrect ones. The second attitude is very
conservative, and the user, in fear of obtaining incorrect results, decreases ANS(Q),
possibly loosing some correct solution.
Example 4.13 (Imielinski [269]) Let the knowledge base KB store all direct flights
between cities in the USA, and let connect(x,y) be the predicate stating that there is
a direct flight from city x to city y. Let us form equivalence classes by putting all the
cities in a given state together, and rename them with the name of the state. If, for
4.5 Granularity 91

instance, connect(New York, San Francisco) is true, than the predicate will
be rewritten as connect(New York State, California). Suppose we want
to answer the query Q =“Find all direct or one-stop flights from New York to Seattle”.
If the original KB contained connect(New York, San Francisco) and con-
nect(Los Angeles, Seattle), the abstract one contains connect(New York
State, California) and connect(California, Washington). In the
“liberal” interpretation the trip (New York State, California, Washing-
ton) will be added (incorrectly) to ANS(Q), while in the conservative interpretation
it will not. On the other hand, if the original KB contained connect(New York,
San Francisco) and connect(San Francisco, Seattle), this trip will not
be added (even though correct) to ANS(Q), because it is known that each state con-
tains several cities, and the connection could potentially be incorrect.

4.5.3 Fuzzy Sets

In addition to the abstraction operations reported in Sect. 4.4.3, Cross [118] shows
how the Fuzzy Set theory [576] can be used to implement them. Independently from
other definitions, fuzzy sets can be considered an abstraction per se, at least under
two respects. First of all they allow a form of attribute value abstraction. As an
example, let us consider a numerical variable X, assuming values x in the interval
[0, ∞), and let “Big” be a fuzzy set whose membership function μ(x) is reported in
Fig. 4.11. When we say that “X is Big”, we abstract from the actual value taken on
by X, and we just retain the compatibility measure of X with the concept Big. For
X in the interval (10,100) we still can make some difference between values, as the
membership μ(x) differentiates the degree of compatibility of x with the meaning
of Big. For values of X either in the interval [0,10] or in the interval [100, ∞),
we consider all the values of X as equivalent with respect to the relevant predicate

Fig. 4.11 Fuzzy set “Big”, whose membership function μ(x) is defined over the real axis where
the variable X takes values
92 4 Definitions of Abstraction

membership(X, μ(x)) (“the membership of X in the fuzzy set Big is μ(x)”). Then:

∀x, y : membership(X, μ(x)) = (membership(X, μ(y))) → x ∼ y

Another, even more interesting way of achieving abstraction with fuzzy sets is the
use of linguistic variables, introduced by Zadeh [577]. A linguistic variable is a
variable that takes as values words or sentences in a language. For example, Age is
a linguistic variable if its values are linguistic rather than numerical, i.e., young,
very young, old, …rather than 20, 21, 22, 80. Formally:
Definition 4.19 (Linguistic variable) A linguistic variable is a quintuple L, T (L),
U, G, M, where L is the name of the variable, T (L) is the term-set of L, i.e., the
collection of its linguistic values, U is a universe of discourse, G is a syntactic rule
which generates the terms in T (L), and M is a semantic rule that associates with
each linguistic value x its meaning μ(x), which is a fuzzy subset of U.
As Zadeh puts it [577], “the concept of a linguistic variable provides a means of
approximate characterization of phenomena which are too complex or too ill-defined
to be amenable to description in conventional quantitative terms”. Using a linguistic
variable can be considered a special case of discretization.

4.5.4 Rough Sets

A role similar to the fuzzy sets’ one, but more oriented to approximation, is played by
the Rough Set theory, introduced by Pawlak [414]. The idea is based on information
systems and a notion of indiscernability.
Definition 4.20 (Information system) An information system S is a 4-tuple (U, A,
Λ, f ), where U
is the universe, i.e., a finite set of N objects, A is a finite set of
attributes, Λ = A∈A ΛA is the union of the domains of the attributes (ΛA is the
domain of A), and f : U × A → Λ is a total decision function such that f (x, A) ∈ ΛA
for every A ∈ A, x ∈ U.
In Definition 4.20 f is a function that associates to an object x a value f (x, A) of
attribute A. A subset X of U is called a concept. We can now introduce the indis-
cernability relation.
Definition 4.21 (Indiscernability) A subset of attribute Aind ⊆ A defines an equiv-
alence relation, called indiscernability relation, IND(Aind ), on U 2 , such that:

IND(Aind ) = {(x, y) ∈ U 2 | ∀A ∈ Aind : f (x, A) = f (y, A)}

Indiscernability, in the rough set theory, originates from the fact that removing the
attribute subset A-Aind from A leaves some objects with the same description; hence,
4.5 Granularity 93

for those objects, the function f assumes the same value for every attribute A ∈ Aind ,
making them indistinguishable.
Given
an information system S and an indiscernability relation IND(Aind ), let
AS = U, IND(Aind ) be the approximation space in S.
Definition 4.22 (Upper/Lower bounds) Given an approximation space AS and a
concept X , the Aind -lower approximation LX Aind of the set X can be defined as:

LX Aind = {x ∈ U : [x]Aind ⊆ X }

In an analogous way an upper approximation UX Aind can be defined by:

UX Aind = {x ∈ U : [x]Aind ∩ X }
= 0}

In Definition 4.22, [x]Aind denotes the equivalence class of x with respect to the
attribute set Aind . Finally, we can define a rough set.
Definition 4.23 (Rough set) A rough set R is the pair (LX Aind , UX Aind ).
A rough set is thus a pair of crisp sets, one representing a lower bound and the other
representing an upper bound of a concept X .
Example 4.14 Let U be the set of points P in a plane, and let A = {X, Y } be the
set of attributes, representing the coordinates X and Y in the plane. Let moreover
ΛX = ΛY = (−∞, +∞). Then, Λ = ΛX ∪ ΛY .
The function f : U × A → Λ will be:
f (P, X) = x ∈ ΛX
f (P, Y ) = y ∈ ΛY

Fig. 4.12 Upper (pink + yellow regions) and lower (yellow region) approximations of a concept
X = Oval, defined as a region in the 2D plane. [A color version of this figure is reported in Fig. H.4
of Appendix H]
94 4 Definitions of Abstraction

Let us choose as Aind the whole A. Then we can define equivalence classes among
points as follows:

Pij A = P | xi x < xi + Δ, yj y < yj + Δ

ind

with a given Δ. The plane will be divided into squares of side Δ, such that all points
inside a square are considered equivalent, as represented in Fig. 4.12.
Given a concept X , defined extensionally as an oval region of the plane, the lower
approximation consists of all squares that are totally inside the oval, whereas the
upper approximation consists of all the squares that, at least partially, overlap the
oval.
Even though the rough set theory is based on a notion of indiscernability similar to
the one used by Hobbs [252] and Imielinski [269] for granularity, its use is different,
because it in not used per se, but as a first step to provide approximations of sets.

4.6 Syntactic Theories of Abstraction

Logical theories of abstraction can be considered either at the syntactic or at the

semantic level. Syntactic theories deal with abstractions defined as mapping between
logical languages, without reference to the models of the theories. On the contrary,
semantic abstractions view as primary the correspondence between models.

4.6.1 Plaisted’s Theory of Abstraction

Within the realm of logical representations, Plaisted [419] was the first to propose
a general theory of abstraction oriented to theorem proving, and, specifically, to
resolution.
Plaisted considered a First Order Logic language in clausal form (See Appendix
D), and defined a generic abstraction mapping as a mapping between a (ground)
clause C and a set f (C) of (abstract) clauses. The idea is to transform a problem A
into a simpler one B, such that B has certainly a solution if A does, but B may also
have additional solutions. According to Giunchiglia and Walsh’s classification [214]
this mapping is a TI-abstraction.
A cardinal notion in Plaisted’s approach is subsumption. Let x denote a vector
of variables {x1 , x2 , . . . , xn } and let A be a set of constants. A substitution θ is an
assignment xi = ai (1 i n), with ai ∈ A. Given a clause C(x), the notation Cθ
stands for C(a) = C(a1 , a2 , . . . , an ).
Definition 4.24 (Subsumption) A clause C1 subsumes a clause C2 (denoted by
C1 C2 ) if there exists a substitution θ such that C1 θ is a subset of C2 .
4.6 Syntactic Theories of Abstraction 95

We are now in the position to introduce the definition of abstraction mapping.

Definition 4.25 (Abstraction mapping) An abstraction mapping is an association
between a clause C and a set of clauses f (C) such that:
• If clause C3 is a resolvent of clauses C1 and C2 , and D3 ∈ f (C3 ), then there exists
D1 ∈ f (C1 ) and D2 ∈ f (C2 ) such that some resolvent of D1 and D2 subsumes D3 .
• f (NIL) = {NIL}
• If C1 subsumes C2 , then for each abstraction D2 of C2 there is an abstraction D1
of C1 such that D1 subsumes D2
A (syntactic) abstraction mapping between clauses and sets of clauses can be obtained
from a mapping between literals, as shown by the following theorem.5
Theorem 4.1 (Plaisted [419]) Let ϕ be a mapping from literals to literals. This
mapping can be extended to a mapping from clauses to clauses by assuming that
ϕ(C) = {ϕ(L)|L ∈ C}. If ϕ satisfies the following properties:
• ϕ(L̄) = ϕ(L) (ϕ preserves complements)
• If C and D are clauses and D is an instance of C, then ϕ(D) is an instance of
ϕ(C) (ϕ preserves instances),
then ϕ is the abstraction mapping f (C) = ϕ(C).
Plaisted provides several examples of syntactic abstraction mappings, some of
which are reported below.
1. Ground abstraction—A syntactic abstraction can be defined by associating to a
clause C the (possibly infinite) set C of its ground instances.
2. Propositional abstraction—Let C = {L1 , . . . , Lk } and f (C) = C , where C =
{L1 , . . . , Lk } and Li is defined as follows:

If Li is a positive literal P, then Li = P, otherwise Li = ¬P,

where P is a propositional variable. In this abstraction, only the names of the

predicates, with the sign they had in C, remain, whereas the arguments are all

P(x) Q(x) R(x) P(x) P Q R P

Q(a) Q(x) R(x) Q Q R

R(x) R
(a) (b)
Fig. 4.13 a Resolution in First Order Logic. b Abstract resolution after removing all variables from
clauses. The trees in (a) and (b) have the same structure

5 A literal L is an atomic predicate or its negation, as recalled in Appendix D.

96 4 Definitions of Abstraction

removed. Propositional abstraction can be used to derive an abstract proof that

can be used as a guide for the original proof, as illustrated in Fig. 4.13.
3. Renaming—For a clause C, let f (C) = C , where C is the clause where all
functions and predicate symbols of C have been renamed. Two distinct predicates
or functions in C may be renamed to the same symbol in C , but a function and
a predicate cannot be mapped to the same new name.
4. Deleting arguments—For clause C, let f (C) = C , where C is C with certain
arguments of predicates or functions deleted. Propositional abstraction is a lim-
iting case of this abstraction, when all arguments are deleted.
Plaisted defined also a semantic abstraction mapping, as reported in the following
definition.
Definition 4.26 (I-Abstraction) Let I be an interpretation and D its domain.
With each ground literal Li of the form P(t1 , . . . , tn ) we associate the literal
P(a1 , . . . , an ), where ai ∈ D and ai is the value of ti (1 i n) in I.
With the literal P̄(t1 , . . . , tn ) we associate P̄(a1 , . . . , an ). With each ground clause
C = {L1 , . . . , Lk } we associate C = {L1 , . . . , Lk }, where Li is associated to Li as
described above. If C1 is an arbitrary clause, then f (C1 ) = {D|D is associated with C
for some ground instance C of C1 }. The mapping f is called an I-abstraction.
In other words, the semantic abstraction of a clause is the set of all its possible
interpretations in a domain D.
Given two abstractions f1 and f2 , let their composition be:

f2 (f1 ) = f2 (D)
D∈f1 (C)

The composition of two abstractions is an abstraction. Moreover, if S is an inconsis-

tent set of clauses, so is f (S). Plaisted was also aware that the abstract space could be
inconsistent even though the ground one is not, for instance by applying the renaming
abstraction. He called this fact the “false proof” problem.

4.6.2 Tenenberg’s Theory

Following Plaisted’s work, Tenenberg [526] concentrated on a subclass of Plaisted’s

abstraction mappings, namely predicate mappings.
Definition 4.27 (Predicate mapping) Given two logical languages L1 and L2 , let
P1 and P2 be the sets of their predicates. A predicate mapping f is a function

f : P1 → P2

that maps more than one predicate in P1 to the same predicate in P2 .

4.6 Syntactic Theories of Abstraction 97

Predicate mapping in an abstraction mapping in Plaisted’s sense. In fact, it can be

seen as a special case of his “renaming” abstraction. The mapping can be extended
over literals in such a way that literals in L1 are mapped to literals in L2 by replacing
the predicate symbols under f .
As Plaisted’s abstraction mappings, predicate mappings have also the property
(which Tenenberg calls upward-solution property) that every solution in the original
problem space has a corresponding solution in the abstract space, but the converse
in not necessarily true. Then, predicate mapping is a TI-abstraction, according to
Giunchiglia and Walsh’ classification [214].
Tenenberg was particularly interested in the “false proof” problem, emerging in
predicate mappings, as illustrated in the following example.
Example 4.15 (Tenenberg [526]) Let P1 = {glass(x), bottle(x)} be the predi-
cate set of a languages L1 , and let C = {a} be the set of constants. Let us consider,
in L1 , the theory
glass(x) ⇒ ¬ bottle(x)
bottle(x) ⇒ ¬ glass(x)
bottle(a),
stating that an object is either a glass or a bottle, but not both. Moreover, object a
is a bottle. Let us consider now an abstract language L2 with predicate set P2 =
{container(x)}, and let f be the predicate mapping associating container(x)
to both glass(x) and bottle(x):
f (glass(x)) = container(x)
f (bottle(x)) = container(x)
Mapping f generates an inconsistency in the abstract theory. In fact:
bottle(a) ⇒ container(a)
bottle(a) ⇒ ¬ glass(a) ⇒ ¬ container(a)
In order to obviate the problem of inconsistency, Tenenberg introduced a “Restricted
Predicate Mapping”. According to him, “the intuition behind the semantics of
restricted predicate mappings is that we would like the interpretation of a pred-
icate in the abstract theory to be the union of the interpretations of each of the
predicates in the original theory that map to it. So the objects that are containers are
all of those objects that are either bottles or glasses, or any of the other things that
map to container”. This goal is achieved by removing from the mapping all those
predicates in the original theory that distinguish those predicates that are conflated
into a single one in the abstract theory. Suppose, for instance, that in Example 4.15
we add the formulas:
glass(x) ⇒ breakable(x)
bottle(x) ⇒ breakable(x)
bottle(x) ⇒ corkable(x)
In the abstract theory only the clause container(x) ⇒ breakable(x) appears,
because corkable(x) distinguishes glasses from bottles. However, this solution is
stronger than required, because it hinders glasses and bottles to be ever abstracted
98 4 Definitions of Abstraction

into containers, because they have certainly some property distinguishing them (oth-
erwise they would be the same object). A weakening of the requirement is to allow
abstractions of the form bottle(x) ∨ glass(x) → container(x).
The intuitive notions discussed so far are then formalized by Tenenberg. First
of all, let Cf be the set of predicates (or clauses) that map into C under predicate
mapping f , i.e.:
Cf = {D|f (D) = C}

Each D is called a specialization of C. If Φ is a clause set and D is a clause, the

notation Φ D means that the null clause ⊥ can be derived from the clause set
Φ ∪ ¬D, through resolution.
Definition 4.28 (Restricted predicate mapping) Let Φ be a theory in a language L1
and f a predicate mapping from P1 in L1 to P2 in another language L2 . A restricted
predicate mapping g(Φ) is a subset of f (Φ) such that:

g(Φ) = {C| there exists some D ∈ Cf such that D ∈ Φ, and either C is a

positive clause, or for every D ∈ Cf it is the case that Φ D}.

Definition 4.28 simply states that, among all the correspondences between predi-
cates from the ground to the abstract language, only those that preserve consistency
are kept. In fact, Tenenberg proves that g(Φ) preserves consistency across predicate
mapping. However, restricted predicate mappings are no longer abstraction mappings
according to Plaisted’s definition, because the upward-solution property is not pre-
served. On the contrary, restricted mappings do have a downward-solution property,
since for every clause derivable from the abstract theory there is a specialization
of it derivable from the original theory. Restricted mappings are TD-abstractions,
according to Giunchiglia and Walsh [214], because a solution may not exist in the
abstract theory, but, if it does, the original problem has a corresponding solution.
It has to be noted that g(Φ), as defined above, is undecidable, since it requires
determining Φ D for every clause D mapping to each candidate clause in the
abstract clause set. In practice, the search for derivability can always be arbitrarily
bounded, and if no proof is obtained within this bound, it can be assumed that the
clause is not derivable. In this way, consistency is still preserved between the original
and the abstract theory, the abstract theory being simply weaker than it theoretically
could be (it has fewer theorems). Let us see an example of a restricted predicate
mapping.
Example 4.16 (Tenenberg [526]) Let a be a constant and Φ be the set of clauses
reported in Fig. 4.14. Let f be a predicate mapping associating each predicate in
Φ to itself, except for bottle(x) → glass-container(x) and glass(x) →
glass-container(x).
The abstracted clauses are the following ones:
1’) glass-container(x) ⇒ made-of-glass(x)
2’) glass-container(x) ⇒ graspable(x)
4.6 Syntactic Theories of Abstraction 99

Fig. 4.14 A theory Φ to be abstracted

6’) box(x) ⇒ graspable(x)

9’) glass-container(x) ⇒ ¬ box(x)
11’) box(x) ⇒ ¬ glass-container(x)
14’) open(x) ∧ graspable(x) ⇒ pourable(x)
15’) graspable(x) ⇒ movable(x)
16’) made-of-glass(x) ⇒ breakable(x)
17’) glass-container(x) ∨ box(x)
18’) open(a)
Clauses 3) and 4) become redundant in the abstract theory and are eliminated, as
well as clauses 10) and 12). Clauses 5), 7), 8), and 13) cannot be derived from Φ and
are eliminated.
Abstraction can proceed further, by abstracting again some predicates, obtaining
a hierarchy of more and more abstract clause sets.

4.6.3 De Saeger and Shimojima’s Theory

After a period in which no new theoretical approaches have been proposed, De

Saeger and Shimojima [464] tried to formalize abstraction in the channel theory, in
an attempt to provide a general, unifying framework for abstraction in knowledge
representation.
Channel theory was proposed by Barwise and Seligman [40] as a mathematical
framework for qualitatively analyzing the flow of information in distributed systems,
where the components are represented through objects called classifications. A clas-
sification is a simple kind of data structure, classifying a set of individuals by a set
of types.
Definition 4.29 (Classification) A classification A is defined as a triple < typ(A),
tok(A), |=A >, where typ(A) and tok(A) are sets, respectively called the types
and tokens of A, and |=A ⊆ tok(A)×typ(A) is a binary classification relation. The
notation a |=A α means that a is of type α. Types are equivalent if they have the same
100 4 Definitions of Abstraction

tokens; tokens are equivalent if they are of the same types. A classification is type-
extensional if there are no two distinct equivalent types, and it is token-extensional
if there are no two distinct equivalent tokens.
A classification can be seen as a table in a very simple database, where only two
columns are available: token and types. However, unlike a row in a relational database,
channel theory treats each token as a first-class object,6 and hence each token is the
key of itself. By treating tokens as first-class objects, relationships can be modeled
using an infomorphism.
Definition 4.30 (Infomorphism) Given two classifications A =< typ(A),
tok(A), |=A > and < typ(B), tok(B), |=B >, an infomorphism f : A B
from A to B is a pair of functions (f ∧ , f ∨ ), such that f ∧ : typ(A) → typ(B),
and f ∨ : tok(B) → tok(A), satisfying the following property:
For every type α in A and every token b in B, b |=B f ∧ (α) iff f ∨ (b) |=B α
An infomorphism formalizes the correspondence in the information structure of two
classifications; it states that the regularities in a domain, captured by classifications,
are compatible. More precisely, an infomorphism is intended to model transfer of
information from one view of a system to another one; for instance knowing that “a
mountain’s side has a particular distribution of flora can carry information about the
local micro-climate” [480]. An infomorphism is more general than an isomorphism
between classifications. For example, an infomorphism between classifications A
and B might map a single type β in B onto two or more types in A, provided that
from B’s point of view the two types are indistinguishable; this means that, for all
tokens b in B and all types α in A, f ∨ (b) |=A if and only if f ∨ (b) |=B α . It must be
noticed that two types indistinguishable for B may be distinguishable in A. In fact,
there may be tokens in A outside the range of f ∨ for which, for example, a |=A α
but not a |=A α .
Dually, two tokens of B may be mapped onto the same token in A, provided that
those tokens in B are indistinguishable with respect to the set of types β in B for
which there exists some α such that f ∧ (α) = β. Again, this does not mean that these
same tokens in B are wholly indistinguishable in B. For example, there may be types
outside the range of f ∧ classifying them differently. Thus, “an infomorphism may be
thought of as a kind of view or filter into the other classification” [40].
In practice, it may be difficult to find infomorphisms between arbitrary classifica-
tions. If the correspondence is too easy, then the morphism would not be interesting.
If it is too stringent, it would not be applicable (Fig. 4.15).
An example of infomorphism can be taken from chess.
Example 4.17 (Seligman [40]) Consider a game of chess, observed and analysed by
a group of experts. The observations can be represented by an event classification
G in which the actual moves of the game are classified by the experts into types of
varying precision: “White now has a one-pawn advantage”, “Black has control of

6 A first class item is one that has an identity independent of any other item.
4.6 Syntactic Theories of Abstraction 101

Fig. 4.15 A graphical representation of an infomorphism (Adapted with permission from De Saeger
and Shimojima [464])

the centre”, and so on. A more abstract classification C, representing a theoretical

conception of the game of chess, can be defined by taking the tokens to be abstract
representations of each possible configuration of pieces on the board, classified into
three types: “W”, if there is a winning strategy for white, “B”, if there is a winning
strategy for black, and “D” if either side can force a draw.
Using classifications for modeling abstraction, B can be considered as a ground
classification, and A as an abstract one. Then, f ∧ is a language-level abstraction
function, and represents a syntactic mapping. On the contrary, f ∨ is at the level of
models. In this case typ is a set of formulas and tok is a set of models. This type
of abstraction mapping works well when abstraction can be defined on the atoms of
the ground representation. In order to deal with more complex cases, for instance
reasoning explicitly about the abstraction itself, De Saeger and Shimojima extend
the approach by considering abstractions in the context of channel theory.
Definition 4.31 (Channel) A channel is a classification that connects other classifi-
cations. Formally, a channel is a tuple C, {fi : Ci C}i∈I , consisting of an indexed
family of infomorphisms fi (for some index set I) with a common co-domain classi-
fication C, called the channel “core”.
De Saeger and Shimojima only consider binary channels, i.e., channels connecting
pairs of classifications, as represented in Fig. 4.16. For the sake of exemplification,
a channel representing a communication via telegraph is reported in Fig. 4.17.
From the perspective of channel theory, instead of being thought of as a mapping
between classifications, an abstraction can be viewed as a local logic on a channel
core C, to which the respective base (B) and abstract (A) theories have been lifted
via infomorphisms f : A C and g : B C. In particular, in each classification,
the sets typ(A) and typ(B) are sets of logical formulas, the sets tok(A) and
tok(B) are the sets of corresponding models, and |= is the satisfiability relation.
Then, a classification, in this context, can be rewritten as L, M, |=; then, typ(A) =
LA , typ(B) = LB , tok(A) = MA , tok(B) = MB .
There are more than one ways to set up such a construction, but the “canonical”
one is to define C as the sum classification LA + LB , obtaining the structure reported
in Fig. 4.18.
102 4 Definitions of Abstraction

typ( )
f

typ( ) typ( )

tok( )

f tok( )
tok( )
Fig. 4.16 A channel connecting two infomorphisms

Fig. 4.17 A channel representing the telegraph (Reprinted with permission from Seligman [480])

Fig. 4.18 An abstraction

represented by means of
a channel (Reprinted with
permission from De Saeger
and Shimojima [464])

In this approach, the sum of two classifications A and B is defined as follows:

• typ(A + B) is the disjoint union of typ(A) and typ(B), given by (0, α) for
each type α ∈ typ(A) and (1, β) for each type β ∈ typ(B), such that for
each token (a, b) ∈ typ(A + B) it is (a, b) |=A+B (0, α) iff a |=A α, and
(a, b) |=A+B (1, β) iff b |=B β
• tok(A + B) = tok(A) × tok(B).
Using the abstraction definition described above, De Saeger and Shimojima show
how classical examples in the abstraction literature can be revisited.
4.7 Semantic Theories of Abstraction 103

4.7 Semantic Theories of Abstraction

In the effort to overcome some of the problems presented by syntactic theories of

abstraction, some researchers have proposed semantic ones.

4.7.1 Nayak and Levy’s Theory

A semantic approach to the formalization of abstraction has been described by Nayak

and Levy [328, 395], who proposed a model of abstraction consisting of a mapping
between models, rather than formulas. In their view, in fact, syntactic theories, such
as the one described by Giunchiglia and Walsh [214], do not capture important
aspects of abstraction that can be captured by a semantic theory. The idea behind
the proposal is that abstraction should be performed in two steps: first, the intended
domain model is abstracted, and then a set of abstract formulas is constructed to
capture the abstracted domain model.
Contrarily to Giunchiglia and Walsh, who were only interested in TI-abstractions,
Nayak and Levy introduced the notion of Model Increasing, MI-abstractions, which
are a proper subset of TD-abstractions. MI-abstractions do not generate false proofs.
Nayak and Levy’s definition of abstraction, based on the notion of interpretation
of a logical language, is valid for any logical language with a declarative semantics.
However, we will focus here on First Order Logic languages, because these are the
ones in which we are interested.
In this context, let Lb be a base language and La an abstract one. Let moreover, Tb
be a theory in Lb . A first order interpretation consists of a universe of discourse, and
of object, function, and relation names. Defining an abstraction mapping π consists
in specifying how the abstract universe is built up starting from the base one. In order
to be able to define π, Nayak and Levy introduce some definitions.
Definition 4.32 (Interpretation) An interpretation I is a model of a set of sentences
Σ (denoted by I |= Σ) iff I satisfies every sentence in the set.
The entailment relation can be extended to pairs of sentence sets as follows:
Definition 4.33 (Entailment) A set of sentences T1 entails a set of sentences T2
(denoted T1 |= T2 ) iff every model of T1 is also a model of T2 .
Definition 4.34 (Defined relation) Let φ be a wff in language L with n free variables
v1 , v2 , . . . , vn , and let I be an interpretation of L. The n-ary relation defined by φ in
I is:
{(a1 , a2 , . . . , an ) : I |= φ[a1 /v1 , . . . , an /vn ]}

i.e., a tuple (a1 , a2 , . . . , an ) is in the defined relation iff I is a model of φ with a

variable assignment that assigns ai to vi (1 i n).
104 4 Definitions of Abstraction

The defined relation is the extension of formula φ.

Definition 4.35 (Abstraction mapping) Given a base language Lb and an abstract
one La , an abstraction mapping π is defined as a mapping between interpretations:

π : Interpretations(Lb ) → Interpretations(La )

Definition 4.36 (MI-Abstraction) Let Tb and Ta be two sets of sentences in the

languages Lb nd La respectively, and let π be an abstraction mapping. Ta is a Model-
Increasing (MI) abstraction of Tb , with respect to π, if for every model Mb of Tb ,
π(Mb ) is a model of Ta .
The property for an abstraction to be MI allows the following theorem to be proved.
Theorem 4.2 (Nayak and Levy [395]) In an MI-abstraction, if Ta is inconsistent,
then Tb is inconsistent.
MI-abstractions satisfy a compositionality property. If Ta and Sa are MI-abstract-
ions with respect to π of Tb and Sb , respectively, then Ta ∪ Sa is an MI-abstraction
of Tb ∪ Sb .
The specifications of mapping π are given in terms of interpretation mappings
[149]. Given a base theory Tb in a first order language Lb , let La be an abstract
language. Specifying π means to show how the abstract universe and the denotation
of abstract objects, functions and relations can be built up from a base model. This
can be done by finding appropriate formulas in Lb .
Definition 4.37 (Interpretation mapping) An interpretation mapping π that maps
a model Mb of Tb to a model π(Mb ) of La consists of the following elements:
• A wff π∀ with one free variable v1 , which defines the abstract universe. In particular,
given a model Mb of Tb , π∀ defines the universe of π(Mb ) to be the set defined
by π∀ in Mb .
• For each n-ary relation R in La , a wff πR , with n free variables v1 , v2 , . . . , vn , that
defines R. More precisely, given a model Mb of Tb , πR defines an n-ary relation
in Mb . The denotation of R in π(Mb ) is this relation restricted to the universe of
π(Mb ).
• Similar wffs are used to specify the denotation of abstract object and function
names.
An example shall clarify the notions introduced so far. In this example the notion of
“foreign” must be understood as “not American”.
Example 4.18 (Nayak and Levy [328]) Let the base theory Tb contain the following
formulas:
JapaneseCar(x) ⇒ Car(x)
EuropeanCar(x) ⇒ Car(x)
Toyota(x) ⇒ JapaneseCar(x)
4.7 Semantic Theories of Abstraction 105

BMW(x) ⇒ EuropeanCar(x)
Let us first consider a (syntactic) predicate abstraction that associates the abstract
predicate ForeignCar(x) to both EuropeanCar(x) and JapaneseCar(x).
Then, Ta will be:
ForeignCar(x) ⇒ Car(x)
Toyota(x) ⇒ ForeignCar(x)
BMW(x) ⇒ ForeignCar(x)
This abstraction considers the difference between a European and a Japanese car
irrelevant to the goal of the current reasoning. Let us now add to Tb the following
axioms:
EuropeanCar(x) ⇒ Fast(x)
JapaneseCar(x) ⇒ Reliable(x)
Applying the previous abstraction, we obtain:
ForeignCar(x) ⇒ Fast(x)
ForeignCar(x) ⇒ Reliable(x)
These last axioms, added to the previously obtained ones, may lead to the conclusion
that a Toyota is fast and that a BMW is reliable, even though these conclusions are
not warranted in the base theory.
Let us now consider how this example can be handled in Nayak and Levy’s theory.
As π preserves the universe of discourse, we have that π∀ = (v1 = v1 ), which is
satisfied by all elements. The extension of the predicate ForeignCar(x) is the
union of the extensions of the predicates JapaneseCar(x) and EuropeanCar(x).
Hence:
πForeignCar (v1 ) =JapaneseCar(v1 ) ∨ EuropeanCar(v1 )
The extension of the other predicates (except JapaneseCar and EuropeanCar,
which are not in the abstract theory) is unchanged.
Another interesting point raised by Nayak and Levy is that TI-abstractions, which
admit false proofs, are better viewed as MI-abstractions in conjunction with a set
of simplifying assumptions; false proofs emerge when the assumptions are violated.
We may see how this happens in the following example.
Example 4.19 Let us consider Imielinski’s domain abstraction [269], and let the
base theory contain the axioms {p(a, b), ¬p(c, d)}. If we assume that a and c
are equivalent, and that b and d are equivalent, then the abstract theory becomes
{p(a, b), ¬p(a, b)}, which is inconsistent. As the base theory is consistent, this
abstraction violates Theorem 4.2 and cannot be an MI-abstraction.
Suppose now that the equivalence relation is a congruence, i.e., for every n-ary
relation p and terms t1 , ti (1 i n) such that t1 and ti are equivalent, the base
theory entails p(t1 , . . . , tn ) ⇔ p(t1 , . . . , tn ). In this case domain abstraction is indeed
an MI-abstraction, and the simplifying assumption is that the “equivalence relation
is a congruence”.
106 4 Definitions of Abstraction

4.7.2 Ghidini and Giunchiglia’s Theory

A few years after the publication of Giunchiglia and Walsh’ paper [214], Ghidini and
Giunchiglia [203] proposed a “model-theoretic formalization of abstraction, where
abstraction is modeled as two representations, the ground and the abstract ones,
modeling the same phenomenon at different levels of detail”.
In this revisited approach abstraction is simply defined as a mapping function
f : L0 → L1 between a ground language L0 and an abstract language L1 , where
a language is a set of wffs. The function f preserves the names of variables, and is
total, effective, and surjective, i.e.:
• for each symbol s ∈ L0 , f (s) is defined,
• for each symbol s ∈ L1 , there is a symbol s ∈ L0 such that s = f (s),
• if f (s) = s0 and f (s) = s1 , then s0 = s1 .
Moreover, f only maps atomic formulas in the languages, keeping the logical
structure unmodified; for this reason it is called an “atomic abstraction”.
Definition 4.38 (Atomic abstraction) The function f : L0 → L1 is an atomic
abstraction iff
• f (α ◦ β) = f (α) ◦ f (β) for all binary connectives ◦,
• f (α) = f (α) for all unary connectives ,
• f (x.α) = x.f (α) for all quantifiers .
Atomic abstractions can be further classified as term abstractions, which operate
on term symbols, and formula abstractions, which operate on predicates, and map
ground formulas to abstract ones. A typical atomic abstraction is symbol abstraction,
which lets different ground constants (or function symbols, or predicates) collapse
into a single abstract one. Another one is arity abstraction, which reduces the number
of arguments in functions or predicates.
The meaning of an abstraction is then defined in terms of Local Model Semantics
(LMS) [202]. In order to make this section self-consistent, we briefly recall the
underlying theory, proposed by Ghidini and Giunchiglia [203]. The theory tries to
formalize the notion of context, i.e., of the environment where some reasoning is
performed according to a partial view of a system. If a system is observed under
different points of view, each observer can reason using the information he/she has
gathered. However, as the observed system is the same, the partial views of the various
observers must agree to some extent. Indeed, not all the system’s states collected by
one observer are a priori compatible with all the system’s states collected by another.
This problem is, for instance, a classical one in Medical Informatics, when a 3-D
image has to be reconstructed from a series of 2-D images.
In order to illustrate the problem, we introduce the same example that Ghidini
and Giunchiglia used themselves [203] for the same purpose.
Example 4.20 (Ghidini and Giunchiglia [203]) Let us suppose that we have a trans-
parent box, subdivided into a 2 × 3 grid of sectors; in each sector a ball can be put.
4.7 Semantic Theories of Abstraction 107

(a) (b)

Fig. 4.19 a Two observers look at the same transparent box from orthogonal directions. b Edges
connect states that are compatible in O1 and O2 ’s views (Reprinted with permission from Ghidini
and Giunchiglia [202])

An observer O1 looks at the box in one direction, whereas another observer O2 looks
along an orthogonal one. As aligned balls cover one another, O1 can only observe
four states of the box, namely, “ball to the left”, “ball to the right”, “no ball”, or “balls
to the left and to the right”. Observer O2 can see eight different states, according to
the presence of no ball, one ball, two balls, or three balls in the three visible sectors.
The various states are reported in Fig. 4.19.
Let L1 and L2 be the languages in which O1 and O2 describe their observations.
These are propositional languages, where P1 = {, r} and P2 = {, c, r} are the sets
of propositional variables in L1 and L2 , respectively. Let Mi be the set of models of
Li (i = 1, 2). Models in Mi are called local models, because their truth is assessed
independently from other views of the system. Let now ci (i = 1, 2) be a subset of Mi .
The set ci belongs to the power set of Mi . Let c = (c1 , c2 ) be a compatibility pair,
i.e., a pair of subsets of models that are compatible in the two views. A compatibility
relation C = {c} is a set of compatibility pairs. Then:

C ⊆ 2M1 × 2M2

Finally, a model is a compatibility relation which is not empty and does not contain
the empty pair. A special case occurs when |ci | = 1 (i = 1, 2); in this case, C ⊆
M1 × M2 .
In the example of the box, the local models of L1 are:

M1 = {∅, (), (r), (, r)}

whereas those of L2 are:

108 4 Definitions of Abstraction

M2 = {∅, (), (c), (r), (, c), (, r), (c, r), (, c, r)}

Let us impose the following compatibility constraint:

“If O1 sees at least one ball, then O2 sees at least one ball”.
The above constraint can be encoded in a compatibility relation C such that, for all
c = (c1 , c2 ) ∈ C it is true that if no model in c1 is equal to ∅, then no model in c2
will be equal to ∅, as well.
The above introduced idea of compatibility, applied to a First Order Logic setting,
is used by Ghidini and Giunchiglia to provide a semantic account of abstraction. More
precisely, let Lg and La be the ground and the abstract languages, respectively, and let
Mg and Ma be their sets of local models. Remember that the abstraction function f
maps Lg to La . The abstraction mapping is represented with a compatibility relation,
which defines how meaning is preserved across it. Formally, the following definitions
are introduced:
Definition 4.39 (Local model) A local model m for a first order language L is
a pair (dom, I), where dom is a non empty set called the domain, and I is the
interpretation function. Function I assigns to each n-ary predicate p an n-place
relation [p]I ⊆ domn , to each n-ary function h an (n + 1)-place relation [h]I over
domn+1 , and to a constant c some element [c]I ∈ dom.
An assignment in m is a function a that associates to each variable of L an element
of dom. The assignment of a variable x is denoted by [x]a , and the interpretation
of s (term or formula) by [s]I . Finally, the satisfiability relation with respect to a is
denoted by m |= φ[a].
Abstraction is based on compatibility relations between the ground and abstract
domains, as specified by the following definitions:
Definition 4.40 (Domain relation) Let mg = (domg , Ig ) and ma = (doma , Ia ) be
models of Lg and La , respectively. A domain relation r is a relation:

r ⊆ domg × doma

A domain relation r represents the relation between the domains of the ground and
abstract models. All domain relations are considered total and surjective functions;
in other words, for all d1 , d2 ∈ doma , if (d, d1 ) ∈ r and (d, d2 ) ∈ r, then d1 = d2 .
Moreover, Ghidini and Giunchiglia assume that all local models in Mi (i = g, a)
agree on the interpretation of terms. This means that elements in Mi may only differ
in the interpretation of predicates.
The preservation of meaning across abstraction is formalized by means of a com-
patibility relation.
Definition 4.41 (Compatibility relation) Given Mg and Ma and a domain relation
r ⊆ domg × doma , a compatibility pair c = (cg , ca ) is defined as either a pair of
local models in Mg and Ma , or the empty set ∅. Moreover, a compatibility relation
C is a set C = {c} of compatibility pairs.
4.7 Semantic Theories of Abstraction 109

Fig. 4.20 Relation between

assignments and abstraction
(Derived with permission
from Ghidini and Giunchiglia
[203])

Using Definition 4.41, a model of {Lg , La } can be defined as follows:

Definition 4.42 (Model) Given Mg and Ma , and a domain relation r ⊆ domg ×
doma 1, a model for {Li } (i = g, a) is a compatibility relation C such that C
= ∅ and
(∅, ∅) ∈
/ C.
A model of an abstraction function is then a set of pairs of models which are, respec-
tively, models of the ground and of the abstract language. The empty set ∅ describes
an inconsistent theory.
Using domain relations and compatibility relations, Ghidini and Giunchiglia
restrict assignments of constants to variables in the ground and abstract spaces to
those that preserve the correspondence between variables in Lg and La . In other
words, given a model C and a domain relation r, two assignments ag and aa must be
such that, for all variables x in La it is the case that ([X]ag , [X]aa ) ∈ r, as represented
in Fig. 4.20.
Compatibility relations can be used to provide a semantics to the syntactic defi-
nition of abstraction between terms or symbols of the languages Lg and La . Let us
consider first the term abstraction f , which lets a set of constants {d1 , · · · , dk } in
Lg to be mapped onto a single constant d in La . If C is a model over {Lg , La } and
r ⊆ domg × doma , then C satisfies the term abstraction if:
• For all d1 , · · · , dk ∈ Lg , and d ∈ La such that f (di ) = d (1 i k), it must

be [di ]Ig , [d]Ia ∈ r (1 i k). In other words, the same variable x must be
instantiated, in the ground and abstract languages, to pairs of constants satisfying
relation r.
• For all function g1 , · · · , gm ∈ L0 , and g ∈ L1 such that f (gi ) = g, (1 i m) it
must be:
if [gi ]I0 (d1 , · · · , dk ) = dki , then [g]I1 (r(d1 ), · · · , r(dk )) = d (1 i m),
where d = r(dk1 ) = r(dk2 ) = · · · = r(dkm ). In other words, arguments of
functions linked by the abstraction f must be instantiated to pairs of constants
satisfying r.
• For all functions gg (x1 , x2 , · · · , xn ) ∈ Lg , such that f (gg ) = ga (x1 , x2 , · · · , xm ) ∈
La , with n m, it must be:
if [gg ]Ig (d1 , · · · , dm , · · · dn ) = dn+1 then [ga ]Ia (r(d1 ), · · · , r(dm )) = r(dn+1 ).
If some argument of a function is deleted in a term abstraction, both the remaining
arguments and the co-domain must be instantiated to pairs of constants satisfying r.
110 4 Definitions of Abstraction

Fig. 4.21 Graphical repre-

sentation of term abstraction
(Reprinted with permission
from Ghidini and Giunchiglia
[203])

It is easy to see that a model satisfies a term abstraction if the domain relation maps
all the ground terms (tuples of terms) into the corresponding abstract terms (tuples
of terms). A graphical representation of term abstraction is reported in Fig. 4.21. The
fact that constants c1 and c2 are abstracted into the same constant c in La is captured,
at the semantic level, by imposing that both the interpretations d1 and d2 of c1 and
c2 in domg are mapped into the interpretation d of c in doma .
Abstraction of functions is similar, but has an additional difficulty: if function
g1 (x) and g2 (x) are collapsed into g(x), it is not clear what value should be attributed
to g(x). Different choices are available to the user (max, min, or other aggregation
operations).
The last notion to introduce is the satisfiability of formula abstraction.
Definition 4.43 (Satisfiability of formula abstraction) Let f : Lg → La be a for-
mula abstraction. Let C be a model over Mg , Ma , and r ⊆ domg × doma . C is said
to satisfy the formula abstraction f if for all compatibility pairs (cg , ca ) in C:
• For all p1 , . . . pn ∈ Lg , and p ∈ La , such that f (pi ) = p(1 i n):

if cg |= pi (x1 , . . . , xm )[d1 , . . . , dm ]) for some i ∈ [1, n]

then ca |= p(x1 , . . . , xm )[r(d1 ), . . . , r(dm )])
if cg pi (x1 , . . . , xm )[d1 , . . . , dm ]) for some 1 ∈ [1, n]
then ca p(x1 , . . . , xm )[r(d1 ), . . . , r(dm )])

• For all pg (x1 , . . . xn ) ∈ Lg , and p(x1 , . . . , xm ) ∈ La , such that n m and

f (pg ) = p:

if cg |= pg (x1 , . . . , xm , . . . , xn )[d1 , . . . , dm , . . . , dn ]
then ca |= p(x1 , . . . , xm )[r(d1 ), . . . , r(dm )]
if cg pg (x1 , . . . , xm , . . . , xn )[d1 , . . . , dm , . . . , dn ]
then ca p(x1 , . . . , xm )[r(d1 ), . . . , r(dm )]
4.7 Semantic Theories of Abstraction 111

• For all pg ∈ Lg such that f (pg ) = T:

if cg |= p(x1 , . . . , xm )[d1 , . . . , dm ]) then ca |= T

if cg p(x1 , . . . , xm )[d1 , . . . , dm ]) then ca T

Definition 4.43 states that a model satisfies a formula abstraction if the satisfiability
of formulas (and of their negation) is preserved throughout abstraction. Finally, given
a model C and an abstraction f , C is said to satisfy f if it satisfies all the terms and
formula abstractions.
For the sake of exemplification we report the following example, provided by
Ghidini and Giunchiglia themselves, in which they show how Hobbs’s example,
reported in Example 4.9, can be reformulated in their approach.
Example 4.21 (Ghidini and Giunchiglia [203]) Let Lg and La be the ground and
abstract languages introduced in Example 4.9, and let f : Lg → La an abstrac-
tion mapping. Let domg and doma be two domains of interpretation, containing
all the constants of Lg and La , respectively; let moreover r ⊆ domg × doma be a
domain relation that follows directly from f , i.e., a domain relation that satisfies the
constraints:

r(tab) = tab for the table tab

r(bi ) = bi for all blocks bi on tab
r(x) = for all other objects x ∈ domg
r(x, y, 0) = int(x), int(y) for all positions x, y, 0 ∈ domg
with 0 x, y 100
r(x , y , z ) = for all other locations x , y , z ∈ Lg

Let mg and ma be a pair of local models over domg and doma , which interpret
each constant c into itself. Let C be any model over r containing these compatibility
pairs. Then C satisfies the granularity abstraction on constants “by construction”. Let
us now restrict ourselves to a C that satisfies also the granularity abstraction on the
predicate symbol on. It is easy to see that if mg satisfies the formula on(b, x, y, z),
and the block b is on the table, then ma satisfies the formula on(b, int(x),
int(y)).

4.8 Reformulation

The notion of abstraction has been often connected to the one of reformulation, with-
out equating the two. In this section we mention three approaches on reformulation,
which are explicitly linked to abstraction.
112 4 Definitions of Abstraction

Design
Abstract Abstract
Specification Algorithm

Abstraction

Implementation

User Concrete
Specification Algorithm

Fig. 4.22 Problem reformulation scheme (Reprinted with permission from Lowry [342])

4.8.1 Lowry’s Theory

One of the first theories on reformulation, in connection with abstraction, has been
proposed by Lowry [342, 344], who described the system STRATA, which reformu-
lates problem class descriptions, targeting algorithm synthesis.
A problem class description consists of input–output specifications, and a domain
theory describing the semantics of objects, functions and relations in the specifica-
tions. Data structures are Abstract Data Types (ADT), generated by STRATA. ADTs
are considered as theories, whose symbols denote the functions of interest, and whose
axioms are given abstractly. An ADT hides implementation details, while making
the essential properties of the functions explicit. Figure 4.22 graphically describes
the reformulation process.
Reformulation is a representation mapping between theories. Given a problem
specification in a problem domain theory, STRATA finds an equivalent problem spec-
ification in a more abstract problem domain theory. This type of abstraction is called
behavioral abstraction, because it concerns input–output (IO) behavior, and the refor-
mulation involved is similar to Korf’s homomorphism [297]. Behavioral abstraction
occurs by merging models of the concrete theory that are identical with respect to
IO behavior. Abstractions are searched for in a space with a semi-lattice structure,
where more and more abstract (tractable) formulations are found moving toward the
top, whereas implementations are at the bottom.
In order to apply behavioral abstraction, behavioral equivalence schemas are used,
such as:

In1 ∼
=beh In2 iff ∀Out : R(In1 , Out) ↔ R(In2 , Out)
Out1 ∼
=beh Out2 iff ∀In : R(In, Out1 ) ↔ R(In, Out2 )

Methods for generating behavioral equivalence theorems are the kernel method and
the homomorphism method [343].
4.8 Reformulation 113

4.8.2 Choueiry et al.’s Approach

A general framework for reformulation, aimed at reasoning about physical systems,

is provided by Choueiry et al. [102], who also compare previous approaches. Infor-
mally, the authors define reformulation as a transformation from one encoding of a
problem to another, given a particular problem solving task. Reformulation is meant
to subsume both abstraction and approximation. Three are the main reasons for
reformulating a problem:

• The problem has no solution, and then it has to be replaced by an “approximate”

one which has a solution.
• The solution to the current problem is too computationally expensive, and the
problem must be replaced by a more tractable one.
• Reformulation may increase the comprehension of the problem.

By noticing that classical theories of abstraction, such as Giunchiglia and Walsh’

or Nayak and Levy’s ones [214, 395], are unable to adequately model physical
systems, Choueiry et al. introduce their own framework, which is limited in scope,
but useful in practice. The systems they consider can be described by continuous
models with parameters, containing algebraic or differential equations.
In their approach, Choueiry et al. give a fundamental role to the task to be per-
formed, be it prediction, or explanation, or verification. A task is defined as a com-
bination of a scenario, a query, a domain theory, and a set of modeling assumptions.
Given a task, relevant aspects of the domain theory are assembled to generate a
model, which is an instantiation of a subset of the domain theory, both consistent and
sufficient to answer the query.
In order to apply reformulation in this context, the authors propose a frame-
work containing four components, namely problems, reformulations, processes, and
strategies. A problem P is a triple P = (Q, Form, Asspt), where Q is the query
the user wants to answer, Form denotes the conceptualization of the domain or the
physical artifact under analysis, and Asspt states the conditions under which the
formulation is valid. The reformulation from a problem P1 to a problem P2 is a pair
R = (Proc, Cond), where Proc is an effectively computable procedure with input
P1 and output P2 , and Cond denotes a set of necessary applicability conditions. A
reformulation process is the triple (P1 , R, P2 ), where R is a strategy, i.e., a sequence
of reformulation processes. The execution of a strategy constitutes problem solv-
ing. By adding tools for evaluating reformulation techniques, the authors are able to
express the qualitative and quantitative aspects of the reformulation process.
114 4 Definitions of Abstraction

4.8.3 Subramanian’s Approach

An articulated theory of reformulation is proposed by Subramanian [512]. She is

interested in the conceptualization of a domain of analysis; the conceptualization
contains objects, functions and relations describing the domain.
Definition 4.44 (Conceptualization) A conceptualization C is a triple (O, F, R),
where O is a set of objects, called the universe of discourse, F is a set of functions
from On to O, called the functional basis, and R is a set of relations over Om , called
relational basis, with n, m integer numbers.
Semantically, a reformulation is a change in conceptualization. A conceptualization
must serve some goal, and this goal has to be preserved across reformulation. A
conceptualization C is an extensional description of a phenomenon of interest. For
an intensional description, a logical “canonical” language L is associated to C; L
has a distinct symbol name for every object, function and relation in the conceptu-
alization C.
Definition 4.45 (Encoding) An encoding E of a conceptualization C is a set of
sentences in the canonical language L such that C is one of the model of E under
Tarskian interpretation.
Subramanian provides a semantic account of reformulation as reconceptualization,
namely as an ontological shift in the basic objects, functions and relations.
Definition 4.46 (Articulation theory) A conceptualization C2 is a reconceptualiza-
tion of C1 , with respect to some background conceptualization Δ, if the elements of
C2 are definable from C1 and Δ. The definition of the elements of C2 in terms of Δ
and C1 constitutes the articulation theory between the two conceptualizations. This
reconceptualization is an abstraction.
In other words, a conceptualization C2 is definable in terms of another C1 if C2 can
be constructed from C1 with the help of a background knowledge Δ, represented
as a conceptualization itself. The conceptualization C2 makes distinctions that were
not present in C1 . The reconceptualization in Definition 4.46 is called a refinement.
Then, Subramanian approach shares with Floridi’s [175] the view of abstraction as
a top-down process of increasing detailing.
Definition 4.47 (Correctness) Given a set of target relations G, C2 is a correct recon-
ceptualization of C1 , with respect to Δ and G, if G is definable in both C1 and C2 .
According to Definition 4.47, in a correct reformulation G is preserved across the
conceptual shift. Because Subramanian is ultimately interested in abstraction (refor-
mulation, reconceptualization) for computational efficiency reasons, reformulations
have to cope with computational constraints.
Definition 4.48 (Good reformulation) A reformulation C2 of the conceptualization
C1 is good, with respect to a problem solver PS and time and space bounds S on the
4.8 Reformulation 115

computation of the goal wffs G in L2 , if there is an encoding E of C2 that allows

computation of G within S. The interpretation of G in C2 is the goal relation G.
As proofs of correctness and of goodness are too low-level, in order to justify why
a given reformulation occurs, changes in conceptualization must be tied directly to
changes in computational properties. This can be achieved by introducing the notion
of irrelevance explanation, which proves that certain distinctions are not logically
necessary to solve a given class of questions. Subramanian’s theory of irrelevance
is centered around the meta-level ternary predicate

ΔG
Irrelevant(f , G, T) ≡ =0
Δf T

More precisely, a distinction f is (exactly) irrelevant to the goal scheme G in the

context of a theory T (written Irrelevant(f , G, T)), if perturbing the value of f in
T does not affect that of G. An approximate irrelevance can also be introduced, if
perturbing the value of f only produces a small change in G.
The notion of irrelevance can also be extend to the computational aspects of
performing a task.
Using these definitions, Subramanian presents a first-principle account of abstrac-
tion, consisting of a framework for the generation of abstractions: first some appro-
priate irrelevance claims are to be found in the meta-theory of a formulation, and
then the formulation is reduced by inferences that minimize irrelevant distinctions.
A hill-climbing approach is used to search the space of reformulations, toward those
that make the fewest distinctions consistent with the given correctness and good-
ness constraints.

4.9 Summary
Not so many theories of abstraction have been proposed in the literature in the
last decades. Even though abstraction has a fundamental role in many disciplines,
only in Computer Science and Artificial Intelligence some computational models
have been put forward. In Artificial Intelligence most models exploit some logical
context/language.
After an initial enthusiasm and optimism, the complexity of the aspects of abstrac-
tion and the variety of the contexts of its use have dissuaded researchers from looking
for general theories and suggested to concentrate the efforts on more limited, but
practically useful notions. In fact, to the best of our knowledge, none of the general
logical theories proposed went beyond some simple, didactical examples. In fact, the
elegant formulations at the theoretical level fail to cope with all the details that must
be specified for actual application to real-world problems.
The attitude toward applicability was, since the beginning, at the core of the idea
of Abstract Data Types, which had pragmatic goals and limited scope, and were not
presented as a general theory, but as an effective and useful tool. Something similar
116 4 Definitions of Abstraction

can be said for abstraction in databases, which is the other subfield where abstraction
has been treated somewhat formally.
The confinement to the realm of the theory also happened for approaches to irrele-
vance, even though they were promising. Actually, an effective theory of irrelevance
could exactly be the missing link between a problem and the identification of the
fundamental aspects needed to solve it.
Chapter 5
Boundaries of Abstraction

n this chapter we come up with the properties that in our view, abstraction
should have, as it emerges from the analysis in the previous chapters. We will also
relate abstraction with its cognate notions, mainly generalization and approximation.
As we have seen, notwithstanding the recognized role abstraction plays in many
disciplines, there are very few general theories of abstraction, and most of them are
quite old and difficult to apply. Abstraction is an elusive and multi-faceted concept,
difficult to pin down and formalize. Its ubiquitous presence, even in the everyday life,
contributes to overload its meaning. Thus, we are aware that finding general properties
and a definition of abstraction that covers all its meaning and usage is likely to be an
impossible task, and, hence, we are focusing on a notion of abstraction targeted to
domains and tasks whose conceptualization is largely grounded on observations.
A definition of abstraction may be useful for several reasons, such as:
• Clarifying the exact role abstraction plays in a given domain.
• Defining abstraction operators with a clear semantics.
• Establishing what properties are or not preserved when modifying a system
description.
• Eliciting knowledge, by establishing clear relations between differently detailed
layers of knowledge.
This chapter will be kept at an informal level, as it is meant to provide an intuitive
and introductory understanding of the issues, whereas a more formal treatment will
be presented in the next chapters.

L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 117
DOI: 10.1007/978-1-4614-7052-6_5, © Springer Science+Business Media New York 2013
118 5 Boundaries of Abstraction

5.1 Characteristic Aspects of Abstraction

In order to be at the same time useful in practice and sufficiently general with respect
to its intended goal, abstraction should comply with the following characterizing
features:
• It should be related to the notion of information, and, specifically, of information
hiding.
• It should be an intensional property and not an extensional one of a system.
• It should be a relative notion and not an absolute one.
• It should be a process and not a state.
Each one of the above features, not independent from one another, will be elabo-
rated upon in the following of this chapter.

5.1.1 Abstraction as Information Reduction

As we have seen in Chaps. 2 and 3, there are mainly three broad and intertwined
points of view from which to start for a definition of abstraction:
• To move away from the physical world, ignoring concrete details.
• To restructure a problem aiming at its simplification, which is the basis for the
reformulation view of abstraction.
• To forget or factorize irrelevant details, which is linked to the notion of relevance,
information hiding, and aggregation.
Even though the first of these points of view matches our intuition, it does not
lend itself to a computational treatment, as it includes philosophical, linguistic, and
perceptive aspects which are hard to assess and quantify. Concerning the second
perspective, simplicity is as hard to define as abstraction itself, and grounding the
definition of abstraction on it turns out to be somewhat circular. Then, we remain
with the last point of view, relating abstraction to information reduction. As we
will see, taking this perspective is, prima facie, less satisfying for our intuition than
considering abstract “what does not fall under our senses”, but, at a closer inspection,
it will not be conflicting with that intuition. Moreover, the information-based view
nicely integrates with the simplicity-based one.
Clearly, we have first to define information. We may recur to formal notions of
information readily available, and we may also choose between alternative ones,
according to our need (see Gründwald and Vitànyi [223] for a discussion). The two
classical definitions are Shannon’s “information” [483] and Kolmogorov’s “algo-
rithmic information” (or complexity) [295], which seem quite different, but are nev-
ertheless related by strong links. Both definitions aim at measuring “information”
in bits. In both cases, the amount of information of an entity is the length of its
description. But in Shannon’s approach entities are supposed to be the outcomes
of a known random source, and the length of the entity description is determined
5.1 Characteristic Aspects of Abstraction 119

solely by the stochastic characteristics of the source and not by the individual entities.
Concretely, Shannon information of a message (a string of symbols) is the minimum
expected number of bits to transmit the message from the random source through an
error-free channel. On the contrary, Kolmogorov algorithmic information depends
exclusively upon the individual features, and it is defined as the length of the shortest
(universal) computer program that generates it and then halts. In Shannon’s words:
“The fundamental problem of communication is that of reproducing at one point,
either exactly or approximately, a message selected at another point ”. In Kolmogorov
complexity the quantity of interest is the minimum number of bits from which a
particular message can effectively be reconstructed. The link between Shannon’s
and Kolmogorov’s notions is established by a theorem that states that the average
Kolmogorov complexity over all the possible strings is equal to Shannon information.
Notwithstanding their elegance and appropriateness, the two mentioned defini-
tions are not suitable to be used, as they are, in an approach to abstraction. In fact,
Kolmogorov information measure is uncomputable, and, instead, we need a notion
that could be easily understood and computed in concrete systems. On the other
hand, Shannon’s probabilistic notion of information, based on an ensemble of possi-
ble messages (objects), is not suited either to our needs. In fact, we would like a notion
of information which depends on single objects (as for Kolmogorov’s) and which is
not probabilistic, as Shannon’s information is. However, the notion of information
that we search for should, in some sense, be reducible to Shannon’s or Kolmogorov’s
definitions in special cases. The reason why the probabilistic approach is not well
suited to our approach is twofold. First of all, the set of messages from the source
should be known in advance. When messages are transmitted, this requirement is
usually met. But when the “source” is the world, and the “message” is a perceived
part of it, the definition of the set of messages and of a superimposed probability
distribution is out of reach. In addition, our intuition tells us that information needs
not to be always probabilistic; when we read a book or speak with someone, we
may acquire some new pieces of information, something that we were not aware of
before, that must be integrated with our current knowledge, and which is not sto-
chastic. Clearly, if the part of the world of interest can be reduced to a known set of
possible alternatives, Shannon’s information notion may apply.
Based on the above discussion, we need a definition of information suggested by
pragmatic reasons: it has to be well suited to support the definition of abstraction, but
it does not need to be general outside its intended application. Moreover, it should
reduce to either Shannon’s or Kolmogorov’s definitions, when applicable.
The notion of information that we will use in this book starts from the consideration
of a system S. We observe this system with a set of sensors Σ. The system can
be described as being in a (usually very large) set of possible states, Ψ , and the
observations tell us which one is appropriate to describe S. The states ψ ∈ Ψ can
be represented in terms of the measurements captured by the sensors Σ. Knowledge
of ψ provides information about the various parts of S and of their interactions, and
allows some distinctions to be made between entities and their properties.
If we use a different set of observation tools, Σ , obtained by changing the sensors,
the system S does not change, but the image that we have of it does; in particular,
120 5 Boundaries of Abstraction

S can be now described as being in a different set of possible states Ψ . The state
ψ ∈ Ψ , describing S if observed with Σ, becomes a state ψ ∈ Ψ , if observed
with Σ . If the sensors in Σ are less sensitive, more coarse-grained or are a subset of
those in Σ, some distinctions in ψ will not be possible anymore, and the information
we gather on S is in a lesser amount and/or less detailed. If this is the case, we say that
knowing the state ψ is less informative, or provides less information that knowing
the state ψ. As a consequence, we say that (the description of) state ψ is “more
abstract” than (the description of) state ψ.
Abstraction is then linked to the change in the amount of information that can
be obtained by “measuring” the system S with different observation tools.1 We
recall that information may be reduced either by omitting (hiding) part of it, or by
aggregating details into larger units. In the next chapter we will introduce the formal
definition of abstraction based on these ideas.
To make more intuitive the notions introduced so far, let us consider a simple
example.
Example 5.1 Suppose that we have a system S including geometrical figures, such
as polygons. Two objects a and b in the system may be known to be polygons.
Then, S is in the state ψ = (“a is a polygon”, “b is a polygon”). If we observe
more carefully, we may notice that a is a square and b is an hexagon. Then, we
may consider a new state ψ = (“a is a square”, “b is a hexagon”). In this case,
state ψ is less informative and hence more abstract that state ψ, because ψ allows
squares to be distinguished from hexagons, which was not the case in ψ .
Linking abstraction to information reduction is at least coherent with the view of
abstraction as a mechanism that “distills the essential”, i.e., that selects those features
of a problem or of an object that are more relevant, and ignores the others. This
corresponds to the cognitive mechanism of focus of attention, also described by
Gärdenfors,2 and well known in Cognitive Sciences.
The informal definition of abstraction introduced earlier is far from being satisfy-
ing; in fact, we did not say anything about the link between the states ψ and ψ . One
may expect, in fact, that a precise definition of abstraction ought to limit the possi-
ble transformations between ground and abstract states. Nevertheless, even though
informal, the definition is sufficient to go ahead with the other aspects characterizing
abstraction.

5.1.2 Abstraction as an Intensional Property

When we observe a system in the world, we have to deal with entities (objects, events,
actions, . . .) and with their descriptions. The entities themselves are the extensional

1 As it will be discussed in the next chapter, the system S needs not to be a physical one, and
“measuring” is to be taken in a wide sense, as discussed also by Floridi [175].
2 See Sect. 2.2.
5.1 Characteristic Aspects of Abstraction 121

Fig. 5.1 Incas used quipus to memorize numbers. A quipu is a cord with nodes that assume position-
dependent values. An example of the complexity a quipu may reach (Reprinted with permission
from Museo Larco, Pueblo Libre, Lima, Peru) [A color version of this figure can be found in Fig. H.5
of Appendix H]

part of the system, characterized by their number N ; the size of the system increases
linearly with N . On the other hand, the description of the system is its intensional
part, which, for a description to be useful, should not increase more than sub-linearly
with N . In Kolmogorov’s theory of complexity an object, whose description is of
the size of the object itself, is incompressible.
When we say that abstraction is an intensional property, we mean that it pertains
to the description of the observed entities, and not to collections thereof. During
evolution, humans, in order to organize the inputs they receive from the world in
a cognitively coherent body of knowledge, faced the problem of going beyond an
extensional apprehension of the world, “inventing” the concepts. A typical example,
as discussed in Sect. 2.3, has been to move from physically counting objects (see
Fig. 5.1) to the notion of “number”.
Without entering into a disquisition of the subtleties of the concept of “con-
cept”, we will equate a concept C with a “set Z of sufficient properties”. Any object
satisfying Z is declared to be an instance of C. For example, the Oxford Dictio-
nary defines the concept C = vehicle as “a thing used for transporting people or
goods”. As we can see, a concept can also be defined in terms of functionalities.
Concepts can be more or less detailed, according to the defining properties, and they
may form hierarchies. For example, a vehicle can be terrestrial, marine, or aerial. An
example of a possible hierarchical organization for vehicle is reported in Fig. 5.2.
In this figure we may notice that there are two types of nodes and edges. Oval nodes
correspond to descriptions, whereas rectangle nodes correspond to instances, i.e.,
particular objects that satisfy the descriptions. Each oval node adds some new prop-
erty to the description of the father node, and is linked to it by an “is-a” relation. Thus,
nodes low in the hierarchy are more detailed then nodes up, and they provide more
information about the objects that satisfy the properties. The lowest level contains
the objects themselves, which are, in fact, the most detailed descriptions.
122 5 Boundaries of Abstraction

Fig. 5.2 A possible hierarchical organization of the concept vehicle = “thing used for
transporting people or goods”. Transportation may occur on land, sea, or air. A vehicle can be
used to transport people or goods, and so on. The instances of car are specific cars, uniquely
identified with their (Italian, in the figure) plate

The hierarchy can be read in two ways:

• Intensional view—Each node of the hierarchy is associated to a description.
Climbing the tree, less and less detailed descriptions are found.
• Extensional view—Each node of the hierarchy is associated to a set of objects,
exactly those objects that satisfy the description associated to the node in the
intensional view. Climbing the tree, larger and larger sets of objects are found.
It is fundamental to understand that abstraction is only concerned with
descriptions. Given a particular object, we have to distinguish what the object is
from what we know (just want to keep) of the object. In fact, a given object always
satisfies, for its very nature, the most detailed description possible. If we do not need
all the details of the complete description of the object, we may resort to more abstract
ones, moving up in the hierarchy, and finding less and less detailed descriptions.
Hierarchies such as the one of Fig. 5.2 are by no way the sole means to change
descriptions. Another, very important one, is to exploit a “part-of” relation, as illus-
trated in Fig. 5.3.
By climbing the “part-of” hierarchy, more abstract descriptions are found,
because more and more information about the compositions of the parts is hidden.
Again, the parts of the whole are there, until the tiniest resistor or screw, but it is up
5.1 Characteristic Aspects of Abstraction 123

part-of

part-of
part-of

Fig. 5.3 A computer is constituted by several parts, such as screen, keyboard, loudspeakers, mouse
and body. The body has many components inside, among which there is the internal hard disk,
which, in turn, has its own components. Also the mouse has several parts in its interior. Then,
compound objects can be decomposed into parts at several nested levels

to us what to “see”. Notice that a “part-of” hierarchy does not have the extensional
interpretation as the “is-a” hierarchy.
In summary, abstraction acts on an object description: we cannot abstract the
object, because we cannot change what it is, but we can abstract its description.

5.1.3 Abstraction as a Relative Notion

As it was reported in Chap. 2, many debates have arisen, in Philosophy, Linguistic,

Cognitive Sciences, and Artificial Intelligence, around the problem of drawing a
boundary between what is “abstract” and what is “concrete”. We have also seen
that no agreement has been reached, and the issue is till open, even with a lesser
momentum, due to the awareness that no agreement is in view.
Without having the ambition to solve a longstanding problem (see Rosen’s view
[457] in Sect. 2.1), we too came across the same problem, and were forced to take
a stance, even though only within our limited scope. After trying unsuccessfully
several definitions, we came up with the belief that finding a general rule to label
something (an object, a concept, a word, . . .) as “abstract” or “concrete” is without
124 5 Boundaries of Abstraction

(a) (b)

Fig. 5.4 a Picture of a poppy field. If we only have this picture, it is impossible to say whether
it is concrete or abstract. b The same picture in black and white. By comparison, this last is less
informative than the colored one, because the information referring to the color has been removed;
then picture (b) is more abstract than picture (a) [A color version of this figure is reported in
Fig. H.6 of Appendix H]

hope, and that we can only speak of abstraction as a relative notion, and not as an
absolute one. In other words, all we can say is that something is more abstract than
something else. Then, abstraction has to be considered as an equivalence relation
that induces a partial order on entities.
In order to explain our choice, let us look at an example. In Fig. 5.4a a picture of
a poppy field is reported. There are no clear and undisputable grounds for labeling
this picture as abstract or concrete. In fact, if we reason from the point of view of
closeness with the reality, the picture is not the “true” poppy field, and then it should
be abstract (see also Fig. 2.2). On the other hand, if we judge from the point of
view of abstract art, it has a close resemblance with the original, and then it should
be rather labeled as concrete. From the point of view of the ability to capture the
essential aspects of the original, again we do not know what to say: maybe there
are important details that the picture did not capture (for instance, the pistils), or
the image is even too much detailed (maybe, only the perception of a red field, as
in impressionist art, would matter). But, if we look at the picture in Fig. 5.4b, and
we compare picture (a) with picture (b), we are immediately able to say that picture
(b) is more abstract. In fact, the information about the color has been removed,
leaving the rest unchanged. We want to stress that only the pictures are compared
with respect to the “more abstract then” relation, because the original poppy field,
of course, did not change, as we have discussed in Sect. 5.1.2. We may notice that
5.1 Characteristic Aspects of Abstraction 125

picture (b) in Fig. 5.4 is more abstract than picture (a) even according to the notion
of abstraction as taking a distance from the senses. In fact picture (b) has a poorer
sensory quality than its colored counterpart.
Defining abstraction as a relative notion agrees with Floridi’s idea of Gradient of
Abstraction (GoA), discussed in Sect. 4.2.2, and, specifically, with nested GoAs.
Moving to a more subtle domain, let us consider concepts. We have seen that
concepts, too, are labeled as “abstract” or “concrete”, according to the fact that they
refer to abstract or concrete things in the world. However, also in this case the dis-
tinction is not easy to do. In fact, if considering concrete a chair (a classical example
of concrete thing) looks quite reasonable, classifying as abstract freedom (a classical
example of abstract thing) might be challenged: in fact, one could say that experi-
encing freedom (or its opposite, slavery) affects deeply one’s life in a very concrete
way. Clearly, this discussion involves the notion of abstraction as distance from the
sensorial world, which is unable to provide an always uncontroversial labeling.
The examples of abstraction as moving up and down an “is-a” or a “part-of”
hierarchy, described in the preceding section, are good instances of the relativity of
the notion itself. In fact, while it is impossible to label any concept node (is-a) or
any structural node (part-of) as abstract or concrete, it is very natural to compare
two of them if they are linked by the same relation. Clearly, abstraction induces
only a partial order among entities, and the node car and truck, in Fig. 5.2, are
incomparable. Given two entities, the means that can be used to actually compare
them according to the more-abstract-than relation will be introduced in Chap. 6.

5.1.4 Abstraction as a Process

If abstraction is not an absolute but a relative notion, the process with which a more
abstract state is generated from a more detailed one becomes crucial.
Let S be a system (with objects, events, . . .), and let Ψ be the set of its possible
states, determined by the used sensors. Each state ψ corresponds to a description of S.
Taking two of these states, say ψ1 and ψ2 , we would like to compare them with respect
to the more-abstract-than relation. Except in cases where the relation is obvious (as,
for instance, in the cases of Figs. 5.2 and 5.3), it would not be easy, or it would be even
impossible, to make a comparison. In fact, if abstraction were an absolute notion,
for any two states we could say whether they are comparable (and then which one
is the more abstract) or whether they are incomparable, only by looking at the states
themselves. In fact, it would be necessary to define a function I(ψ), depending only
on the state ψ, which represents the information that the description corresponding
to ψ conveys about the system under analysis. In this case, any pair of states ψ1
and ψ2 such that I(ψ2 ) < I(ψ1 ) would imply that ψ2 is more abstract than ψ1 ,
even though ψ1 and ψ2 are unrelated. On the contrary, with the generative view of
abstraction that we propose, ψ2 must be obtained from ψ1 through an identifiable
process. By taking the position that the comparison among two states, with respect
to their relative abstraction level, depends on the path followed for going from one
to the other, the comparison may require additional information.
126 5 Boundaries of Abstraction

Fig. 5.5 Liberation (Escher

1955) (Reprinted with per-
mission from The M.C.
Escher Company, Baarn,
The Netherlands)

In order to understand the importance of taking into account the very abstraction
process, let us consider two examples. In Fig. 5.5 the beautiful Escher’s lithography
Liberation is reported. Suppose that we have only access to the view of the birds at the
top and of the triangles at the bottom of the drawing. If we want to link them according
to their information content, we cannot but conclude that they are unrelated. In fact,
birds and triangles, taken in isolation, do not have anything meaningful in common.
On the contrary, if we have access to the whole drawing, we see quite clearly how the
triangles are obtained from the birds through a sequence of detail eliminations and
approximations. Then, knowing the process of transformation from one state into
another allows the birds and the triangles to be related, by saying that the triangles
are indeed modified representations of the birds.
Another example can be found in Fig. 5.6, taken from Il vero modo et ordine per
dissegnar tutte le parti ie membra del corpo humano,3 by Odoardo Fialetti, printed in
Venice in 1608. Here a study on the techniques for drawing a human eye is illustrated.

3 “The true way and order for drawing all parts and members of the human body”.
5.1 Characteristic Aspects of Abstraction 127

Fig. 5.6 From Fialetti’s “Il vero modo et ordine per dissegnar tutte le parti ie membra del corpo
humano”, 1608. One among a set of studies for drawing eyes

In the series of eyes it is really hard, without looking at the intermediate steps,
to relate the top-leftmost and bottom right-most drawings. However, the relation
between the two clearly appears if we consider the whole process of stepwise trans-
formations.
Abstraction has been considered a process also in Mathematics, where the concept
of number is reached, according to Husserl, through a counting process leaving
aside all properties of a set of objects, except their numerosity. Lewis [329] defines
explicitly abstraction as a process of removing details from the concrete.4 Finally,
Staub and Stern’s approach to abstraction5 mixes the idea of both abstraction as a
process and abstraction as a relative notion, as we do; in fact, these authors claim
that concepts are obtained by reasoning, starting from the concrete world. Along the
reasoning chain abstraction increases, so that the farther from the concrete world
a concept is along the chain, the more abstract it is. As an example, real numbers
are more abstract than integers. Even though this approach shares with our view the
ideas of process and relativity of abstraction, we do not reach the same conclusions
as Staub and Stern, regarding numbers, because they do not acknowledge the role of
information reduction along the abstraction process.
Considering abstraction as a process raises two important issues. The first one
is to investigate whether the process has a preferential direction, and whether it is
reversible. The second one is the identification of the abstraction processes them-
selves. Concerning the first issue, we must remember that we have defined abstraction
as an information reduction mechanism, whatever this means. A part of the world,
namely a system S, contains, in nuce, all the features and details that can possibly
be detected. It is then necessary to decide what features of the system are to be
considered and measured, and which ones are not. The result of this selection is
the most detailed description dg of the system that we decide to keep, and also the

4 See Sect. 2.1.

5 See Sect. 2.3.
128 5 Boundaries of Abstraction

Fig. 5.7 A color picture has been transformed into a black and white one. If the color is added
again, there is no clue for performing this addition correctly, if it is not know how the color was
originally removed [A color version of this figure is reported in Fig. H.7 of Appendix H]

most informative one. We call dg a ground description. If we remove from dg some

features, we obtain a less informative, and hence more abstract description da1 of S,
which is still truthful to S, because the features that are left belonged indeed to S. By
removing more and more features, we obtain more and more abstract descriptions
da2 , . . . , dak (k 2).
Suppose that we want now to invert the process. If we look at the most abstract
description dak and we want to go back to more and more concrete descriptions of
S, we cannot do it, because, at each step, we do not know what features we have
to add, in order to still be faithful to S. Almost surely we will end up with a series
of descriptions dk−1 , . . . , d quite different from the actual ones. An example is
g
reported in Fig. 5.7, where the color has been hidden in a picture and then added
again without any information on what colors were removed.
As a conclusion, we may say that abstraction has indeed a preferential direction
from the most detailed to the least detailed description of a system. This means
that, in order to abstract a description dg into a description da according to a given
process, all the information needed to implement the process is included in dg . On the
contrary, the process of de-abstraction (or concretion), from da to dg needs additional
information, not included in da itself, if dg has to be recovered exactly. In other words,
it is not possible to reach dg from da by having access only to da , but we have to
know how da was derived originally from dg .
Even though we are mostly interested in going from the more detailed to the less
detailed (“bottom-up” approach to abstraction), we have to mention that in particular
tasks a “top-down” approach is also possible, namely starting from a very sketchy
representation of a system, and adding details step by step. This is typically the case
of design, or programming, where implementation details are added along the way.
In this case, there may be a problem of consistency with the reality: if we want to
cope with an existing reality, the “ground” world must be known in advance, and it
can be used to guide the concretion process. Let us consider for instance the case
of the design of an algorithm; given a problem, an abstract algorithm for finding
its solution can be conceived. Then, when the algorithm must be implemented as a
program in a particular languages, only instructions belonging to that languages can
be used, if the program has to run.
5.1 Characteristic Aspects of Abstraction 129

Concerning the issue of checking the more-abstract-than relation, we have

chosen to proceed in analogy to what was done for Machine Learning. Compar-
ing two concepts (hypotheses) with respect to the more-general than relation implies
a costly extensional test on their coverage. If the relation has to be tested intension-
ally, i.e., by looking only at the concept descriptions, generalization (specialization)
operators have been introduced, such that their application is guaranteed to produce
a more general (specific) concept.6 Then, the space of hypotheses, where the learner
searches, can be organized by relating its elements through sequences of application
of these operators.
Given a description d1 , there is usually a large number of ways for obtaining
from it a more abstract one, d2 ; this makes comparison of d1 and d2 , with respect
to the more-abstract-than relation, problematic at best. Then, we define a set of
abstraction operators (see Chap. 7) such that the application to d1 of any operator
in the set is guaranteed to produce a more abstract state d2 , according to the chosen
definition of the more-abstract-than relation. Any pair of states d1 and d2 , such that
neither is obtained from the other through a sequence of abstraction operators, are
incomparable with respect to this relation.
We are aware that, with this choice, there may be pairs of states, which we can
intuitively compare with respect to abstraction, which are to be formally declared
incomparable. At the same time, we exclude from consideration any form of abstrac-
tion that cannot be realized through the defined operator set. This is the price we pay
to obtain an operational definition for the more-abstract-than relation. In practice,
no serious problems arise from the above limitation, because it is possible to defined
a sufficiently wide spectrum of generic operators (transversally applicable to many
domains) to cope with most of the interesting cases.

5.1.5 Abstraction as Information Hiding

The last aspect of abstraction to be discussed in this chapter is its effect on the
information that is removed. According to the view of Abstract Data Types in Com-
puter Science, the idea behind abstraction is that information is not deleted from
a description at a given level, but only hidden, so that it can be seen at a lower
(more detailed) levels, and also recovered when needed. This information hiding
is also called encapsulation [142]. For example, in Roşu’s approach, abstraction is
explicitly defined as information hiding.7 As we have discussed in Sect. 2.4, Colburn
and Shute [111] contrast information hiding in writing programs with information
neglect in Mathematics.
If we think of the reversibility problem mentioned earlier, the loss of the informa-
tion removed at a given level would completely hinder the concretion process. In fact,
any lost information cannot be recovered without seeking it again in the real world

6 See the pioneering work by Michalski [367].

7 See Sect. 2.3.
130 5 Boundaries of Abstraction

through measurements. As we will see in the next chapter, information hiding, as

opposed to information loss, will play a fundamental role in reasoning with abstract
knowledge.
Information hiding is strictly connected with naming, i.e., the process of assign-
ing a “name” to a set of properties. In some sense, naming is an extreme form
of information hiding: all information referring to the named object disappears,
except the name itself, which can be used to retrieve the complete information when
needed. As an example, when we say “glass”, we immediately think of a container for
liquids, with a concave form, an open top, and a flat bottom, and of size such that
it is graspable by a hand. As we have seen in Sect. 2.6, Barsalou does also consider
naming as a way to simplify the essential description of an object.
Finally, information hiding is also related to Schmidtke’s operations of zooming
into and out of a scene.8 These operations allow close and far objects to be differently
highlighted, as well as smaller or larger details be put differently in focus, changing
thus the amount of information conveyed by a visual scene.

5.2 Boundaries of Abstraction

Having now introduced our basic ideas about abstraction, we can see how they
help in setting boundaries between abstraction and cognate notions. In particular,
we will discuss the relations between abstraction, on one hand, and generalization,
categorization, approximation, simplification, and reformulation, on the other.

5.2.1 Abstraction and Generalization/Categorization

Throughout Chaps. 2 and 3 we have seen that abstraction is very often linked to the
notion of generalization, and sometimes even identified with it, as, for instance, by
Colunga and Smith [112] or by Thinus Blanc [528]. Of course the relation between
abstraction and generalization depends on their respective definitions. If abstraction
is defined as generalization, the identification cannot be but correct. However, we
claim that this identification is not appropriate, because it masks distinctions that it
is important and useful to preserve.
The first and most important distinction between abstraction and generalization
is that the first is an intensional property, i.e., it pertains to descriptions, whereas
generalization is extensional, i.e., it pertains to instances. In order to clarify this
distinction, we have to start somewhat far, and precisely from Frege’s notion of
concept [181]. Given a First Order Logical language L, let ϕ(x) be a formula with
free variables x = {x1 , . . . , xn }. This formula represents a “concept”, i.e., the set of
all n-tuple of objects a = {a1 , . . . , an }, in the chosen domain A, which satisfy ϕ(x).

8 See Sect. 2.6.

5.2 Boundaries of Abstraction 131

(a) (b)
Fig. 5.8 Examples of concepts according to Frege. a COV (ϕ) is the extension of the concept
Mother(x, y), i.e., the set of pairs of peoples such that y is the mother of x. b COV 1 is the extension
of the concept Mother(x, b), i.e., the set of b’s children, and COV 2 is the extension of the concept
∃x[Mother(x, y)], i.e., the set of women that have at least one child in A. COV 2 is the projection
of COV (ϕ) on the y axis

Let us call this set COV (ϕ).9 Formula ϕ does not have a truth value associated to it,
but an “extension”. It is not necessary that all the variables in ϕ are free: some may
be bound, but at least one must remain free.
We notice that this definition of concept is coherent with the view of concept
as a set of properties introduced in Sect. 2.6. In fact, formula ϕ(x) specifies what
properties the instances must satisfy.
Example 5.2 Let ϕ(x, y) = Mother(x, y) be a concept and let A be a given set of
people. Let the upper right quadrant of the space (x, y) contains the set of all pairs
(x, y) ⊆ A × A, and let COV (ϕ) be the extension of ϕ(x, y), i.e., the set of all pairs
(x, y) of people such that y is the mother of x, as it appears in Fig. 5.8a. We may
bind a variable either by instantiating it to a constant, or by using a quantifier. Let us
see what concepts we obtain through these operations.
Let us first set y = b; the concept Mother(x, b) has the only free variable x,
and represents the set of children of b, some of which (the subset COV 1 on the x
axes) belong to the set A. On the contrary, if we bind x to a, we obtain the concept
Mother(a, y), which represents the set of mothers of a; as there is only one mother
for each person, this concept has either an extension consisting of a unique point (if
a and his/her mother belong to A), or it is void.
Consider now the existential quantifier applied to x, i.e., ∃x[Mother(x, y)]. This
is a concept with free variable y, and represent the subset COV 2 of persons y that are

9 COV (ϕ) is the set of models of ϕ in A.

132 5 Boundaries of Abstraction

mothers of some children. On the contrary, the concept ∃y[mother(x, y)] represents
the set of people whose mother is included in the set A.
Finally, let us consider the universal quantifier applied to x, i.e., ∀x[Mother(x, y)].
This concept has y as free variable, and represents the set of women y that are mother
of all persons in A, clearly a void concept, because y ∈ A, but cannot be mother of
herself. On the contrary, the concept ∀y[Mother(x, y)] represents the set of people
x whose mothers are the whole population in A; again, clearly a void concept.
When a formula does not have any free variable, it becomes a sentence; it does
not have an extension associated to it, but has a truth value.
Example 5.3 Let us consider in the set A the whole humanity at a given time instant.
The formula ∀x ∃y [mother(x, y)] is not a concept, because it does not have any free
variable, but is a sentence that has value true, because it asserts that each person has
a mother.
Concepts can be compared according to their extension: a concept C1 is more
general than concept C2 iff COV (C1 ) ⊇ COV (C2 ). We notice that the more-general-
than relation, in order to be assessed, needs to compare sets of concept instances.
Hence, generality cannot be attributed to sentences, which do not have associated
extensions. On their part, sentences, with their truth value, provide information. They
are the intensional counterpart of concepts. Being an intensional property, abstraction
is related to sentences. As a conclusion, the following differences between abstraction
and generalization can be assessed:
• Abstraction is an intensional notion and has to do with information, whereas gen-
eralization is an extensional notion and has to do with instance covering.
• Abstraction can be applied to a single entity, generalization only to sets of entities.
• Abstraction is related to sentences, generalization to concepts (in Frege’s sense).
• Abstraction and generalization can be performed by means of different operators.10
Abstraction and generalization not only can be distinguished, but they are, in a
sense, orthogonal. In Fig. 5.9 we see that they can be combined in all possible ways,
generating a bi-dimensional space, where one axis corresponds to abstraction and the
other to generalization. It is possible to find descriptions that are general and abstract,
general and concrete, specific and abstract, or specific and concrete. The separation of
abstraction from generalization “solves”, to a certain extent, Berkeley’s objection to
the idea of abstraction (see Sect. 2.1); in fact, a concept can, at the same time, be very
precise and cover many instances. Moreover, the separation agrees with Laycock’s
view, reported in Sect. 2.1, that there are two dichotomies: “abstract/concrete” and
“universal/particular”. The first one directly corresponds to the Abstraction axis in
Fig. 5.9, whereas the second one can be mapped onto the Generalization axis in
the same figure. Unfortunately, starting from the ontic view of abstraction, other
philosophers, such as Quine (see Sect. 2.1), let again the two dichotomies coincide.

10 For instance, aggregation is only meaningful for abstraction, whereas omitting details pertains
to both abstraction and generalisation.
5.2 Boundaries of Abstraction 133

Fig. 5.9 Abstraction and generalization can be combined in every possible way. In the left-bottom
corner there is picture of one of the authors, which is specific (only one instance) and concrete
(all the skin, hair, face, . . . details are visible). In the right-bottom corner there is a version of the
picture which is specific (only one instance, as the person is still recognizable) and abstract (most
details of the appearance are hidden). In the top-left corner the chimpanzee-human last common
ancestor is represented with many physical details, making thus the picture still concrete; however
many monkeys or humans satisfy the same description, so that this is an example of a concrete but
general concept. Finally, in the top-right corner there is a representation of a human head according
to Marr [353] (see Fig. 2.13); the head is abstract (very few details of the appearance) and general
(any person could be an instance) image [A color version of this figure in reported in Fig. H.8 of
Appendix H]

Figure 5.9 has to be read along two dimensions: the pictures may be viewed either
as concepts, or as descriptions of concepts (sentences). In the first interpretation, they
must be compared according to their extension, which increases from bottom to top.
In the second interpretation, they must be compared according to the amount of
information they provide, which decreases from left to right.
Clearly, even if Fig. 5.9 shows that it is not always the case, it happens very
frequently that a more abstract description corresponds to a more general concept.
In fact, during the process of abstraction, details are increasingly removed, and the
set of properties that instances must satisfy shrinks. This concomitance might be a
reason for the confusion between abstraction and generalization.
A second aspect that differentiates abstraction from generalization consists in
the possibly different nature of their related operators. If we consider the hier-
archy in Fig. 5.2, we can make two observations. The first is that, if nodes are
134 5 Boundaries of Abstraction

viewed as concepts, by going up the hierarchy, more and more general concepts are
found, because their extension is the union of the extensions of the children nodes.
On the other hand, if the nodes are viewed as descriptions, they become more and
more abstract going up, because the information that they provide about the instances
of the associated concepts are less and less detailed. Then, in this case, an increase
in generality goes together with an increase in abstractness.
If we now look at Fig. 5.3, we see that what was said for the hierarchy in Fig. 5.2
is not applicable here. In fact, the nodes in the “part-of” hierarchy can only be
interpreted as descriptions, whose abstraction level increases going up, and not as
concepts whose generality increases. The nodes in this hierarchy are incomparable
from the more-general-than relation point of view.
In several disciplines where abstraction is used and deemed important this notion
has been related to (and sometimes defined on the basis of) a mechanism for extracting
common features from a variety of instances. By considering what has been said
earlier in this chapter, this mechanism might underlie generalization, rather than
abstraction. In fact, the abstraction process, in order to be performed, does not need to
look at several instances, but it can be applied to single objects, so that commonalities
with other objects do not matter. In addition, abstraction is a process that hides
features instead of searching for them. Again, as searching for common features
means deleting the different ones, searching shared features across instances and
forgetting irrelevant ones ends up with a more abstract description. In fact, abstraction
ignores irrelevant features; these ones are likely to be those that are accidental to
an instance and not belonging to its essence. Then, even though generalization and
abstraction are different mechanisms with different goals, their result may sometimes
be the same, which explains again why they get often confused.
After discussing the differences between generalization and abstraction, we may
look into their possible links. To this aim, let us consider a question that has been dis-
cussed in the Machine Learning literature at some length: Is the assertion s1 =“Yves
lives in France” more general than the assertion s2 = “Yves lives in Paris”? [172].
Actually, this question, as it is formulated, is ill-posed. First of all, both s1 and s2
are sentences and not concepts; as such, they do not have an extension associated to
them, but a truth value. As we have discussed earlier, generalization is an extensional
property and, then, it makes no sense to assign to either s1 or s2 a generality status.
On the other hand, assertion s1 provides much less information about the place where
Yves lives than s2 , and then, we are willing to say that s1 is a more abstract description
of Yves’ domicile than s2 . Now, let us consider the set of people living in Europe, and
let lives(x, France) be the concept whose extension COV (lives(x, France)) is
the subset of all people living in France. In an analogous way, let lives(x, Paris)
be the concept whose extension COV (lives(x, Paris)) is the subset of all people
living in Paris. As Paris is in France, then

COV (lives(x, Paris)) ⊆ COV (lives(x, France)),

and, hence, the concept lives(x, France) is more general than the concept
lives(x, Paris).
5.2 Boundaries of Abstraction 135

We can now see a pattern emerging: if a concept C1 is more general than a

concept C2 , then a pair of sentences, s1 and s2 , obtained by bounding in the same
way all the free variables in C1 and C2 , is such that s1 is more abstract that s2 .
Notice that the converse is not true, because descriptions comparable under the
more-abstract-than relation may not correspond to concepts comparable under the
more-general-than relation (for instance, in the case of Fig. 5.3). This asymmetry
further reinforces the difference between abstraction and generalization.
To conclude this section, we want to briefly mention that we did not consider, in
our discussion, categorization, which is, instead, at the core of the studies in concept
formation in Cognitive Sciences (see Sect. 2.6). The reason is that categorization,
in our view, is not directly comparable to abstraction, because it is a task, whereas
abstraction, as well as generalization, are processes. Generalization is indeed one of
the processes by which categorization can be achieved; it is not the unique, however,
because finding common properties of instances (a bottom-up mechanism) can work
in strict connection with finding distinctions in overgeneral categories (a top-down
mechanism).

5.2.2 Abstraction, Approximation, and Reformulation

After investigating the connections between abstraction and generalization, let us try
to relate abstraction and approximation. The task here is more difficult, because the
precise definition of the more-general-than relation in terms of extensions is not paral-
leled by anything similar for approximation. We can decompose the problem into two
parts: defining approximation first, and discussing the link approximation/abstraction
later.
The Oxford Dictionary defines approximation as “a value or quantity that is nearly
but not exactly correct ” or “a thing that is similar to something else, but is not exactly
the same ”. Then, intuitively, approximation is related to the notion of controlled error.
When describing a system, some part of its description or behavior is replaced by
another one. In principle, any part could be replaced with anything else, depending
on the reasons underlying the substitution. When considering approximation in the
context of abstraction, the main reason is usually simplification. We may recall here
Hobbs’ proposal [252] of considering approximate values in a numerical interval as
indistinguishable.
Example 5.4 (Pendulum) Let us consider the simple pendulum represented in
Fig. 5.10, embedded in the gravitational field. If we suppose that the pendulum starts
from the position θ = θ0 > 0 with null velocity, and that there are no dissipative
forces, it will oscillate between position θ = θ0 and θ = −θ0 with a period T .
Solving the equation of motion
g
θ̈ = − sin θ
r
136 5 Boundaries of Abstraction

Fig. 5.10 A simple pen-

0 x
dulum. A point mass m is
attached to a non-extensible
cord of length r , and is subject r
to the gravity. Starting from
rest in a given position, the
pendulum swings around the m
position θ = 0 with a period T
mg
y

shows that the oscillation period is given by

r θ0
T =4 K sin , (5.1)
g 2

where r is the length of the pendulum, g the gravity acceleration, and K the Complete
Elliptic Integral of First Kind.
If we assume that θ0 is small, i.e., that the pendulum swings in the vicinity of the
position θ = 0, an approximate (but simpler to solve) equation of motion is obtained,
namely
g
θ̈ = − θ
r
This linearized equation provides an approximate value for the period, i.e.,

r
Ta = 2π (5.2)
g

If we compare the values of T and Ta as function of r , for the values θ0 =

π/6, π/4, π/3, π/2, we obtain that the quality of the approximation increases when
θ0 decreases.
Example 5.4 describes a case in which the approximation refers to numerical
values. In this case, it is often possible to also estimate the approximation error. For
instance, the relative error of Ta with respect to T , is given by:

θ0
|Ta − T | 2π r
g − 4 r
g K sin 2 π
= = 1 − (5.3)
T 4 rg K sin θ20 2K sin θ0
2

When θ0 = 0, K (0) = π/2, and hence the error is 0, as it must be. When
θ0 = π/2, K (sin π/4) = 2.086, and the maximum relative error is 0.25, namely the
approximate value does not differ from the true one by more than 25 % of the latter.
Approximation may also be done in discrete systems, where it is not always easy
to describe what replaces what, and how good the approximation is. This was actually
5.2 Boundaries of Abstraction 137

Fig. 5.11 A running man has been approximated by replacing his body parts by polygons. The
approximation lets both the human body and its running posture to be still recognized

the starting point of the abstraction theory proposed by Imielinski [269], as described
in Sect. 4.5.2.
The following example clarifies this case.
Example 5.5 Let us consider the running man in Fig. 5.11. The parts of the body
have been approximated by means of polygons, and yet the body and its running
attitude are clearly recognizable. However, in this case it is very difficult to define
an approximation error.
The schematic representation of a man in Fig. 5.11 recalls Marr’s 3-D representa-
tion with generalized cones (geons), reported in Fig. 2.13, which has to be considered
an approximation of the human body as well.
A last example comes from Computer Graphics, where 3-D objects are represented
as a network of meshes (see Fig. 5.12). The mesh representation makes a compromise
between realism in object rendering and computational complexity. Increasing the
number of meshes lets the realism of the image increases, but also computational
needs increase.
In summary, approximation occurs when some part (a variable or a value) of
a system is not hidden but replaced with something else, usually with the aim of
achieving simplicity: a simpler description, a simpler behavior, a simpler solution.
The approximation of Example 5.4 reduces the complexity of solving the equation
of motion, whereas the ones of Figs. 5.11 and 5.12 generate simplified descriptions.
138 5 Boundaries of Abstraction

Fig. 5.12 A dolphin represented with a network of meshes (From Wikipedia)

We may now turn to the question of whether an approximation is an abstraction,

if abstraction is considered as an information reduction process. In principle, the
substitution of a part of a system with another one (even simpler) does not neces-
sarily provide less information, but a different one. In the case of Example 5.4, the
final result, namely the dynamic behavior of the pendulum, is simply quantitatively
different from the true one, but not qualitatively. On the contrary, in the cases of
Figs. 5.11 and 5.12 the approximate pictures seem to provide information which is
both less and different from the original ones. Even if abstraction and approximation
look quite similar, they may be nevertheless quite far apart, as it will be discussed in
Chap. 6.
The idea of approximation, as presented in this section, is deeply connected with de
Vries’ mechanism of idealization,11 which consists in replacing a complicated detail
of the world with a simplified one. However, de Vries thinks that idealization leads
to generalization, rather than abstraction, and that it provides imprecise knowledge
(we would say, approximate knowledge, instead).
Finally, Euzenat exploits the idea of granularity12 to define approximate repre-
sentations in the time and space domains.
In analogy to approximation, also reformulation of a problem may be, in principle,
a generic one, which can be motivated by the most varied reasons. However, from our
computational perspective, useful reformulations are those that simplify a problem,
in the sense that it becomes either easier to understand, or easier to solve. Apart from
this intended goal, there are no clear links between abstraction and reformulation.
However, the two notions have again simplification as a common denominator.
More precise links between abstraction, approximation, and reformulation will
be provided in Chap. 6.

11 See Sect. 2.6.

12 See Sect. 2.6.
5.3 Summary 139

5.3 Summary

Abstraction has been related, in the literature, to other notions, such as generalization,
approximation, and reformulation. It is thus important to try to set boundaries among
these notions (or mechanisms) in such a way that modifications applied to system
descriptions are clearly understood with their effects and properties.
A recurring theme in defining all the above mechanisms is simplicity. All of
them aim at simplifying a problem or a problem solution. This common struggle
to simplicity has sometimes generated confusion. Even though it is true that sim-
plification is the ultimate goal of abstraction, generalization, approximation and
reformulation, nonetheless the workings of these mechanisms may be very differ-
ent from one another. In addition, as there is no precise definition of simplicity,
either, things become even more intricate, as different notions of simplicity may be
implicitly or explicitly invoked.
In our approach we use information as a common denominator to all notions. This
allows a clear characterization of the various notions to be provided, even though only
focused on the problem of knowledge representation. With this choice, all the above
mechanisms can be described as acting on spaces containing states of a dynamical
system. The mechanism, be it abstraction or any of the others, is modeled as a process
moving from one state to another.
Concerning abstraction, it has been identified as a mechanism that handles sys-
tem descriptions, providing information about the system itself, and modifying the
amount of the information provided by hiding or aggregating details. Only changes
in the information are considered, so that it is not necessary to assess whether an
entity is abstract or concrete; what only matters is a partial order among entities,
generated by a more-abstract-than partial relation.
Abstraction is not viewed as a mapping between two existing spaces, but as a
generative process, which, starting from one space (called “ground”), generates the
other (the abstract one) with less information. As a consequence, in the more abstract
space there are only states with at least one antecedent in the ground one. In this case,
no “spurious” state [587] may appear.
Chapter 6
The KRA Model

n this chapter the notions informally introduced in Chap. 5 will be

formalize. The abstraction model that we propose is called KRA, and it is primarily
targeted to abstraction performed on observations (hence we call it “perception-
based”). The name KRA stands for Knowledge Reformulation and Abstraction,
because it distinguishes between changes in representation format and changes in
information content. This distinction goes back to Korf [297], who divided represen-
tation changes into two classes: homomorphisms, which keep constant the format of
the representation and change the information content, and isomorphisms, which do
change the information format while keeping the information content constant.
All previously proposed theories of abstraction, sound and elegant as they may
be, are difficult to use in practice, because they overlook all the aspects involved in
concretely abstracting the description of a system. Moreover, most of them do not
consider explicitly the task to be performed. Task-oriented approaches to abstraction
have been proposed in several domains, such as reasoning about physical systems
[555], planning [258], or Reinforcement Learning [137], just to mention some. The
idea behind the KRA model is to follow this line of research, bringing abstraction
into the realm of effective procedures, well-suited to real-world size applications,
with a focus on flexible knowledge representations, adaptable to various uses.
The primary motivation behind reasoning and acting is the need to perform a task
or to reach a goal. We denote generically this task by a query Q. Performing the
task consists in answering the query. As a query may have more than one answer,
we denote by ANS(Q) the set of answers to Q. The specification of the task is
fundamental for abstraction, because an abstraction which is useful for one task may
be harmful for another.

L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 141
DOI: 10.1007/978-1-4614-7052-6_6, © Springer Science+Business Media New York 2013
142 6 The KRA Model

(a) (b)
Fig. 6.1 a A task to be performed (a query) requires both measurements (observations) from the
world and a theory. b In order to perform the task of detecting the presence of a person in a corridor,
a camera is used. The output of the camera (the observations) is processed by suitable algorithms
(theory)

As sketched in Fig. 6.1, the task (query) requires (at least) two sources of
information: measurements of observables in S, obtained through a set of sensors,1
and a task-dependent theory, namely a priori information about the structure of S,
its functioning, and its relations with the rest of the world. An additional source of
information may include some general background knowledge. As we will see in the
following, tracking the sources of the information is very important. Q may assume
various formats, according to the nature of S. For example, in symbolic systems
Q may be a closed logical formula (a sentence) to be proved true, or an open one
(a “concept”) whose extension has to be found. In continuous systems Q may be a
set of variables whose values have to be computed. Analogous observations can be
done for the measurements and the theory, which must comply with the format of
the query.
Concerning the source of all the components in Fig. 6.1a, the measures clearly
come from the world, whereas the theory and the query itself are usually provided
by a user, who will receive the answers. For the moment we just consider abstraction
from the representation point of view, leaving a discussion of the interaction between
theory and observations for a later chapter.

6.1 Query Environment, Description Frame, and Configuration

Space

In order to build a model of abstraction useful in practice, we need to enter into a

more detailed description of the elements of the scheme in Fig. 6.1a, and about their
sources and relationships.

1As already mentioned, the term “sensor” has to be intended in a wide sense, not only as a physical
mechanism or apparatus. Acquiring information consists in applying a procedure that supplies the
basic elements of the system under consideration.
6.1 Query Environment, Description Frame, and Configuration Space 143

Let us start from the query. As already said, the query Q represents the task to
be performed on a system S, and it may assume different formats. The query is
provided by the user, and, in order to answer it, we need to reason and/or execute
some procedure on data observed in S.
The choice of the sensors Σ (either natural or manmade measurement apparata)
needed to acquire information about S biases all what we can know of S, both directly
(through measurements) or indirectly (through inference). The output from the sen-
sors are the observations. We assume that observations consist of the specification
of the objects that can be present in S, of the values of some attributes of the objects,
of functional relations among objects (functions), and of some inter-relationships
among sets of objects (relations). We are now in the position of introducing the
notion of description frame.
Definition 6.1 (Description frame) Given a set of sensors Σ, let ΓTYPE , ΓO , ΓA ,
ΓF , and ΓR be the sets of all types, identifiers of objects, attributes, functions and
relations, respectively, potentially observable in a system S by means of Σ. The
description frame of Σ is the 5-ple Γ = ΓTYPE , ΓO , ΓA , ΓF , ΓR .
The set ΓO includes labels of the objects that can possibly be detected by Σ in
the system S. Here, the notion of object is considered as an undefined primitive, and
we rely on an intuitive definition of objects as elementary units (physical objects,
images, words, concepts, . . .) appearing in the system S to be described. Objects
are typed, i.e., assigned to different classes, each one characterized by potentially
different properties. For instance, an object can be of type human, whereas another
of type book. If no specific type is given, then objects will be of the generic type
obj. We will denote by ΓTYPE the set of types that objects can have, and by ΓO,t
the subset of objects of type t∈ ΓTYPE .
The set of attributes ΓA = {(A1 , Λ1 ), (A2 , Λ2 ), . . . , (AM , ΛM )} consists of
descriptors of the objects. Each attribute Am (1 m M) may take values either
(m) (m)
in a discrete set Λm = {v1 , . . . , vm }, or in a continuous one, i.e., a (proper or
improper) subset of the real axis R, namely Λm = [a, b] ⊆ R. When suitable, we will
consider the type of the objects as an attribute A0 , whose domain is Λ0 = ΓTYPE ,
and |Λ0 | = 0 . For each type of objects only a subset of the attributes defined
(t) (t) (t) (t)
in ΓA is usually applicable. Let ΓA,t = {(A1 , Λ1 ), . . . , (AMt , ΛMt )} ⊆ ΓA
(t)
be the subset of attributes applicable to objects of type t. The set Λi ⊆ Λi (1
i Mt ) is the set of values that objects of type t can take on. Let |ΓA,t | =
Mt M. The association of the attributes and their domains to the types has
also the advantage that it allows some specific values, characteristic of a type,
to be specified. For instance, given the attribute Color with domain ΛColor =
{yellow, red, white, orange, pink, blue, green}, we may asso-
ciate to flowers of type poppy the attribute (Color (poppy) , {red}). This is an easy
way to represent ontologies, where the attributes characterizing each node can be
specified.
The set ΓF = {f1 , f2 , . . . , fH } contains some functions fh (1 h H), with
arity th , such that:
144 6 The KRA Model

fh : DOM(fh ) → CD(fh ), (6.1)

The domain DOM(fh ) contains a set of th -ples, each one with associated a value in
the co-domain CD(fh ). We assume that all arguments of fh take values in ΓO , so that
t
DOM(fh ) = ΓOh . The co-domain can be either ΓO or another discrete or continuous
value set. Notice that functions are, at this point, only empty shells, because ΓO
contains only a set of identifiers, as previously mentioned. Then, function fh has to
be intended as a procedure that, once the values of its tuple of arguments is actually
instantiated, associates to the tuple a value in CD(fh ). As an example, let Mother:
ΓO → ΓO be a function. The semantics of this function is that, once the identifier
of a particular person x is given, the procedure Mother provides the identifier of x’s
mother.
Finally, the set ΓR contains some relations Rk (1 k K), each one of arity tk ,
such that:
t
Rk ⊆ ΓOk

Each argument of a relation can only take values in ΓO . As it happens for functions,
also Rk is a procedure that, given an instantiation of the tuple of its arguments, is able
to ascertain whether the tuple satisfies the relation. As an example, let us consider
relation RFatherOf ⊆ ΓO × ΓO . To each pair (x1 , x2 ) of persons, RFatherOf determines
whether x1 is actually the father of x2 .
When an attribute, or a function, or relation is not applicable to an object (or set
of objects), we denote it as NA (Not Applicable). It is also possible that some value,
even though applicable, is not known; in this case we set it to UN.
The description frame Γ generates the totality of the descriptions that can be
formed with the elements specified in it (i.e., objects, attributes, functions and rela-
tions) be a function. The set of these descriptions is the configuration space. In order
to formally define the configuration space, we have to look into more details at how
a description can be built up with the elements of Γ .
First of all, any object has its type t associated to it, and has an identifier o ∈ ΓO,t .
An object is described by a vector:

(t) (t) (t)

(o, t, vj1 , . . . , vjr , . . . , vjM ), (6.2)
t

(t) (t) (t) (t)

where vjr ∈ Λr (1 r Mt , |Λr | = r ).
Consider now a generic system containing N objects. We can define a subspace of
partial descriptions containing the characteristics of the N objects, each one specified
by a vector (6.2).
Let fh ∈ ΓF be a function (6.1). The function defines a set of (th +1)-ple
x1 , x2 , . . . , xth , f (x1 , . . . , xth ) , where f (x1 , . . . , xth ) is the value associated by fh
to the tuple (x1 , x2 , . . . , xth ). Given a system with N objects, let FSET (fh ) be the
(possibly infinite) set of all possible tuples x1 , x2 , . . . , xth , f (x1 , . . . , xth ) that can
be formed with the N objects. In a specific system not all the possible tuples in
6.1 Query Environment, Description Frame, and Configuration Space 145

FSET (fh ) are usually observed, but only a subset of v (0 v |FSET (fh )|) of them.
Then, in order to capture the actual situation, it is useful to introduce the following
definition.
Definition 6.2 (FCOV) Given a function fh of arity th , let FCOV (fh ) be a cover of
fh , namely a set of tuples satisfying the function. Then:

FCOV (fh ) ⊆ FSET (fh ) or FCOV (fh ) ∈ 2FSET (fh )

An analogous reasoning can be made for relations. We define by RSET (Rk ) the set of
all tk -ples x1 , x2 , . . . , xtk that can potentially verify Rk in a system with N objects.
In analogy with functions, we introduce the following definition:
Definition 6.3 (RCOV) Given a relations Rk of arity tk , let RCOV (Rk ) be a cover
of Rk , namely, a set of tk -tuples (x1 , . . . , xtk ) satisfying Rk . It is:

RCOV (Rk ) ⊆ RSET (Rk ) or RCOV (Rk ) ∈ 2RSET (Rk )

The set RCOV (Rk ) contains v tuples (0 v |RSET (Rk )|).

In defining the configurations we use the “Closed World Assumption” [448], i.e., we
assume that anything not explicitly specified is either non applicable (for attributes
and functions) or false (for relations).
We can now introduce the configuration space.
Definition 6.4 (Configuration space) A description frame Γ generates, for a system
with N objects, a configuration space ΨN , which contains all the possible descriptions
built up with the descriptive elements of Γ .
A configuration ψ ∈ Ψ takes the form:

ψ = ({(on , tn , v(t1 n ) , . . . , vM
(tn )
t
| 1 n N), {FCOV (fh ) | 1 h H},
n

{RCOV (Rk ) | 1 k K})

The description frame and the configuration space are defined before any obser-
vation is made in the world. Let us consider now a system S, and let us collect all
the measures performed on it in a structure, called a P-Set, and denoted P. The
name comes from the fact that P is a perception, i.e., it contains the measures and
information “perceived” in the world. As we assign a primary role to P, we call our
model of abstraction “perception-based”.
Definition 6.5 (P-Set) Given a system S and a set of sensors Σ, let Γ be its
associated description frame, and Ψ the corresponding configuration space.
A P-Set P, containing the specific observations made on the system S, is a 4-ple
P = O, A, F, R, where O is the actual set of identifiers of (typed) objects observed
in S, and A, F and R are specific instantiations, on the actual objects belonging to
O, of the attributes, functions and relations defined in Γ .
146 6 The KRA Model

The definition of a P-Set is similar to the definition of a domain conceptualization,

introduced by Subramanian [512], and discussed in Sect. 4.8.3.
The set O contains N typed objects, whose identifiers are taken from ΓO . The set
A contains the actual values of the attributes for each one of the objects in O. It can
be expressed as follows:

(t ) (t )
A = {(on , tn , vj1 n (on ), . . . , vjM n (on )) | 1 n N},
tn

where tn is the type of object on .2

F is simply the set of covers, observed in S, of all functions in ΓF :

F = {FCOV (f1 ), . . . , FCOV (fH )}

Analogously:
R = {RCOV (R1 ), . . . , RCOV (RK )}

The relation between a P-Set and a configuration lies in the possibility of leaving
some value unspecified. If no UN appears in a P-Set, then the P-Set is exactly one
configuration. If some values are set to UN, then a P-Set is a set of configurations,
precisely the set of all those configurations that can be completed with any legal
value in the place of UN.
In order to clarify the links between the description frame, the configuration space,
and a P-Set, let us introduce a simple example.
Example 6.1 Let Σ be a set of sensors allowing N objects, all of the same type, to be
observed in a system S. Then, ΓTYPE = {obj} and ΓO = {o1 , . . . , oN |N 1}. The
set ΓA = {(A1 , {0, 1}), (A2 , {true,false})} includes two attributes with sets of
values Λ1 = {0, 1} and Λ2 = {true,false}, respectively. The set of functions,
ΓF = {f : ΓO → ΓO } includes a single function, and the same is true for the set of
relations, namely, ΓR = {R(x, y) ⊆ ΓO2 }.
In this simple case we can find all possible configurations. The possible combi-
nations of attribute values are four; each one can be assigned to any of the N objects,
obtaining the set ΨA (N):

ΨA (N) = {(o1 , 0, true), (o1 , 0, false), (o1 , 1, true), (o1 , 1, false),

(o2 , 0, true), (o2 , 0, false), (o2 , 1, true), (o2 , 1, false),
............,
(oN , 0, true), (oN , 0, false), (oN , 1, true), (oN , 1, false)}

Hence, |ΨA | = 4N . In the description of objects, the type has been omitted, as it is
the same for all.

2 Notice that the names of objects are unique, so that they are the key to themselves.
6.1 Query Environment, Description Frame, and Configuration Space 147

For function f we have: DOM(f ) = CD(f ) = ΓO . Then:

FSET (f ) = {(on , f (on ))|1 n N},

and then: N
N
|ΨF | = = 2N
v
v=0

In an analogous way, for relation R ⊆ ΓO × ΓO we have:

RSET (f ) = {(oi , oj )|oi , oj ∈ ΓO },

and then:
n 2

2
N 2
|ΨR | = = 2N
v
v=0

The total number of configurations will be:

2 2 +3N
|Ψ | = 4N · 2N · 2N = 2N

Suppose that we observe a system S with N = 3 objects, for example {a,b,c}.

Two possible configurations, out of 262, 144, are:
⎧
⎪
⎨ A1 = {(a,0,false), (b,1,true), (c,1,true)}
ψ1 → FCOV1 (f ) = {(a,b)}
⎪
⎩
RCOV1 (R) = {(c,b), (a,c)}
⎧
⎪
⎨ A2 = {(a,1,false), (b,0,true), (c,0,false)}
ψ2 → FCOV1 (f ) = {(b,c)}
⎪
⎩
RCOV1 (R) = {(a,b)}

Let us suppose now to take only partial observations of S. We may obtain, for
instance, the following P-Set:
⎧
⎪
⎨ A = {(a,1,UN), (b,0,true), (c,0,false)}
P→ FCOV (f ) = {(b,UN)}
⎪
⎩
RCOV (R) = {(a,b)}

P corresponds to the set of six configurations {ψ1 , ψ2 , ψ3 , ψ4 , ψ5 , ψ6 }, where:

ψ1 = ((a,1,false), (b,0,true), (c,0,false)), {(b, a)}, {(a,b)}

ψ2 = ((a,1,false), (b,0,true), (c,0,false)), {(b, c)}, {(a,b)}

ψ3 = ((a,1,false), (b,0,true), (c,0,false)), {(b, b)}, {(a,b)}
148 6 The KRA Model

ψ4 = ((a,1,true), (b,0,true), (c,0,false)), {(b, a)}, {(a,b)}

ψ5 = ((a,1,true), (b,0,true), (c,0,false)), {(b, c)}, {(a,b)}

ψ6 = ((a,1,true), (b,0,true), (c,0,false)), {(b, b)}, {(a,b)}

Obviously, one of the possible configuration coincides with the exact one,
namely ψ2 .
A description frame more sophisticated than the previous one is described in the
following example.
Example 6.2 Let Σ be a set of sensors that recognize geometric objects in a plane,
their attributes and relative positions. We define a description frame3

Γ = ΓTYPE , ΓO , ΓA , ΓF , ΓR ,

where ΓTYPE = {point, segment, figure}. Then:

ΓO = {ΓO,point , ΓO,segment , ΓO,figure }

Objects of type point do not have dimensions, objects of type segment are uni-
dimensional, whereas objects of type figure are 2-dimensional.
The sensors provide four attributes, i.e., ΓA = {(Color, ΛColor ), (Shape,ΛShape ),
(Size, ΛSize ), (Length, ΛLength )}. Color captures the wavelength reflected by the
objects, and the corresponding sensor is able to make distinctions into four shades:

ΛColor = {red, blue, green, black}

Attribute Shape captures the spatial structure of the objects, and can distinguish
among four values:

ΛShape = {square, rectangle, circle, triangle}

Attribute Size captures the spatial extension of the objects, and can assume three
values:
ΛSize = {small, medium, large}

3 We may notice that the perception of the world does not provide names to the percepts, but it
limits itself to register the outcomes of a set of sensors Σ, grouping together those that come from
the same sensors, and classifying them accordingly. This is an important point, because it allows
the information about a system be decoupled from its linguistic denotation; for instance, when we
see an object on top of another, we capture their relative spatial position, and this relation is not
affected by the name (ontop, under, supporting, . . .) that we give to the relation itself. Or, if we see
some objects all the same color (say, red) we can perceptually group those objects, without knowing
the name (“red”) of the color, nor even that the observed property has name “color”. The names are
provided from outside the system.
6.1 Query Environment, Description Frame, and Configuration Space 149

Attribute Length captures the linear extension of the objects, and can assume positive
real values:
ΛLength = R+ .

The set R contains the real numbers, of type real. This type does not come from the
observation process, but it is known a priori, and is part of the background knowledge
about the sensors.
Attribute Color is applicable to objects of type segment and figure, attributes
Shape and Size are applicable to objects of type figure, whereas attribute Length
is applicable to objects of type segment. Moreover, all segments are black, and no
figure is black. Then:

ΓA,point = ∅
ΓA,segment = {(Color, {black}), (Length, R+ )}
ΓA,figure = {(Color, {red, blue, green), (Shape, ΛShape ), (Size, ΛSize )}

Regarding functions, we introduce the set ΓF = {Radius, Center}, namely:

Radius : ΓO,figure → ΓO,segment

Center : ΓO,figure → ΓO,point

The functions Radius and Center capture functional links between objects of type
figure and some of its elements. Finally, we consider three binary relations:

ΓR = {Rontop , Rleftof , Rsideof }

where Rontop ⊆ ΓO,figure × ΓO,figure and Rleftof ⊆ ΓO,figure × ΓO,figure

perceive spatial arrangements among objects of type figure, whereas Rsideof ⊆
ΓO,segment × ΓO,figure identifies a relation between objects of type segment
and objects of type figure.
The description frame introduced in Example 6.2 allows a large variety of sce-
narios to be described. Among all these a specific one, described in Example 6.3 and
reported in Fig. 6.2, has been actually observed.
Example 6.3 The scene reported in Fig. 6.2 is a specific P-Set, namely P =
(O, A, F, R), which contains the observations. In P we have:

Opoint = {A, B, C, D, E, F, G, H, O}
Osegment = {AB, BD, DB, CA, EF, HF, GH, GE, OP}
Ofigure = {a, b, c, d}

Out of the many possible combinations of values for the attributes Color, Shape and
Size, the following ones have been associated to the objects of type figure:
150 6 The KRA Model

Fig. 6.2 A geometrical scenario with various geometrical elements [A color version of this figure
is reported in Fig. H.9 of Appendix H]

Afigure = (a, figure, green, triangle, small),

(b, figure, blue, square, large),
(c, figure, red, circle, medium),
(d, figure, green, rectangle, large)

For objects of type of type segment:

Asegment = (AB, segment, black, ),

...........,
(GE, segment, black, h),
(OP, segment, NA, r)

We may notice that the sides of triangle a have not been observed as single entities.
In the above assignments, , h and r are numbers in R+ . The functions Radius and
Center are observed on a unique point of the domain:

OP = Radius(c), and O = Center(c).

Then, FCOV (Radius) = (c, OP) and FCOV (Center) = (c,O) and F =
{FCOV (Radius), FCOV (Center)}.
Finally we have:

RCOV (Rontop ) = {(a,b), (c,d)}

RCOV (Rleftof ) = {(a,c), (a,d), (b,c), (b,d)}
RCOV (Rsideof ) = {(AB, b), (BD, b), (CD, b), (CA, b),
(EF, d), (FH, d), (GH, d), (EG, d)}.

Hence:

R = {RCOV (Rontop ), RCOV (Rleftof ), RCOV (Rsideof )}

6.1 Query Environment, Description Frame, and Configuration Space 151

As we already mentioned, we have to be careful, in defining Σ, about the meaning

of “perception”, “experiments”, and “observations”. There are cases in which the
system S is a part of the physical world, and the only way to acquire information
about it is via measurement instruments, be they natural (the eye, the hear, . . .)
or artificial (a meter, a camera, . . .); in this case we may speak of perception in a
proper sense. In other cases, whatever the nature of the system (physical or not),
information about its properties is provided by an “oracle”. An oracle is a perfect
source of information, which gives answers to questions posed by a user. Obviously,
the two situations (with and without an oracle) can be mixed in the acquisition of
information about a system S.
It may also be the case that S is of a conceptual nature; for example, S may
consists of a written text, where the elementary objects are the words. The process
used to acquire the words consists in extracting them from the text (by reading, or
using an OCR, or by other means).
Example 6.4 Let the system S under consideration be a text. Then ΓO may be
defined as the set of all words of the text language. Words may have types associated
to them. For instance, we may define (among others) the types noun, adjective,
article, verb. Examples of attributes that might be defined are:

ΓA = {(Article − Kind, ΛArticle−Kind ), (Noun − Kind, ΛNoun−Kind ),

(Verb − Kind, ΛVerb−Kind )},

with:

ΛArticle−Kind = {definite,undefinite}
ΛNoun−Kind = {common,proper}
ΛVerb−Kind = {transitive,intransitive,auxiliary}

Similarly, we can define some functions such as ΓF = {Subject, Article}, where

Subject : ΓO,verb → ΓO,noun and Article : ΓO,noun → ΓO,article .
Finally, a possible relation that could be defined in ΓR is Rnoun−phrase ⊆
ΓO,article × ΓO,adjective × ΓO,noun .
Let us consider now a specific text:
“By adopting a perceptual perspective, we certainly do not mean to deny the use of abstract
rules. On the contrary, our position is that abstract conceptual knowledge is indeed central
to human cognition, but that it depends on perceptual representations and processes, both
in its development and in its active use”.
[Goldstone and Barsalou (1998)]

We have a particular P-Set P = O, A, F, R, such that:

Onoun = {perspective,use,rules,contrary,. . .,development}

Oadjective = {perceptual,abstract,conceptual,. . .,active}
152 6 The KRA Model

Oarticle = {a,the}
Overb = {adopting,mean,deny,. . .,depends}

Some assignments of attribute values are the following:

A = {(perspective, noun, common), (deny, verb, transitive), …}
Examples of values for the above defined functions are position = Subject(is)
and the = Article(use).
An example of the relation is the triple (a,perceptual,perspective).
In order to use the observations collected in a system to perform tasks, we need
to store the information contained in it, and to provide tools for communicating and
reasoning. This is done by introducing the concept of query environment.

6.2 Query Environment

The sensor set Σ is the source of any experience and information about a system
S under analysis, where concrete objects (the “real things”) reside. However, most
often the world is not really known, because we only have a mediated access to it,
through our “perception” (or some measurement apparata). Then, given a specific
system S, what is important, for an observer, is not the world per se, but how s/he
perceives it. During the act of perceiving the percepts “exist” only for the observer,
and only during the period in which they are observed. Their reality consists in
some stimuli generated in the observer. As an example, let us consider looking at a
landscape; as long as we look at it, the landscape “exists” for us, but when we turn
the head or close our eyes, it is no more there. Then, the simple perception of an
object is something that cannot be used outside the perception act itself.
In order to let the stimuli become available over time, they must become data,
organized in a structure DS. The first and very basic one can be the observer’s
memory, where stimuli of the same kind, coming from the same type of experience,
are put together: images with images, sounds with sounds, color with color, and so
on. The content of memory can be recalled, but can neither be shared with others nor
acted upon, as it is. Clearly, for an artificial agent, an “artificial” memory structure
must be considered. This structure is an extensional representation [545] of the
perceived world, in which those stimuli, perceptively related one to another, are
stored together. In the case of symbolic systems, information pieces can be stored in
tables. Then, the memory consists of a set of tables, i.e., a relational database scheme,
DS, where relational algebra operators can be applied.4 The query environment is
unable to provide answers to the query without actual data; then, we have introduced
the notion of P-Set and of configuration space. The actual observations populate the
structure DS, generating an actual dataset D. If a relational database is used, then
DS is its scheme, whereas D is the populated database. Then, the relation between
DS and D is analogous to that between Γ and P.

4 In Appendix C an overview of the relational algebra operators is provided.

6.2 Query Environment 153

Data can be accessed by the observer, but cannot be communicated to other agents,
nor reasoned upon. To this aim, it is necessary to assign to the elements of Γ , and, as
a consequence, to the elements of DS, names which are both sharable among users
and meaningful to them. These names constitute a vocabulary V of a language L,
which has a double role: on the one hand, it offers an intensional view of the perceived
world, as a single symbol can stand for a whole table. On the other, it offers the
building blocks for expressing a theory. Notice that L must be able to express all the
information specified by DS.
Even though there may be a large choice of languages, depending on the nature
of the system under consideration and of its properties, we will concentrate here
on (some subset of) Predicate Logic.5 Hence, the elements of V enter the definition
of a language L = C, X, O, P, F. In L, C is a set of constants associated, in
a one-to-one correspondence,
with the objects in ΓO (namely, CO ), and with the
elements of Λ = M m=1 Λ m (namely, CA ). If continuous attributes exist, then the
set R is considered as well. X is a set of variables. F is the set of names of functions,
associated, in a one-to-one mapping, to the functions in ΓF .
For the set P of predicates, things are a little more complex. The predicates are
the basic elements that allow the theory to be expressed and inferences to be made.
Then, they should be able to describe in an intensional way DS. In this case data are
to be expressed as ground logical formulas to be manipulated by a logical engine.
For this reason, the set P is the union of four subsets, each corresponding to one
component of the P-Set:

P = PTYPE ∪ PA ∪ PF ∪ PR

The set PTYPE contains predicates referring to the types of objects present in the
system, namely:
PTYPE = {type(x)|∀ type ∈ ΓTYPE }

The set PA contains predicates referring to the values of attributes that can be assigned
to objects:

M
PA = {am (x, v)|v ∈ Λm }
m=1

The meaning of am (x, v) is “object x has value v for attribute Am ”. The set PF
contains a predicate for each fh ∈ ΓF of arity th :

H
PF = {fh (x1 , . . . , xth , y)}
h=1

The meaning of fh (x1 , . . . , xth , y) is “y is the value of fh (x1 , . . . , xth )”. The set PR
contains a predicate for each Rk ∈ ΓR of arity tk :

5 See Appendix D for a brief overview.

154 6 The KRA Model

K
PR = {rk (x1 , . . . , xtk )}
k=1

The meaning of rk (x1 , . . . , xtk ) is “tuple (x1 , . . . , xtk ) satisfies relation Rk ”.

The arguments x and y, in the above defined predicates, belong to X, and are to
be bound to constants in C.
Operators in O are not linked to Γ , but are the standard logical operators, which
do not need to be defined every time S changes. However, the operators are not used
in L, as L does not contain non-atomic formulas. In other words, L only provides
the lexicon and the grammar for building composite formulas. It has a role analogue
to that of the lexicon and syntax of a natural language: the lexicon specifies the
words that can be used, and the grammar the rules for forming sentences, but neither
contains actual sentences.
The language L provides the support for expressing a theory T and also the
query Q, which are those that contain sentences.
In addition to knowledge that refers explicitly to the system S and to the task to
be performed, a body of background knowledge, BK, may also be included in T .
BK contains information that is not typical of S, but is valid in general. For instance,
BK may contain the general rule stating that “an even number is divisible by 2”, or
the rule coding the symmetry of equality:

if x = y then y = x

By summarizing the preceding discussion, we collect all the elements necessary

to exploit the description and the properties of a system into a query environment.
Definition 6.6 (Query environment) A Query Environment QE is the 5-ple QE =
Q, Γ, DS, T , L, where Q is the query, Γ is a description frame, DS is a formal
data structure,6 T is a theory, and L is a language.
Q represents the task to be performed. To this aim both observations and a theory
are necessary. The observations are provided by Γ and are encoded according to
DS, while T supplies the tools to operate on the observations, these tools being a
logical theory, or algorithms, or others. The language L is introduced for expressing
the theory, the query, and, possibly, the content of DS.
At this point we might enter into a disquisition about the relationships among the
components of QE. Clearly, the components cannot be independent, as, for instance,
the collected data must allow the theory to be used. However, we are not interested
here in how an effective QE should be provided, because we are only interested in
how it can be abstracted. The only thing that we observe here is that, once the kind
of data structure is selected (for instance a relational database), DS can be obtained
from Γ , as well as D can be obtained from P in an automated way.

6 See Van Dalen [545].

6.2 Query Environment 155

For the sake of exemplification, let us show, in the next example, how a DS can
be obtained once Γ is defined, in the case DS is a relational database.
Example 6.5 Let us consider again the description frame introduced in Example 6.2,
and let us build up the corresponding data structure DS.
The first table to be defined is the OBJ table, assigning a unique identifier to each
object considered in the scene, specifying, at the same time, its type. The scheme
of this table is then OBJ = [ID, Type]. As objects of different types have usually
different attributes associated to them, we define a table of attributes for each type.
With respect to a single table with all objects and all attributes, this choice has the
advantage that it avoids considering a possibly large number of entries with value NA.
As a consequence, we define a set of table t-ATTR (∀t ∈ ΓTYPE ), each one of them
following the scheme t-ATTR = [ID, Aj(t) 1
, . . . , Aj(t)
Mt
].
Regarding functions, each one generates a table corresponding to its cover; for
each function fh ∈ ΓF , a table F-H, with scheme F-H = [X1 , . . . , Xth , fh ], will be
created. The first th columns correspond to the argument of fh , and the last one contains
the associated value of the function. In an analogous way, each relation Rk ∈ ΓR
is associated to a table representing its cover; more precisely, RK = [X1 , . . . , Xtk ],
where the columns correspond to the arguments of relation Rk .
When actual observations are taken, a specific description of a system is acquired
in P. As previously said, these observations are to be memorized in a populated
database D.
Example 6.6 Let Γ be the description frame introduced in Example 6.2. In Γ objects
are partitioned into three types, namely point, segment, and figure. Table OBJ
will thus be the one reported in Fig. 6.3. Objects of a given type can be extracted
from OBJ using the relational algebra selection operator. For instance:

OBJpoint = σType=point (OBJ)

Concerning the attributes, we define two tables, SEGMENT -ATTR and

FIGURE-ATTR, reported in Fig. 6.4. Objects of type point do not have attributes,
and hence no table is associated to them.
Regarding functions, each one generates a table with a number of columns equal
to its arity +1. The tables RADIUS and CENTER, corresponding to the covers of the
defined functions Radius and Center, are reported in Fig. 6.5.
Finally, we have to construct the three tables corresponding to the RCOV s of
the relations in the set ΓR = {Rontop , Rleftfof , Rsideof }. These tables are reported in
Fig. 6.6.
Concerning the language L, the user can provide it explicitly. On the contrary,
he/she might just choose the kind of language (for instance, a subset of FOL), and let
an automated procedure build up the language, at least for the part required by Γ to
represent the content of any DS compatible with Γ itself. In the following example,
the chosen language is a DATALOG one.
156 6 The KRA Model

Fig. 6.3 The table OBJ assigns to each objects in the scene a unique identifier, ID, as well as its
type

Fig. 6.4 Tables SEGMENT -ATTR and FIGURE-ATTR, reporting the attribute values of the objects
of type segment and figure, respectively, occurring in the scenario. The objects of type point
do not have attributes associated, and hence there is no corresponding table. The segment OP does
not have a color, as it is not a true segment, but it only denotes the radius of the circle c. The values
, b,h,r stand for generic real numbers

Fig. 6.5 Tables RADIUS and CENTER, corresponding to the FCOV s of the functions Radius and
Center, defined in the scenario. Each function is unary, i.e., it has arity 1
6.2 Query Environment 157

Fig. 6.6 For each relation in the set ΓR = {Rontop , Rleftof , Rsideof } a table has been built up to
collect the tuples satisfying them. Each of the table is an RCOV

Example 6.7 In order to describe data generated by the Γ introduced in Example

6.2, a DATALOG language L = C, X, O, P, F has been chosen. The set of variable
X and the set of operators O are given, and we have to define the other three sets.
First of all we introduce the set of constants C. This set is the union of two subsets:
CO ∪ CA . CO contains the names assigned by the user to the objects in ΓO , which
we always assume to be identical to the identifiers. Then, CO = ΓO . The set CA
contains the names of the elements of attribute domains Λm (1 m M), including
UN and NA. In other words, C contains the names of the constants that can appear as
arguments of functions or relations.
The set F contains the names assigned to the functions in ΓF , i.e., {Radius(x),
Center(x)}.
The set PO contains predicates referring to the types of objects present in the
system, namely:
PO = {point(x), segment(x), figure(x)}

The set PA contains predicates referring to the values of attributes that can be assigned
to objects:

PA = {shape(x, v)|∀v ∈ ΛShape )} ∪ {color(x, v)|∀v ∈ ΛColor )} ∪

{size(x, v)|∀v ∈ ΛSize )} ∪ {length(x, v)|∀v ∈ ΛLength )}

The set PF contains predicates associated to each fh ∈ ΓF ; these predicates describe

intensionally the elements belonging to FCOV (fh ):

PF = {radius(x, y), center(x, z)}

The predicate {radius(x, y)} has the semantics: “Object x has radius y”, and
{center(x, z)} has the semantics: “Object x has center z”.
Finally, the set PR contains predicates associated to each Rk ∈ ΓR :
158 6 The KRA Model

PR = {ontop(x, y), leftof(x, y), sideof(y, x)}

The semantics of the predicate ontop(x, y) is that “object x is located on top of object
y”, the one of the predicate leftof(x, y) is that “object x is located to the left of object
y”, and the semantics of the predicate sideof(x, y) is that “object x belongs to the
contour of object y”.
All the above introduced predicates represent what can be said with the language
L referring to the chosen P-Set. The actual instantiations of the predicates that are
true in P are the following ones (for the P of Example 6.3):

PO = {point(A), . . . , point(O), segment(AB), . . . ,

segment(GE), figure(a), figure(b), figure(c), figure(d)}
PA = {shape(a,triangle), shape(b,square), shape(c,circle),
shape(d,rectangle), color(a,green), color(b,blue),
color(c,red), color(d,green), color(AB, black), . . . , color(OP, NA),
size(a,small), size(b,large), size(c,medium), size(d,large),
length(AB, ), . . . , length(OP, r)}
PF = {radius(c, OP), center(c, O)}
PR = {ontop(a,b), ontop(c,d), leftof(a,c), leftof(a,d), leftof(b,c),
leftof(b,d), sideof(AB, b), sideof(CD, b), sideof(CA, b), sideof(DB, b),
sideof(EF, d), sideof(HF, d), sideof(GH, d), sideof(GE, d)}

The above grounded predicates are the subset of Herbrand’s universe containing the
predicates true in S.
The examples reported above illustrate what a description frame looks like, inde-
pendently from any task. Then, the aspects of a system captured by it may or may
not be relevant, as shown in the following example.
Example 6.8 Let us suppose that, in the scenario composed by geometric elements,
each figure x has associated the ratio ζ(x) between its contour length and its surface
area. This ratio has dimension [length−1 ]; it does not matter what unit is used to
measure length, but this unit has to be the same for all figures. We want to answer
the query:
Q = Arg Max ζ(x)
x∈Ofigure

The answer is an object (or a set of objects) o∗ ∈ Ofigure , whose ratio ζ(o∗ ) is the
maximum over all objects in Ofigure .
In order to answer the query, we need to define, first of all, the functions Area(x),
Contour-length(x), and ζ(x), which provide, respectively, the area of a figure, the
length of its contour (i.e., the perimeter for a polygonal figure, and the circumference
for a circle), and the ratio between the latter and the former.
6.2 Query Environment 159

The function Area(x) can be defined as follows:

⎧
⎪
⎪ Power2() if shape(x, square) ∧ sideof(y, x) ∧
⎪
⎪
⎪
⎪ length(y, ),
⎪
⎪
⎪
⎪ if shape(x, rectangle) ∧ sideof(y, x)∧
⎪
⎪
Prod(b, h)
⎪
⎪
⎪
⎨ sideof(z, x) ∧ length(y, b) ∧
Area(x) = length(z, h) ∧ diff(y, z),
⎪
⎪
⎪
⎪Prod(π, Power2(r)) if shape(x, circle) ∧ length(Radius(x), r)
⎪
⎪
⎪
⎪ Divide(Prod(b,h), 2) if shape(x, triangle) ∧ baseof(y, x)
⎪
⎪
⎪
⎪
⎪
⎪ ∧ heightof(z, x) ∧ length(y, b) ∧
⎪
⎩
length(z, h).

In the above rules, Prod(x, y) (Prod : R2 → R), Divide(z, w) (Divide : R2 → R),

and Power2(x) (Power2 : R → R+ ) are functions that compute the product between
numbers x and y, the quotient between numbers z and w, and the square of number x,
respectively.
The semantics of the function Contour-length(x) is analogous to that of Area(x),
and it can be computed as follows:
⎧
⎪
⎪Prod(4, ) if shape(x, square)∧
⎪
⎪
⎪
⎪ sideof(y, x) ∧ length(y, ),
⎪
⎪
⎪
⎪
⎪
⎪Prod(2, Sum(b, h)) if shape(x, rectangle)
⎪
⎪
⎪
⎪ ∧ sideof(y, x) ∧ sideof(z, x)
⎪
⎪
⎪
⎪ ∧ length(y, b) ∧ length(z, h)
⎪
⎪
⎪
⎪
⎪
⎨ ∧ diff(y, z),
Contour − length(x) = Prod(2, Prod(π, r))) if shape(x, circle)∧
⎪
⎪
⎪
⎪ length(Radius(x), r)
⎪
⎪
⎪
⎪
⎪
⎪ Sum(Sum(1 , 2 ), 3 ) if shape(x, triangle) ∧ sideof(y1 , x)
⎪
⎪
⎪
⎪ ∧ sideof(y2 , x) ∧ sideof(y3 , x)
⎪
⎪
⎪
⎪ ∧ length(y1 , 1 ) ∧ length(y2 , 2 )
⎪
⎪
⎪
⎪
⎪
⎪ ∧length(y3 , 3 ) ∧ diff(y1 , y2 )
⎪
⎩
∧ diff(y1 , y3 ) ∧ diff(y2 , y3 ).

The function Sum(x, y) (Sum : R2 → R) computes the sum between the two
numbers x and y, whereas predicate diff(x, y) states that x and y are to be bound to
different constants. In an analogous way, the constants 2 and 4 are integer numbers
that belong to N, and have type natural, whereas π ∈ R and has type real.
Finally, the function ζ is simply defined as

ζ(x) : Divide(Contour − length(x), Area(x))

160 6 The KRA Model

If we analyze the information required to answer Q, we obtain the following sets of

descriptive elements:

Types = {figure,segments}
Attributes = {Shape, Length}
Functions = {Radius(x), Area(x), Contour − length(x), ζ(x), Prod(z, w),
Power2(z), Sum(z, w), Divide(z, w)}
Relations = {Rsideof , Rbaseof , Rheightof , Rdiff }

Both the types and the attributes are to be inserted in Γ , because the type of an object
and its attribute values must be observed in the world. Then:

ΓTYPE = {figure, segment}

ΓO = Γfigure ∪ Γsegment
ΓA = {(Shape, {square, rectangle,triangle,circle}),
(Length, R+ )}

The needed functions are in part to be observed, and in part are given a priori. More
precisely, the function Radius(x) must be observed, i.e.,
ΓF = {Radius(x)},

whereas the remaining functions are inserted into the theory T . In fact, they are
either computed from more elementary information, or provided by the background
knowledge. Then:
T = {Area(x), Contour − length(x), ζ(x), Prod(z, w),
Power2(z), Sum(z, w), Divide(z, w)}

Concerning the relations, three of them must be observed, whereas Rdiff is only added
to the theory:
ΓR = {Rsideof , Rbaseof , Rheightof }
T = T ∪ {Rdiff }

All the introduced descriptive elements are inserted into the language L = C, X, O, P, F:

C = {a1 , . . . , aN , square,rectangle, . . . , black, . . . , small, . . . }

F = {Radius, Area, Contour − length, ζ, Prod, Power2, Sum, Divide}
R = {figure(x), segment(x), shape(x, triangle), shape(x, rectangle),
shape(x, circle), shape(x, square), length(x, ), sideof(y, x),
baseof(y, x), heightof(y, x), diff(x, y)}
6.2 Query Environment 161

Fig. 6.7 The algorithm

SOLVE(Q, QE ) answers
query Q introduced in Exam-
ple 6.8. The query consists in
finding the figure(s) with the
maximum value of the func-
tion ζ(x). If some figure has
an UN or an NA value for an
attribute needed by ζ, ANS(Q)
is only an approximation of
the true answer set

When a specific P-Set is observed, the corresponding sensor outcomes are inserted
into the database D. The semantics of the functions and relations not grounded on
the observations (such as, Rdiff , or Prod(x, y) is considered implicitly given.
Finally, we have to provide, in the theory, the means to answer the query, namely,
the algorithm SOLVE(Q, QE), reported in Fig. 6.7. SOLVE returns ANS(Q), i.e.,
the set of objects whose ζ’s value is the largest. Notice that more than one object
may have the same maximum value of ζ.
By considering Example 6.8, we may notice that the description frame Γ , chosen
in Example 6.2, has some measures, such as the color, that are not useful to solve
the task described in Example 6.8, and then, are irrelevant. On the other hand, the
base and the height of triangle a (see Fig. 6.2) are not observed, so that the query
of Example 6.8 cannot be solved exactly. Then, in Γ of Example 6.2 some relevant
aspects of the problem have been overlooked. In order to solve a task, all and only
the relevant aspects of the problem at hand should be captured. In practice, it may
not be possible to obtain a perfect match between query and observations, so that the
query might be solved only partially, or approximately, or not at all. Unfortunately,
it is not always possible to select observations and theory in a coherent way. Then,
one usually tries to collect more information that the one is actually used.

6.3 Data Generation

Given a system S and a query Q, we acquire a P-Set P = O, A, F, R. Starting

from P, we have to convert the observations into “data”, i.e., into a format that can
be exploited by available tools; it is up to the user to choose suitable ones. The
conversion from P to D can be done automatically, when D consists of a database.
The procedure BUILD-DATA, reported in Fig. 6.8, realizes this process. Actually,
this procedure builds at the same time both DS from Γ and D from P. BUILD-DATA
constructs, first of all, the table OBJ, using O and the types. In OBJ every object
is associated to its type. Then, the set A of attributes is divided into groups, each
containing those attributes that are applicable to a specific type. These groups may
162 6 The KRA Model

Fig. 6.8 Procedure BUILD-DATA automatically generates the database schema DS starting
from Γ , and the database D starting from P

be overlapping, as an attribute may be applicable to more than one type of objects.

For each type a table is built up, whose rows contain the object identifier and the
values of the corresponding attributes. Then, a table for each function is constructed.
Each table contains, in each row, the tuple of objects on which the function has been
observed, as well as the corresponding value of the function itself. Finally, for each
relation, the corresponding table contains the tuples of objects that satisfy the relation
in P. The definition of a theory may require the database to be augmented with some
tables provided by it (for instant, the table of prime numbers not greater than 1000).
Procedure BUILD-LANG, reported in Fig. 6.9, constructs the language L,
described in the previous section. It generates, in turn, the set C of constants, the set
of functions, F, and the set of predicate, P. Both X and O are standard, and do not
need to be defined.
6.4 The KRA Model of Abstraction 163

Fig. 6.9 After generating the database scheme DS , algorithm BUILD-LANG(DS ) constructs the
language L

6.4 The KRA Model of Abstraction

Let us go back for a moment to the definition of the description frame Γ

corresponding to a set of sensors Σ. The choice of Σ is crucial, because it determines
all the potential information that can be acquired about any system S using those
sensors. The number of P-Sets derivable from Γ can be very large.
Taking into account the set Ψ of all configurations is fundamental when the sensors
Σ do not provide precise observations, but only a probability distribution over the
configurations themselves. For instance, assuming that ψ is a random variable taking
value in Ψ , we may associate to it a Gibbs distribution:

1 − E(ψ)
Pr(ψ) = e kB T
Z
where E(ψ) is the “energy” of ψ, kB is Boltzmann’s constant (kB =
1.38066 · 1023 J/K), and T is a “temperature”. For such a situation, it makes sense
to speak of the entropy S of ψ and related notion. In this book, however, we only
consider the deterministic case, where no probability is assigned to the outcomes of
164 6 The KRA Model

the sensors.7 On the other hand, observations may not identify exactly the state of
system S, if some value is UN (unknown).
In order to introduce our definition of abstraction, based on information reduction,
we need first to make more precise the notion of information that we will rely upon.
Luckily enough, we do not need to come up with an absolute value of information, but
we only need some tool to determine whether, in some transformation, a reduction
of information occurred. In order to reach this goal, we make use of the relationship
between informativeness and generality discussed in Sect. 5.2.1.
Given a set of sensors, Σ, used to observe a system S, let Γ and Ψ be the
description frame and the configuration space associated to Σ, respectively. Ψ usually
contains a large number of states. When we apply Σ to S, our ignorance about the
state of S is reduced, because we gather, in a P-Set P some information about S.
In the ideal case, when no variable takes on the UN (unknown) value, a single state
is left, and the system’s state is perfectly identified, i.e., P corresponds to a unique
configuration (or state) ψ ∈ Ψ . On the contrary, if some of the variables in P assume
the value UN, P selects a subset of states in Ψ . Then, we can say that, in general,
P ⊂ Ψ . We have now to introduce some definitions.
Definition 6.7 (State compatibility) Given a configuration space Ψ , containing the
possible descriptions of a system S, and an actual set of observations P, a config-
uration ψ ∈ Ψ is compatible with P iff no value in ψ contradicts any of the values
specified by P for the variables. If a variable x (an attribute, a function’s argument, . . .)
may take value in Λ but has, instead, a value UN in P, then any value in Λ for x is
compatible with UN.
Definition 6.8 (COMP) Given a description framework Γ and the configuration
space Ψ associated to the set of sensors Σ, let COMP(P) be the subset of configu-
rations (states) in Ψ that are compatible with an observed P-Set P.
When P is completely specified, the corresponding COMP(P) contains a unique
configuration, the state ψ corresponding to P itself. We can now introduce a funda-
mental definition.
Definition 6.9 (Same space “Less-Informative-Than” relation between P-Sets)
Given two P-Sets P1 and P2 , belonging to a configuration space Ψ , we will say that
P1 is less informative than P2 (denoted P1 P2 ) iff COMP(P2 ) ⊂ COMP(P1 ). If
COMP(P1 ) ≡ COMP(P2 ), the two P-Sets are equally informative.
Definition 6.9 allows two P-Sets belonging to the same configuration space to be
compared with respect to the amount of information they convey. Two configurations
ψ1 and ψ2 in the same space are either equally informative, if they coincide, or
incomparable if they are distinct.
Suppose now that we observe a system S with a given set of sensors, Σg , which
defines a configuration space Ψg . Let QE g = Qg , Γg , DS g , Tg , Lg be the query

7The probabilistic setting is a possible extension of the KRA model, very briefly mentioned in
Chap. 10.
6.4 The KRA Model of Abstraction 165

environment corresponding to a query Qg . It may be the case that, in order to answer

the query, the computational cost for using all the available information is prohibitive.
Then, we would like to reduce this information, simplifying the observations. An
obvious solutions would be to change the set of sensors, and to use a Σa that is less
sensitive, and hence provides less information. Then, the query environment QE g ,
would be replaced by a “simplified” query environment QE a , determined by Σa . As a
consequence, configurations that were distinct in Ψg would become indistinguishable
in Ψa .
However, the prohibitive complexity of the observations usually emerges after
they have been acquired; then, using Σa implies to invest additional time and cost
in a new set of experiments. In some cases, it is not even possible to repeat data
acquisition (for instance, when collecting data during a space mission). Finally, it
might not be easy to figure out what kind of sensors would provide us with precisely
the information we need, with the exact level of detail. Instead, what we can actually
do is to keep the originally acquired information, and “artificially simplify” it, by
hiding some parts. This process of simplification is what we call abstraction. In
other words, starting from Ψg , i.e., the set of actually observable configurations, we
generate a new Ψa , which contains only the part of the observations that we need, or
that we can computationally or cognitively afford.
We call Γg ground, because it is the one determined by the actual sensors. The
abstraction process generates a Γa , which is not abstract in absolute, but is only more
abstract than Γg . Nevertheless we will call Γa “abstract” for short, only for the sake
of simplicity. Actually, the abstraction process could be applied again to Γa , getting
an even more abstract Γa , and so on. The obtained hierarchy is the same as Floridi’s
nested GoA.8
Once an abstraction operation has been applied to Γg , generating Γa , the same
operation must be propagated to the other components of QE g , obtaining thus QE a .
We want to stress some aspects of this process. First, we point out the genera-
tive view of abstraction that we propose. In fact, we do not consider separately
two spaces Γg and Γa and search for a “mapping” between them; on the contrary,
Γa is generated from Γg and only contains configurations ψa that are obtained from
some configuration ψg according to the given rules. This idea corresponds to the
requirement of surjectivity in Floridi’s [176, 175] and Ghidini and Giunchiglia’s
[202] accounts of abstraction. In fact, as the space Ψa is generated from Ψg via
abstraction, for all configuration ψa ∈ Ψa there is certainly at least one ground
configuration ψg which corresponds to it.
An important observation is that, formally, abstraction could have been originated
in Tg , or in DS g , and then propagated to the other components of QE g , and, in fact,
several among the previously defined models actually do so. In our KRA model we
have given preeminence to Γg because the observations are the “ground truth”, and
this can alleviate the problem of inconsistency (as we will see later on in this chapter).
Nevertheless, nothing hinders abstraction to be primarily defined on a component

8 See Sect. 2.1.

166 6 The KRA Model

different from Γg ; in this way the model KRA offers a unifying and operational
framework to previous models of abstraction.
We will now formalize the concepts introduced in the above discussion. Let us
start from the process used to generate Γa .
Definition 6.10 (Generative process) Given a description frame Γg , with the asso-
ciated configuration space Ψg , let Π be a process that generates Γa starting from Γg
Π
(denoted Γg Γa or Γa = Π (Γg )), and the configuration space Ψa starting from
Π
Ψg (denoted Ψg Ψa or Ψa = Π (Ψg )), respectively. Π is said a generative process
for (Γa , Ψa ) with origin (Γg , Ψg ).
Process Π acts on the description frame, before any observation is made. It is
sufficient to define the kind of abstraction one wants to perform, but not to fill in
all the details that are needed for actually abstracting a set of observations Pg . In
fact, Π establishes what are the descriptive elements that a simplified set of sensors
allows to be observed on a system S, independently of any actual S. In addition,
we have to provide a program which implements the modifications defined by Π .
For instance, Π may state that two ground functions f1 and f2 collapse to the same
abstract function f . This is sufficient to define Γa . However, when implementing this
abstraction on a Pg we need to specify what value f shall take on. The implementation
program is embedded in Π , as it will be discussed in the next chapter. By assuming
Π
that the program is given, we can use the notation ψg ψa to indicate that a ground
configuration ψg can be transformed into a more abstract one using the program
embedded in Π . Then, we can introduce the definition that follows.
Π
Definition 6.11 (Configuration space generation) The notation Ψg Ψa or Ψa =
Π (Ψg ) is equivalent to say that:

Π
Ψa = {ψa | ∃ψg ∈ Ψg : ψg ψa }

Notice that all ψa ∈ Ψa have at least one source ψg in Ψg , and possibly more than
one. Concerning P-Sets, we use the following definition:
Definition 6.12 (P-Set generation) A P-Set Pa is obtained from a P-Set Pg through
Π in the following way:

Π
Pa = {ψa | ∃ψg ∈ Pg : ψg ψa }

Definition 6.13 (Inverse process) Given a generative process Π , let Π −1 be the

inverse process, i.e., the process that tries to reconstruct Γg from Γa , and Ψg from Ψa .
Clearly, as several ψg can have generated the same ψa , Π may not be functionally
invertible, because Π −1 (ψa ) may not be unique. Then, we introduce the following
definition:
6.4 The KRA Model of Abstraction 167

Fig. 6.10 Graphical illustration of the link between Ψg and Ψa . The P -Set Pg contains a single
configuration ψg ; then, COMPg (Pg ) = ψg . The transformed Pa has COMPa (Pa ) = ψa in the
space Ψa . Given Pa , more than one configuration in Γg is compatible with it. Then, COMPg (Pa )
is a proper superset of ψg

Definition 6.14 (Compatibility set for configurations) Given a generative process

Π and an “abstract” configuration ψa , the compatibility set COMPg (ψa ) of ψa is the
set of ground configurations which are compatible with ψa , i.e.:

Π
COMPg (ψa ) = {ψg |ψg ψa }

A graphical illustration of COMPg (ψa ) is reported in Fig. 6.10. We can extend the
notion of compatibility set from configurations to P-Sets as follows.
Definition 6.15 (Compatibility set for P-Sets) Given a generative process Π and
an “abstract” P-Set Pa , the compatibility set COMPg (Pa ) of Pa is the set of ground
configurations with are compatible with Pa , i.e.:

COMPg (Pa ) = COMPg (ψa )
ψa ∈Pa

Definition 6.15 allows the less-informative-than relation to be extended to config-

urations not belonging to the same space. To this aim, let COMPg (Pg ) be the set of
states compatible with Pg in Ψg . If ψa is the state in Ψa generated from ψg , then ψa
must carry less information than ψg . In other words, it should be impossible to recon-
struct ψg from ψa uniquely, because there will be several states in Ψg which might
have generated ψa . Then, the set of states in Ψg consistent with ψa must properly
include ψg .
Definition 6.16 (Inter-Space “Less-Informative-Than” relation between P-Sets)
Given the P-Set Pg ⊂ Ψg and the P-Set Pa , generated from Pg via the process Π ,
we will say that Pa is less informative than Pg (denoted Pa Pg ) iff COMPg (Pg ) ⊂
COMPg (Pa ).
Definition 6.17 (“Less-Informative-Than” relation between configuration spaces)
Given a configuration space Ψg and the configuration space Ψa , obtained from Ψg
168 6 The KRA Model

by means of a process Π , we will say that Ψa is less informative than Ψg (denoted

Π
Ψa Ψg ) iff Pa Pg for each Pa such that Pg Pa .
Definition 6.18 (“Less-Informative-Than” relation between description frames)
Given a description frame Γg and a description frame Γa , obtained from Γg by means
of a process Π , we will say that Γa is less informative than Γg (denoted Γa Γg ) iff
Ψa is less informative than Ψg .
As we may observe, the relation of informativeness relative to description frames is
given in terms of informativeness relative to configuration spaces, and, at the bottom
line, in terms of informativeness between configurations, which is what is really
important in the practical application of abstraction.
We are now in the position to introduce the notion of abstraction, central to the
KRA model.
Definition 6.19 (Abstraction) Given a description frame Γg and a generative process
Π
Π such that Γg Γa , we say that Γa is more abstract than (or simply an abstraction
of) Γg iff Γa Γg . At the same time we say that Ψa is more abstract than (or simply
an abstraction of) Ψg .
The more-abstract-than relation is a partial one. As already mentioned, Definition
6.19 says that the correspondence between Γg and Γa (or, equivalently, between Ψg
and Ψa ) is not a simple mapping, but it is a constructive one, which specifies how
Ψg is transformed into Ψa . The essence of abstraction is this very transformation
process. In fact, it could be difficult to assess whether a more-abstract-than relation
exists between two P-Set’s only looking at Pg and Pa .
If we reconsider now the idea of information hiding, there are basically three ways
in which information can be hidden in Γg , giving rise to Γa :
• Some piece of information in Γg can be removed from view in Γa . In this case, Γa
provides less information than Γg .
• Two or more pieces of information in Γg collapse into a single one in Γa . In this
case, the information in Γa is less detailed than the one in Γg .
• A piece of information in Γg is replaced by an approximate one in Γa . In this case
the information in Γa is less precise than the one in Γg .
We consider here only the first two cases (hidden or less detailed information),
whereas the case of less precise information will be handled in Sect. 7.6, which deals
with the notion of approximation.
Clearly, Definition 6.19 may be limitative, because many processes that could
have been labelled as abstraction risk to go unrecognized. On the other hand, if
a sufficiently rich library of well defined abstraction operators can be designed,
this risk will be reduced. Operators may be generic or task-dependent. People
using abstraction in different domains may help collecting a substantial set of
abstraction operators, to be used also by others. Actually, the same stance was
taken in the early days of Machine Learning, when generalization was defined
6.4 The KRA Model of Abstraction 169

through a set of generalization/ specialization operators, not requiring an exten-

sional test [367]. As a consequence, we propose to limit the types of processes
that can be considered as abstraction, by introducing the notion of abstraction
operator.
Definition 6.20 (Abstraction operator) Given a description frame Γg , we call
abstraction operator an elementary process ω such that:
ω
• Γa is obtained from Γg by means of ω (Γg Γa )
ω
• Ψa is obtained from Ψg by means of ω (Ψg Ψa )
ω
• For each pair (Pg , Pa ) such that Pg Pa , it is Pa Pg )
The operator is elementary in the sense that it cannot be split into a combination
of simpler operators. We could even try to quantify the amount of abstraction an
operator applied to a P-Set involves, by introducing a kind of abstraction ratio.
Definition 6.21 (Abstraction ratio) Given two P-Set’s Pg and Pa , corresponding
to configurations in finite configuration spaces, if Pg and Pa are connected through
an abstraction operator, we say that the ratio

|COMPg (Pa )|
ξ(Pa , Pg ) = log2
|COMPg (Pg )|

is the abstraction ratio of the transformation. The values of ξ(Pa , Pg ) are always
positive, and higher values correspond to higher degrees of abstraction. This ratio
has only meaning for P-Sets.
Before proceeding any further, we provide an example to informally illustrate the
concepts introduced so far.
Example 6.9 Let us consider a camera Σ, which provides pictures of resolution
256 × 256 pixels, each with a gray level in the integer interval [0, 255]. Objects are
all of type pixel, and they have three attributes associated to them, namely the
X coordinate, with domain ΛX = [1, 256], the Y coordinate, with domain ΛY =
[1, 256], and the intensity I, with domain ΛI = [0, 255]. Neither functions nor
relations are considered. We can define the following description frame:

Γg = {pixel}, {pi,j | 1 i 256, 1 j 256}, {(X, {1, 2, . . . , 256}),

(Y , {1, 2, . . . , 256}), (I, {0, . . . , 255})}, ∅, ∅,

If we want to lower the resolution of the taken picture, we can aggregate non over-
lapping groups of four adjacent pixels to form one, called a square z, where square
is a new type of object. Then, the generated description framework is as follows:

(a)
Γa = {square}, {pi,j | 1 i, j 128}, {(X (a) , {1, . . . , 128}), (Y (a) , {1, . . . , 128}),
(I (a) , {0, . . . , 255})}, ∅, ∅
170 6 The KRA Model

The above expression completely specifies the description frame Γa ; however,

when we try to apply it to a given P-Set we see that there are other aspects that
have to be made more precise. In fact, the new description frame only tells that less
detailed pixels are visible. We must specify how squares are constructed, and how
new attribute values are applied to them. For instance, as each square z contains
four pixels with different intensity values, one possibility is to assign to the square
an UN value for the intensity, another one is to assign to the square the darkest
intensity among the four, or to assign the average intensity. Moreover, also the X and
Y coordinates of p(a)
i,j must be specified. For instance, we can assign to X
(a) the value

of the leftmost ground pixel of the square, or the rightmost one. All these choices
are contained in the definition of the program associated to Π . One possibility is the
following:

Process Π (Pg )
for h = 0, 127 do
for k = 0, 127 do
z2h+1,2k+1 ← (p2h+1,2k+1 , p2h+2,2k+1 , p2h+1,2k+2 , p2h+2,2k+2 )
(a)
ph+1,k+1 = z2h+1,2k+1
X (a) (p(a)
h+1,k+1 ) = h + 1
(a)
Y (a) (ph+1,k+1 ) = k + 1
(a) (a)
Ih,k (ph+1,k+1 ) = 41 [I(p2h+1,2k+1 ) + I(p2h+2,2k+1 )+
I(p2h+1,2k+2 ) + I(p2h+2,2k+2 )]
end
end

The result of this abstraction is illustrated in Fig. 6.11.

In order to prove that the introduced process constitutes indeed an abstraction, we
must invert Π . While the relations and the assignment of coordinates X (a) and Y (a)
are deterministically reversible, the abstract intensity I (a) is not; in fact, I (a) takes
values in a discrete set:

1
I (a) ∈ [I1 + I2 + I3 + I4 ]|I1 , I2 , I3 , I4 ∈ [0, 1, 2, . . . , 255]
4

Given a generic I (a) , the intensities of the original pixels satisfy the equation:

I1 + I2 + I3 + I4 = 4 I (a) (6.3)

Then, the number n(I (a) ) of different 4-ples satisfying (6.3) is:

(a) (a) −I ,255} Min{4I (a) −I −I ,255}

,255} Min{4I
Min{4I 1 1 2
(a)
n(I )= 1 (6.4)
I1 =0 I2 =0 I3 =0
6.4 The KRA Model of Abstraction 171

Fig. 6.11 The pixels in the grid are grouped four by four. Rules are given to compute the attributes
of the so formed squares

The number n(I (a) ) is very large and the process is indeed an abstraction.
An abstraction operator is a special case of an abstraction process with just one
step. By concatenating or composing abstraction operators we obtain more complex
abstraction processes. Composition of abstractions, and their properties, have also
been considered, among others, by Giunchiglia and Walsh [214], and by Plaisted
[419].
Definition 6.22 (Abstraction process) An abstraction process is a simultaneous or
sequential composition of abstraction operators.
Abstraction processes will be considered in more detail in Chap. 7.
We may observe that abstraction processes can be applied repeatedly. In the fol-
lowing, when moving from one level of abstraction to the next, we may, for the sake
of simplicity, speak of a “ground” and of an “abstract” space. The sequential appli-
cation of abstractions makes in turn “ground” the space that was “abstract” before
and so on, obtaining a multi-level hierarchy of more and more abstract representation
172 6 The KRA Model

frames. This possibility complies with a relative notion of abstraction, because one
representation is “abstract” only with respect to the previous ones.
Let us go back to the KRA model, and see what role do play an (elementary)
abstraction operator ω in it. By applying ω to Γg a new description frame Γa is
obtained. Γa specifies what descriptive elements can be used to represent systems
in the abstract space. The space Ψa is generated accordingly from Ψg . Given an
actual observation Pg of a system, the application of ω to it generates an abstract
“observation” Pa . We write, symbolically:

Pa = ω(Pg )

The application of an abstraction operator hides or aggregates some information

appearing in Pg . Let us denote by Δ the “difference” in information between Pg
and Pa . We will follow the symbolic notation

Pg = Pa Δ

to indicate that Pg can be re-obtained from Pa by re-integrating the body Δ of pre-

viously hidden information. As discussed in Chap. 5, we stress that, in the process
of abstraction, information is simply hidden and not lost. In other words, the infor-
mation Δ must be stored and kept to be used in the process of de-asbtraction (or
concretion).
Once we have defined an abstraction process between Γg and Γa , the process
has to be extended to the other components of the query environment QE g =
Qg , Γg , DS g , Tg , Lg . For this reason we introduce abstraction operators also for
DS g , Lg , and Tg , and then we write symbolically:

Pa = ω(Pg )
DS a = δ(DS g )
La = λ(Lg )
Ta = τ (Tg )

We group together the four operators into a single one:

Ω = (ω, δ, λ, τ )

Then, we can write:

QE a = Ω(QE g ) = Qa , ω(Pg ), δ(DS g ), τ (Tg ), λ(Lg ) (6.5)

In Fig. 6.12 the full abstraction model KRA is reported. There are two reasons why
we do not define an operator for the query Q: on the one hand, the query remains,
usually, the same, modulo some syntactic modifications dictated by the more abstract
language La . On the other hand, if another query (related to Qg ) is to be solved in
6.4 The KRA Model of Abstraction 173

Fig. 6.12 The KRA model of abstraction. The reason why abstraction on Γg is set apart is that the
model assumes that abstraction must be defined first on Γg , by applying operator ω. In essence, the
basic act of abstracting is performed when selecting a particular set of descriptors for a system S .
In fact, after observation is performed, and a Pg is acquired, no other information comes from
the world. If the observations are too detailed, the user may decide to simplify them by removing
some of the information provided, obtaining thus a new, more abstract set of observations. An
automated procedure BUILD-DATA generates the component DS g starting from Γg , and possibly
the component Lg . Once the more abstract Γa is obtained, the same procedures can build up DS a and
La starting from Pa . However, for the sake of avoiding wasting computation efforts, it is possible
to apply suitable operators to each pair of the corresponding QE ’s components. Theory Ta has to
be generated directly, because it is not derivable from Pa . The same is true for the query

the more abstract space, then only the user can do such a modification, and hence an
abstraction operator is not needed.
The role of the operators δ and λ is to avoid to go through BUILD-DATA and
BUILD-LANG to build DS a and La starting from Pa . Instead, these operators can
be applied directly to the corresponding components of the observation frame. Notice
that Ta has to be generated directly from Tg by means of τ in any case, because it
cannot be derived from Γa . Of course, the operators acting on the various components
are not independent from one another.
Before moving ahead, we can make some comments about the whole approach.
As we may see from Fig. 6.12, the first and basic abstraction process actually takes
place in the transition from the “true” world to the “perceived” one. After that,
the construction of DS and L does not require any further abstraction, because DS
and L do not contain less information than the perception itself. The theory is not
derived from Γ , because it is independent. The schema of Fig. 6.12 can be contrasted
with the one of Fig. 6.13, which looks superficially quite similar to the former, but, in
fact, is orthogonal to it. The schema in Fig. 6.13 depicts a notion of abstraction based
on the stepwise process of moving away from the sensory world. As we have seen
in Chap. 2, this idea of abstraction is widely shared among many investigations on
abstraction, and it is intuitive and reasonable. However, it is not practical, especially
174 6 The KRA Model

Fig. 6.13 The knowledge spectrum. (Reprinted with permission from Pantazi et al. [412])

in view of automatizing the abstraction process. In fact, we should be able to specify

what is the process that extracts objects from signals, how concepts are derived from
objects, and how theories are formed from concepts, all problems that do not have a
clear (if not known at all) answer.
Even though the view of abstraction represented in Fig. 6.13 is fundamental for
setting grounds to human cognition, we are much less ambitious, and we limit our-
selves, with Definition 6.20, to a notion of abstraction more practical, even though
conceptually more limited.
The application of the abstraction operators to the other components of an obser-
vation frame will be handled in the next chapter. For the moment let us look at some
more examples.
(g)
Example 6.10 Consider a description frame Γg , which contains, in ΓA the attribute
(Color, ΛColor ). Suppose that we want to hide the color in all descriptions, generating
thus a new description framework Γa , which is then:

(g) (g) (g) (g) (g)

Γa = ΓTYPE , ΓO , ΓA − {(Color, ΛColor )}, ΓF , ΓR ,

Given any Pg ⊂ Ψg , by applying an operator ω that removes colors from Pg ,

a corresponding Pa is generated; this process can be inverted by adding colors to
the objects in Pa , obtaining again Pg . However, the color of each object in Pg may
assume any of the values in ΛColor . Then, Pa Pg for all pairs (Pg , Pa ) such that
ω
Pg Pa , and, hence, Γa is indeed more abstract than Γg .
Example 6.11 Let Γg be the description frame introduced in Example 6.2, and sup-
pose that we have the observed scenario reported in the left part of Fig. 6.14.
Suppose, moreover, that we want to build up composite objects of a new type,
say tower, by aggregating objects which are on top of each other. Without entering
into all the details of this abstraction, we just notice that, given the specific Pg of
Fig. 6.14, there are two ways of applying it: either forming a tower with objects a
and b, or forming a tower with objects b and c. The two abstractions are mutually
exclusive, because an object used in an aggregate cannot be included in another one.
Furthermore, the original objects disappear in the abstract configurations. In this case
6.4 The KRA Model of Abstraction 175

Fig. 6.14 Application of an operator that aggregates two objects, one lying on top of the other,
into a new object, called a tower. In the left-side configuration the operator can be applied in two
mutually exclusive ways, namely forming s from a and b, or forming s from b and c

a single Pg may generate two Pa ’s, each one being an abstraction of Pg according
to Definition 6.19, but only one is actually performed.
In the next chapter abstraction operators will be classified and described in details.

6.5 Summary

The KRA model is based on the acknowledgement that solving a problem (or
performing a task) usually requires two sources of information: observations and
“theory”. The observations have originally a perceptive connotation, and, hence,
not immediately exploitable: they need to be transformed into “data”, i.e., struc-
tured information usable by the theory. The link between the data and the theory
is assured by a language, that allows both the theory and the data to be expressed
for communication purposes. There may be a complex interplay between data and
theory, especially regarding their mutual compatibility, and the order in which they
are acquired, which biases the obtainable solutions. The model is not concerned with
this interplay, but assumes that all the information that is needed to solve a problem
(i.e., the “ground” description frame Γg ) has been acquired in some way. Instead, the
model is aimed at capturing the transformations that Γg undergoes under abstraction,
namely when the information contained in it is reduced.
A description frame Γ defines a space of possible configurations Ψ , i.e., a space
of descriptions that can be applied to systems. Γ does not refer to any concrete
176 6 The KRA Model

Fig. 6.15 The Rubik’s cube can be described in terms of the 26 small component cubes, which give
rise to the description frame Γ . Each arrangement of the cubes generates a specific configuration ψ;
the configuration set, Ψ , is very large. A configuration is a complete description of the positions
of the small cubes, so that it is unique. If Rubik’s cube is observed only partially, for instance by
looking only at one face, the observation corresponds to many configurations, each one obtained
by completing the invisible faces of the cube in a different way; in this case we have a P -Set P ,
which is a set of configurations. The query Q can be represented by a particular configuration to
be reached starting from an initial one [A color version of this figure is reported in Fig. H.12 of
Appendix H]

system, but only establishes what are the elements usable to describe one. When
an actual system is observed, the signals captured on it are collected in a P-Set P.
Abstraction is defined on Γ , and then it is uniformly applied to all the potentially
observed systems. The relation between the various elements involved in modeling
abstraction with KRA are illustrated in Fig. 6.15.
The KRA model allows reformulation to be distinguished from abstraction; in
fact, some transformations reduce the amount of information provided by a descrip-
tion, and some only change the form in which information is represented.
Abstraction is defined in terms of information reduction. This view of abstrac-
tion allows two configurations (descriptions) to be compared with respect to the
more-abstract-than relation, even though they may belong to different configuration
spaces. The information is not lost, but simply hidden or encapsulated.
An important aspect of the view of abstraction captured by KRA is that moving
across abstraction levels should be easy, in order to be able to try many abstractions,
6.5 Summary 177

when solving a problem. For this reason, all the hidden information is memorized
during the process of abstracting, so that it can be quickly retrieved.
Finally, only transformations generated by a precisely defined set of abstraction
operators are considered in the model. This is done to avoid the costly process of
checking the more-abstract-than relation on pairs of configurations.
Chapter 7
Abstraction Operators and Design Patterns

n the previous chapter we have introduced the notion of abstraction

operator acting on description frames and configuration spaces. In this chapter we
will handle abstraction operators in much more detail, providing a classification
depending of their effects and modus operandi. Then, we will address the problem
of designing generic operator schemas, and relate them to the notions of abstract Pro-
cedural Data Types and Design Patterns, both well known in Software Engineering.
Such an approach will allow us to describe the context where these operators can be
used. In fact, many of the operators correspond to similar representation problems
occurring in various domains.
In this chapter only the most basic operators will be described in detail, in order
to provide the reader with the feeling of how operators can be defined in practice and
concretely applied. The complete list of the currently available operators is reported
in Appendix E.

7.1 A Classification of Abstraction Operators

In this section we start by classifying abstraction operators according to general

properties rather than exploiting specific characteristics of the domain of applica-
tion. Thus the definitions will be abstract, while their operational semantics will be
specified at the time of their application to specific problems.
We have seen in the previous chapter that abstraction reduces the information
provided by a system description roughly in two ways: hiding some pieces of infor-
mation, or making information less detailed. We also describe some approximation
operators inasmuch as this supports explaining the difference between abstraction and
approximation.

L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 179
DOI: 10.1007/978-1-4614-7052-6_7, © Springer Science+Business Media New York 2013
180 7 Abstraction Operators and Design Patterns

Abstraction operators can be subdivided into classes according to their basic func-
tioning. In particular, we consider four categories:
• Operators that mask information
– by hiding elements of a system description.
• Operators that make information less detailed
– by building equivalence classes of elements,
– by generating hierarchies of element descriptions,
– by combining existing elements into new ones.
The definitions given in this chapter concern the operator’s component ω that acts
on description frames.
In order to describe abstraction operators, we use the encapsulation approach
exploited in Abstract Data Types, by only providing formal definitions, and encap-
sulating implementation details inside the definitions. More precisely, we will use
Liskov and Guttag’s formalism [333] for Procedural Data Type (PDT), described in
Sect. 2.4 and reported (adapted to our approach) below:
begin NAME = proc ω
described as : % function and goal
requires : % identifies inputs
generates : % identifies outputs
method : meth[Pg , ω] (program that performs abstraction on Pg )
end NAME
The above schema describes, in an “abstract” way, what does change in the descrip-
tion frame, whereas the actual implementation on a perception P is realized by the
method meth.
Each instantiation of the PDT corresponds to a specific operator. Actually,
the procedural data type is nested, in that meth[Pg , ω] is in turn a PDT, as
it will be described later on. In this chapter it is always assumed that oper-
ators of type ω take as input elements of a ground description frame Γg =
(g) (g) (g) (g) (g)
ΓTYPE , ΓO , ΓA , ΓF , ΓR and give as output elements of an abstract descrip-
(a) (a) (a) (a) (a)
tion frame Γa = ΓTYPE , ΓO , ΓA , ΓF , ΓR . Instead, meth[Pg , ω] describes
how ω has to be applied to any Pg to obtain Pa .

7.2 Hiding Operators

The class of hiding operators is further subdivided according to the components of a

description frame they act upon, namely, “elements” (objects, types, attributes, func-
tion, relations), “values”(of attributes, arguments of function or relations, functions’
co-domains), or “arguments” (of functions or relations).
7.2 Hiding Operators 181

7.2.1 Hiding Element Operators

Operators of the first group hide an element of the description frame. The operator’s
generic name is ωh , and it may act on types, objects, attributes, functions or relations.
The class is described in the following PDT:
begin NAME = proc ωh
described as : Removing from view an element of Γg
requires : X (g) (set of involved element)
y (element to hide)
generates : X (a) = X (g) − {y}
method : meth[Pg , ωh ]
end NAME
By instantiating X (g) and y, we obtain specific operators. In particular:
• ωhobj hides an object of a description frame,
• ωhtype hides a type of a description frame,
• ωhattr hides an attribute of a description frame,
• ωhfun hides a function of a description frame,
• ωhrel hides a relation of a description frame.
Among these operators, we will only detail the first and third ones.

7.2.1.1 Operator that Hides an Object: ωhobj

(g)
If X (g) = ΓO and y = o, the object with identifier o is no more part of the set
of objects that can be observed in a system. For the sake of simplicity notation, we
define:
(g)
ωhobj (o) = ωh (ΓO , o)
def

and we obtain:
(g)
ΓO(a) = ΓO − {o}

The method meth[Pg , ωhobj (o)], applied to the observed description Pg of a system,
removes from view the object o, if it is actually observed, as well as all its attribute
values and its occurrences in the covers of functions and relations. An example is
reported in Fig. 7.1. “Removing from view” means that ωhobj (o) replaces in Pg every
occurrence of o by UN.
The reported operator is the most basic one. It is easy to think of extending it to
more complex situations, for instance to hiding a set of objects that satisfy a given
formula, for example to hide all “red objects”.
182 7 Abstraction Operators and Design Patterns

Fig. 7.1 Example of application of the method meth[Pg , ωhobj (o)]. In the right-hand picture
object o is hidden behind a cloud of smoke. Its shape, color, and position are hidden as well

7.2.1.2 Operator that Hides an Attribute: ωhattr

(g)
If X (g) = ΓA and y = (Am , Λm ), attribute Am cannot be anymore part of any
description on a object in the abstract space. We define:

(g)
ωhattr (Am , Λm ) = ωh (ΓA , (Am , Λm ))
def

Then:
(a) (g)
ΓA = ΓA − {(Am , Λm )}

The corresponding method meth[Pg , (Am , Λm )], applied to any observed Pg ,

removes from view the values of Am in all objects in Pg . An example is reported in
Fig. 7.2.

7.2.2 Hiding Value Operators

The second group of operators of this class hides a value taken on by a variable. Its
generic PDT is the following:
begin NAME = proc ωhval
described as : Removing from view an element of the domain of
a variable.
requires : X (g) (set of involved element)
(x, Λx ) (variable and its domain)
v (value to hide)
7.2 Hiding Operators 183

Fig. 7.2 Example of method meth(Pg , ωhattr (Am , Λm )). The attribute Am = Color is hidden from
the left picture giving a gray-level picture (right). Each pixel shows a value of the light intensity, but
this last is no more distributed over the R,G,B channels [A color version of the figure is reported in
Fig. H.10 of Appendix H]

(a)
generates : Λx = Λx − {v}
method : meth[Pg , ωhval ]
end NAME
By instantiating X (g) , y, (x, Λx ), and v, we obtain four specific operators:

• ωhattrval hides a value of an attribute,

• ωhfunargval hides a value of an argument of a function,
• ωhrelargval hides a value of an argument of a relation,
• ωhfuncodom hides a value from the codomain of a function.

We only detail here the first one of these four operators.

7.2.2.1 Operator that Hides an Attribute’s Value: ωhattrval

(g)
If X (g) = ΓA , (x, Λx ) = (Am , Λm ), v = vi ∈ Λm , then the operator

(g)
ωhattrval ((Am , Λm ), vi ) = ωh (ΓA , (Am , Λm ), vi )
def

hides the value vi ∈ Λm of attribute Am in the abstract description frame. We have

then:

ΓA(a) = (A1 , Λ1 ), . . . , (Am−1 , Λm−1 ), (Am , Λm − {vi }),

(Am+1 , Λm+1 ), . . . , (AM , ΛM )

The corresponding method meth[Pg , ωhattrval ((Am , Λm ), vi )] replaces the value vi

with UN (undefined) in all objects in any Pg . An example of application of this
method is provided in Fig. 7.3.
184 7 Abstraction Operators and Design Patterns

Fig. 7.3 Example of application of the method meth [Pg , ωhattrval ((Color, ΛColor ),
turquoise)]. The value turquoise is hidden from the left picture; a less colorful picture
is obtained (right), where objects of color turquoise become transparent (UN) [A color version
of this figure is reported in Fig. H.11 of Appendix H]

7.2.3 Hiding Argument Operators

The third group of operators of this class hides an argument of a function or a relation.
Its generic PDT is the following:
begin NAME = proc ωharg
described as : Removing from view an argument of a function or
a relation.
requires : X (g) (set of involved element)
y (element to be modified)
x (argument to be hidden)
generates : X (a)
method : meth[Pg , ωharg ]
end NAME
By instantiating X (g) , y, and x, we obtain specific operators:
• ωhfunarg hides an argument of a function,
• ωhrelarg hides an argument of a relation.
We only detail here the first one of these operators.

7.2.3.1 Operator that Hides a Relation’s Argument: ωhrelarg

(g)
If X (g) = ΓR , y = Rk and x = xj , then the operator

(g)
ωhrelarg (Rk , xj ) = ωh (ΓR , Rk , xj )
def
7.2 Hiding Operators 185

reduces the arity of relation Rk by hiding its argument xj . If the arity of Rk is tk , then
an abstract realtion Rk(a) , with arity tk − 1, is created:

(g) (g) (g) (g)

Rk(a) ⊆ ΓO × . . . × ΓO × ΓO × . . . × ΓO

1 ... (j−1) (j+1) ... tk

Moreover:
(a) (g) (a)
ΓR = ΓR − {Rk } ∪ {Rk }

Method meth[Pg , ωhrelarg (Rk , xj )] acts on the cover FCOV (Rk ) of Rk , replacing in
each tuple the argument in the j-th position with UN.
As an example, let us consider a description frame Γg such that Rontop (x1 , x2 ) ∈
(g) (g) (a)
ΓR , with x1 , x2 ∈ ΓO . We want to hide the first argument, obtaining thus Rontop (x2 ).
(a)
Again, meth[Pg , ωhrelarg (Rk , xj )] provides rules for constructing RCOV (Rontop ).
For instance:
(a)
∀σ ≡ (o1 , o2 ) ∈ RCOV (Rontop ) : Add σ (a) ≡ (o2 ) to RCOV (Rontop )

(a)
In Example 6.3 the cover of Rontop will be {b, d}. With this kind of abstraction we
still know that both b and d have some objects on top of them, but we do not know
any more which one.
This operator is the same as the arity reduction one, defined by Ghidini and
Giunchiglia [203], and the propositionalization operator defined by Plaisted [419].

7.3 Building Equivalence Classes Operators

Some operators may merge pieces of information, reducing the level of detail of a sys-
tem description. More specifically, these operators group description elements into
equivalence classes. The equivalence classes are formed by defining a formula ϕeq ,
which the elements of the same class must satisfy (intensional specification) or sim-
ply by enumerating the elements y1 , . . . , ys that belong to the class (extensional
specification). The extensional definition is a special case of the intensional one,
when ϕeq is the disjunction of Krönecker Delta functions.
When a set of elements {a1 , . . . , ak } are equated by building up an equivalence
class, the class can be defined in two ways: either it is denoted with a generic name
such as, for instance, [a], or all the values in the set are equated to any one among
them, say, a1 . Abstraction operators building up equivalence classes must use the first
method, whereas those that use the second one are actually approximation operators,
and will be discussed in Sect. 7.6.
186 7 Abstraction Operators and Design Patterns

The abstraction operators building equivalence classes are partitioned into three
groups, according to the type of elements they act upon.
• ωeqelem builds equivalence classes of elements,
• ωeqval builds equivalence classes of values of elements,
• ωeqarg build equivalence classes of arguments of elements.
Building equivalence classes has been a much studied abstraction, due to its sim-
plicity and large applicability. For instance, Roşu describes behavioral abstraction as
an extension of algebraic specification [455]: “two states are behaviorally equivalent
if and only if they appear to be the same under any visible experiment.”
This operator implements the partition of the domain objects into equivalence
classes, as defined by Hobbs [252], Imielinski [269], and Ghidini and Giunchiglia
[203].
In a recent paper Antonelli [20], starting from the abstraction principle (4.1),
defines an abstraction operator, which assigns an object—“number”—to the equiv-
alence classes generated by the equinumerosity relation, in such a way that a different
object is associated to each class.
The operators that we consider in this section build a single equivalence class out
of a number of elements. It is an immediate extension to define operators that build
several equivalence classes at the same time, using an equivalence relation.

7.3.1 Operators Building Equivalence Classes of Elements

The first group of operators builds equivalence classes of elements (objects, attributes,
functions, or relations) of a description frame. Their generic PDT is the following:
begin NAME = proc ωeqelem
described as : Making some elements indistinguishable
requires : X (g) (set of involved element)
ϕeq (indistinguishability condition)
y (a) (name of the equivalence class)
generates : X (a)
Xeq (set of indistinguishable elements)
method : meth[Pg , ωeqelem ]
end NAME
In the above PDT ϕeq represents the condition stating the equivalence among a set
of elements x1 , . . . , xk ∈ X (g) . In a logical context it may be expressed as a logical
formula. Moreover, y (a) is the name of the class, given by the user. By applying
ϕeq to objects belonging to X (g) the set of equivalent (indistinguishable) elements is
computed, obtaining thus Xeq .
By instantiating X (g) and ϕeq , specific operators are obtained:
• ωeqobj builds equivalence classes of objects,
7.3 Building Equivalence Classes Operators 187

• ωeqtype builds equivalence classes of types,

• ωeqattr builds equivalence classes of attributes,
• ωeqfun builds equivalence classes of functions,
• ωeqrel builds equivalence classes of relations.
Abstractions that identify functions or relations have been considered by Ghidini
and Giunchiglia [203] and Plaisted [419].

7.3.1.1 Operator that Builds Equivalence Classes of Objects: ωeqobj

(g)
If X (g) = ΓO , and y (a) = o(a) , the operator defines the granularity of the
description. All tuples of objects (o1 , . . . , ok ) satisfying ϕeq are considered indis-
tinguishable. Then:

Xeq = ΓO,eq = {(o1 , . . . , ok )|ϕeq is true}

is the set of indistinguishable objects. We define:

(g)
ωeqobj (ϕeq , o(a) ) = ωeqelem (ΓO , ϕeq , o(a) )
def

The method meth[Pg , ωeqobj (ϕeq , o(a) )] generates first the set ΓO,eq ; then, it
replaces each element of ΓO,eq by o(a) , obtaining:

(g)
ΓO(a) = ΓO − ΓO,eq ∪ {o(a) }

Moreover, meth[Pg , ωeqobj ] specifies what properties are to be assigned to o(a) ,

considering the ones of the equated objects. For instance, the attribute values of o(a)
can be set to UN, or they could be averaged over ΓO,eq , if meaningful.
As an example, let us consider some furniture, of types chair and table, with
different attributes, as exemplified in Fig. 7.4.
Let the attributes be, for instance:
(Color, {black, white, gray}) and
(Use, {folding-chair, stool, end-table, office-chair, desk, . . .}).
We want to make all chairs indistinguishable, by defining a simple predicate:

ϕeq (o) = “o ∈ ΓO,chair ”.

Then, ΓO,eq = ΓO,chair , and all chairs are replaced by some abstract “schema” of
them, as illustrated in Fig. 7.4. In this case o(a) may have Color = UN and Use = UN.
188 7 Abstraction Operators and Design Patterns

Fig. 7.4 Example of application of meth[Pg , ωeqobj (ϕeq , o(a) )], where ϕeq (o)=“o ∈ ΓO,chair ”.
The different chairs (on the left) might be considered equivalent among each other, and the class
can be represented by an abstract schema o(a) of a chair (on the right)

7.3.1.2 Operator that Builds Equivalence Classes of Types: ωeqtype

(g)
If X (g) = ΓTYPE and y (a) = t(a) , the operator makes all types satisfying ϕeq indis-
tinguishable. Then, type t(a) is applied to all objects in the equivalence class. We
define:
(g)
ωeqtype (ϕeq , t(a) ) = ωeqelem (ΓTYPE , ϕeq , t(a) )
def

The method meth[Pg , ωeqtype (ϕeq , t(a) )] generates first the set Xeq = ΓTYPE,eq of
indistinguishable types, and then it applies t(a) to the obtained class. All types in
ΓTYPE,eq become t(a) , obtaining:

(a) (g)
ΓTYPE = ΓTYPE − ΓTYPE,eq ∪ {t(a) }

The method meth[Pg , ωeqtype (ϕeq , t(a) )] specifies what properties are to be assigned
to t(a) , considering the ones of the equated types. For instance, if the types in ΓTYPE,eq
have different sets of attributes, t(a) could have the intersection of these sets, or their
union, by setting some values to NA, depending on the choice of the user.
7.3 Building Equivalence Classes Operators 189

7.3.2 Operators Building Equivalence Classes of Values

Operators of this group act on values taken on by variables (attributes, function’s

arguments or co-domain, relation’s arguments). Their generic PDT is the following:
begin NAME = proc ωeqval
described as : Making indistinguishable some set of values taken
on by a variable
requires : X (g) (set of involved element)
Y (element to be modified)
Veq (set of indistinguishable values)
y (a) (name of the equivalence class)
generates : X (a)
method : meth[Pg , ωeqval ]
end NAME
By instantiating X (g) , Y , Veq , and v(a) , we obtain specific operators:
• ωeqattrval equates values of an attribute,
• ωeqfunargval equates values of an argument of a function,
• ωeqrelargval equates values of an argument of a relation,
• ωeqfuncodom equates values of the co-domain of a function.
In the above PDT we have assumed, for the sake of simplicity, that the set of
indistinguishable values are give extentionally, as a set Veq , by enumeration; it is
easy to extend the case to indistinguishable values satisfying a given predicate or
formula.

7.3.2.1 Operator that Builds Equivalence Classes of an Attribute’s

Values: ωeqattrval

(g)
If X (g) = ΓA , Y = (Am , Λm ), and Veq = Λm,eq ⊆ Λm , then the operator makes
indistinguishable a subset Λm,eq of the domain Λm of Am . We define:

(g)
ωeqattrval (Am , Λm ), Λm,eq , v(a) = ωeqval ΓA , (Am , Λm ), Λm,eq , v(a)
def

(a) (a)
We obtain an abstract attribute Am such that Λm = Λm − Λm,eq ∪ {v(a) }, and

(g)
ΓA(a) = ΓA − {(Am , Λm )} ∪ {(A(a) (a)
m , Λm )}

An important application of this operator is the discretization of a real interval.

Let us consider the interval [0, 100), including 0 and excluding 100, and let us divide
it into 10 subintervals {[10k, 10(k + 1)) | 0 k 9}. Numbers falling inside one
190 7 Abstraction Operators and Design Patterns

of the intervals are considered equivalent. As a representative of each subinterval we

may take the ordinal position of the bin, namely, bin1 , . . . , bin10 , or a linguistic
specification.

7.3.3 Operators Building Equivalence Classes of Arguments

The third class of operators contains those that act on arguments of functions or
relations. Their generic PDT is the following one:
begin NAME = proc ωeqarg
described as : Making indistinguishable arguments of functions
or relations
requires : X (g) (set of involved element)
Y (element to be modified)
Zeq (set of indistinguishable arguments)
z(a) (name of the equivalence class)
generates : X (a)
method : meth[Pg , ωeqarg ]
end NAME

In the above PDT we have assumed, for the sake of simplicity, that the set of indis-
tinguishable arguments are give extentionally, by enumeration; it is easy to extend
the case to indistinguishable arguments satisfying a given equivalence predicate or
formula. By instantiating X (g) , Y and Zeq specific operators are obtained:
• ωeqfunarg makes indiscernible arguments of a function,
• ωeqrelarg makes indiscernible arguments of a relation.
As these operators have a reduced applicability, we do not give here their details.

7.4 Hierarchy Generating Operators

Hierarchy generating operators replace some set of description elements with a more
general one, reducing thus the level of detail of a system description. More specif-
ically, these operators reduce the information in a description by generating hierar-
chies, in which the ground information in lower level nodes is replaced by higher
level ones (more generic and in smaller number). Objects, per se, cannot be orga-
nized into hierarchies, because they are just instances of types. Then, only “types” of
objects can form hierarchies. Moreover, function and relation arguments may only
have objects as values. Then, no operator is defined for hierarchies over argument
values of functions and relations. The generic PDT corresponding to this group of
operators is the following:
7.4 Hierarchy Generating Operators 191

begin NAME = proc ωhier

described as : Constructing a node in a hierarchy
requires : X (g) (set of involved element)
Y (element to be modified)
Ychild (set of elements to be abstracted)
z(a) (name of new node)
generates : X (a)
method : meth[Pg , ωhier ]
end NAME

The considered operator builds up one higher level node at a time. For generating
a complete hierarchy, the operator must be reapplied several times, or a composite
abstraction process must be defined. By instantiating X (g) , Y , Ychild , y (a) we obtain
specific operators:
• ωhiertype builds up a hierarchy of types,
• ωhierattr builds up a hierarchy of attributes,
• ωhierfun builds up a hierarchy of functions,
• ωhierrel builds up a hierarchy of relations,
• ωhierattrval builds up a hierarchy of attribute values,
• ωhierfuncodom builds up a hierarchy of values of a function co-domain.
The elements of the set Ychild are linked to y (a) via an is-a relation.

7.4.1 Operator that Builds a Hierarchy of Types: ωhiertype

(g) (g)
If X (g) = ΓTYPE , Y = ∅, Ychild = ΓTYPE,child , and y (a) = t(a) , then the operator
(g)
builds a type hierarchy, where a set of nodes, those contained in ΓTYPE,child , are
replaced by t(a) . We define:

(g) (g) (g)
ωhiertype ΓTYPE,child , t(a) = ωhier ΓTYPE , ΓTYPE,child , t(a)
def

and we obtain:
(a) (g) (g)
ΓTYPE = ΓTYPE − ΓTYPE,child ∪ {t(a) }.

The original types are all hidden, because only the new one can now label the objects.
As an example, let us consider the set of types

{square, rectangle, triangle, diamond}

192 7 Abstraction Operators and Design Patterns

Each of these types can be replaced by the more general type polygon, thus
loosing

the information
about the shape and number of sides. The method
(g)
meth Pg , ωhiertype ΓTYPE,child , t(a) specifies which attributes can still be asso-
ciated to polygons, and what to do with those that cannot. This operator typically
implements predicate mapping.
An operator that is similar to this one is ωeqtype . However, this last makes a set
of types indistinguishable and interchangeable, without merging the corresponding
instances; simply, any instance of each of the types in the set can be labelled with
any other equivalent type. On the contrary, ωhiertype explicitly builds up hierarchies,
merging also instances. In addition, attributes of the new type can be defined differ-
ently by meth(Pg , ωhiertype ) and meth(Pg , ωeqtype ).

7.4.2 Operator that Builds a Hierarchy of Attribute

Values: ωhierattrval

(g)
If X (g) = ΓA , Y = (Am , Λm ), Ychild = Λm,child , and y (a) = v(a) , then the operator
builds up a hierarchical structure by replacing all values in Λm,child with the single,
more general value v(a) . We define:

(g)
ωhierattrval (Am , Λm ), Λm,child , v(a) = ωhier ΓA , (Am , Λm ), Λm,child , v(a)
def

and we obtain:

Λ(a) (a)
m = Λm − Λm,child ∪ {v }

Then:
(g)
ΓA(a) = ΓA − {(Am , Λm )} ∪ {(A(a) (a)
m , Λm )}

As an example, let Color be an attribute that takes values in the palette ΛColor =
{lawn-green, light-green, dark-green, sea-green, olive- green,
white, yellow, blue, light-blue, aquamarine, cyan, magenta, red,
pink, orange, black}.
We can consider, as illustrated in Fig. 7.5, the set of values Λm,child = {lawn-
green, light-green, dark-green, sea-green, olive-green} and
replace them with and v(a) = green. Notice that the operator ωhierattrval builds
up a new node for a set of old ones at a time. Moreover, when the hierarchy is
climbed upon, the lower levels nodes disappear.
7.5 Composition Operators 193

Fig. 7.5 Example of the method meth[Pg , ωhierattrval (Color, ΛColor ), ΛColor,child , green ],
where ΛColor and ΛColor,child are given in the text, and v(a) = green

7.5 Composition Operators

Operators of this class combine some elements of a description frame, reducing

thus the level of detail. They have four ways of working, represented by the follow-
ing operators ωcoll , ωaggr , ωgroup , ωconstr . As these operators are very important for
abstraction, we detail them all.

7.5.1 Operator that Builds a Collective Object: ωcoll

This operator builds up a “collective” object of type t(a) out of a number of objects
of type t. Its PDT is the following:
begin NAME = proc ωcoll
described as : Building a “collective” object using objects
of the same type
(g)
requires : ΓTYPE
t (original type)
t(a) (new type)
(a)
generates : ΓTYPE
method : meth[Pg , ωcoll ]
end NAME
194 7 Abstraction Operators and Design Patterns

Fig. 7.6 Example of application of the method meth Pg , ωcoll (tree, forest) . A set of trees
(left) is abstracted into a forest (right) represented by an icon with four concentric levels. The other
trees are left unaltered

We define:

(g)
ωcoll t, t(a) = ωcoll ΓTYPE , t, t(a)
def

We have then:
(a) (g)
ΓTYPE = ΓTYPE ∪ {t(a) }

The original type is not hidden in the abstract space, because there may be objects
of that type which are not combined.
The details of the abstraction are specified in
the method meth Pg , t, t(a) . In particular the method states what objects are to be
combined and what properties are to be associated to them, based on the properties
of the constituent objects. The combined objects are removed from view, so that their
attribute values are no more accessible as well as their occurrences in the covers of
functions and relations. The role of each member in the collection is the same.
An example of this abstraction operator is the definition of a type t(a) = forest
out of an ensemble of objects of type tree, as illustrated in Fig. 7.6. The original
objects, collected into the new one, are hidden.
The relation between the collected objects and the collective one is an individual-
of relation.

7.5.2 Operator that Aggregates Objects/Types: ωaggr

This operator builds up a new type of objects by aggregating, according to a given

criterion (for instance, spatial or functional) objects of (usually) different types. Its
PDT is the following:
7.5 Composition Operators 195

begin NAME = proc ωaggr

described as : Building a new type of objects using objects
of (usually) different types
(g)
requires : ΓTYPE
{t1 , . . . , ts } (original types)
t(a) (new type)
(a)
generates : ΓTYPE
method : meth[Pg , ωaggr ]
end NAME
We define:

(g)
ωaggr (t1 , . . . , ts ), t(a) = ωaggr ΓTYPE , (t1 , . . . , ts ), t(a)
def

We have then:
(a) (g)
ΓTYPE = ΓTYPE ∪ {t(a) }

The original types are not hidden in the abstract space, because there may be objects
of those types which
are not combined.
The details
of the abstraction are specified by
the method meth Pg , ωaggr (t1 , . . . , ts ), t(a) , which states what objects in a Pg
are to be aggregated, and what properties are to be associated to the new one, based
on the properties of the original objects. The constituent objects have different roles
or functions inside the aggregate, whose properties cannot be just the sum of those
of the components. Usually, the abstract type has emerging properties or functions.
The combined objects are removed from view, so that their attribute values are no
more accessible as well as their occurrences in the covers of functions and relations.
Method meth Pg , ωaggr also describes the procedure (physical or logical) to be
used to aggregate the component objects.
As an example, we may build up a t(a) = computer starting from objects
of type body, monitor, mouse, and keyboard,1 or a t(a) = tennis − set
by (functionally) aggregating an object of type tennis-racket and one of type
tennis-ball. An example of an aggregation that uses a unique type of component
objects is a chain, formed by a set of rings. The aggregated objects are removed
from further consideration.
The relation between the component objects and the aggregate is a part-of relation.

7.5.3 Operator that Builds up a Group of Objects: ωgroup

This operator forms a group of objects that may not have any relation among each
other: it may be the case that we just want to put them together for some reason.

1 See Fig. 5.3 of Chap. 5.

196 7 Abstraction Operators and Design Patterns

The grouped objects satisfy some condition ϕgroup , which can simply be an enumer-
ation of particular objects. Its PDT is the following:
begin NAME = proc ωgroup
described as : Building a group of etherogeneous objects
(g) (g)
requires : ΓO , ΓTYPE
ϕgroup (condition for grouping)
group (new type)
G(a) (group’s name)
(a) (a)
generates : ΓTYPE , ΓO,group , ΓO
method : meth[Pg , ωgroup ]
end NAME
We define:

(g) (g)
ωgroup (ϕgroup , G(a) ) = ωgroup ΓO , ΓTYPE , ϕgroup , G(a)
def

Let ΓO,group = {oj | oj satisfies ϕgroup } We obtain thus:

(a) (g)
ΓTYPE = ΓTYPE ∪ {group}
(a) (g)
ΓO = ΓO − ΓO,group ∪ {G(a) }

A group has simply the generic type group. As an example, we may want to put
together all the pieces of furniture existing in a given office room. In this way, we
form a group-object G(a) = office − furniture of type group. Notice that
this operator is defined on objects, not on types. Hence, it is neither a collection, nor
an aggregate, nor a hierarchy. The relation between the component objects and the
group is a member-of relation.

7.5.4 Operator that Constructs a New Description Element: ωconstr

This operator constructs a new description element starting from attributes, relations,
or functions. Depending on the input and output, different specific operators can be
defined. The PDT of the operator is the following:
begin NAME = proc ωconstr
described as : Constructing a new description element starting from
elements chosen among attributes, functions and relations
(g) (g) (g)
requires : ΓA , ΓF , ΓR , y
Constr (function that builds up the new element)
(a) (a) (a)
generates : ΓA , ΓF , ΓR

method : meth Pg , ωconstr
end NAME
7.5 Composition Operators 197

In analogy with the previous cases, we define:

(g) (g) (g)
ωconstr (Constr, y) = ωconstr ΓA , ΓF , ΓR , Constr, y ,
def

where:
(g) (g) (g) (a) (a) (a)
Constr :ΓA × ΓF × ΓR → ΓA ∪ ΓF ∪ ΓR
(a) (a) (a)
y ∈ ΓA ∪ ΓF ∪ ΓR

The corresponding meth Pg , ωconstr (Constr, y) states how a new description ele-
ment is built up and what are its properties.
An example of this operator is the combination of attributes to form a new attribute.
For instance, given an object x of type rectangle, let Long be a binary attribute,
which assumes value 1 if x is long and 0 otherwise, and Wide be a binary attribute,
which assumes value 1 if x is wide and 0 otherwise. Then, we can construct a new
attribute, Big (a) , defined as Big (a) = Long ∧ Wide. The attribute Big (a) is a binary
one, and assumes the value 1 only if x is both long and wide. As usual, the attributes
Long and Wide do not enter ΓA(a) , which will only contain (Big (a) , {0, 1}).

7.6 Approximation Operators

In this section we briefly discuss the relation between approximation and abstraction.
The view that we present is far from being general, as it is suggested by pragmatic
issues. In our view approximation occurs when some elements of a system descrip-
tion is replaced on purpose with another one. The new element is, according to some
defined criterion, “simpler” than the original one. As also abstraction aims at sim-
plification, abstraction and approximation look similar, and it is a sensible question
whether or when approximation is also an abstraction. We have tried to answer the
question within the framework we propose, based on the notion of configuration and
information reduction, and we have come up with a distinction. This distinction has a
meaning only inside the KRA framework, and it may well be possible that different
conclusions could be drawn in other contexts.
Let us first consider approximation applied to a specific configuration ψ. If
ψ is a configuration belonging to a configuration space Ψ , any change of a value
v into v of a variable changes ψ into another configuration ψap . Intuitively, we
are willing to say that the approximation v ≈ v is indeed an abstraction if
the set COMP(ψap ) in Ψ contains ψ. However, this is never the case, because
COMP(ψ) ≡ ψ, COMP(ψap ) ≡ ψap , and ψ = ψap , because they differ in the
values v and v for the variable of interest.
198 7 Abstraction Operators and Design Patterns

The notion can be extended to P-Sets. Let COMP(P) be the set of configurations
compatible with P. Modifying a variable in P lets P become a Pap . Again, if the
approximation v ≈ v should be an abstraction, then it must be COMP(P) ⊆
COMP(Pap ). As before, this is impossible, because, even though P may have some
values set to UN, approximation is always made on a variable that is different from
UN, otherwise it would be an addition of information, and not just a change, and,
then, certainly not an abstraction.
As a conclusion, we can say that approximation performed on a P-Set, i.e., on an
observed system description, is never an abstraction, per se, even though it generates
a possibly simpler description. The original and the approximated configurations are
incomparable with respect to our notion of abstraction as information reduction. On
the other hand, this is an intuitive result; in fact, modifying a value is not reducing
information but changing it.
As an example, let us suppose that a configuration consists of a single object,
a, with attributes Length = 1.3 m and Color = red. Let us approximate the real
number “1.3” with 1. The original configuration ψ = (a, obj, 1.3, red) becomes
now ψap = (a, obj, 1, red), and the two are incomparable with respect to the
information they provide.
Let now Q be a query and ANS(Q) be the set of obtainable answers in the orig-
inal space. For the abstract query, things are different, as approximation may lead
to a superset, to a subset, or to a partially or totally disjoint set of answers with
respect to ANS(Q). If we consider the example of the pendulum, reported in Fig. 5.10,
and the query is Q ≡ “Compute T ”, ANS(Q) contains the unique, correct value (5.1).
If we approximate the function Sin θ by θ in the equation of motion, a new set ANS (Q)
containing the only solution (5.2) is obtained. Then, ANS(Q) ∩ ANS (Q) = ∅.
Let us see whether the notion of approximation might be extended to description
frames, and, if yes, what the effect would be. Applying an approximation to Γ
means that some element of the potential descriptions is systematically modified on
purpose in all possible observed systems. For instance, we could change any real
value v ∈ R into its floor function, namely v ≈ v, or to expand any function
f ∈ ΓF into Taylor series and take only the terms of order 0 and 1 (linearization). In
so doing, approximation operators can be defined, in as much the same way as for
abstraction operators. In particular, a description frame Γg can be transformed into an
approximate description frame Γap . A substantial difference between an abstraction
operator and an approximation one is that in abstraction all the information needed is
contained in Γg , as the user only provides names; for instance, in building a node in a
hierarchy, the nodes to be replaced are only selected by the user, but they already exist
in Γg . Moreover, the user provides just the name of the new node. In approximation,
the user introduces some new element; for instance, the linearized version of the
functions in ΓF are usually not already present in it.
In any case, at the level of Γg , approximation operators can be defined by spec-
ifying a procedure Prox, which describes what has to be replaced in Γg and how.
The effect is to build up an approximate description frame Γap . Knowledge of Prox
allows a process similar to the inversion of abstraction to be performed, and then it
allows the ground and approximate configuration spaces to be related.
7.6 Approximation Operators 199

If we consider the abstraction operators introduced in this chapter, we may see that
those generating equivalence classes of elements have their counterpart in approxi-
mation. In fact, approximation occurs when the representative of the class is one of
its element, instead of a generic name. In this way all the class elements are replaced
by one of them, generating thus approximate configurations. Another way to per-
form approximation is the definition of a specific operator that replaces an element.
In the following we will consider this operator and two others, for the sake of exem-
plification. In order to distinguish approximation from abstraction, we will denote
approximation operators by the letter ρ.

7.6.1 Replacement Operator: ρrepl

The replacement operator is the fundamental one for approximation. It takes any
element of a description frame Γg and replaces it with another one. Its PDT is the
following:
begin NAME = proc ρrepl
described as : Replacing a description element with another
requires : X (g) (set of involved elements)
y (element to be approximated)
y (ap) (approximation)
generates : X (ap)
method : prox Pg , ρrepl
end NAME
By instatiating X (g) and y, different operators are obtained. The element to be replaced
can be an object, a type, a function, a relation, an argument, or a value. As an example,
let us consider the case of replacing a function.
(g)
Let X (g) = ΓF and y = fh . Let moreover y (ap) = gh be the function that replaces
fh . We can define ρreplfun (fh , gh ), and we obtain:

(ap) (g)
ΓF = ΓF − {fh } ∪ {gh }

This operator changes uniformly the function fh into gh whenever it occurs, in any
perception Pg .

7.6.2 Identification Operator

As we have discussed when dealing with abstraction operators, there are two ways
of handling classes of equivalent objects: either the class is denoted by a generic
name, which can be instantiated to any element of the class, or all elements of the
class are made equal to one of them; in the former case (“equation”) we have an
abstraction, whereas in the latter one (“identification”) we have an approximation.
The PDT corresponding to identification is the following:
200 7 Abstraction Operators and Design Patterns

begin NAME = proc ρid

described as : Making a set of elements identical to one of them
requires : X (g) (set of involved elements)
ϕid (identification condition)
generates : X (a) , y (a)
Xid (set of approximated elements)
method : prox[Pg , ρid ]
end NAME
In the above PDT ϕid represents the condition selecting the set of elements, Xid , that
are to be considered identical. In a logical context it may be expressed as a logical
formula. Moreover, y (a) ∈ Xid is the element
selected to be the representative of all
the other ones. The method pros Pg , ρid specifies how y (a) has to be chosen; for
instance, it could be provided by the user, or extracted randomly form Xid .
By instantiating X (g) , specific operators are obtained:
• ρidobj makes identical a set of objects,
• ρidtype makes identical a set of types,
• ρidattr makes identical a set of attributes,
• ρidfun makes identical a set of functions,
• ρidrel makes identical a set of relations.
For the sake of exemplification, we just describe the operator that acts on sets of
objects.

7.6.2.1 Operator that Makes Identical a Set of Objects: ρidobj

(g)
Let X (g) = ΓO and ϕid some condition on objects. Then, all tuples of objects
(o1 , . . . , ok ) satisfying ϕid are considered indistinguishable, and are equated to
y (a) ∈ {o1 , . . . , ok }. Then:

Xid = ΓO,id = {(o1 , . . . , ok )|ϕid is true}

We define:
(g)
ρidobj (ϕid ) = ρid (ΓO , ϕid )
def

The method prox Pg , ρidobj (ϕid ) generates first the set ΓO,id ; then, it replaces each
element of ΓO,id by v(a) = o(a) , where o(a) ∈ ΓO,id obtaining:

(a) (g)
ΓO = ΓO − ΓO,id ∪ {o(a) }

The element o(a) can be given by the user of selected in ΓO,eq according to a give
procedure.
7.6 Approximation Operators 201

Fig. 7.7 Example of application of method prox Pg , ρidobj (ϕid ) , where ϕid (o) = “o ∈ ΓO,chair .
The different chairs (on the left) might be considered equivalent among each other, and equal to
one of them (on the right)

As an example, let us consider again the furniture of Fig. 7.4, and let us equate
again all chairs. Whereas in Sect. 7.3.1.1 the class of chairs was represented by a
generic schema of a chair (obtaining thus an abstraction), in this case all chairs will
be equate to one of them, extracted randomly from the set of all chairs. Suppose
that the extraction provided an instance of a folding-chair. Then, all other chairs are
considered equal to it, producing the approximation reported in Fig. 7.7.
Let us now go back again to the discretization of real intervals. Let us consider
the interval [0, 100), and let us divide it into 10 subintervals {[10k, 10(k + 1)) | 0
k 9}. Numbers falling inside one of the intervals are considered equivalent. As a
representative of each subinterval we may take its middle point, i.e., (10 k + 0.5)
for (0 k 9). Then, any value in a specific subinterval will be replaced by the
interval’s middle point, obtaining an approximate value for each number in the bin.
On the contrary, we remember that by assigning to each bin a linguistic value, an
abstraction was obtained.

7.7 Reformulation

For the sake of completeness, we add here a few words on reformulation. Considering
a description frame Γ , and the configuration space Ψ associated to it, its is a natural
thing to extend the definition that we have given for abstraction and approximation,
in terms of information content, to the case of reformulation.
202 7 Abstraction Operators and Design Patterns

Definition 7.1 (Reformulation) Given a description frame Γ , and the configuration

space Ψ associated to it, we will say that a process Π , which transforms Γ into
another description frame Γr , is a reformulation, iff

∀P ⊂ Ψ : COMP(P) = COMP(Π (P))

In other words, P and its image under Π provide exactly the same information about
the system under analysis.
Unfortunately, it does not seem feasible to define reformulation operators at the
same time generic and meaningful, as was the case for approximation and abstrac-
tion, because they are too strongly dependent on the context. However, the very
Definition 7.1 allows us to say that, according to our view, reformulation is never an
abstraction. Again, as in the case of approximation, the result of a reformulation may
be “simpler” than the original one, so that simplicity is the common denominator of
all three mechanisms.
Abstraction, approximation, and reformulation are three facets of knowledge rep-
resentation which are complementary, and often work in synergy, to allow complex
changes to be performed.

7.8 Overview of Operators

All operators introduced so far are summarized in Table 7.1. They are grouped
according to the elements of the description frame they act upon, and the under-
lying abstraction mechanism. Even though they are in quite large number, we notice
that several among them can be “technically” applied in the same way, exploiting
synergies. For instance, equating values of a variable can be implemented with the
same code for attributes, argument values in functions and relations, and in function
co-domains. Nevertheless, we have kept them separate, because they differ in mean-
ing, and also in the impact they have on the Γ ’s. In fact, their PDTs are different, but
they share the same method.
As it was said at the beginning, the listed operators are defined at the level of
description frames, because they correspond to changing the perception provided by
sensors that are used to analyze the world. A method corresponding to each operator
acts on specific P-Sets according to rules that guide the actual process of abstraction.
The list of operators introduced in this chapter is by no means intended to exhaust
the spectrum of abstractions that can be thought of. However they are sufficient to
describe most of the abstractions proposed in the past in a unified way. Moreover,
they provide a guide for defining new ones, better suited to particular fields. The
complete list of currently available operators is reported in Appendix E.
7.9 Abstraction Processes 203

Table 7.1 Summary of the elementary abstraction and approximation operators, classified accord-
ing to the elements of the description frame they act upon and their mechanism
Operators Elements Arguments Values
Hiding ωhobj , ωhtype , ωhfunarg , ωhrelarg ωhattrval
ωhattr , ωhrel , ωhfunargval
ωhfun ωhfuncodom
ωhrelargval
Equating ωeqobj , ωeqtype , ωeqfunarg ωeqattrval
ωeqattr , ωeqfun , ωeqrelarg ωeqfunargval
ωeqrel ωeqfuncodom
ωeqrelargval
Building ωhierattr , ωhierfun , ωhierattrval
hierarchy ωhierrel , ωhiertype ωhierfuncodom
Combining ωcoll , ωaggr , ωgroup ωconstr
Approximation ρreplobj , ρrepltype , ρreplfunarg ρreplattrval
ρreplfun , ρreplrel ρreplrelarg ρreplfunargval
ρreplrelargval
ρreplfuncodom
ρidobj , ρidtype , ρidfunarg ρidattrval
ρidattr , ρidfun , ρidrelarg ρidfunargval
ρidrel ρidfuncodom
ρidrelargval

7.9 Abstraction Processes

The abstraction operators introduced in this chapter are irreducible to simpler ones.
The use of only abstraction operators may not be effective in practice. In fact, if we
want to apply several operators to a ground description frame, we must build up a
hierarchy of more and more abstract spaces, each one obtained by the application of
a single operator. For instance, if we would like to build up a hierarchy, we should
apply ωhier as many times as the number of new nodes we want to add, creating thus
a possibly long chain of spaces, very close to one another.
In such cases it would be more convenient to combine the operators into sets
and/or chains, and apply them all at one time. The combination of operators is an
abstraction process, according to Definition 6.22. Clearly, not any composition of
operators is allowed. In particular, the result of the whole process must be the same as
the result obtained by applying all the composing operators in parallel or in sequence,
one at a time.
Definition 7.2 (Parallel abstraction process) A parallel abstraction process Π =
{ω1 , . . . , ωi , . . . , ωr } is a set of r operators to be applied simultaneously. The process
Π is admissible iff any permutation of the r operators generates the same final Γa .
In other words, if Π is admissible, there exists a corresponding method M =
{meth[Pg , ω1 ], . . . , meth[Pg , ωi ], . . . , meth[Pg , ωr ]} such that, for any Pg ⊆ Ψg ,
204 7 Abstraction Operators and Design Patterns

in no permutation of methods the result of an ωi ∈ Π makes unapplicable another

ωj ∈ Π .
Definition 7.2 states that the operators in Π must be independent, i.e., they must
not interact.
Let us now consider an abstraction process Π = ω1 ⊗ . . . ⊗ ωr which is, on the
contrary, a sequence of operators, i.e., a chain of operators to be applied in a fixed
order. In this case, the operators may interact, and the application of Π is a kind of a
short-cut, which shows the initial and final ones of a chain of more and more abstract
spaces.
Definition 7.3 (Chain abstraction process) A chain abstraction process Π =
ωr ⊗ . . . ⊗ ω1 is a fixed sequence of r operators to be applied one to the result
of the previous one. The process Π is admissible iff the final description frame
Γa is not empty. In other words, the corresponding sequence of methods M =
meth[Pg , ωr ] ⊗ . . . ⊗ meth[Pg , ω1 ] is such that no method makes inapplicable
a following one.
The above definitions are an extension of Giunchiglia and Walsh’s abstraction
composition notion [214].

7.10 Applying Abstraction: The Method

Up to now we have illustrated operators, the ω’s,

that act on the description frame Γg .
In this section we describe the method meth Pg , ω associated to ω.

7.10.1 Abstracting a P-Set with a Method

As described in the previous sections, the method meth Pg , ω , associated to an
operator ω, specifies the details of the operator application to the actual observations
in a P-Set. Moreover,
as information is hidden and not removed during the abstraction
process, meth Pg , ω also provides some mechanism for keeping track of what has
been hidden, in such a way that it can be easily recovered, if needed. As we have
anticipated in Sect. 7.1,
the PDT corresponding to an operator is a nested PDT,
because meth Pg , ω is in turn a PDT, whose structure is reported in Table 7.2.
The NAME of the method is its identifier. The INPUT and OUTPUT fields indicate
the input and output of the method, respectively. The APPL-CONDITIONS specify
when the method is applicable. PARAMETERS contains the internal parameters of
the method (if any). MEMORY is a field that will contain, after the method has been
applied, the information hidden during the abstraction process. This information
remains available, should the abstraction process need to be inverted. Finally, BODY
7.10 Applying Abstraction: The Method 205

Table7.2 PDT

of a method NAME
meth Pg , ω for a generic
operator ω INPUT
OUTPUT
APPL-CONDITIONS
PARAMETERS
MEMORY
BODY

contains the actual program body, which, for the sake of generality, will be described
in pseudo-code.
In the following we will provide examples of methods for some operators.

7.10.1.1 “Hiding an object” Method

We start with one of the simplest operator, ωhobj (o), the one that hides object o
from view, and was described in Sect. 7.2.1.1. Let us consider a P-Set describing an
observed system S, namely Pg = Og , Ag , Fg , Rg . In order to obtain the
abstract
description Pa = Oa , Aa , Fa , Ra , we have to apply meth Pg , ωhobj (o) , reported
in Table 7.3.
The NAME slot simply contains meth Pg , ωhobj (o) . The method requires in
input Pg and the object to hide, o, and provides in output Pa . In order for
meth Pg , ωhobj (o) to be applied, object o must have been actually observed in
the system S; hence, this condition appears in the field APPL-CONDITIONS. The
field MEMORY is filled during the execution of the method. The code for the method,
which is executed only if the application conditions are met, is reported in Table 7.4.
The method modifies in turn Og , Ag , Fg , Rg , and memorizes the changes. In order
to obtain Oa , it simply deletes the object o from Og . For the attributes, it deletes
from Ag the assignment of attribute values to o. Then, it looks at each function
(g)
defined in Fg ; hiding an object has the effect of transforming a function fh , with
(g)
cover FCOV (fh ), into another one, whose cover does not contain anymore tuples
containing o. The new set Fa is the collection of all the modified functions. In an
2

Table7.3 Method

NAME meth Pg , ωhobj (o)
meth Pg , ωhobj (o) for
hiding an object INPUT Pg , o
OUTPUT Pa
APPL-CONDITIONS o ∈ Og
PARAMETERS ∅
MEMORY Δ(P )
BODY See Table 7.4

2 This is one among several possible choices. For instance, the tuples can be kept, and a value UN
can replace object o.
206 7 Abstraction Operators and Design Patterns

Table 7.4 Pseudo-code for the method meth P g, ωhobj (o)

METHOD meth Pg , ωhobj (o)
Oa = Og − {o}
(P )
ΔO = {o}
Let t be the type of o
(t) (t)
Aa = Ag − {o, t, v1 , . . . , vMt }
(P ) (t) (t)
ΔA = {o, t, v1 , . . . , vMt }
Fa = ∅
(g)
forall fh ∈ Fg with arity th do
(P )
ΔF (h) = ∅
(a) (g)
FCOV (fh ) = FCOV (fh )
(a)
for all σ ∈ FCOV (fh ) do
if o ∈ σ
(a) (a)
then FCOV (fh ) = FCOV (fh ) − {σ}
(P ) (P )
ΔF (h) = Append(ΔF (h), σ)
endif
end
Define fh(a) correspondingto FCOV (fh(a) )
(a)
Fa = Fa ∪ {fh }
end
Ra = ∅
(g)
for all Rk ∈ Rg with arity tk do
(P )
ΔR (k) = ∅
(a) (g)
RCOV (Rk ) = RCOV (Rk )
(a)
for all σ ∈ RCOV (Rk ) do
if o ∈ σ
then RCOV (Rk(a) ) = RCOV (Rk(a) ) − {σ}
Δ(RP ) (k) = Append(Δ(RP ) (k), σ)
endif
end
(a) (a)
Define Rk corresponding to RCOV (Rk )
(a)
Ra = Ra ∪ {Rk }
end

(g)
analogous way, relation Rk is transformed into a relation whose cover does not
contain anymore tuples in which o occurs. Again, the new set Ra is the collection of
all the modified relations. Notice that all the hidden information is stored in Δ(P ) :

H
K
Δ(P ) = Δ(OP ) ∪ Δ(AP ) ∪ Δ(FP ) (h) ∪ Δ(RP ) (k)
h=1 k=1

It is immediate to see that, by applying meth Pg , ωhobj (o) , the P-Set Pa is less
informative than Pg , i.e., Pa Pg . In fact, in Pa we do not know anything more
7.10 Applying Abstraction: The Method 207

about object o. Then, any configuration in Pg specifying any value for the type and
attributes of o is compatible with Pa . As tuples of objects are also hidden in some
(g) (g)
FCOV (fh ) and RCOV (Rk ), information about these tuples is no more available,
as well. In order to reconstruct Pg we have to apply the following operations:

(P )
Og = Oa ΔO → Og = Oa ∪ {o}
Ag = Aa Δ(AP ) → Ag = Aa ∪ Δ(AP )
Fg = Fa Δ(FP ) → Fg = {fh |FCOV (fh ) = FCOV (fh(a) ) ∪ Δ(FP ) (h),
(g) (g)

(1 h H)}
(P ) (g) (g) (a) (P )
Rg = Ra ΔR → Rg = {Rk |RCOV (Rk ) = RCOV (Rk ) ∪ ΔR (k),
(1 k K)}

Looking at a method in action helps further

clarifying
the distinction between an
operator ω and its associated method meth Pg , ω . In the case of ωhobj (o), for exam-
ple, the operator does not change the functions and relations in any way, because they
still remain observable, whereas the method may change the cover of the observed
functions and relations. In the same way, the set of attributes is left unchanged by
ωhobj (o), whereas meth Pg , ωhobj (o) hides the attribute values observed on o. As
a consequence, it is possible that, due to an extreme tuple elimination, the cover of a
function or a relation becomes empty. In this case, an empty function or relation ca
be safely removed from Fa or Ra .
For the sake of illustration, let us consider an example of object hiding, applied
to the geometric scenario described in Example 6.3.
Example 7.1 Given the scenario reported in Fig. 6.2, suppose that we want to hide
object b, by applying meth Pg , ωhobj (b) . As b appears in Pg , the conditions
of applicability are satisfied. In order to hide b we have to remove bfrom Og ,
obtaining Oa :

(a) (g)
Opoint = Opoint
(a) (g)
Osegment = Osegment
(a) (g)
Ofigure = Ofigure − {b} = {a, c, d}

From Ag we must remove (b, square, blue, large). Then:

Aa = {a, triangle, green, small),

c, circle, red, medium,
d, rectangle, green, large,
(AB, black, ), ....., (OP, NA, )}
208 7 Abstraction Operators and Design Patterns

The two functions Radius and Center are not affected, because b occurs neither in
their domain nor in their image. Then Fa = Fg . For what concerns relations, the
(a) (a)
abstract set Ra = {Rontop , Rleftof } contains:

(a)
RCOV (Rontop ) = {(c, d)}
(a)
RCOV (Rleftof ) = {(a, c), (a, d)}

When the method meth Pg , ωhobj (b) has been applied to Pg , the hidden information
(P ) (P ) (P ) (P )
can be found in Δ(P ) = ΔO ∪ ΔA ∪ ΔF ∪ ΔR , where:

Δ(OP ) = {b}
Δ(AP ) = {b, square, blue, large)}
(P )
ΔF = ∅
(P ) (P ) (P )
ΔR = {ΔR (Rontop ), ΔR (Rleftof )}, where :
(P ) (P )
ΔR (Rontop ) = {(a, b)} and ΔR (Rleftof ) = {(b, c), (b, d)}.
We will now describe the method for aggregating objects, which is one of the
most complex.

7.10.1.2 Aggregation Method

Let ωaggr (t1 , . . . , ts ), t(a) be the aggregation operator described in Sect. 7.5.2.
This operator takes objects of the types t1 , . . . , ts in input and forms a new composite
object out of them, with a new type t(a) . It is difficult to provide a detailed description
of a generic ωaggr operator, because its actual functioning strongly depends on the
nature of the aggregated objects, but some basic aspects are in common. The method
meth Pg , ωaggr ((t1 , . . . , ts ), t(a) ) is reported in Table 7.5.

Table 7.5 Method meth Pg , ωaggr ((t1 , . . . , ts ), t(a) )

NAME meth Pg , ωaggr ((t1 , . . . , ts ), t(a) )
INPUT Pg , (Ot1 , . . . , Ots ), t(a) ,
g : Ot1 × . . . × Ots → Ot(a)

OUTPUT Pa , z, Rpartof ⊆ si=1 Oti × Ot(a)
APPL-CONDITIONS ∃oi ∈ Oti (1 i s)
PARAMETERS See Table 7.6
MEMORY Δ(P ) , RCOV (Rpartof )
BODY See Table 7.7
7.10 Applying Abstraction: The Method 209

Table 7.6 Parameters of the method meth Pg , ωaggr ((t1 , . . . , ts ), t(a) )
α(x) α(x) = (α1 (x), . . . , αM (x))
for m = 1, M do
(a)
if αm (x) then Am (z) = vj ∈ Λm ∪ {UN} ∪ {NA} endif
end
β(x) if β(x) then Transform Fg into Fa according to given rules endif
γ(x) if γ(x) then Transform Rg into Ra according to given rules endif

The method meth Pg , ωaggr ((t1 , . . . , ts ), t(a) ) takes as input Pg , the sets of
objects of the types to be aggregated, and the new type to be generated. It takes also
in input a function g : Ot1 × . . . × Ots → Ot(a) , which tells how the new object
is obtained from the old ones. The original objects are removed from Oa , whereas
the new object is added. For this method, the field PARAMETERS is very important,
because it contains the rules for the aggregation of the input objects; the relevant
parameters are presented in Table 7.6.
The
rules of transformation must be provided by the user. The body of
meth Pg , ωaggr ((t1 , . . . , ts ), t(a) ) , which is reported in Table 7.7, performs two
separate tasks: hiding the information regarding the original objects {o1 , . . . , os },
and transferring information from {o1 , . . . , os } to the new object c.
While hiding information is easy, and can be done unambiguously once given the
objects to hide, the transfer of information from the components to the aggregated
object requires the use of the rules specified in the PARAMETERS field. The transfer
of information from the component objects to the composite one is not unconditional,
because it might not be always meaningful. First of all, we must provide an abstrac-
tion function g that constructs, starting from the typed objects (o1 , t1 ) . . . , (os , ts )
the new object of the given type (c, t(a) ). Then, the parameters of the method
include sets of conditions, α(o1 , . . . , os ), β(o1 , . . . , os ), and γ(o1 , . . . , os ), which
tell whether the corresponding attributes, functions, or relations are applicable to the
new object, and, if yes, how.
To clarify the working of the aggregation operator we introduce an example.
Example 7.2 Let us consider again the geometric scenario of Fig. 6.2. We want to
aggregate two objects which are one on top of another to form a new object of type
t(a) = tower.
(g) (g) (g) (g) (g)
Given the description frame Γg = ΓTYPE , ΓO , ΓA , ΓF , ΓR of Example
6.2, we apply to it the operator ωaggr ((figure, figure), tower).
If we consider
the scenario Pg of Fig. 6.2, we can apply to it the method meth Pg , ωaggr ((figure ,
figure), tower)]. The instantiation of this method is reported in Table 7.8,
whereas the PARAMETERS field has the content reported in Table 7.9.
The function α generates the attribute values for the new object. Specifically, if
the objects x1 and x2 have the same color, then the composite object will have the
same color as well. If x1 and x2 do not have the same color, then the composite object
assumes the color of the biggest component. Obviously, this choice is one among
210 7 Abstraction Operators and Design Patterns

Table 7.7 Pseudo-code of method meth Pg , ωaggr ((t1 , . . . , ts ), t(a) )

METHOD meth Pg , ωaggr ((t1 , . . . , ts ), t(a) )
s
Let Rpartof ⊆ i=1 Oti × Ot(a) be a new relation
Let σ = (o1 , . . . , os ) with oi ∈ Oti (1 i s)
Let B = {σ | ∀σ , σ : σ ∩ σ = ∅}
Oa = Og , Aa = Ag , Fa = Fg , Ra = Rg
(P ) (P ) (P ) (P )
ΔO = ΔA = ΔF = ΔR = ∅
RCOV (Rpartof ) = ∅
forall σ ∈ B do
Build up c = g(σ)
forall oj ∈ σ do
RCOV (Rpartof ) = RCOV (Rpartof ) ∪ {(oj , c)}
end
Oa = Oa − {o1 , . . . , os } ∪ {c}
Δ(OP ) = (o1 , . . . , os , c)
(t ) (t )
Aa = Aa − {(oi , ti , v1 i , . . . , vMti )|(1 i s)}
i
Aa = Aa ∪ {(c, t(a) , v1 , . . . , vM )},
where vm (1 m M) is determined by the rules
α(o1 , . . . , os ) specified in PARAMETERS
Δ(AP ) = Δ(AP ) ∪ {(oi , ti , v1(ti ) , . . . , vM
(ti )
t
)|(1 i s)}
i
forall fh ∈ ΓF do
forall tuple τ ∈ FCOV (fh ) such that at least one of the oi
occurs in τ do
(a)
FCOV (fh ) = FCOV (fh ) − {τ }
ΔF = Δ(FP ) ∪ {(fh , τ )}
(P )

end
end
(a)
Transform some FCOVER(fh ) ∈ Fg into FCOVER(fh )
according to the rules β(o1 , . . . , os ) and add them to Fa
forall Rk ∈ ΓR do
forall tuple τ ∈ RCOV (Rk ) such that at least one of the oi
occurs in τ do
(a)
RCOV (Rk ) = RCOV (Rk ) − {τ }
(P ) (P )
ΔR = ΔR ∪ {(Rk , τ )}
end
end
(a)
Transform some RCOVER(Rk ) ∈ Fg into FCOVER(Rk )
according to the rules γ(o1 , . . . , os ) and add them to Ra
end
Δ(P ) = Δ(OP ) ∪ Δ(AP ) ∪ Δ(FP ) ∪ Δ(RP ) ∪ RCOV (Rpartof )

the many that the user can make. For instance, the color of z could be set to UN or
to NA. For Size, two objects generate a large object if at least one of them is large, or
if both are of medium size. In all other cases the resulting object is of medium size.
The attribute Shape and Length are no more applicable to z.
7.10 Applying Abstraction: The Method 211

Table 7.8 Method meth Pg , ωaggr ((figure, figure), tower)

NAME meth Pg , ωaggr ((figure, figure), tower)
INPUT Pg , Ofigure , tower,
g : Ofigure × Ofigure → Otower
g(x1 , x2 ) = if [x1 ∈ Ofigure ] ∧ [x2 ∈ Ofigure ]
∧[(x1 , x2 ) ∈ RCOV (Rontop )] then z
OUTPUT Pa , {c}, Rpartof ⊆ Ofigure × Otower
APPL-CONDITIONS ∃o1 , o2 ∈ Ofigure , with o1 = o2
(o1 , o2 ) ∈ RCOV (Rontop )
PARAMETERS See Table 7.9
MEMORY Δ(P ) , RCOV (Rpartof )
BODY See Table 7.7

Table 7.9 Parameters of the method meth Pg , ωaggr ((figure, figure), tower)
α(x1 , x2 ) ⇒ if [Color(x1 ) = v1 ] ∧ [Color(x2 ) = v2 ] ∧ [v1 = v2 ]
then [Color (a) (z) = v1 ]
else if [Size(x1 ) = v3 ] ∧ [Size(x2 ) = v4 ] ∧ [v3 v4 ]
then [Color (a) (z) = v3 ]
else [Color (a) (z) = v4 ]
endif
endif
if [Size(x1 ) = v1 ] ∧ [Size(x2 ) = v2 ] ∧ [(v1 = large) ∨ (v2 = large)]
then [Size(a) (z) = large]
else if [v1 = medium] ∧ [v2 = medium]
then [Size(a) (z) = large]
else [Size(a) (z) = medium]
endif
endif
Shape(a) (z) = NA
Length(a) (z) = NA
β(x1 , x2 ) ⇒ if [Shape(x1 ) = circle] ∧ [Center(c1 , x1 )] ∧ [Radius(y1 , x1 )]
then Delete(c1 , x1 ) from FCOV (Center (a) )
Delete(y1 , x1 ) from FCOV (Radius(a) )
if [Shape(x2 ) = circle] ∧ [Center(c2 , x2 )] ∧ [Radius(y2 , x2 )]
then Delete(c2 , x2 ) from FCOV (Center (a) )
Delete(y2 , x2 ) from FCOV (Radius(a) )
(a)
γ(x1 , x2 ) ⇒ if ∃ u s.t. (u, x1 ) ∈ RCOV (Rontop ) then (u, z) ∈ RCOV (Rontop )
(a)
if ∃ v s.t. (x2 , v) ∈ RCOV (Rontop ) then (z, v) ∈ RCOV (Rontop )
if ∃ u s.t. [(x1 , u) ∈ RCOV (Rleftof ) ∨
(a)
(x2 , u) ∈ RCOV (Rleftof )] then (z, u) ∈ RCOV (Rleftof )
if ∃ v s.t. [(v, x1 ) ∈ RCOV (Rleftof ) ∨
(a)
(v, x2 ) ∈ RCOV (Rleftof )] then (v, z) ∈ RCOV (Rleftof )
forall u s.t. [(u, x1 ) ∈ RCOV (Rsideof )]∨
[(u, x2 ) ∈ RCOV (Rsideof )] do
(a)
Remove (u, x1 ) or (u, x2 ) from RCOV (Rsideof )
end
212 7 Abstraction Operators and Design Patterns

Fig. 7.8 Application of method meth Pg , ωaggr ((figure, figure), tower) . Objects a and
b are aggregated to obtain object c1 , and objects c and d are aggregated to obtain object c2 . The
color of c1 is blue, because b is larger than a, whereas the color of c2 is green. Both composite
objects are large. The new object c1 is at the left of c2 [A color version can be found in Fig. H. 13
of Appendix H]

Regarding functions, neither Center nor Radius are applicable to z; then, if one
of the two objects is a circle, its center and radius disappear from the corresponding
covers.
Regarding relations, if there is an object u which is on top of x1 , then u is also on
top of z. If there is an object v which is under x2 , then z is on top of v. Moreover,
if both x1 or x2 are at the left of an object u, then z is at the left of u; if there is an
object v wich is at the left of both x1 and x2 , then v is at the left of z as well. Finally,
the relation Rsideof is not considered applicable to z, and hence all the original sides
of x1 and x2 are hidden.
In Fig. 7.8 the resulting abstract scenario is reported. It corresponds to the trans-
formations described by the function α, β, and γ.
7.10 Applying Abstraction: The Method 213

The application of the method to the ground scenario Pg generates the follow-
ing Pa :

Oa = Og − {a, b, c, d, O, OP} ∪ {c1 , c2 }

Aa = Ag − {(a, green, triangle, small), (b, blue, square, large),
(c, red, circle, medium), (d, green, rectangle, large),
(OP, black, r)}
∪ {(c1 , blue, NA, large), (c2 , green, NA, large)}
Fa = ∅
(a)
RCOV (Rleftof ) = {(c1 , c2 )}

At the end, the memory contains the following items:

(P )
ΔO = {(a, figure), (b, figure), (c, figure), (d, figure), (O, point),
(OP, segment)}
(P )
ΔA = {(a, green, triangle, small), (b, blue, square, large),
(c, red, circle, medium), (d, green, rectangle, large),
(OP, black, r)}
(P )
ΔF = {FCOV (Center), FCOV (Radius)}
Δ(RP ) (RCOV (Rontop )) = RCOV (Rontop )
Δ(RP ) (Rleftof ) = {(a, c), (a, d), (b, c), (b, d)}
Δ(P)
R (Rsideof ) = {(AB, b), (AC, b), (BD, b), (CD, b), (EG, d), (EF, d),
(GH, d), (HF, d)}
RCOV (RPartOf ) = {(a, c1 ), (b, c1 ), (c, c2 ), (d, c2 )}

7.11 Abstraction Processes and Query Environment

Up to now we have presented only abstraction operators and processes applied to

description frames, namely to observations (the “perception”). In order to answer a
query we need to abstract also the other components of the observation frame, namely
DS, L, and T , because the observations are not the whole story. In this section we
will show how this process can be carried out. As P and T are both provided from
the exterior, ω and τ can be defined first.
Once defined the operators at the perception and theory levels, i.e., ω and τ , we
have to derive the remaining ones, i.e., δ and λ. Let us start with an example, namely
the simple operator ωhobj , and derive δhobj and λhobj . These operators have associated
a method to them, in the very same way a method is associated to ω. This method
214 7 Abstraction Operators and Design Patterns

Table 7.10 Methods meth Dg , δhobj , meth Lg , λhobj , and meth Tg , τhobj (o)

NAME meth Dg , δhobj meth Lg , λhobj meth Tg , τhobj
INPUT Dg , o Lg , o Tg , o
OUTPUT Da La Ta
APPL-CONDITIONS σID=o (OBJ) = ∅ o ∈ CO ∅
PARAMETERS ∅ ∅ ∅
MEMORY Δ(D) Δ(L) Δ(T )
BODY See Table 7.11 Ca = Cg − {o} ∀ϕ ∈ Tg s.t. o ∈ ϕ do
Ta = Tg − {ϕ}
end

has the same structure as the one reported in Table 7.2. For the sake of simplicity,
only the methods and not the operators are described in the following, because the
methods are the ones used in practice to perform the abstraction. For the opera-
tors δhobj (o), λhobj (o), and τhobj
(o), the associated
methods, meth Dg , δhobj (o) ,
meth Lg , λhobj (o) , and meth Tg , τhobj (o) are reported in the same Table 7.10.
The input to meth Dg , δhobj (o) is the database Dg , as well as the object o to hide.
The application condition states that the object must be present in the table OBJ, so
that a query to this table does not return an empty set. The body of the operator is
reported in Table 7.11.
The method meth Dg , δhobj (o) takes as input the database Dg and the object to
be hidden, o, and outputs Da . No internal parameter is needed. When execution
terminates, the hidden information is stored in Δ(D) . The action of the method
consists in removing from all tables in Dg all tuples containing o. The database
Dg can be simply recovered from Da and Δ(D) using operations similar to the ones
reported for Pg .

Table 7.11 Pseudo-code for the method meth Dg , δhobj (o)

METHOD meth Dg , δhobj (o)
for all table Tg ∈ Dg do
Ta = Tg
Δ(D) (Tg ) = ∅
forall σ ∈ Ta do
if o occurs in σ
then Ta = Ta − {σ}
Δ(D) (Tg ) = Append(Δ(D) (Tg ), σ)
endif
end
end
Da = {Ta }

Δ(D) = Tg ∈Dg Δ(D) (Tg )
7.11 Abstraction Processes and Query Environment 215

The method meth Lg , λhobj (o) works on the language Lg = Cg , X, O, Pg , Fg ,
defined in Sect. 6.2. As already mentioned, we assume that the unique name of object
o in the set Cg of constants is simply its identifier o. As Lg only provides names
for attributes, functions and relations, nothing changes in it except the removal of o
in Ca .
Regarding the theory, there may be two cases for operator τhobj (o): either the
constant (object) o does not occur explicitly in any of the formulas in Tg , or it occurs
in some of them. In the former case nothing happens, and Ta = Tg . In the latter
case, we have to remove all formulas in which the constant occurs explicitly. Let, for
instance, Tg contain the formula

∃y ∀x|course(x) ∧ person(Bob) ∧ teaches(Bob, x) ⇒ active(x), (7.1)

and let Bob be the hidden constant (person). Hence, the above formula cannot be
applied anymore, and it has to be hidden from Tg . The method meth Tg , τhobj (o)
is reported in the BODY field of Table 7.10.
Again, the choice of hiding all formulas in Tg in which the object to be hidden
occurs is one among many other possible choices. It is up to the user to choose one,
according to the nature of the query and the context. For instance, the constant Bob
could have been replaced, in expression (7.1), by an existentially quantified variable.
The choice is encoded in rules α in the PARAMETERS field.
In order to clarify the above defined operators, we introduce an example.
Example 7.3 Let us consider the situation described in Example 7.3, where we have
hidden object b. From the P-Set described in Example 6.3, we have built up the
database Dg described in Example 6.5. Dg consists of the tables OBJ, SEGMENT -
ATTR, FIGURE-ATTR, RADIUS, CENTER, ONTOP, LEFTOF, and SIDEOF. By
applying the selection operation σID=b (OBJ) to the table OBJ, we observe that
the object is present, and then we can apply the operator. Then, the abstract table
OBJ (a) becomes:

OBJ (a) = OBJ − σID=b (OBJ)

As b is of type figure, we have to remove it from the table FIGURE-ATTR,

obtaining thus:

FIGURE-ATTR(a) = FIGURE-ATTR − σID=b (FIGURE-ATTR)

SEGMENT -ATTR(a) = SEGMENT -ATTR

As the two function Radius and Center are not affected, because b is neither a circle
nor a point, then:

RADIUS (a) = RADIUS

CENTER(a) = CENTER
216 7 Abstraction Operators and Design Patterns

For what concerns relations, object b occurs in all tables ONTOP, LEFTOF, and
SIDEOF. Then:

ONTOP(a) = ONTOP − σBottomObject=b (ONTOP)

LEFTOF (a) = LEFTOF − σLeftObject=b (LEFTOF)
SIDEOF (a) = SIDEOF − σObject=b (SIDEOF)

When the method meth Dg , δhobj (b) has been applied to Dg , the hidden information
can be found in Δ(P ) :

Δ(OD) (OBJ) = {(b, figure)}

Δ(AD) (FIGURE-ATTR) = {(b, square, blue, large)}
(D )
ΔR (ONTOP) = {(a, b)}
(D )
ΔR (LEFTOF) = {(b, c), (b, d)}
(D )
ΔR (SIDEOF) = {(AB, b), (BC, b), (CD, b), (DA, b)}

The application of meth Lg , λhobj (b) only modifies the set of constants:

Ca = Cg − {b},
Pa = Pg ,
Fa = Fg .

Then:

La = Ca , X, Og , Pg , Fg

(L)
The hidden information in Δ (C) = {b}.
can be found
The method meth Tg , τhobj (b) does not modify the theory, because it does not
explicitly mention the object b. Notice that when the function Area and Contour-
length are instantiated on the scenario in the more abstract space, they will simply
not be applied to b, which is hidden.

7.12 From Abstraction Operators to Abstraction Patterns

In a practical application we have to deal with a query environment QE =

Q, P, DS, L, T , related to a query Q, which we want to answer. It is often the
case, as it happens in Computer Science in general, that similar problems or tasks
are faced many times, with minor differences among them. Then, it would be very
useful to dispose of a tool allowing an easy re-use of previously successful solu-
7.12 From Abstraction Operators to Abstraction Patterns 217

tions. This situation has been addressed, in Software Engineering, with the notion of
Design Patterns. In this section we would like to propose Abstraction Patterns as an
analogue to design patterns, to be used when the same type of abstraction is required
in different domains and/or applications. In the next subsection a brief introduction
to the concept of design patterns is presented, for the sake of self-containedness.

7.12.1 Design Patterns

In Software Engineering the notion of Design Pattern corresponds “to a reusable
solution to a commonly occurring problem within a given context ” and is today
widespread in Software Design. It is interesting to recall that the concept of Design
Pattern originated in the “concrete” world of Architectural Design before that of
Software Design. It is the architect Christopher Alexander who coined the concept
of patterns in the 70’s for capturing architectural decisions and arrangements [7]. It is
only after 1994 that the concept gained a wide popularity in Software Design when
the book “Design Patterns: Elements of Reusable Object-Oriented Software” was
published by the so-called “Gang of Four” (or GoF for short) [191]. A Design Pattern
is not a generic code but a general repeatable solution to a commonly occurring
problem in Software Design. In other words, it is a description or template for how
to solve a problem that can be used in many different situations. Both the problem
and the solution, along with the rationale that binds them together, ought to be
documented in a Design Pattern.
Within the context of this book, a problem could be, for example, a particular
tractability problem, and a proposed solution would be the application of an abstrac-
tion operator. A corresponding Abstraction Pattern would contain its description.
In the literature on Design Patterns, authors often provide also, when possible, an
implementation of the pattern in a particular language.
The 23 patterns published in the GoF’s book were originally grouped into three
categories:
• Creational patterns (e.g., Singleton which ensures a class has only one instance,
and provide a global point of access to it)
• Structural patterns (e.g., Decorator which attaches additional responsibilities to an
object, dynamically keeping the same interface avoiding subclassing for extending
functionality)
• Behavioral patterns (e.g., Servant which defines common functionality for a
group of classes) described using the concepts of delegation, aggregation, and
consultation.
Similarly, we will consider different categories of abstraction patterns, reflecting the
classification given in this chapter.
218 7 Abstraction Operators and Design Patterns

7.12.2 Use and Motivation for Design Patterns

When introduced by Gamma et al. [190], Design Patterns were meant to capture
the “intent behind a design by identifying objects, their collaborations, and the dis-
tribution of responsibilities. Design patterns play many roles in the object-oriented
development process: they provide a common vocabulary for design, they reduce
system complexity by naming and defining abstractions, they constitute a base of
experience for building reusable software, and they act as building blocks from which
more complex designs can be built ”. But Design Patterns are also motivated by the
fact that they can speed up the development process, and improve the quality of
developed software. Indeed, they provide general documented solutions to particular
representation problems but are not tied to a particular context or formalism. Finally,
patterns allow developers to communicate using well-known, well understood names
for software interactions. Common Design Patterns can also benefit from experience
using them over time, and making them more robust than ad-hoc “creative” designs
that reinvent solutions.
Design Patterns have become widely used and many books have specified how to
implement them in different programming languages such as JAVA, C++, or Ajax.
Beyond programming languages, there have also been attempts to codify design
patterns in particular domains as domain specific Design Patterns. Such attempts
include business model design, user interface design, secure design, Web design,
and so on. There is not a unique way to describe a pattern, but the notion of Design
Pattern Template is widely used to provide a coherent and systematic description
of its properties. Within the context of this book we are neither concerned by a
particular language nor by software engineering per se. The key idea we want to retain
from Design Patterns is that of building a documented list of abstraction operators
and algorithms that support their implementation, and of defining a template for a
common language to describe them.

7.12.3 Abstraction Patterns

According to Gamma [190], the use of Design Patterns can be a suitable conceptu-
alization framework to design effective systems, because it allows the experience of
many people to be reused to increase productivity and quality of results. The same
can be said for abstraction. In fact, designing a good abstraction for a given task
may be difficult and still a matter of art, and it would be very useful to exploit past
experience of several people. In fact analyzing a number of applications, abstraction
patterns might emerge; they could act as starting point for a new application, to be
adapted to specific requirements. A Design Pattern has three components:
1. An abstract description of a class or object and its structure.
2. The issue addressed by the abstract structure, which determines the conditions of
pattern applicability.
7.12 From Abstraction Operators to Abstraction Patterns 219

Table 7.12 Abstraction pattern Template

NAME The name of the AP, which is given after its basic
functioning
ALSO KNOWN Any other names by which the operator is known in the
literature
GOAL Intended reasons and aims for applying the operator to
solve a particular problem
TYPICAL APPLICATION and How the pattern has been used in the past, and main
KNOWN USES domain of applicability
PSEUDO-CODE Pointers to the PDT defined in Sect. 7.1 and, indirectly
through the PDT, to the method defined in
Sect. 7.10.1
IMPLEMENTATION ISSUES Issues in implementing the abstraction pattern
SIMILAR PATTERNS Closely related abstraction patterns

3. The effects of the pattern application to the system’s architecture, which suggests
its suitability.
As emphasized by Rising [454], there is now a community, called the patterns
community, in Software Development around the questions of identifying and
documenting design patterns. In the field of AI and in Knowledge Engineering,
what corresponds to the pivotal role of Design in Software Development is the cen-
tral notion of Knowledge Representation. By analogy to Software Development we
have chosen to describe the abstraction operators as a kind of Abstraction Patterns
(AP). Informally, such an abstraction pattern shall correspond to a generic type of
abstraction, to its impact, but also to a concrete approach to make it operational.
More precisely, we will identify four components in an AP:
1. An abstract description of the operator.
2. The issue addressed by the operator, which determines the conditions of pattern
applicability.
3. The effects of the operator to the system’s performance, which suggests pattern
suitability.
4. An operationalization of the operator.
Behind the introduction of abstraction patterns is the very idea of “abstraction”
itself: a user looks first at available patterns to identify the operator class that seems
better suited to his/her problem, without bothering with the operator details. Then,
after the choice is done, the actual operators are analyzed and tried.
To homogenize the description of these components for a generic Abstraction
Pattern (AP) we will use the template, adapted from a Design Pattern Templates
[267], reported in Table 7.12. In this template fields can be added or removed as
needed.
Making a parallel with the classification introduced in Sect. 7.1, we subdivide
abstraction patterns into groups, as reported in Table 7.13.
220 7 Abstraction Operators and Design Patterns

Table 7.13 Classification of abstraction patterns according to their effects, and to the elements of
the description frame they act upon
Argument → Elements Arguments Values
Type of (objects, types, attributes, (of a function, or relation) (of an attribute, or
abstraction functions, relations) function’s argument
or co-domain, or
relation’s argument)
↓
Hiding Hiding elements Hiding arguments Hiding values
Equating Equating elements Equating arguments Equating values
Hierarchy Building a hierarchy of Building a hierarchy of Building a hierarchy of
Building elements arguments values

Combining Making collections, aggregating, constructing, grouping

Approximating Element approximation Argument approximation Value aproximation

In the following we provide some examples of Abstraction Patterns. Templates can

be extended and augmented by users, forming thus a library of ready-to-use APs.

7.12.4 Abstraction Pattern: Hiding

In this section the abstraction patterns for hiding components of a description frame
are provided for the sake of illustration.
The abstraction pattern describing the act of hiding an element of a description
frame Γg aims at simplifying the description of a system by removing from view
an element, be it an attribute or a function or a relation. The corresponding generic
operator is ωhy , where y ∈ {obj, type, attr, fun, rel}.
The abstraction pattern for hiding an argument acts only on functions and relations,
and corresponds to the operators ωhyarg , where y ∈ {fun, rel}. Hiding an argument in
a function or relation reduces its arity. It is necessary to provide rules for computing
the cover of the abstract function/relation, because this is not usually automatically
determined.
The abstraction pattern concerning hiding a value in a description framework
corresponds to the operator ωhyval , where y ∈ {attr, funarg, relarg} (Table 7.14).
Other APs are given in Appendix F.

7.13 Summary

As we have discussed in Chap. 6, we consider as an “abstraction” only a representation

change that can be expressed as an abstraction process, i.e., a set or a sequence of
elementary abstraction operators. In order to make the model widely applicable,
7.13 Summary 221

Table 7.14 HIDING ELEMENT—Abstraction pattern that hides an element in description

frame Γg
NAME ωhy , with y ∈ {obj, type, attr, fun, rel}
ALSO KNOWN In Machine Learning it is known as “Instance Selection” when the
element removed is an example, or “feature selection” when the
element removed is an attribute. In Problem Solving hiding
functions and relations is a form of “relaxation”. In Databases it
corresponds to the “projection” operation, where a column in some
table is removed from the original table, or to the “difference”
operator, which hides rows in a table
GOAL The operator aims at reducing the information to be handled, hiding
unnecessary or irrelevant elements
TYPICAL At time of system definition, this operator is implicitly applied
APPLICATIONS manually by the human designer, when he/she only selects a
and KNOWN USE subpart of the possible variables. Afterwards, it may be applied
either manually or automatically. The idea is that the hidden
information corresponds to aspects of the system that can be
initially overlooked, or to constraints that can be relaxed. A typical
use is the task of feature selection in Machine Learning
PSEUDO-CODE See the appropriate tables
IMPLEMENTATION The operator is defined for hiding a single element, but it can be
ISSUES extended to hide sets of elements specified by a set of conditions.
Removing an element may have side-effects on the other
description components, for instance the cover of some function or
relation
SIMILAR PATTERNS This pattern is similar to those hiding arguments (of a function or
relation), or values (of an attribute, of a function’s or relation’s
argument, of a function’s codomain)

it is then necessary to introduce as many operators as possible. Moreover, these

operators must be guaranteed to be genuine abstraction operators (in the sense of
information reduction). To this aim, a large set of operators has been defined, work-
ing in different ways on all the elements of a description frame. In this chapter only
some of them have been illustrated in detail, whereas the other ones are summarized
in Appendix E, and available in the Web site companion of the book.
Following a precise characterization of both abstraction and approximation, it
was also possible to define some approximation operators. Even though also refor-
mulation could be precisely characterized, no operator has been proposed, because
reformulation may be a complex process, formalizable only contextually. In the
KRA model approximation and reformulation are never abstractions, but all three
are mechanisms aiming at simplification. Very often, in real-world applications all
three are involved in effectively solving a problem.
Both the abstraction and the approximation operators introduced are domain-
independent, and are classified according to the mechanism they employ, and the
description frame’s element/value they act upon. Moreover, additional domain-
dependent operators can be defined.
222 7 Abstraction Operators and Design Patterns

In order to make abstraction operational, i.e., easily applicable in practice, many

details that are ignored in formal definitions have to be specified. For this reason,
an abstraction operator is defined and handled as a Procedural Data Type (PDT),
which describes the effect of abstraction on a description frame, leaving the details
of the actual implementation on an observed system to a method, namely a program
embedded in the PDT. In this book a method is written in pseudo-code, to make
it general, but it is possible to collect specific implementations of the methods in
some chosen programming language. The richness of choice among many kinds of
ready-to-use abstraction operators is one of the features that differentiate the KRA
models from previous ones.
When some operator has been defined on a given description frame (and the corre-
sponding method on a P-Set), it is necessary to abstract also the other components of
a query environment QE, namely the data structure, the language, and the theory. For
each one of the component a corresponding operator is introduced, i.e., δ, λ, and τ ,
for data, language, and theory, respectively. The ensemble of these operators form
a macro-operator Ω = (ω, δ, λ, τ ). Operators acting on different components of a
QE can be applied independently; however they must comply with some constraints,
which are embedded in the methods, in order to produce a coherent abstract QE.
The domain-independence of the defined abstraction operators is the basis for
their applicability in different disciplines, contexts, and problems. Then, following
the idea of Design Patterns, well known in Software Engineering, we have defined a
set of Abstraction Patterns.
Chapter 8
Properties of the KRA Model

s a premise to this chapter, we want to come back to Imielinski’s idea,

reported in Sect. 4.5.2, of distinguishing, in the use of abstraction, the situation in
which the user is or is not aware that an abstraction has indeed been performed.
He showed that an aware user might be more cautious in deriving conclusions in
the abstract space than an unaware one. In our model the user is always aware that
he/she is working in an abstracted space, but, notwithstanding this awareness, he/she
is willing to take “risks”, and deriving as much as he/she can. The key idea behind the
use of abstraction should be flexibility, i.e., the user must be ready to go back to the
ground space at any moment, should he/she observe inconsistencies, or simply not be
satisfied with the results obtained in the abstract space. space. This chapter first uses
the KRA model to state that in our view abstraction reduces the information, while
approximation modifies it, and reformulation leaves it unchanged only modifying its
format. We also explain why and how this three representation changes are used in
synergy.

8.1 Abstraction, Approximation, and Reformulation

Let Γ0 be a generic description frame, and let Ψ0 be the corresponding configuration

space. We have compared abstraction, approximation, and reformulation with respect
to the subsets of Ψ0 that they identify. All three mechanisms have been considered
as generative ones, in the sense that a more abstract, or approximate, or reformu-
lated description frame Γ1 does not exist a priori, but it is constructed from Γ0 by
means of suitably defined processes. In the following, we will set Γ1 = Γa if Γa is
generated from Γ0 through abstraction, Γ1 = Γap if Γ1 is generated from Γ0 through
approximation, and Γ1 = Γr if Γ1 is generated from Γ0 through reformulation.

L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 223
DOI: 10.1007/978-1-4614-7052-6_8, © Springer Science+Business Media New York 2013
224 8 Properties of the KRA Model

Let P0 be a subset of Ψ0 , i.e., a set of observed configurations. According to

Definition 6.8, let COMP0 (P0 ) be the set of configurations compatible with P0 in
Ψ0 . Let Πy (y ∈ {a, ap, r}) be a process that transforms Γ0 into Γ1 , be it an abstraction,
an approximation or a reformulation. For each P0 ⊆ Ψ0 the process Πy generates
a P1 in another space Ψ1 . As introduced in Sect. 6.4, it is possible to define the set
Πy
COMP0 (P1 ), corresponding to the set of configurations ψ0 ∈ Ψ0 such that ψ0 ψ1
(ψ1 ∈ P1 ). By exploiting COMP0 (P1 ) we can introduce a formal distinction among
the mechanisms of abstraction, approximation, and reformulation.
Definition 8.1 (Abstraction, Approximation, Reformulation) Given a descrip-
tion frame Γ0 , its associated configuration space Ψ0 , and a process Πy with
y ∈ {a, ap, r}, let Γ1 be the description frame generated by Πy from Γ0 , and Ψ1
Πy
the associated configuration space. Let P0 be any P-Set in Ψ0 , and P0 P1 , with
P1 ⊆ Ψ1 . Finally, let COMP0 (P1 ) be the set of configurations in Ψ0 compatible with
P1 . We will say that the process Πy is
• an abstraction (process) iff COMP0 (P0 ) ⊂ COMP0 (P1 ), and, then, Ψ1 is a more
abstract space than Ψ0 ;
• an approximation (process) iff neither COMP0 (P0 ) ⊂ COMP0 (P1 ) nor COMP0
(P1 ) ⊂ COMP0 (P0 ), and, then, Ψ1 is an approximate space of Ψ0 ;
• a reformulation (process) iff COMP0 (P0 ) ≡ COMP0 (P1 ), and, then, Ψ1 is a
reformulated space with respect to Ψ0 ;
According to Definition 8.1, COMP0 (P1 ) is a strict superset of COMP0 (P0 ) for
abstraction, COMP0 (P1 ) is neither a superset nor a subset of COMP0 (P0 ) for approx-
imation, and COMP0 (P1 ) coincides with COMP0 (P0 ) for reformulation. In the case
of approximation, the two sets COMP0 (P1 ) and COMP0 (P0 ) may be disjoint. In
other words, abstraction reduces the information contained in P0 , approximation
modifies it, whereas reformulation does not change the information content, but only
its format.
Example 8.1 Let us consider a description frame Γ0 = ΓTYPE , ΓO , ΓA , ΓF , ΓR ,
where:
ΓTYPE = {point}
ΓO = N+
ΓA = {(X, [0, ∞)), (Y , [0, ∞))}
ΓF = ∅
ΓR = ∅
Γ0 describes the upper-right quadrant of the (X, Y ) plane. There is a unique type of
objects, namely point, and each point is described by the two continuous coordi-
nates X and Y .
Let us consider first an abstraction, represented by the operator ωhobj (ϕhide ), where
ϕhide ≡ “Identifier of object is even”. This operator is an extension of the one that
hides a single object (described in Sect. 7.2.1.1); it performs a deterministic sampling
of the objects, hiding all those that have an even identifier.
8.1 Abstraction, Approximation, and Reformulation 225

For approximation, let us apply the two operators ρrepl ((X, [0, ∞)), (X
, N))
and ρrepl ((Y , [0, ∞)), (Y
, N)), which replace the attributes X and Y , assuming
real values, with the attributes X
and Y
, assuming integer values. In particu-
lar,
the corresponding methods meth P0 , ρ
repl ((X, [0, ∞)), (X
, N)) and meth

P0 , ρrepl ((Y , [0, ∞)), (Y , N)) state that x = x and y = y. The effect of
these operators, to be applied simultaneously, is to round up all the real coordinates
with the largest integers not greater than the coordinates themselves.
Finally, in order to exemplify reformulation, we change the Cartesian coordinate
system from the Cartesian pair (X, Y ) to the polar coordinates (ρ, θ), with ρ 0 and
0 θ π/4. We have then:

ρ = x 2 + y2
θ = arctang xy
Let us suppose now that we observe a set of 10 points in the plane. Then:
P0 = O, A, ∅, ∅ with:
O0 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
A0 = {(1, point, 0.60, 1.13), (2, point, 1.34, 9.24)), (3, point, 2.63, 8.56),
(4, point, 6.05, 3.86), (5, point, 8.35, 9.05), (6, point, 9.80, 7.87),
(7, point, 12.11, 3.03), (8, point, 14.29, 9.58),
(9, point, 17.41, 5.89), (10, point, 19.11, 3.73)}
F0 = R0 = ∅
In Fig. 8.1 the three transformed
P-Sets Pa , Pap , Pr are reported.
By applying meth P0 , ωhobj (ϕhide ) to P0 , we obtain the following P-Set Pa :
Oa = {1, 3, 5, 7, 9, UN, . . . , UN}
Aa = {(1, point, 0.60, 1.13), (3, point, 2.63, 8.56), (5, point, 8.35, 9.05),
(7, point, 12.11, 3.03), (9, point, 17.41, 5.89),
(UN, point, UN, UN), . . . , (UN, point, UN, UN)}
Fa = Ra = ∅
The set COMP0 (Pa ) consists of all the configurations in which the UN’s in Oa and
Aa are replaced with precise values (with 2 decimal digits). COMP0 (Pa ) contains
P0 , and hence the transformation is indeed an abstraction. Notice that the UN values
are set here to denote the places where an abstraction took place, but they are ignored
when reasoning in the abstract space.
By applying the methods meth P0 , ρrepl ((X, [0, ∞)), (X
, N)) and meth
P0 , ρrepl ((Y , [0, ∞)), (Y
, N)) , the following P-Set Pap is obtained:
Oap = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
Aap = {(1, point, 0, 1), (2, point, 1, 9)), (3, point, 2, 8), (4, point, 6, 3),
(5, point, 8, 9), (6, point, 9, 7), (7, point, 12, 3), (8, point, 14, 9),
(9, point, 17, 5), (10, point, 19, 3)}
Fap = Rap = ∅
226 8 Properties of the KRA Model

5
10
4
9
3
8
2
7
1
6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 5

10 4
3
9
2
8
1
7
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
6

5
10
4
9
3
8
2
7
1 6

5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Fig. 8.1 Transformation of a P -Set P0 into Pa , via abstraction, into Pap , via approximation, and
Pr , via reformulation. In Pa all the points with even identifiers are hidden. In Pp all points have their
coordinates modified being approximated by their floor functions. In Pr points in P0 and points in
Pr are set into a one-to-one correspondence

The set COMP0 (Pap ) ≡ Pap consists of a single configuration. On the other hand
P0 ∈ COMP0 (Pap ) and then COMP0 (Pap ) ∩ COMP0 (P0 ) = ∅; hence the transfor-
mation is indeed an approximation.
Finally, by changing the coordinate system from the Cartesian to the polar one
(where angles are measured in radians), the following Pr is obtained:

Or = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
Ar = {(1, point, 1.28, 1.08), (2, point, 9.34, 1.43)), (3, point, 8.95, 1.27),
(4, point, 7.18, 0.57), (5, point, 12.31, 0.83), (6, point, 12.57, 0.68),
(7, point, 12.48, 0.25), (8, point, 17.20, 0.59), (9, point, 18.38, 0.33),
(10, point, 19.47, 0.19)}
Fr = Rr = ∅

In this case we have COMP0 (Pr ) ≡ Pr and COMP0 (P0 ) ≡ P0 ; on the other hand
Pr is functionally related to P0 , and then the two sets COMP0 (Pr ) and COMP0 (P0 )
coincide, as it must be for a reformulation. In fact, from any pair (ρ, θ) a single point
(the original one) is recovered in the (X, Y ) plane.
8.1 Abstraction, Approximation, and Reformulation 227

When answering a query (solving a problem) the three mechanisms of abstraction,

approximation, and reformulation can be combined, in parallel or in sequence, to
generate complex representation changes. In order to illustrate this point, let us
look again at Fig. 5.6. As we can see, the picture sequence is obtained through a
series of pure abstraction steps, each one removing some detail from the preceding
picture; at the end, the most abstract picture is a subset of the original one. On the
contrary, the same cannot be said for Fig. 5.11; in fact, the final figure, composed by
polygons, cannot be obtained from the original one only via abstraction; instead, it is
necessary to apply some abstraction steps first, removing all the irrelevant details of
the original picture, and then some approximation steps, replacing parts of the body
with polygons.
By combining abstraction, approximation, and reformulation complex transfor-
mation processes can be obtained; they generalize Giunchiglia and Walsh’ notion of
operator composition [214], as well as Plaisted’s [419], and Nayak and Levy’s [395]
similar notions.
If we consider now the set ANS(Q) of answers to a given query Q, it is not
possible to prove general properties of the abstraction or approximation operators.
In fact, whether an operator lets ANS(Q) increase, decrease, or remain the same
strongly depends on the query itself and not only on the operator effects. The only
thing that can be said is that a well-defined reformulation operator should not change
ANS(Q).

8.2 Abstraction and Information

Coming back to just abstraction, we have defined it in terms of information reduction,

and provided an operational way, namely the comparison between COMPg (Pg ) and
COMPg (Pa ), to ascertain whether a given transformation is indeed an abstraction.
However, the comparison between COMPg (Pg ) and COMPg (Pa ) may be costly
or even impossible, in practice, for generic abstraction processes. Then, we have
introduced the abstraction operators, which are guaranteed to generate more abstract
descriptions, eliminating thus the need for the comparison. In this section we show
how this comparison can be performed in some selected cases, to prove that the
introduced operators are truly abstraction operators.
Let us start from ωh , i.e., the class of operators that hide a description ele-
ment (object, attribute, function, or relation), introduced in Sect. 7.2.1. Given a
(g) (g) (g) (g) (g)
ground description frame Γg = ΓTYPE , ΓO , ΓA , ΓF , ΓR , , let Pg = Og ,
Ag , Fg , Rg be a generic observation set. Hiding an element corresponds to replac-
ing in Pg some element with UN, obtaining thus Pa . The UN value is simply a
marker for something hidden, but, concretely, the hidden element is removed from
the description frame. In Chap. 7 we have seen that, in this case, the set of ground
configurations COMPg (Pa ), compatible with Pa , is a superset of COMPg (Pg ). In
fact, each UN can be replaced by any other element (of the same kind as the hidden
one) specified by Γg , including the “correct” one.
228 8 Properties of the KRA Model

As an example, let us consider hiding an object, which is one of the most complex
(g)
hiding operation. If o is the identifier of the hidden object in ΓO , then we will have,
for every Pg in which o occurs:
Og = {o1 , o2 , . . . , oN }
Oa = {o
1 , . . . , o
N−1 } ∪ {UN}
where UN is one of the identifiers in {o1 , o2 , . . . , oN }. Moreover:
Aa = Ag − {(o, t, v1 (o), . . . , vM (o))} ∪ {(UN, UN, . . . , UN)}
Finally, in all the covers of functions and relations all occurrences of o are replaced
with UN. Clearly, when reasoning in the abstract space, all the UN’s values are ignored
(and the corresponding tuples as well), but the user is always aware that they denote
something hidden. As the above derivation is valid for all Pg ⊆ Ψg , the operator ωh
is an abstraction operator.
It is equally easy to show that operator ωhattrval is an abstraction operator. In fact,
it simply replaces some value in the domain Λm of an attribute Am with UN. This UN
stands for any value in Λm , including the “correct” one. Analogous reasoning can
be done when the value is hidden from the codomain of a function.
A little more tricky is ωharg , which hides an argument in a function (or relation). In
(g)
fact, all functions and relations have arguments that belong to ΓO ; then, by saying
that an argument in a function (or relation) is UN does not change the function (or
(g)
relation), because the unknown argument can take on any value in ΓO . However,
given a Pg and a function f (x1 , ., xj , . . . , xt ) (or relation R(x1 , ., xj , . . . , xt )), the
cover of f (or R) in Pg becomes less informed, as any observed value of argument
xj is replace by UN, introducing thus tuples that were not present in it. As this is
true for any Pg , any function f or relation R, and any argument xj , operator ωharg is
indeed an abstraction operator. Hiding an argument of a relation or function can be
implemented, in a database, with the projection operator in relational algebra.
Moving to the group of operators ωhier that create hierarchies, they hide a set
of elements or values, and replace each one of them by a more abstract element
or value.Let us consider the operator that builds up a more abstract type, namely
(g) (g)
ωhiertype ΓTYPE,child , t(a) . The link between each element tj ∈ ΓTYPE,child and
t(a) is an is-a link. Any configuration in the abstract space, where t(a) occurs,
corresponds to the set of configurations in which t(a) can be replaced by any one of
(g)
the tj ∈ ΓTYPE,child . In other word, any abstract element corresponds to the set of
ground elements from which it has been defined. Then, operators of this group are
indeed abstraction operators.
Finally, let us consider the composition operators, which combine description
elements to form new ones. As we have seen in Sect. 7.5, there are four operators in
this group, namely ωcoll , ωaggr , ωgroup , and ωconstr . The first three ones act exclusively
on objects.1 Let us consider the operators one at a time.

1This is a choice that we have done for the sake of simplicity. It is easy to envisage, however, that
combination can be defined on other descriptors or values.
8.2 Abstraction and Information 229

Operator ωcoll t, t(a) builds up a collective object of type t(a) , which is the
ensemble of many objects of the same type t. In more details, we have:

(a) (g)
ΓTYPE = ΓTYPE ∪ {t(a) }
(a) (g) (g)
ΓO = ΓO ∪ ΓO,t(a)
(a) (g)
ΓA = ΓA
(a) (g)
ΓF = ΓF
(a) (g)
ΓR = ΓR

(a)
Notice that, at the level of description frame, nothing changes except ΓTYPE and
(g) (a)
ΓO . In fact, type t remains in ΓTYPE , because not all objects of type t necessarily
(g)
enter a collective object. Even though the original identifiers in ΓO could be used to
denote the abstract objects, it may be convenient to introduce some specific identifiers
for them. All ground attributes are still valid and the functions and relations do not
change, as such. The real difference between ground and abstract representations only
appears when the corresponding methods are applied to Pg = Og , Ag , Fg , Rg . For
the sake of simplicity, we assume that just a single abstract object can be created
from Pg . By denoting the collective object by c, we obtain, in this case:

Oa = Og − {o1 , . . . , ok } ∪ {c} (k 2)
Aa = Ag − {(o1 , t, v1 (o1 ), . . . , vM (o1 )), . . . , (ok , t, v1 (ok ), . . . , vM (ok ))} ∪
∪ {(c, t(a) , v1 (c), . . . , vM (c))}

In the above expressions {o1 , . . . , ok } is the set of objects of type t entering the
abstract object c of type t(a) . Given that t(a) is a new type, the ground attributes
may or may not be applicable to it; then, attribute values are set to UN, or NA, or to
some specific value, depending on the specific meaning of t(a) . Each original object,
oj (1 j k), is linked to the newly created one, c, via an individual-of relation.
Concerning functions, Fg contains the covers of the functions fh (1 h H)
(g)
defined in ΓF . In the cover FCOVg (fh ) all the tuples where at least one of the objects
oj (1 j k) occurs, are hidden. The same can be done for relations.
According to the above considerations, we can conclude that ωcoll t, t(a) is
indeed an abstraction operator. In order to see this, let us consider a Pg , which includes
a single configuration ψg . Also Pa includes a single configuration, ψa , namely the one
obtained from Pg via ωcoll . Configuration ψa consists of the sets Oa and Aa , reported
above, plus the sets Fa = {FCOV
(fh ) | 1 h H} and Ra {RCOV
(Rk ) | 1 k
K}, containing the abstracted covers of functions an relations. The generic cover
FCOV
(fh ) is the cover FCOV (fh ), where some argument values have been replaced
by UN. This new cover corresponds to the set of configurations, in the ground space,
that can be obtained by replacing each UN with any admissible value, clearly including
the “correct” one. The same can be said for RCOV
(Rk ). The only critical point is
230 8 Properties of the KRA Model

the introduction of the new collective object with identifier c. However, this does
not increase the information, because c is functionally determined by the application
of an abstraction function coll(o1 , . . . , ok ) to the set of objets {o1 , . . . , ok }. Then,
the abstract configuration ψa is compatible, in Ψg , with all the configurations that
involve k-ple of objects generating c.
Let us now consider operator ωaggr ((t1 , . . . , ts ), t(a) ), which generates an object
of a new type by putting together objects of different types, on a perceptive or
functional basis, and operator ωgroup ϕgroup , G(a) ) , which groups objects according
to a given (possibly extensional) criterion. By following the same reasoning as for
ωcoll , it can be proved that ωaggr and ωgroup are abstraction operators, as well.
The role of the last combination operator, namely ωconstr (Constr) is somewhat
special. It is used to construct abstract descriptors, be they attributes or functions or
relations, starting from ground ones. It is difficult to precisely analyze its behavior
without defining the function Constr. What we can say, in general, is that, in order
for the operator to be an abstraction, no new information has to be added in the
abstract space, and the new descriptor’s values must be deducible from the ground
ones. Given the codomain of the function
(g) (g) (g) (g) (g) (g)
Constr : ΓA × ΓF × ΓR → ΓA ∪ ΓF ∪ ΓR ,
each value in it may be, in principle, obtained from more than one tuple in the domain.
Each tuple corresponds to a ground configuration consistent with the value in the
abstract one.
Apart from its use as a stand-alone operator, ωconstr can be used in connection
with the other combination operators. In fact, when building up a new type, one may
think of defining derived attributes from those of the constituents, without loosing
the property of obtaining an abstract space.2 Then, the application of one of the
operators ωcoll , ωaggr , or ωgroup can be followed by one or more application of the
operator ωconstr , obtaining a more complex abstraction process.
The methodology described in this section can be applied to all the abstraction
operators introduced in Chap. 7 and Appendix E, which can all be proved to be
abstraction operators.

8.3 Approximation and Information

In this section we analyze some of the approximation operators defined in Sect. 7.6
with respect to their information content. In Chap.7 we have defined as approximation
operators those that identify sets of elements, and replace these elements with one
of them, namely ρidelem , ρidval , and ρidarg .3 Moreover, we have defined a special
replacement operator ρrepl .

2A conservative attitude would assign to all existing attributes an NA value for the new object.
3We recall that operators that build up equivalence classes and denote them by generic names are
abstraction operators, instead.
8.3 Approximation and Information 231

Let us consider, for instance, ρidobj (ϕid ). This operator searches for k-tuples of
objects satisfying ϕid , selects one of them, say o(a) = oj (1 j k), and set the
others equal to o(a) . The way in which
o(a) is chosen (for instance, randomly) is
specified by meth Pg , ρidobj (ϕid ) . In particular, given Pg = Og , Ag , Fg , Rg , let
us suppose, for the sake of simplicity, that there is just one tuple satisfying formula
ϕid (the extension to multiple tuples is straightforward). Then, if ϕid (o1 , . . . , ok ) is
true, and t is the type of oj (1 j k), we have:

Oa = Og − {(o1 , . . . , ok )} ∪ {(o(a) , . . . , o(a) )}

Aa = Ag − {(oj , t, v1 (oj ), . . . , vM (oj ))|1 j k} ∪
∪ {(o(a) , t, v1 (o(a) ), . . . , vM (o(a) ))|1 j k}

In the above expression we notice that o(a) has, of course, the same type as the
equated objects. Moreover, each of the objects oj (1 j k) becomes o(a) and
assumes the same attribute values as o(a) . Finally, all occurrences of each of the
oj (1 j k) in the cover of functions and relations is replaced by o(a) .
As we can see, we are dealing with an approximation, because some objects
and the corresponding attributes are all replaced with other ones, but not with an
abstraction, because the ground configurations corresponding to an abstract one are
not a superset of the original ones. Then Pg and Pa are incomparable with respect
to the information content. From the syntactic point of view this approximation
is also a simplification, because the number of distinct elements in Pa is reduced;
nevertheless, this fact does not imply, in our definition, that Pa Pg .
Analogous considerations hold for the other approximation operators that we have
defined. Let us look, for instance, at ρidtype , which is one of the most useful and inter-
esting approximation operators. Equating two types means that all objects, instances
of these types, are declared to be of just one type, chosen among the two. For example,
we can identify the types square and rectangle, and let them be represented
by rectangle. With this approximation all instances of squares and rectangles, in
any Pg , are all declared of type rectangle, and then they will have the attributes
of the selected type, possibly with some NA or UN values. As an example, let Color,
and Side be the attributes of type square, whereas those of type rectangle
are Color, Height, and Width. Let moreover (a, square, blue, 20) be an
instance of square, and (b, rectangle, yellow, 15, 10) be an instance
of rectangle. If the only type considered is rectangle, then the description of
object b is unchanged, whereas the one of object a becomes
(a, rectangle,
blue, 20, 20). Clearly, the method meth Pg , ωidtype specifies that Height (a) =
Side, Width(a) = Side, and Color (a) = Color. On the other hand, if we equate the
type square with circle, and the attributes of a circle are Color and Has-Center,
then the new description of the square will be (a, circle, blue, NA). Notice
that the objects a and b are kept distinct. In this case, as well, the operator is not an
abstraction operator.
232 8 Properties of the KRA Model

In conclusion, we can make two comments. First, the operation of approximating

description elements or values in Pg depends on the choice, made by the user, of
the representative element. Second, we point out that approximation operators have
subtle differences with respect to abstraction ones. For instance, let us consider again
discretization: given a variable X taking values in the continuous interval [0, 99], let
us divide this interval in 10 bins {[10k, 10k + 9] | 0 k 9}. If we equate all values
in each bin with its middle point, namely {[10k + 4.5] | 0 k 9}, we obtain an
approximation. In fact, all values of the variable X belonging to the same bin are
set equal to the middle point, and, then, they are changed and hence approximated.
If instead, we assign to each bin an abstract name or value not belonging to the bin
itself, this transformation is an abstraction.
As an example, let us consider the case in which we assign to each bin its ordinal
number, i.e., bin1 , bin2 , and so on; if X = binj , then, all configurations in which
X ∈ [10(j − 1), 10(j − 1) + 9] are consistent with X = binj , and the operator is an
abstraction operator. Clearly, the middle point of each bin could be simply considered
as a “symbol” for the bin itself, but this is not a recommended practice, because it
may generate confusion.

8.4 Reformulation and Information

As we have seen, it is not possible to investigate general properties of reformula-

tion operators, because there is no general way to represent them, give their strict
dependency upon the problem at hand. We can only say that there must be one-
to-one correspondence between the configuration spaces Ψ0 and Ψr , and that the
set of answers to the query, in the reformulated space, should be the same as the
one in the original space. In Computer Science a beautiful example of reformu-
lation is given by Cook’s theorem [114], which proves that the SAT problem is
NP-complete.

8.5 Query Environment and Abstraction Operators

Given a query environment QE = Q, Γ, DS, T , L, we have seen that the observa-
tions cannot usually be exploited as they are, because they consist of “instantaneous
signals”, and then they must be rewritten into a format exploitable by the com-
putational devices specified in T . The exploitable observations constitute the data
structure DS. We will now prove that the transformation from Γ to DS (and hence,
from P to D) is a reformulation, in the sense that it does not change the configuration
space, and, hence, Γ and DS are equivalent from the point of view of information
potential.4

4 This justifies the name of the model KRA as Knowledge Reformulation and Abstraction.
8.5 Query Environment and Abstraction Operators 233

Let us consider the case of DS being a database in a logical context. Let P be a

generic observation set, which procedure BUILD-DATA transforms into a database D.
For each typed object in O there is a unique row in the table OBJ, and vice-versa.
Concerning attributes, each vector of values contained in A corresponds to a row
in one of the ATTR tables in D, depending on the type of objects; moreover, by
construction, the union of all table ATTR exactly corresponds to A. Finally, each
cover of a function or a relation is set into a one-to-one correspondence with a table
in D.
More precisely, given a P-Set P with N objects, a totally specified (with no UN)
configuration ψP is defined by:

K
(t(o)) (t(o))
ψP = o,t(o), vj1 , . . . , vjM , FCOV (fh ), RCOV (Rk )
t
o∈O h=1 k=1

Let us define ψD as a configuration in the same space Ψ , determined by a totally

specified (with no UN) database D. Let us now consider the natural join, with ID as the
key, of the OBJ table with one of the t-ATTR tables, for t ∈ ΓTYPE . The resulting
table, t-OBJATTR, has the scheme [ID, Type, Aj1 , . . . , AjMt ], where all entries in
column Type are equal to t, and all entries in column ID correspond to objects of
type t. A row in t-OBJATTR corresponds to the following partial configuration:

(t) (t)
oi , t, vj1 (oi ), . . . , vjM (oi )
t

It is easy to see, by rearranging the order of the partial descriptions, that:

o,t(o), vj(t(o))
1
, . . . , vj
(t(o))
M
= oi , t, v(t) (t)
j1 (oi ), . . . , vjM (oi )
t t
o∈O t∈ΓTYPE oi ∈Ot

Moreover, each table in DS, associated to a function fh , contains exactly one row for
each tuple occurring in FCOV (fh ), and the same is true for each table corresponding
to a relation Rk . Then:
ψ P = ψD

If some information is missing, the same UN values occur at the same places in both
P and D. Hence, if D is constructed from P using algorithm BUILDATA(P), P and
D generate the same set of compatible configurations, i.e.:

COMP(P) = COMP(D)

As the above reasoning is true from any P, then the equivalence holds for Γ and
DS as well.
As we explained in Chap. 6, in order to answer a query Q, a theory must usually
be provided and applied to D. Let ANSg (Q) be the set of answers to the query
234 8 Properties of the KRA Model

obtained in the ground space, and let ANSa (Q) be the one in the more abstract one.
Following Giunchiglia and Walsh [214], we extend their classification of abstractions
as theorem increasing, theorem decreasing and theorem constant, in such a way that
the classification can be applied also to other contexts than theorem proving.
Definition 8.2 (A∗ -Abstraction) Given a query Q, and a query environment QE g =
Qg , Γg , DS g , Tg , Lg , let ANSg (Q) be the set of answers obtained by applying
theory Tg to DS g . Let QE a = Qa , Γa , DS a , Ta , La be the more abstract query
environment, obtained by applying an operator ω to Γ . We say that Ω = (ω, δ, λ, τ , )
is:
• Answer Increasing (AI), if ANSa (Q) ⊃ ANSg (Q),
• Answer Decreasing (AD), if ANSa (Q) ⊂ ANSg (Q),
• Answer Constant (AC), if ANSa (Q) = ANSg (Q).
If the query is the proof of a theorem, the notion of an A∗ -Abstraction becomes
exactly Giunchiglia and Walsh’ definition of T ∗ -Abstraction.

8.6 Abstraction versus Concretion

In the KRA model of abstraction there are several interacting components. First of
all, there is the relation between a description frame Γ and a P-Set P, the former
delimiting the boundaries of what could be observed with a given set of sensors,
the latter collecting actual measures performed on a concrete system. Abstraction
operators ω applied to Γ constrains all the potential descriptions of any system
observable with a given set of sensors Σ, whereas the method meth Pg , ω , applied
to Pg , implements, on a particular system, the changes specified by ω.
Another fundamental point we have highlighted is that abstraction acts on descrip-
tions of a system, and not on the system itself. Then, whatever its description (coherent
with a given Γ ), the system is always the same and the information about it can be
shown/hidden at will. In fact, abstraction is a non-destructive process, and the hidden
information is memorized to be retrieved later, if needed.
Finally, the presence of a query is crucial. In fact, there is no point in abstracting
without any target, and the effectiveness of an abstraction is strictly linked to the ways
the query can be answered. Then, any abstraction theory should take into account
the query. As we have already mentioned, in order to perform the task of answering
a query, we need information coming from two sources, namely observations from
the world, described by in Γ , and a “theory” T , providing the tools to perform the
task. These are the two primary components of QE, because the data structure DS
and the language L are determined by these two. In fact, DS simply formalizes the
content of Γ , as shown in the previous section, and L is simply a tool to express T ,
Q, and DS.
Given the query Q, we know that T and Γ should agree, in order to answer Q.
This means that the observations collected in any P, describable in Γ , should at
8.6 Abstraction versus Concretion 235

least include the information needed by T . There are basically three alternatives to
provide T and Γ :
• The theory T and the description frame Γ can be chosen independently. In this
case, to provide them is easier, but agreement is not guaranteed. If there is no other
way to proceed, it would be better to collect from the world as much information
as possible, and to gather a large theory, in such a way that the relevant parts can
be extracted later on. In this type of behavior there may be a waste of resources.
• The theory T is given first, biased by Q, and the observables will only include
aspects that are necessary to use T .
• The set of sensors is given first, and hence, Γ . The theory T shall then agree
with Γ .
In practice, the three alternatives may be combined. Nevertheless, it may happen that
complete agreement is not reachable, and only approximate/partial answers can be
given to Q. We may also notice that the process of selecting the observations and
the theory is often a spiral one: one may starts from one of the two, and then adjust
T and/or Γ in order to reach agreement gradually. Our model of abstraction is not
concerned with the possible ways to letting T and Γ agree; what we are interest in
is simply how the application of an abstraction operator affects agreement.
As an example, let us consider a concept learning task. We may provide, as
observations (Γ ), a set of learning examples described by continuous features. For
performing the task the algorithm ID3 [440] is available (T ). However, ID3 only
works on discrete features, so that T and Γ do not agree, because ID3 cannot be
used on the learning examples. This is a case of disagreement that can be solved in
various ways; for instance, by searching for another learning algorithm (changing
T ), or by searching for discrete features characterizing the examples (changing Γ ),
or by discretizing the available features (abstracting Γ ).
Going into some more detail, the observations (i.e., the “percept” P) play, in
our view, the primary role; in fact, the measures that can be collected on a system
are often constrained by physical impossibility, unavailability, cost, risk, and so on.
Then, in many cases, we are not free of collecting all the information we want or
need. Moreover, owing to the mere fact of existing, the sensory world is intrinsically
coherent, and, then, the information arriving from the sensors are consistent among
each other. Clearly, this is true assuming that the sensors are working correctly, and
that they are used in a proper way; if not, other issues beyond abstraction emerge,
and we are not dealing with them. As we briefly discussed in Sect. 6.2, the “percepts”
must be acquired and then memorized, to be available later on. The memorization is
structured according to stimuli “similarity”, in such a way that stimuli arriving from
the same sensor are kept together.
If DS is a database, tables in it are originally anonymous; each one is seman-
tically identified by the corresponding sensor and not by a “name”. However, the
manipulation and the communication of the content of the tables requires that they
indeed receive names; these names are provided by the language L. The fact that the
tables in DS have not a name, per se, is an important aspect of the model. In fact,
the table content corresponds to the outcome of some sensor, which does not depend
236 8 Properties of the KRA Model

on the language used to describe it. For instance, let us consider the case where we
group together stimuli corresponding to the color of objects; the table reporting the
color of objects can be assigned the name colore, if one speaks Italian, or color
if one speaks English. In fact, the name of the table does not change the perception of
colors. The same observations can be done for the objects, which receive a posteriori
their name. In this view the language is ancillary to the perception via the database.
On the other hand, the “theory” needed to reason about S is, in some sense,
independent of the perception, because it states generic properties holding in any
S. When the theory becomes instantiated on an actual system S it provides spe-
cific information about it. From this point of view, even though the theory T could
be selected independently from P, a theory that mentions properties that are not
acquirable from S is of no use. A theory may, of course, predict aspects of S not yet
known, as in the clamorous case of Higgs’ boson [248]. Also, “theory” may be the
starting point when generating worlds in virtual reality. However, in the problems
we have to face in everyday life, we need a theory the can be applied immediately.
Then, the perception and the theory must be compatible.
Concerning the language L, once its nature (logical, mathematical, …) has been
decided, its constituent elements are a consequence of the perception (through DS)
and the theory. L must be such that the theory can be expressed in it, and all the
tables in DS and constants in O receive their name.
Once the theory is added to QE, some inferences can be done, leading to the
potential definition of new tables, which, in turn, may offer new material for further
inferences. For economy reasons, the inferences are not done at definition time, nor
new tables are added to DS; both activities will be performed on demand, only if
and when needed.
Another issue that should be discussed is the bottom-up (abstraction) versus top-
down (concretion) use of any abstraction model (including KRA). By definition,
abstraction is a process that goes from more detailed representations to less detailed
ones. In this case, which is the one proposed in most previous models in Artificial
Intelligence (except planning), abstraction is constrained by the observations, and Pa
must comply with the ground truth provided by Pg , in the sense that, by de-abstracting
Pa , Pg should be obtained again.
However, one may think as well of a top-down process, where an original descrip-
tion of a system is made more precise by adding details steps by steps. This is the
case of design, invention, creation, where there is no a priori ground truth. This is,
for instance, Floridi’s approach [176], discussed in the next section. The same idea
is behind Abstract Data Types (ADT) in Computer Science, where an ADT is first
introduced at its maximum abstraction level, and implementation details are added
later on. An interesting top-down approach is proposed by Schmidhuber [477], who
creates even complex pictures starting from a single large circumference and adding
smaller and smaller ones, as illustrated in Fig. 8.2.
We notice that the top-down abstraction process described above is different from
the one used, for example, in planning. Here there is a ground truth, Pg , consisting
of detailed states and actions. When Pg is abstracted, less detailed states and actions
are obtained, which allow a schematic plan to be formulated. When the abstract plan
8.6 Abstraction versus Concretion 237

C J. Schmidhuber, 1994. C J. Schmidhuber, 1994.

Fig. 8.2 Schmidhuber proposes a top-down approach to create abstract pictures. Starting from a
single circumference, smaller and smaller circumferences are added. By keeping only parts of the
circumferences, even complex figures can be created (Reprinted with permission from Schmidhuber
[477])

is refined, the details to be included are the ones originally present in Pg . In the case
of a true top-down abstraction process, the result may be totally new.
Even though the KRA model is primarily thought for a bottom-up use, which
is also the most common one in Artificial Intelligence tasks, it can also be used
top-down, by inverting the abstraction operators defined in Chap. 7. An example will
be provided in Sect. 8.8.1.1, when discussing Floridi’s approach.
Another aspect of abstraction that deserves to be discussed emerges when com-
bination operators are applied, in particular those that generate a collective or aggre-
gated object, or that create a group. In such cases, combined objects and new ones
are related by special relations, namely individual-of for collection, part-of for
aggregation, and member-of for grouping.
As we have seen in the definition of the operators’ body (see Sect. 7.10.1), these
relations are neither in Pg nor in Pa , but, instead, they are created during the oper-
ator application, and are stored in the operator’s memory. In fact, each one of these
relations establishes a link between objects that belong to different spaces, situated
at different levels of abstraction. Then, we must use either the ones or the others.
This fact may appear strange at first sight, because we see at the same time both the
components and the whole. This is due to the fact that when we see some special
arrangement of objects (for instance the parts of the computer in Fig. 5.3), the past
experience tells us that their association or aggregation brings some conceptual or
computational advantage, reinforced each time we see it anew. Then, in those par-
ticular cases, we automatically know, on the basis of past learning, what is a suitable
abstraction, without searching for it. As a consequence, we (humans) are able to rea-
son moving quickly and seamlessly without apparent effort between two (or more)
abstraction spaces at the same time.
238 8 Properties of the KRA Model

An analogous automatic abstraction occurs in vision: when we look at a picture,

we immediately and effortlessly group pixels into meaningful objects. What are
“meaningful” objects is genetically inherited or dictated by the past experience,
which tells us that grouping some pixels in a certain way rather than other pixels in
another way proved to be more frequently useful than not in the past. For instance,
in a picture showing two persons, we spontaneously aggregate head, body and limbs
of each person to form an individual, rather then putting together the two heads, on
one hand, and the two bodies, on the other. Even though, when looking at a picture,
we do not “see” the pixels, but “objects”, we can retrieve at any moment the pixels,
if we need more information.
As finding a good abstraction for a given task is still the most difficult part of
the abstraction process, understanding how humans perform abstraction in such an
effective and efficient way would be of utmost importance. Unfortunately, there is no
known answer yet. Certainly, there are other factors that determine such an ability. For
instance, it is likely that certain abstractions have reached such a level of automatism
because they proved to be the best ones across many different tasks. Given a specific
task, it is possible that there are abstractions that are best suited to it, but that are not
automatic, because they have a smaller range of applicability.
Let us suppose, for instance, that we are looking at a dance floor, where some cou-
ples are dancing. If there is a ticket to pay per couple, and we want to know how much
the income from the dancers has been, the best abstraction would be to aggregate
each pair of dancers into single units. However, this is not what we automatically do;
instead, we “see” the two dancers separately, and only afterward, and on purpose, we
aggregate them into couples. If we consider the range of situations in which humans
operate, it is certainly greater the number of those in which the abstraction “person”
is more useful than the abstraction “couple”. Using the abstraction “couple” would
not be wrong, but it would have a higher cognitive load, because in very many sit-
uations the couple should be de-abstracted into a pair of individuals, requiring thus
more cognitive work. Then, one might formulate the hypothesis that at the emergence
of the cognition, many alternative abstractions have been tried, and only those that,
in the history of humanity, proved to be useful in the largest set of situations have
been reinforced and established. In support of this hypothesis we may observe that
when we look at a picture of something unknown in an unknown context, we find
difficult aggregating pixels in a meaningful way, and often the aggregation does not
go beyond forming some color or texture regions. The power of human abstraction is
amazing in the experiments, mentioned in Sect. 2.6, reporting the ability of primates
to discover even unknown living beings in previously unseen pictures.
By coming back to the KRA model, even though it is especially meant to work in
an abstraction space at a time, nothing hinders that two or more spaces be considered
simultaneously, in order to take advantage of the already available abstractions that
have been proved useful in the past.
8.7 Inconsistency Problem 239

8.7 Inconsistency Problem

In the abstraction literature it is well known that some types of abstraction may
generate inconsistencies in the abstract space, as discussed, for instance, by Plaisted
[419], Giunchiglia and Walsh [214], Zilles and Holte [587], and Tenenberg [526].
As it will be shown in the following, this problem might not be as severe as it appears
at first sight. In fact, the consistency or inconsistency of the abstract space may or
may not be an issue, because the only important thing is whether the given query can
or cannot be answered in the abstract space. For instance, if we plan a trip by car from
one city to another, color and make of the car do not matter, whereas speed does.
Then, if there is an inconsistency about color or make, they can be ignored, as even
an inconsistent theory can be fruitfully used (avoiding checking for inconsistencies).
On the other hand, if a car has to be bought, color and make are relevant, and the
presence of an inconsistency may affect the results.
The reason why inconsistency derives in abstraction is that logical theories assume
that abstracting is deducing all that can be deduced from the ground space and the
abstraction mapping. Actually, abstraction should not be primarily concerned with
deduction, and hence theory correctness or completeness, but about usefulness. In
fact, the very idea of abstraction is deciding what information to keep and what
information to ignore. It is the user that has to decide how much he/she is ready to
bet, risking to obtain wrong results versus to obtain useful ones with a reduced cost.
Then, in the abstract space we have to preserve what we decide to keep, not what
can be deduced.
Then, a crucial issue in abstraction is the ability to go back to the ground space,
when abstraction proved to be unsatisfactory, and either try another abstraction, or
give up abstracting at all. From this perspective a very important issue is the ease in
going up and down across abstraction levels. This is the reason why we keep ready in
memory what has been hidden from one level to another, in order to facilitate coming
back to the ground space. The next example, provided by Tenenberg, and reported
in Example 4.15, illustrates the issue.
Example 8.2 (Tenenberg [526]) Let us go back to Example 4.15. In Tenenberg’s
formulation, all information provided for reasoning are put together in a single
“theory”. In order to handle the same example in KRA, we have to describe the
(g) (g) (g) (g) (g)
ground description frame Γg = ΓTYPE , ΓO , ΓA , ΓF , ΓR , where:
(g)
ΓTYPE = {glass, bottle}
(g)
ΓO = {a, b, . . .}
(g) (g) (g)
ΓA = ΓF = ΓR = ∅
In the KRA model predicate mapping is realized by an operator that constructs a
node in a type hierarchy. Then, we apply to Γg the operator
ωhiertype ({glass, bottle}, container}) ,
240 8 Properties of the KRA Model

which maps types glass and bottle to a new, more abstract type, container.
The results of this operator is the more abstract description frame
(a)
Γa = ΓTYPE , ΓO(a) , ΓA(a) , ΓF(a) , ΓR
(a)
, where:
(a)
ΓTYPE = {container}
(a) (g)
ΓO = ΓO
ΓA = ΓF(a) = ΓR
(a) (a)
=∅
By using KRA’s typing facility, there is no need to explicitly say that a glass is
not a bottle, and vice-versa, which is the rule generating the inconsistency. The
description frame Γa is neither consistent nor inconsistent; simply, in Γa there is no
more distinction between glasses and bottles, and this has only effect on the possibility
to answer the query. In Γa , any question that requires glasses to be distinguished from
bottles cannot be answered anymore, because we simply ignore that there are bottles
and glasses in the world.
Tenenberg shows that an inconsistency arises when a bottle a is observed, and he
makes the following derivation:
bottle(a) ⇒ container(a)
bottle(a) ⇒ ¬glass(a)
glass(a) ⇒ container(a)
container(a) ⇒ ¬container(a)
In our model we have the following ground and abstract observations:
Og = {a}
Ag = {(a, bottle)}
Fg = Rg = ∅
and
Oa = {a}
Aa = {(a, container)}
Fg = Rg = ∅
and there is no inconsistency in the abstract space, but only a less detailed description.
In fact, in our model the derivation that generates the inconsistency is not allowed,
because it involves elements across two different abstraction levels.
A more complex example is also provided by Tenenberg (reported in Example 4.16),
and we show here how it can be handled in the KRA model.
Example 8.3 (Tenenberg [526]) Tenenberg provides another case of predicate map-
ping, in which, besides types, there are also attributes of objects, which we partition
in observable and deducible. The observable attributes (Made-of-glass, Graspable,
Open) are inserted in Γg , whereas the deducible ones (Movable, Breakable, Pourable)
in the theory Tg .5

5 There are actually alternative ways to represent this example in the KRA model. We have chosen
the one that gives the closest results to Tenenberg’s.
8.7 Inconsistency Problem 241

By using the 18 rules reported in Example 4.16, we can define the following
observation frame Γg :

(g)
ΓTYPE = {glass, bottle, box}
(g)
ΓO = {a, b, . . .}
(g)
ΓA = Made-of -glass, {yes, no} , Graspable, {yes, no} , Open, {yes,

no}
(g)
ΓA,bottle = Made-of -glass, {yes} , Graspable, {yes} , Open, {yes,no}
(g)
ΓA,glass = Made-of -glass, {yes} , Graspable, {yes} , Open, {yes}
(g)
ΓA,box = Made-of -glass, {yes, no} , Graspable, {yes} , Open, {yes,

no}
(g) (g)
ΓF = Γ R = ∅

(g)
As we may see, the domains of attributes reported in ΓA are generic, and contain
all the possible values the corresponding attributes can take. On the contrary, some
of the attributes associated to specific types may have fixed values, common to all
instances of the type. The values of the attributes have been derived from rules (1–6).
We may note that rule (13) of Example 4.16 is not used, because it actually refers
to a less abstract description frame, where the basic types are milk-bottle and
wine-bottle, instead of bottle. Then, the frame called Γg is already a result
of a previous abstraction, where an operator creating a hierarchy has been applied.
As this rule is not used in the following, we may ignore this previous abstraction and
start from Γg .
Rules (7–12) are implicitly taken into account by the typing mechanism, and we
do not need to explicit them. The same can be said for rule (17). The theory contains
rules (14–16), namely:
Tg = {graspable(x) ⇒ movable(x)
made-of-glass(x) ⇒ breakable(x)}
open(x) ∧ breakable(x) ⇒ pourable(x)}
Let us apply now the abstraction operator that builds up a node in a type hierarchy,
starting from bottle and glass, i.e.,
ωhiertype ({glass, bottle}, glass-container).

The method meth Pg , ωhiertype ({bottle, glass}, glass-container)
must specify what to do with the attributes of the original types. This is a choice
that the user has to do. For instance, he/she may decide to be conservative, and thus
selects, for the type glass-container the following:
• only the attributes that are common to both bottle and glass,
242 8 Properties of the KRA Model

• for each selected attribute, the smallest superset of the values appearing in both
types bottle and glass.
With this choice, the abstract description frame Γa becomes:
(a)
ΓTYPE = {glass-container, box}
(a) (g)
ΓO = ΓO
(a) (g)
ΓA = ΓA
(a)
ΓA,glass-container = Made-of -glass, {yes} , Graspable, {yes} , Open,

{yes, no}
(a) (g)
ΓA,box = ΓA,box
ΓF(a) = ΓR (a)
=∅
As both types bottle and glass have attributes Made-of-glass, Graspable, and
Open, all three attributes are associated to the type glass-container. Moreover,
both glasses and bottles are made of glass and are graspable, so that the attributes
Made-of-glass and Graspable have only value yes. On the contrary, glasses are
open, but bottle may not be, so that attribute Open, in glass-container, may
take value in the whole domain {yes, no}. Moreover, the theory Ta is equal to Tg .
In Tenenberg’s example there is only one observation, namely open(a). In our
model this is represented by the following Pg :
Og = {a}
Ag = {(a, UN, UN, UN, yes)}
Fg = Rg = ∅
The values of a’s attributes are reported in the same order in which they appear in
(g)
ΓA . More precisely, the first UN stands for the type, which is not specified, the
second and third UN stand for Made-of-glass = UN and Graspable = UN, whereas the
last yes tells that object a has been observed to be open.
The observation Pg consists of several configurations, all the ones where each UN
is replaced by any value in the attribute’s domain. In particular:
Pg = {ψ1 , ψ2 , ψ3 , ψ4 }
where:
ψ1 = (a, glass, yes, yes, yes)
ψ2 = (a, bottle, yes, yes, yes)
ψ3 = (a, box, yes, yes, yes)
ψ4 = (a, box, no, yes, yes)
By abstracting the possible configurations, we obtain:
(a)
ψ1 = (a, glass-container, yes, yes, yes)
(a)
ψ2 = (a, glass-container, yes, yes, yes)
(a)
ψ3 = (a, box, yes, yes, yes)
(a)
ψ4 = (a, box, no, yes, yes)
8.7 Inconsistency Problem 243

(a) (a)
Configurations ψ1 and ψ2 collapse together, as it must be, having equated types
bottle and glass. Then, the abstract Pa contains:
(a)
ψ1 = (a, glass-container, yes, yes, yes)
ψ3(a) = (a, box, UN, yes, yes)
In the ground theory Tg the predicate open(a) is true, but also predicate graspable(a)
is true, because attribute Graspable is equal to yes in all Pg ’s configurations. As a
consequence, movable(a) and pourable(a) are also true.
As we may see, there are no inconsistencies in the abstract space, per se. There is
only a question of utility. For instance, consider the question Q1 = “pourable(a)?”.
The answer in the ground space is yes, as it is in the abstract one. Then the
performed abstraction is AC (Answer Constant). On the contrary, question Q2 =
“breakable(a)?”, cannot be answered in either space, because the information that
a is open and graspable is not sufficient to ascertain whether a is also breakable.
In the considered example, the types bottle and glass are almost the same,
except for the attribute Open. As this is exactly the attribute which is observed
true, configurations ψ1 and ψ2 become identical. Then, any question about a glass-
container involves equally a bottle and a glass.
The situation would be different if we assume the Closed World Assumption. In
this case, at it is not said that bottle(x) ⇒ open(x) nor that box(x) ⇒ open(x), we
have to assume that Open = no for all instances of types bottle and glass.
Then, we will have in Γg (all the rest being equal):
(g)
ΓA,bottle = Made-of -glass, {yes} , Graspable, {yes} , Open, {yes}
(g)
ΓA,glass = Made-of -glass, {yes} , Graspable, {yes} , Open, {no}
(g)
ΓA,box = Made-of -glass, {yes, no} , Graspable, {yes} , Open, {no}
In the theory we can also derive movable(a) and pourable(a). By abstracting Γg ,
we obtain, in this case (all the rest being equal):
(a)
ΓTYPE = {glass-container, box}
(a)
ΓA,glass-container = Made-of -glass, {yes} , Graspable, {yes} , Open,

{yes, no}
(a) (g)
ΓA,box = ΓA,box
Then, Pg would correspond to the following configurations:
ψ1 = (a, glass, yes, yes, yes)
ψ2 = (a, bottle, yes, yes, no)
ψ3 = (a, box, yes, yes, no)
ψ4 = (a, box, no, yes, no)
Using the rules introduced at the beginning of the example, we obtain the following
Pa :
ψ1(a) = (a, glass-container, yes, yes, yes)
(a)
ψ2 = (a, glass-container, yes, yes, no)
(a)
ψ3 = (a, box, yes, yes, no)
244 8 Properties of the KRA Model

(a)
ψ4 = (a, box, no, yes, no)
Again, Pa is not inconsistent, per se, but it simply has less information that the
original one; in fact, it collapses into the following one:
(a)
ψ1 = (a, glass-container, yes, yes, UN)
(a)
ψ3 = (a, box, UN, yes, no)
Pa derives from a set of sensors that are no more able to establish whether an object
is open or not, and hence any query involving attribute Open cannot be answered
anymore. In fact, even though query Q1 can be answered in the ground space, it
cannot be answered anymore in the abstract one. The same is true for Q2 .
As a conclusion, we can say that it is neither worth checking consistency of the
abstract space a priori, as Tenenberg suggests [525], nor to use complex derivations,
as proposed by De Saeger and Shimojima [464]. Nayak and Levy [395] have sug-
gested a method of dealing with inconsistencies which has something in common
with our approach. It will be discussed later on in this chapter.

8.8 KRA’s Unification Power

By explicitly considering perception (observation), data, language and theory, the

KRA model is able to simulate most of the previously proposed abstraction theories.
In the following, we will consider some of the previous models described in Chap. 4
and show how they can be set in the KRA model. In the attempt to reproduce previous
models, many of the issues discussed earlier on will become more clear.
In the following, we will refer to Fig. 6.12, which schematically represents the
relations between the “ground” and the “abstract” spaces for the four components of
a query environment. In order to see how previously proposed theories of abstraction
can be represented inside the KRA model, we group them according to the query
environment component in which they have been primarily defined. Then, we classify
abstraction theories as being defined at the perception, database, language, or theory
level.

8.8.1 Theories Defined at the Perception (Observation) Level

In this category we include Floridi’s Method of Abstraction [175, 176], described

in Sect. 4.2.2 , and the theories of Granularity by Hobbs [252] and Imielinski [269],
described in Sects. 4.10 and 4.5.2, respectively. In Fig. 6.12 we have highlighted the
primary operator used to define these types of abstractions, namely the ω’s. Starting
from ω we extend abstraction to the other components of the model.
8.8 KRA’s Unification Power 245

8.8.1.1 Floridi’s Method of Abstraction

The most interesting model of abstraction proposed in Philosophy is the one by

Floridi [176], which shares with KRA the idea that the world is the root of any
experience. Even though much simpler, the notion of Level of Abstraction (LoA)
coincides with the description frame Γ , because a LoA is defined before any system is
observed. Floridi’s model is simpler than KRA, because it considers only attributes.
For the sake of exemplification, let us go back to Example 4.4, where a nested GoA,
consisting of a ground and an abstract LoA, was considered. The ground LoA, Lg ,
contains a unique type of objects, namely greenlight, and only one attribute,
(1) (2)
Wavelength, with domain ΛWavelength = {green, yellow, [λred , λred ]}. Then,
Lg can be expressed in KRA as follows:
(g) (g) (g)
LoA → Γg = ΓTYPE , ΓO , ΓA , ∅, ∅ (8.1)

where:
(g)
ΓTYPE = {greenlight}
(g)
ΓO = {o, o
, o

, . . .}
(g) (1) (2)
ΓA = Wavelength, {green, yellow, [λred , λred ]}
When a particular system is described, the values of the instantiated variables are
inserted into the behavior Πg , which contains the actual observations. Then, Πg
corresponds to a Pg :

Pg = {o}, {(o, greenlight, x)}, ∅, ∅

where x is the wavelength of o. The set A = {(o, greenlight, x)} specifies the
measured value x of the variable Wavelength on object o.
Floridi’s method of abstraction involves generic relations among LoAs, but we are
only interested in those that can be clearly identified as abstraction in KRA’s sense.
For this reason, we only consider nested Gradients of Abstraction (GoA). Given two
LoAs L1 and L2 , let L1 = Γa and L2 = Γg be the more abstract and more concrete
LoAs, respectively. We recall that Floridi’s method proceeds from the abstract to the
concrete by progressively refining descriptions. In order to map this method to the
KRA Model, we have to invert its process. Moreover, even though not explicitly
stated, his goal in abstracting is to find behaviors, so that the query can be formulated
as follows:
Q = “Given a behavior Πa , is there a corresponding behavior Πg such that
a relation R between L1 and L2 is satisfied?”

In principle, relation R can be any one, provided that for each Pa in the abstract
space there is at least one Pg in the concrete one. In particular, Floridi considers two
cases: in the first one, the range of value of a variable is refined, and, in the second
246 8 Properties of the KRA Model

(1) (2)
Table 8.1 Pseudo-code for meth Pg , ωeqattrval (Wavelength, ΛWavelength ), [λred , λred ], red)

METHOD meth Pg , ωeqattrval (Wavelength, ΛWavelength ), [λ(1) (2)
red , λred ], red)
Oa = Og
Fa = ∅
Ra = ∅
if Wavelength(o) = yellow or green
then Aa = Ag
else Aa = {(o, greenlight, red)}
endif
Δ(P ) = Wavelength(o)

one, a variable is added going from the abstract to the concrete. Refining a variable’s
domain is described in Example 4.17, where the color of a greenlight is considered.
The abstract LoA is then:
(a)
Γa = L1 = {greenlight}, ΓO , {(Color, {red, yellow, green})}, ∅, ∅

whereas the ground one is given by (8.1).

The description frame Γa is linked to Γg by means of operator ωeqattrval :

(1) (2)
Γa = L1 = ωeqattrval (Wavelength, ΛWavelength ), [λred , λred ], red)

Given any behavior Pa , there is always a corresponding behavior Pg , obtained by

inverting the method meth Pg , ωeqattrval . The pseudo-code for the method is given
in Table 8.1.
From Pg and Pa we have to derive Dg , Da , Lg , La . Tg and Ta are both empty.
Dg contains two tables: OBJ g with the single tuple (o, greenlight), and
GREENLIGHT -ATTRg with the single tuple (o, x), where x ∈ ΛWavelength . In the
corresponding Da we have OBJ a = OBJ g , and GREENLIGHT -ATTRa with the sin-
gle tuple (o, x
), where x
∈ ΛColor is the image of x generated by the abstraction
operator.
For the language Lg we can choose a vector V = Attribute, Value representation,
with Vg = Wavelength, x and Va = Color, x
.
Clearly, proceeding top-down in refining descriptions, one has the problem of
(1) (2)
maintaining coherence with the reality. For instance, the values λred and λred must
actually delimit a range of wavelengths corresponding to the “red” color. When
proceeding bottom-up with the abstraction, this problem does not arise; in fact, after
the range of wavelengths of interest is chosen, it does not matter whether we give
to it the name “red” or “rosso” or any other, because only the perceived “redness”
counts.
8.8 KRA’s Unification Power 247

The second case described by Floridi is a refinement through the addition of a

variable. The abstract LoA La contains a single variable X, whereas the concrete
one, Lg contains X and Y . By inverting the process, we can identify this type of
transformation as an abstraction, specifically a case of hiding an attribute (feature
selection), performed by the operator ωhattr . More precisely, let

Γg = {obj}, ΓO , {(X, ΛX ), (Y , ΛY )}, ∅, ∅

Then:
Γa = ωhattr Y = {obj}, ΓO , {(X, ΛX ), ∅, ∅

The corresponding method meth Pg , ωhattr Y generates an abstract Pa for each
Pg . Both Tg and Ta are empty, and the construction of Da and La is straightforward.

8.8.1.2 Hobbs’ Theory of Granularity

As we have seen in Sect. 4.10, Hobbs partitions the set of objects in a domain into
equivalence classes, such that two objects belongs to the same class iff they satisfy
a subset of the “relevant” predicates in the domain. Hobbs starts from a ground
theory Tg , which contains everything, namely the observations, the language and the
theory. Data and observations are not distinguished between each other, and are both
expressed in logical form. If Sg is the set of ground objects, Pg the set of predicated,
R ⊆ Pg the subset of relevant predicates, and Sa the set of equivalence classes, then
Hobbs defines an abstraction function f : Sg → Sa such that

∀x, y ∈ Sg : (x ∼ y) ⇒ (f (x) = f (y)) (8.2)

Hobbs introduces also a mapping κ : Pg → Pa between predicates, such that

∀p ∈ Pg : κ(p) is true on f (x) iff p is true on x.

Hobbs’ abstraction can be modeled in KRA with the operator ωeqobj ϕeq , where
ϕeq is the equivalence relation, i.e., the conjunction of the predicates in R. The
representative of each equivalence class is inserted into Sa . Function f maps all
objects in Sg to the corresponding equivalence class in Sa .
Hobbs’ pioneering approach had the merit of introducing the concept of changing
the granularity in a domain by means of abstraction. However, his theory has not been
used in any substantial application, so that the hidden difficulties did not come out.
Let us discuss in more details the example described by Hobbs [252] and reported in
Example 4.9 of this book. The example looks simple enough, but a closer examination
shows that many details, essential for a practical application of the theory, have been
overlooked.
Example 8.4 Hobbs’ example of the block world can be described in KRA by the
(g) (g) (g) (g) (g)
ground description frame Γg = ΓTYPE , ΓO , ΓA , ΓF , ΓR , where:
248 8 Properties of the KRA Model

(g)
ΓTYPE = {agent, block, table, location, event}
(g) (g) (g) (g) (g) (g)
ΓO = ΓO,agent ∪ ΓO,block ∪ ΓO,table ∪ ΓO,location ∪ ΓO,event
(g)
ΓA,location = {(X, R+ ), (Y , R+ ), (Z, R+ )}
(g) (g) (g)
ΓA,agent = ΓA,block = ΓA,table = ∅
(g) +
ΓA,event = {(te , R+ ), (T , 2R )}
(g) (g) (g)
ΓA = ΓA,location ∪ ΓA,event
(g)
ΓF = ∅

(g) (g) (g)
ΓR = ROn ⊆ ΓO,block × ΓO,block ∪ {tab} × R+
(g)
In the above definitions ΓO,location is the continuous set of points (locations ) in
the Euclidean space, and X, Y , Z are the Cartesian coordinates of a location . Objects
of type agent, block, and table do not have attributes. Events are described
by their end time te and duration T . Relation ROn (x, y, t) states that block x is on
another block y or on the table at time t.
The theory contains the predicate move. If one tries to apply the above theory
to some real case, it is immediately clear that no sufficient details are provided. For
instance, it is not clear how move does work, namely whether it acts directly on
blocks (“move block b1 on block b2 ”) or on blocks through locations (“move block
b from location 1 to location 2 ”).
Another point needing clarification is the relation between actions in the world and
events. As the only allowed action consists in moving a block, at each application of
the move predicate a corresponding event should be defined. Events must be created
on the spot, because a priori it is not known which moves will be made. How events
are created must be specified in Tg as well.
In the abstraction mapping all agents are indistinguishable (equivalence class EA)
except agent A, all blocks are indistinguishable (equivalence class EB) except those
on the table. For locations, only locations (xi , yi ) with xi , yi ∈ [0, . . . , 100] are kept
as different, whereas all other locations are collapsed into the same equivalence class
and labelled EL. Table tab does not change. Moreover, Hobbs defines a mapping
function κ, which maps the predicate move onto itself.
We will show now how the above transformations can be realized in the KRA
model. Regarding locations, two different operators are applied in sequence: the first
one makes points on the same vertical line (Z axis) indistinguishable, and, afterwards
the continuous values x and y are transformed into their floors. This second operation
can be interpreted as an approximation, which substitutes to k any value of the
attributes X or Y included in the intervals [k, k + 1) for (0 k 99).
In the KRA model, the above transformations can be obtained with the set of
operators specified in the following:
• For events, the end time and the duration are equated, so that events become
instantaneous
in the abstract space. This is achieved by operator ωeqattr (te , R+ ),
+
(T , R ) .
8.8 KRA’s Unification Power 249
(g)
• For agents, operator ωeqobj ϕeq (u), EA , with ϕeq (u) ≡ [u ∈ ΓO,agent ]∧[u = A],
makes all agents different from A equivalent to each other.
• For blocks, only those that lie on the (unique) table maintain their identity, whereas
all the others are made equivalent. This is achieved by
(g)
operator ωeqobj ϕeq (u), EB , with ϕ
eq (u) ≡ [u ∈ ΓO,block ]∧te [(u, tab, te ) ∈

RCOV (ROn )].

• For locations, the first step consists in abstracting the Z coordinate, making
thus all points
on thesame vertical line equivalent. This is achieved by opera-
tor ωeqobj ϕeq ( ), 0 , where ϕ

eq ( ) = [X( ) = x] ∧ [Y ( ) = y]. Moreover,

0 = (x, y, 0).
• Once the Z coordinate has been hidden, we want to reduce the surface of the
table (which is located at z = 0) to a grid corresponding to integer values of
X and Y . In this way, any value x or y is reduce to its floor function, resulting
thus in an approximation
ofthe true value. This approximation is performed by
operator ρidobj ϕ

id ( 0 ),
(a) , applied one time to X and one time to Y . Moreover,

id ( 0 ) = [X( 0 ) ∈ [k, k + 1)], where

(a) = k.

The above operators are combined into a complex abstraction/approximation

process. In particular, the four abstraction operators are independent, and can be
applied in parallel, whereas the two approximation operators, independent from one
another, are to be applied after the Z coordinate has been hidden. Then the global
process is the following:

Π = ωeqattr (te , R+ ), (T , R+ ) , ωeqobj ϕeq (u), EA , ωeqobj ϕ
eq (u), EB ,

ωeqobj ϕ

eq ( ), 0 ⊗ ρeqobj ϕ

eq,X ( 0 ),
(a)
, ρeqobj ϕ

eq,Y ( 0 ),
(a)

(a) (a) (a) (a)

After the application of Π , a new description frame Γa = ΓTYPE , ΓO , ΓA , ΓF ,
(a)
ΓR is obtained, which is the combination of abstraction and approximation. It
contains:
ΓO(a) = {ΓO(a),agent , ΓO(a),block , ΓO(a),table , ΓO(a),location , ΓO(a),event }, where:
ΓO(a),agent = {A, EA}
(a) (g)
ΓO,block = {u ∈ ΓO,block |¬ϕ
eq (u)} ∪ {EB}
(a)
ΓO,location = { (a) ∈ {(j, k)|0 j, k 99}} ∪ {EL}
(g)
ΓO(a),event = ΓO,event
(a)
ΓO,table = {tab}
Let us consider now a specific scenario Pg = Og , Ag , Fg , Rg , where Og
includes some agents, blocks, and a table tab; in addition, locations are identi-
fied by their three Cartesian coordinates. The scenario is reported in Fig. 8.3a. The
abstracted/approximated scenario is reported in Fig. 8.3b.
250 8 Properties of the KRA Model

(a)

(b)

Fig. 8.3 Ground and abstract spaces corresponding to the scenario described in Example 8.4

In the theory Tg we have to define the predicate move(u, 1 , 2 , te , T ), where u is

a block, 1 is the source location, 2 is the destination, te and T are the end time and
duration of the moving action, respectively. Now we must link the action of moving
blocks with events, by stating, in Tg , that

move(u, 1 , 2 , te , T ) ↔ e(te , T ) (8.3)

In Ta expression (8.3) becomes

move(u, (a) (a)

1 , 2 , te ) ↔ e(te ) (8.4)

Clearly the obtained abstract theory is simpler than the original one, as in Hobbs’
intentions, but its usefulness cannot be ascertained per se. In fact, if we consider the
starting scenario of Fig. 8.3a, and a query Q1 , consisting of a state to be reached,
namely “Six block on the table”, this query can be answered in 3003 different ways
(6 objects out of 14) in the ground space, whereas it cannot be solved in the abstract
one. On the contrary, the query Q2 “Four blocks on the table” can be solved in 1001
different ways (choosing 4 objects out of 14) in the ground space, and in 5 ways in
8.8 KRA’s Unification Power 251

the abstract one. Then the applied abstraction process is AD (Answer Decreasing)
for both Q1 and Q2 .

8.8.1.3 Imielinski’s Limited Reasoning

As Hobbs, Imielinski as well suggests as well an abstraction consisting in collapsing

the constants in a domain into equivalence classes, as described in Sect. 4.5.2. For
instance, the formula
P(a1 , b1 ) ∧ P(a2 , b2 ) ∧ · · · ∧ P(an , bn )
collapses onto P(a, b) if a is the equivalence class of {a1 , . . . , an } and b is the
equivalence class of {b1 , . . . , bn }. The above abstraction can be represented in the
KRA model by the operator ωegobj (ϕeq ), where
ϕeq (x) = (x ∈ {a1 , . . . , an }) and ϕeq (y) = (y ∈ {b1 , . . . , bn }) and (x = y).
Let us consider now the example provided by Imielinski and reported as Example
4.13 in this book.
Example 8.5 Let us consider domestic flights in the USA, from city to city. Then
(g) (g) (g) (g)
ΓTYPE = {city, state}, ΓO = {cities of the USA}, ΓA = ΓF = ∅,
(g) (g) (g)
ΓR = {Rdirconnect ⊆ ΓO × ΓO }. Let us apply the grouping operator:

ωgroup ϕgroup , G(a) , with ϕgroup (x, s) ≡ “city x is in state s” and G(a) = s.
The predicate “city x is in state s” induces a partition of USA’s cities into groups, each
one containing the cities located in a given state. Then, ΓO(a) = {states of the USA},
ΓA(a) = ΓF(a) = ∅, ΓR(a)
= {Rdirconnect ⊆ ΓO(a) × ΓO(a) }.
Suppose now that we observe a particular scenario Pg , which contains the timeta-
bles of some direct flights between USA’s cities. More precisely, let
Pg = Og , ∅, ∅, RCOV (Rdirconnect ), where :
Og = {New York,San Francisco,Los Angeles,Seattle,
Boston,Charlotte}
RCOV (Rdirconnect ) = {(San Francisco, Seattle), (Seattle,
Boston), (Los Angeles, Charlotte),
(New York, Los Angeles), (New York, Seattle),
(New York, San Francisco)}
The content of Pg is translated into two tables in Dg , namely, OBJ and
DIRCONNECT , the first containing the cities and their type city, and the sec-
ond the direct flights between cities. The language Lg contains the constants Cg =
{New York, San Francisco, Los Angeles, Seattle,Boston,
Charlotte}, the predicates city(x), dirconnect(x, y) and connect(x, y). There
are no functions. The theory Tg needs a table with the association between cities and
states in the USA, and it contains the notion of “connection”, either direct or through
one stop:
252 8 Properties of the KRA Model

dirconnect(x, y) → connect(x, y) (8.5)

dirconnect(x, z) ∧ dirconnect(z, y) → connect(x, y) (8.6)

The query we want to answer is Q = “Flights with no more that 1 stop between
New York and Seattle”. Using the rule in Tg , the set of answers to the query is
ANSQg (Q) = {(New York → Seattle),
(New York → San Francisco → Seattle)}.
If we now apply the grouping operator, we obtain:
(a)
ΓTYPE = {state}
(a)
ΓO = {California,Washington,NY State,North Carolina,
Massachussetts},
(a) (a) (a) (a) (a) (a)
ΓA = ΓF = ∅, ΓR = {Rdirconnect ⊆ ΓO × ΓO }.
For the connections we have:
(a)
RCOV (Rdirconnect ) = {(NW State, California), (NW State,
Washington), (California, Washington),
(Washington, Massachussetts),
(California, North Carolina)}

The content of Pa is translated into two tables in Da , namely, OBJ (a) and
DIRCONNECT (a) , the first containing the states and their type state, and the
second the direct flights between states. The language La contains the constants Ca =
{California, Washington, NY State, North Carolina,Massa
chussetts}, the predicates state(x), dirconnect(x, y) and connect(x, y). There
are no functions. Theory Ta contains the same rules (8.5) and (8.6), when x and y are
now of type state.
If we try to answer the query Q in the abstract space, we obtain the following
answer set:
ANSQa (Q) = {(NW State → Washington),
(NW State → California → Washington)}
Then, the applied abstraction is an AI (Answer Increasing) abstraction; in fact, the
answer set contains the correct direct connection (NW State → Washington),
which corresponds, in the ground space to (New York → Seattle), and the addi-
tional (NW State → California → Washington), which corresponds to
the existing connection (New York → San Francisco → Seattle) but also
to the non existing connection (New York → Los Angeles → Seattle).
Clearly, if no flight exists between two states, no flight will exist between any two
cities in the state.
8.8 KRA’s Unification Power 253

8.8.1.4 Fuzzy Set and Rough Set Theories

Among the operators defined on perception, we may include also the fuzzy sets and
the rough sets theories.
Fuzzy Sets
In the theory of fuzzy linguistic variables (see Sect. 4.5.3), the range U of a numerical
variable X is mapped onto a set T (L) of linguistic terms taken on by a corresponding
linguistic variable L. The association is done by a semantic rule M, specifying the
membership function μ(x). In the KRA model variable X can be either an attribute X
(g)
or the codomain of a function. In the case of an attribute, we have ΓA = {(X, ΛX )},
and the corresponding operator is one that discretizes U, i.e.:

ωeqattrval (X, ΛX ), [xi , xj ], Lij

The corresponding method meth Pg , ωeqattrval (X, ΛX ), [xi , xj ], Lij will assign a
linguistic term Lij to the interval [xi , xj ] ⊆ U. By considering a set of such oper-
ators, the domain of X is transformed into a linguistic domain of attribute L. As a
consequence, the abstract set of attributes will be:
(g)
ΓA(a) = ΓA − {(X, ΛX )} ∪ {(L, T (L))}
and the memory Δ(P ) will contain the semantic rule M.
Example 8.6 Let X be the attribute Age of a person, and let ΛAge = [0, 140] be
its domain. We can define a linguistic variable LAge with domain ΛLAge = {very-
young, young, middle-age, rather-old, old, very-old}. By
suitably defining fuzzy membership functions, we may say, for instance, that
Age = 18 and Age = 22 are both abstracted to LAge = very − young.
Rough Sets
Concerning rough sets, objects in a domain are make identical by an indiscernibility
relation, which produces a tessellation of the object space into equivalence regions.
As modeling rough sets arises interesting issues, we handle this case in some detail.
In fact, the use of rough sets involves both abstraction and approximation.
Let us consider, for the sake of exemplification, a simple case, where Γg contains
a single variable u of type t and with domain U. Let moreover concept be a type
denoting subsets of U. Let us consider a set A = {A1 , . . . , Am } of attributes, with the
associated set Λ = {Λ1 , . . . , Λm } of values. Then:
(g)
ΓTYPE = {t, concept}
(g)
ΓO = U
(g)
ΓA = {(Ai , Λi )|1 i m}
Let moreover Aind be the subset of A containing the attributes that make indis-
tinguishable objects (see Sect. 7.3.1.1). We apply to Γg the following abstraction
process Π , consisting of a chain of three operators:
254 8 Properties of the KRA Model

(A )
Π = ωeqobj ϕeq ind , [u]Aind ⊗ ωhtype (t) ⊗ ωaggr ((concept, concept), approx)

The first operator generates classes of equivalence [u]Aind of objects u, by considering

in the same class those that have the same values for all attributes in Aind . The resulting
equivalence classes are of type concept (they are subsets of U). The second operator
hides the original type t, because there are no more objects of this type. The third
operator aggregates pairs of objects of type concept, and generates a composite
object of type approx. Objects of type approx are pairs of concepts (i.e., pairs of
subsets of U). After application of Π , the following Γa is obtained:
(a)
ΓTYPE = {concept, approx}
(a) (g)
ΓO,concept = Γconcept ∪ {[u]Aind }
2
(g)
ΓO(a),approx = ΓO,concept
(g)
ΓA = A − Aind
At this point, we want to replace an object of type concept with another of
type approx. To this aim, we use the approximation operator (see Sect. 7.6.1)
(a) (a)
ρrepl y, (ylw , yup ) , where y ∈ ΓO,concept and (ylw , yup ) ∈ ΓO,approx . The method

meth Pa , ρrepl y, (ylw , yup ) specifies how the bounds (ylw , yup ) haves to be com-
puted, given y:

(a)
ylw = {[u]Aind ∈ ΓO,concept | [u]Aind ⊆ y}
(a)
yup = {[u]Aind ∈ ΓO,concept | [u]Aind ∩ y = ∅}

After applying ρrepl the following Γap is obtained:

(ap)
ΓTYPE = {concept, approx}
(ap) (a)
ΓO,concept = ΓO,concept
(ap)
ΓO,approx = {all generated approximations}
(ap)
ΓA = A − Aind
Let us introduce an example for the purpose of clarification.
Example 8.7 Let Γg , contain points and regions in a plane, as in Example 4.14:
(g)
ΓTYPE = {point, region}
(g)
ΓO,point = {points p in the plane}
(g)
ΓO,region = {regions r in the plane}
(g)
ΓA,point = {(X, R), (Y , R)}

(A )
Let Aind = A. The operator ωeqobj ϕeq ind , [r]ij is such that:
[r]ij = {p|[i X(p) < (i + 1)] ∧ [j Y (p) < (j + 1)]} (i, j ∈ N)
8.8 KRA’s Unification Power 255

This operator performs a tessellation of the plane. Suppose now that we observe a
Pg , consisting of all points in the upper-right quadrant of the plane, and a region c,
corresponding to the oval in Fig. 4.12. Then:
Oa,point = {p | X(p) 0, Y (p) 0}
Oa,region = {c}
By applying the process Π first, and the approximation operator afterward, we obtain
a final Pap :
(ap)
ΓTYPE = {region, approx}
(ap)
ΓO,region = i,j∈N [r]ij
(ap)
ΓO,approx = {(clw , cup )}
The concepts clw and cup are the lower and upper approximations, respectively, of
c, and they are reported in Fig. 4.12.
As a conclusion, the procedure of approximating a set (a “concept”, in Pawlak’s
terms [414]) with two others, less detailed sets, involves both abstraction and approx-
imation.

8.8.2 Semantic Theories of Abstraction

In this section we consider the theories of abstraction defined primarily at the level
of models (or data structure, in our terminology). Figure 8.4 highlights the primary
operator used to define these types of abstractions, namely δ.

Fig. 8.4 Abstraction defined primarily at the data level. The other operators, λ and τ are derived
from δ
256 8 Properties of the KRA Model

By semantic operators we mean all the ones originated in the Database field
(reviewed in Sect. 4.4), and the operators acting on (logical) models (reviewed in
Sect. 4.7).

8.8.2.1 Database Operations

In this section we consider the models proposed by Miles-Smith and Smith’s [371],
Goldstein and Storey [217], and Cross [118]. The abstraction operators proposed by
these authors can be grouped into five categories:
• Aggregation between objects of different types [118, 217, 371], among attributes
[217], or among entities and attributes [217]. In an aggregation a relation becomes
a new entity with a new meaning and possibly emerging properties. The basic
abstraction relation is the “part-of”. Aggregation is defined at the level of database
schema.
• Generalization of types into supertypes, forming an “is-a” hierarchy [118, 371].
Generalization is also called Inclusion by Goldstein and Storey [217]. The basic
aspect of generalization is inheritance, because a node transmits its properties to
its children. Generalization is defined at the level of database schema.
• Association of entities of the same type to form a new collective type [217]. The
collective type may have emerging properties. The basic relation in association is
“individual-of”. Association is defined at the level of database scheme as well.
• Classification is a particular case of Generalization, where an “instance-of”
relation is defined between instances of a type and the type itself [118]. In Classi-
fication, the type has properties which are common to all its instances.
• Grouping is another way of linking individuals to a class [118]. On the contrary
of Classification, which has an intensional basis, grouping has an extensional one.
In fact, a group may be created simply out of the will of the user, and individuals
in the group do not need to share any property. In addition to simply collecting
together individuals in a subjective way, a group can also be created by defining
a predicate, and collecting together all individuals satisfying the predicate. The
type corresponding to the group is simply a name without a description, used as
a short-cut to denote a set of individuals. The relevant relation for grouping is
“member-of”.
The phase of data acquisition is implicitly assumed to have been done previously;
then, there is no explicit notion of perception or observation. The language used to
manipulate the data is an SQL-like language. The “theory” consists of the set of
relational algebra operators, and the task to be solved is usually answering a query
expressed in SQL. In terms of the KRA model, the operators of interest are those of
kind δ (cfr. Fig. 8.4).
In the database approach abstraction is mostly a reorganization of data, consisting
in creating new tables from existing ones. As the original tables are not hidden, the
global process is not actually an “abstraction” in the sense considered in this book,
8.8 KRA’s Unification Power 257

because no information is hidden. However, using the view mechanism, the old tables
can be easily hidden.
Between abstraction operations defined for databases and abstraction operators
in KRA there is a two-way link; in fact, on the one hand, KRA operators can be
used to model database abstractions, but, on the other, these last could be used as
methods of the δ operators themselves. Before exploring this correspondence, we
need to make an association between an Entity-Relationship (ER) model and KRA.
The schema of a database involves entities, which are objects or events or anything
that can be considered as stand-alone and uniquely identified, attributes, which are
associated to entities (or relations), and relations, which link entities among each
other. In terms of KRA, the database schema corresponds to DS. Let us look at
each operation in turn.
AGGREGATION—Aggregation among entities is a widespread operation. Aggre-
gation operates at the level of the database schema, by relating types of entities
rather than single entities. If the types to be aggregated are {E1 , . . . , En }, and the
new type is E, Aggregation creates a new table scheme, where the Ei (1 i n)
become attributes of type E. This scheme is added to the database scheme DS, and
the corresponding populated table is added to D. The matching operator in KRA is

δaggr ((E1 , . . . , En ), E).

Aggregation, in database technology, includes both the definition of a new table

scheme (the same as δaggr does)
and the population of the corresponding
table
(the same as meth Dg , δaggr does). However, meth Dg , δaggr performs some
more operations:
• All rows in all tables of Dg , where an entity of some type E1 , . . . , En has been
used to build up an instance of E, must be removed (hidden), or replaced by the
new entity, if applicable.
• In all tables Ei -ATTR, the rows corresponding to entities of type Ei , which have
been actually used to build a new entity of type E, must be hidden in Da .
• A new table E-ATTR reporting the values of possible attributes of the new type,
deduced from existing ones, must be added to Da , if applicable.
We observe that the new table, containing the actually performed aggregations, is
NOT added to Da , in our model, because is contains items across the two levels of
abstraction. On the contrary, it is stored in Δ(D) as the part-of relation between the
aggregate and the components. Then, the Aggregation operation and δaggr do not
generate the same abstract database, even though they perform the same actions. In
order to obtain exactly the same results, the ground and abstract databases are to be
consider at the same time. In this way, the table defined by Aggregation becomes
visible.
If we look at the correspondence between Aggregation and δaggr the other way
round, given the aggregation
operator δaggr ((E1 , . . . , En ), E), we can implement the
associated method meth Dg , δaggr using, for instance, Miles Smith and Smith’s
schema, as follows:
258 8 Properties of the KRA Model

type type(a) = aggregate [ID]

ID : [key] Identifier
C1 : type1
............
Cs : types
end

Beyond aggregating entities (objects), Goldstein and Storey [217] suggest aggre-
gation of attributes. For instance, let Street, Civic number, City, and Country be
attributes of a type person. We can build up with them a new, aggregate attribute
Address. In order to model this type of aggregation in KRA we use the operator

δconstr (Concat : ΛStreet × ΛCivicNumber × ΛCity × ΛCountry → Address)

Each new address is identified by a unique code, such as ad1 , ad2 , . . . .

GENERALIZATION—As defined by Miles Smith and Smith [372], the General-
ization abstraction handles both the intentional and the extensional aspects of hierar-
chies. In fact, on the one hand, it builds up generic objects, i.e., types with associated
a description, and, on the other, it specifies the extension of the type by considering
mutually exclusive clusters of instances, as it was illustrated in Fig. 4.8. The set of
attributes associated to a generic object contains those that are common to the types
that are generalized. Let us consider the generic object G, defined by Miles Smith
and Smith as G : (A1 , . . . , An , C1 , . . . , Cm ). In KRA we can model generalization
by means of the operator

δhiertype ΓTYPE,child , type(a) ,

where ΓTYPE,child is the subset of types that are to be generalized, and type(a) is the
new type. In order to model G : (A1 , . . . , An , C1 , . . . , Cm ) we have to select, in G, the
types C1 , . . . , Cm that become children in a hierarchy, whose father is type(a) = G;
then, we can use
δhiertype ({C1 , . . . , Cm }, G)

On the other hand, the assignment of the attributes A1 , . . . , An is a matter of imple-

mentation, and is subject to choice. Miles Smith and Smith select as A1 , . . . , An all
the attributes that are in common in the types C1 , . . . , Cm . In particular, we insert in
the PARAMETERS slot the following rule:
8.8 KRA’s Unification Power 259

for i = 1, m do
Let Ai ⊆ A be the subset of attributes meaningful
for type Ci
Assign to G the set of attributes AG = m i=1 A i
end

We have left this choice to the implementation level, because also other choices could
be made without changing the semantics of δhiertype .
As we have noticed for Aggregation, also Generalization is not an abstraction, in our
sense, if the original types are not hidden. In Miles Smith and Smith’s, and Goldstein
and Storey’s approaches the new type (with the corresponding
table for the generic
object) is simply added to Dg . As for Aggregation, meth Dg , δhiertype also performs
the following operations:
• In table OBJ, all rows corresponding to objects of type Ci (1 i m) are hidden.
• All rows in all tables of Dg , where an instance of Ci (1 i m) occurs, must be
removed (hidden).
• All tables Ci -ATTR (1 i m) must be hidden in Da .
• If ΓO,i is the set of objects of type Ci in Dg , then ΓG(a) = m
i=1 ΓO ,i .
The new table, containing the actually performed aggregations, is not added to Da ,
because is contains items across the two levels of abstraction. On the contrary, it is
stored in Δ(D) as the is-a relation between aggregate and components.
(a) (a) (a) (a) (a)
In the abstract description frame Γa = ΓTYPE , ΓO , ΓA , ΓF , ΓR we have:

(a) (g)

m
ΓTYPE = ΓTYPE − Ci ∪ {G}
i=1

(g)

m
(g)
ΓO,G = ΓO,i
i=1

(a) (g)

m
(g) (g)
ΓO = ΓO − ΓO,i ∪ ΓO,G
i=1
(a) (g)
ΓA = ΓA

(g)
For what concerns functions and relations, the set ΓO,t(a) replaces, in their domain
(g)
or codomain, any occurrence of one of the objects in the ΓO,i ’s.
The above discussion applies without changes to the Inclusion abstraction defined
by Goldstein and Storey [217].
GROUPING—The grouping
operation corresponds in KRA to the grouping oper-
ator ωgroup ϕgroup , G . For this operation, analogous considerations can be made as
for the preceding ones.
260 8 Properties of the KRA Model

In the database field, there is no need of transferring abstraction to the “percep-

tion”, or the language, or the theory. In fact, the process of data acquisition is (usually)
not taken into account in abstracting a database Dg . Moreover, as the abstract tables
are simply added to Dg , the query language (SQL) and the relational algebra oper-
ators do not need to be modified. Abstraction in a database usually starts and ends
inside the database itself.

8.8.2.2 Nayak and Levy’s Model

As described in Sect. 4.7.1, Nayak and Levy proposed a theory of abstraction based
on the models of a logical theory [396], motivated by the desire to solve the incon-
sistency problem. In essence, they suggest to manipulate the tables generated by an
interpretation of a ground logical theory, and then to try to find an “abstract” formula
that reflects the modification. Then, in KRA terms, they start with a δ operator, and
then move to λ (modifying the logical language), and to τ , as a side-effect of λ (see
Fig. 8.4).
As Nayak and Levy notice, complex manipulations can be easily done on the mod-
els, but finding the abstract theory that implements them may be difficult. To this aim,
the authors describe an automated procedure, Construct-Abstraction(Tg , N , V),
which constructs the abstract theory for the special case in which the abstract lan-
guage can be obtained from the ground one by dropping some predicates and adding
new ones. This type of abstraction includes predicate mapping, dropping arguments
in predicates, and taking the union or intersection of predicates. The procedure con-
sists in deriving the abstract theory from the ground one (Tg ), the set of rules (N )
defining the new predicates, and the set (P) of predicates to be dropped. For com-
plex theories this procedure may be computationally costly. A similar approach was
taken by Giordana et al. [208, 210] earlier on, when proposing a semantic theory of
abstraction for relational Machine Learning.
If we look more closely into the procedure Construct-Abstraction(Tg , N , V), we
notice that it is actually a syntactic abstraction that it realizes, because it consists of
logical derivations at the level of theory. In order to illustrate how Nayak and Levy’s
semantic model can be represented in KRA, let us limit ourselves to predicate
mapping.
Given a model of a ground theory, let T1 and T2 be two tables, sharing the same
schema. T1 contains the set of objects of type t1 , and T2 those of type t2 . We may
construct a new table T = T1 ∪ T2 . T contains the set of objects of either type t1 or
type t2 . The two types can be expressed, in the ground language, as two predicates
t1 (x) and t2 (x), and the resulting formula, associated to t, is t(x) = t1 (x) ∨ t2 (x).
In terms of the KRA model, the whole process can be represented as follows.
Let Dg be a database, where the table OBJ = [ID, Type] contains the identifiers of
N objects, each one with associated its type. Let t1 and t2 be two types. The objects
belonging to these types can be extracted from OBJ by means of the relational algebra
operator of selection, namely:
8.8 KRA’s Unification Power 261

OBJt1 = σType=t1 OBJ

OBJt2 = σType=t2 OBJ,
where OBJti contains the objects with type ti (1 i 2). In order to create the new
type t, we have to apply the abstraction operator

δhiertype {OBJt1 , OBJt2 }, OBJt ,
which creates a new table of objects starting from the original ones. Notice that we
apply the operator δ on the database directly, without going through the perception.
The details of the
actual implementation of the operator
are specified by the associated
method meth Dg , δhiertype ({OBJt1 , OBJt2 }, OBJt ) , reported in the following:

METHOD meth Dg , δhiertype ({OBJt1 , OBJt2 }, OBJt )
OBJt1 = σType=t1 OBJ
OBJt2 = σType=t2 OBJ
Define OBJt as [ID, Type]
OBJt = OBJt1 ∪ OBJt2
Set all values OBJt .Type = t
OBJ (a) = OBJ − OBJt1 − OBJt2 ∪ OBJt
The above method first defines the scheme of the table OBJt , containing the objects
of the new type; then, it merges all objects of type t1 and t2 into OBJt , and replaces
the old types with the new one in the column Type. Finally, it reintegrates the extracted
objects into the table OBJ, which becomes in this way OBJ (a) .
For the language Lg , its set of predicates Pg includes t1 (x) and t2 (x). The trans-
formation to the abstract language La is provided by the abstraction operator
λhiertype ({t1 (x), t2 (x)}, t(x)) ,
whose associated method meth[Lg , λhiertype ({t1 (x), t2 (x)}, t(x))] is the following:
METHOD meth[Lg , λhiertype ({t1 (x), t2 (x)}, t(x))]
t1 (x) ⇒ t(x)
t2 (x) ⇒ t(x)
Pa = Pg − {t1 (x), t2 (x)} ∪ {t(x)}
There is no need to state that t(x) = t1 (x) ∨ t2 (x), because the KRA model assumes
this condition automatically, when building a hierarchy.
In order to see how Nayak and Levy’s model works for predicate mapping, let us
consider the example used by the same authors.
Example 8.8 (Nayak and Levy [395]) Nayak and Levy define a ground theory con-
sisting of the following formulas:
JapaneseCar(x) ⇒ fast(x)
EuropeanCar(x) ⇒ reliable(x)
262 8 Properties of the KRA Model

If the predicates JapaneseCar(x) and EuropeanCar(x) are both mapped to

ForeignCar(x), the authors show that, by observing JapaneseCar(a), in the abstract
theory the car will be not only reliable, as it should be, but also fast, which is not
warrant by the ground theory. In order to solve this problem, they deduce the abstract
theory by dropping predicates JapaneseCar(x) and EuropeanCar(x), and by defin-
ing the new predicate ForeignCar(x) as:
ForeignCar(x) = JapaneseCar(x) ∨ EuropeanCar(x)
The tables containing the extensions of JapaneseCar(x) and EuropeanCar(x) are
merged into a single one, corresponding to the extension of ForeignCar(x). Actually,
this solution is not a true solution of the inconsistency problem, but only an acknowl-
edgment that in the abstract space there is no sufficient information for distinguishing
whether a car is fast and/or reliable.
In the KRA model, we have to decide, first of all, how to model the available
information in the database DS g . Even though we could model the situation defining
a description frame Γg , i.e., at the perception/observation level, we prefer to work
directly with the data structure, because this is the way Nayak and Levy proceed,
and, moreover, we want to show that the KRA model can be directly applied to
any of the components of a query environment. Then, we model JapaneseCar
and EuropeanCar as types, and Fast and Reliable as binary attributes that can
be observed. Then, in DS g we have table OBJ with scheme [ID, Type], where
ΛType = {JapaneseCar, EuropeanCar}, and tables JapaneseCar-ATTR and
EuropeanCar-ATTR, both with scheme [ID, Fast, Reliable], reporting the attributes
of cars. All cars of type JapaneseCar will have a value yes in correspondence
of Reliable, whereas all cars of type EuropeanCar will have a value yes in cor-
respondence of Fast.
By applying the operator δhiertype ({JapaneseCar, EuropeanCar},
ForeignCar) to DS g , we obtain an abstract data structure DS a , consisting of:
• a table OBJa , with scheme [ID, Type(a) ], where all fields OBJa .Type =
JapaneseCar or OBJa .Type = EuropeanCar are replaced by OBJa .Type =
ForeignCar,
• a table ForeignCar-ATTR, with scheme [ID, Fast, Reliable], where ΛFast =
{yes, no} and ΛReliable = {yes, no}. As the two types are no more dis-
tinguishable, we have ForeignCar-ATTR.Fast = UN and ForeignCar-ATTR.
Reliable = UN for each entry in the tables.
The language Lg contains the predicates:
{JapaneseCar(x), EuropeanCar(x), fast(x), reliable(x)}.
The abstract language La , obtained from
λhiertype ({JapaneseCar, EuropeanCar}, ForeignCar),
contains the predicates (all the rest being equal):
{ForeignCar(x), fast(x), reliable(x)}.
Now, suppose that we have a populated Dg , where table OBJ contains the tuple
(a, JapaneseCar). In the ground Dg table JapaneseCar-ATTR will contain the
tuple (a, UN, yes). Then, we can conclude that a is reliable, but we do not know
8.8 KRA’s Unification Power 263

whether it is fast. In the abstract Da , we have, in table ForeignCar-ATTR, the

tuple (a, UN, UN), and then we do not know anything about the properties of car
a. This results is almost identical to say that a is either fast or reliable, which is the
conclusion by Nayak and Levy. The difference is that, in their formulation, a foreign
car must be either fast or reliable, whereas in our model a can be neither fast nor
reliable (i.e., with both Fast = no and Reliable = no). We think that, without using
the Closed World Assumption, this is a more reasonable conclusion. Nevertheless,
in order to obtain exactly Nayak and Levy’s conclusion, we have to add in the theory
the condition that fast(x) ∨ reliable(x) must be true.

8.8.2.3 Ghidini and Giunchiglia’s Model

As described in Sect. 4.7.2, Ghidini and Giunchiglia [203] define a semantic abstrac-
tion that encompasses several of the operators defined in this book. For example, they
consider symbol abstraction, in which different ground constants (domain abstrac-
tion), or functions, or predicates (predicate mapping) collapse into a single one in
the abstract space. These abstractions can be modeled, in KRA, with the operators:
• ωeqobj ({c1 , . . . , cn }, c) for domain abstraction.
• ωeqfun ({f1 , . . . , fn }, f ) for function mapping.
• For predicates, more than one KRA’s operator can be used. For example,
ωhiertype ({t1 , . . . , tn }, t) can be used when predicates represent types, and they
are replaced by a more general type (usually this is a predicate mapping), or
ωhierrel ({R1 , . . . , Rn }, R), when predicates are associated to relations and not types,
or still ωeqrel ({R1 , . . . , Rn }, [R]) when an equivalence class of predicates is built
up.
Another kind of abstraction described by Ghidini and Giunchiglia is arity abstraction,
which reduces the number of arguments of a function or relation. In our model, we can
map arity reduction of a function to the operator ωhfunarg (fh , xj ), which hides argu-
ment xj of function fh , and arity reduction of a relation to the operator ωhrelarg (Rk , xj ),
which hides argument xj of relation Rk . For relations, if all arguments are hidden, the
propositionalization operator, defined by Plaisted [419], is obtained. In our model it
is possible to define a composite operator, namely ωhrelarg (Rk , {xj1 , . . . , xjk }), which
hides several arguments (in the limit, all) at the same time.
Finally, the authors introduce a truth abstraction, which maps a set of predicates
to the truth symbol . In our model this abstraction corresponds to an abstraction
process consisting of a sequence of two operators: the first removes all arguments of
the predicate, and the second builds up a hierarchy with the father node equal to .
As in our model a predicate may correspond to different description elements, such
as types, attribute values, relations, etc., we consider here the case of a predicate
corresponding to a relation, which is the most common case. Then, the first operator
(a) (a)
is ωhrelarg (Rk , {x1 , . . . , xn }), which generates Rk , and the second is ωhierrel (Rk , ).
Another way to look at this type of abstraction is to hide the arguments of Rk , and
then approximate it with .
264 8 Properties of the KRA Model

The above described different kinds of abstraction are special cases of atomic
abstractions, introduced by Ghidini and Giunchiglia mostly targeting theorem
proving. In fact, they are all TI-abstractions, and offer abstract simplified proofs,
whose structure may guide the search for proofs in the ground space. The properties
required by an atomic abstraction, reported in Sect. 4.7.2, are not meaningful in our
model, at least for the part that concerns observations. As we have said, our model
does not have the ambition to exhaust all the aspects of abstraction, and it is explicitly
not targeted to theorem proving. It is much better suited to domains where observa-
tions play the most important role, complemented by a theory specifically designed
to handle the observations.

8.8.3 Syntactic Theories of Abstraction

Historically, models of abstraction have been first proposed at the syntactic level,
as mapping between languages, with the works by Plaisted [419], Tenenberg [526],
Giunchiglia and Walsh [214], and, more recently, De Saeger and Shimojima [464].
These models have sound foundations in Logics, but they fail to offer concrete tools
to perform abstraction in practice. In fact, to the best of our knowledge, no one of
them went beyond a simple explicative example. In the following we will show, for
some of them, how they can be translated into abstraction operators in the KRA
model, making them applicable in practice.
Even though several theories of abstraction are defined as mappings between
complex formulas or sets of clauses, predicate mapping is one of the most investigated
abstractions, for its potential applicability. Given a predicate in the ground language,
its renaming in the abstract one is clearly not an abstraction, if it is done in isolation,
because it simply corresponds to changing its name, and hence to a reformulation.
The interesting case is when two different predicates in the ground language are
renamed onto a unique one in the abstract language (Giunchigla and Walsh’ Predicate
Mapping and Plaisted’s Renaming).

8.8.3.1 Plaisted’s Approach

Plaisted points out the difficulty of generating abstractions in general, and he offers
a number of methods to concretely build up abstraction mappings that preserve
some required properties. Among these there are mappings between sets of clauses,
ground abstraction, propositional abstraction, changing signs to literals, permuting
and deleting arguments. One way to simplify the generation of abstractions between
clauses or sets of clauses is to reduce the abstraction to a mapping between literals.
The basic properties required from an abstraction between literals are reported in
Theorem 4.1. We want now to match Plaisted’s approach to the KRA model. Because
of Theorem 4.1 we limit ourselves to abstractions between literals. Actually, as
8.8 KRA’s Unification Power 265

Plaisted’s abstractions preserve negation (and instances), we can further reduce our
analysis to positive literal, i.e., predicates.
In the following, we consider a theory consisting of a set of clauses Tg ={C1 (x),
. . . , Cn (x)}. The theory is expressed in a language Lg = Cg , Xg , Og , Fg , Pg , where
Cg = {a1 , a2 , . . .}, Xg = {x1 , x2 , . . .}, Og is the set of standard logical connectives,
Fg = ∅, and Pg = {p1 (x), . . . , pm (x)}. Using Pg and Cg we can build Herbrand’s
universe. There is neither a notion of observation nor of data structure.
Given a clause C, let C(a) be its abstraction. If C = { 1 , . . . , k }, by definition:
C(a) = τ (C) = { (a)
j | 1 j k}
The operator τ is then expressed in terms of operators on the predicates in Pg , which
we have called λ. In conclusion, in Plaisted’s approach, we can handle abstraction
between theories in terms of abstraction between languages. Let us consider some
of the proposed abstraction.
GROUND ABSTRACTION—This kind of abstraction replaces a predicate
p(x) with the set of all its grounding with the constants in Cg . This set can be
infinite. Without loosing the essence of the abstraction, let us suppose that p is a
unary predicate p(x). We can thus define an operator λground (p, Cg ), such that:

p(a) (x) = λground (p, Cg ) = {p(a1 ), p(a2 ), . . .} (8.7)

Even though ground abstraction is indeed an abstraction, according to Plaisted’s

definition, in our model it is not; in fact, expression (8.7), according to our definition
of abstraction as an information reduction process, is a reformulation. The left- and
right-hand sides provide exactly the same information, and this is true also if we
make some observations in the world, which identify some instances as true and
some as false. Actually, the same instances will be made true or false on both sides.
DELETING ARGUMENTS—This abstraction, and its extreme case, propositional-
ization, reduces the arity of a predicate p(x1 , . . . , xk ). We can use our operator

λhrelarg p, {xj1 , . . . , xjr 1 } ,

to hide the set {xj1 , . . . , xjr 1 } of arguments. This transformation is indeed an abstrac-
tion according to our model.
CHANGING SIGN OF A LITERAL or PERMUTING ARGUMENTS—As the
negation of a literal or changing systematically the order of the arguments is univo-
quely determined by the original literal, also these abstractions are actually reformu-
lations, according to our criteria.
As we may see, in Plaisted’s approach there is no need for observations or a database,
because the query to be answered is always a theorem to be proved. This task only
requires theory and language.
266 8 Properties of the KRA Model

8.8.3.2 De Saeger and Shimojima’s Approach

The most recent model, aimed at capturing general properties of abstraction, has been
proposed by De Saeger and Shimojima [464]. As described in Sect. 4.6.3, it uses the
notion of classification and infomorphism. Notwithstanding its elegance, the model
does not solve the problems that we have analyzed in this chapter, and, moreover,
it is hard to model with it abstractions more complex than predicate mappings. One
interesting aspect of the model is that, by considering abstraction as a local logic on
a channel that connects two classifications, abstraction becomes a two-way process;
this may form the basis for achieving the flexibility in abstracting/de-abstracting that
we consider a fundamental aspect of any applicable abstraction theory.
Moreover, the model includes in a natural way both the syntactic and the semantic
aspects of abstraction. In terms of KRA, abstraction based on channel theory includes
the theory and language components of a query environment, in the syntactic part,
and the data structure in the semantic one. Possible observations do not play any role
in the model, currently, but the authors themselves acknowledge their importance
and believe that observations could be added as a further classification in the whole
schema.

8.9 KRA and Other Models of Abstraction

The KRA model appears particularly well suited to describe systems that have a
strong experimental components (perception and observation). Often this kind of
systems is investigated in non logical contexts, where the primary source of infor-
mation is the world. One of the field where this is true is Cognitive Science, where
abstraction plays an essential role, but is rarely formalized.
One of the researcher who explicitly acknowledges this role is Barsalou, who,
together with his co-workers, investigated in detail abstraction and its properties
in cognition [216]. They define three abstraction “operators”, namely selectivity,
blurring, and productivity. These operators are all defined on perception, namely on
our Γ , as the ones in KRA. The selectivity operator lets the attention concentrate
on particular aspects of a perception, and then it can be modeled with operators that
select features, namely of the kind ωh , the most common being ωhattr , which selects
attributes of objects. The blurring operation is mostly relevant in acoustic or visual
perception, and it consists in lowering the resolution of an image or sound, by making
it less detailed. This operation corresponds, in the KRA model, to more than one
operator; in particular blurring can be obtained by replacing groups of pixels (or of
sounds) within smaller regions, obtained, for instance, with the aggregation operator
ωaggr , or the operator ωeqobj , which forms equivalence classes among objects; even
ωhobj can be applied, because it hides sounds or pixels, realizing thus a sampling of
the input signal. Finally, productivity corresponds to our aggregation operator ωaggr ,
which generates new objects from parts.
8.9 KRA and Other Models of Abstraction 267

Goldstone and Barsalou [216] have also introduce the object-to-variable binding
operation, which they label as abstraction. This operation might correspond to the
operator that builds up classes of equivalence among objects. If {a1 , . . . , an } is the
set of constants that are replaced by the variable x, operator ωeqobj ({a1 , . . . , an }, x)
do the job. Replacing constants with a variable is a typical generalization operation,
which, in this case, also corresponds to an abstraction.
Operator ωhattr , which performs feature selection, is also at the basis of Schyns’
investigation [220]. In fact, he tried to ascertain what features the human eye focus
on when looking at a face and trying to decide its gender and/or the mood.
Behind the phenomenon of fast categorization of animals in natural scenes [132,
158, 211] there is likely a complex, but very rapid process of abstraction, consisting
of a mix of feature selection (ωhattr ), feature construction (ωconstr ), and aggregation
(ωaggr ), possibly wired in the brain. One of the component is certainly the removal
of colors (ωhattr (Color, ΛColor )), as this feature does not appear to influence the
performance in the task.
In spatial cognition, the KRA model allows the formation of spatial aggregates,
as proposed by Yip and Zhao [575], to be easily modeled. These aggregates are
equivalence classes of locations, built on adjacency relations. Then, we can use the
operator ωeqobj (ϕeq , region), where the objects are of type location, and ϕeq
involves spatial relations among them. Operators that change the granularity of a
representation, such as ωeqobj , are also able to model Euzenat’s notion of granularity
[154–156].
As Zeki affirms [580–582], abstraction is a crucial aspect of the whole brain activ-
ity. In particular, perceptual constancy could be modeled with a composite process
of abstraction and approximation, consisting of several kinds of operators combined
together. For instance, feature selection (ωhattr ) may play a relevant role, but also
aggregation (ωaggr ), and some approximation operator that generates a schematic
view. For explaining a complex phenomenon as perceptual constancy, the percep-
tion is likely not to be sufficient, and a theory is also needed. The same is true for
other cognitive aspects relevant for abstraction.
In processing images, a key role is played by the Gestalt theory, as was discussed
in Sect. 2.7, and abstraction plays a crucial role in it. Particularly important are the
constructive operators ωcoll , ωaggr , ωgroup , which allow a matrix of pixels to become
a scenario with meaningful objects. If we consider the six grouping principles at the
base of the theory, we can make the following considerations:
• Operator ωaggr and ωgroup are useful to distinguish “objects” from the background,
because objects are often structured, and background is formed by more or less
homogeneous region.
• The principle of similarity can be implemented by operators of the kind ωeqelem
or ωeqval , because equivalence of content or of attribute values imply similarity of
objects.
• Proximity derives from spatial arrangements, and can be implemented by operators
of the kind ωgroup , where the condition for grouping involves spatial closeness.
Repetitive patterns may be discovered by abstracting either with ωaggr , as, for
268 8 Properties of the KRA Model

instance, in Fig. 1.4, or with ωcoll , as, for instance in Fig. 2.9, where a lot of leaves
are perceived as a uniform ground cover.
• Both closure and continuity might be explained with a process of acquisition of
abstract scheme (a “square”, as in Fig. 2.10), which are then used to bias subsequent
perceptions. The scheme can be generated, for example, by an ωaggr operator, and
then reinforced by further observations. When a part of the scheme is observed, the
whole scheme is retrieved from memory and used to interpret the incoming sig-
nal. Moreover, this is in accordance with Biederman’s Principle of Componential
Recovery [59].
If we move from cognition in general to the more specialized field of vision, we can
safely say that every abstraction operator is useful. In fact, we do see because we
abstract. With feature selection (ωhattr ) we focus our attention on relevant aspects of
an image, with aggregation (ωaggr ) meaningful objects are detected in a scene, with
the identification of equivalence classes (ωeqelem or ωeqattrval ) we find homogeneous
regions. Of particular importance, in vision, is the a ability of moving across several
levels of abstraction at the same time. The KRA model is particularly well suited
to this aim, because it keeps separated in each level the relevant information, yet
allowing more than one level to be used at the same time.
Connected with the idea of multi-level image representation is Luebke
et al.’s approach of the Level of Detail (LOD) [348], which adapts the amount of
details of an image to its distance from the observer or to its size: the farer or the
smaller, the less detailed. Also in this case the KRA model can easily model the
rendering process. First of all, the LOD approach has a preferential direction from
the most detailed version of the image (the one acquired from the world) to the less
detailed ones, at it happens in KRA. Then, one or more ωhide operators are applied
in sequence, obtaining more and more abstract image representations, remembering
also what details have been overlooked at each step. Actually, the operator could be
parametrized, so that the process of generating the sequence of images can be totally
automatized.

8.10 Special Cases

During the analysis of abstraction in different disciplines we have come across some
examples of processes, labelled abstractions, which would be interesting to discuss.
The first is offered by Leron [327], who states that formula ϕ(x, y) ≡ (x + y)2 =
x + 2xy + y2 , with x and y natural numbers, is generalized, but not abstracted,
2

when its validity is extended from natural numbers to rational ones. In our model,
the alternatives can be modeled by two description frames Γ1 and Γ2 , such that:
8.10 Special Cases 269

Fig. 8.5 Mandelbrot set, generated by the recursive equation Z = Z 2 + C, where C and Z are
complex numbers. For a fixed Imax none of them is more abstract than the other, and they are exact
reformulations from one another

(1) (2)
ΓTYPE = {integer-pair} ΓTYPE = {rational-pair}
(1) (2)
ΓO = {(i, j) ∈ N2 } ΓO = {Pair (x, y) of rationals}
(1) (1) (2) (2)
ΓA = Γ F = ∅ ΓA = ΓF = ∅
(1)
ΓR = {Rϕ } ΓR = {Rϕ(a) }
(2)

(a)
The cover RCOV (Rϕ ) is the set of pairs of integers, whereas RCOV (Rϕ ) is the set
(a)
of pairs of rationals. Then, RCOV (Rϕ ) ⊆ RCOV (Rϕ ), and the transformation from
Γ1 to Γ2 is indeed a generalization.
The set Ψ1 of configurations associated to Γ1 is the whole N2 , whereas the set
Ψ2 of configurations associated to Γ2 is the set of pairs of rational numbers. Then,
transformation from Γ1 to Γ2 is not an abstraction, as Ψ2 contains more information
that Ψ1 ; actually it is the other way around, and Γ1 is an abstraction of Γ2 , obtained
(2)
by hiding all points in ΓO that do not have integer coordinates.
The second example discussed by the same author is that the description “all prime
numbers less than 20”, is more abstract, but not more general, than “the numbers
2, 3, 5, 7, 11, 13, 17, 19”. It is immediate to see that, in our approach, the two
descriptions are reformulations from one another, and then there is neither abstraction
nor generalization involved.
A last interesting case to consider is the description of a fractal, such as the
Mandelbrot set, reported in Fig. 8.5, generated by the recursive equation Z = Z 2 +C,
where C and Z are complex numbers.
By considering the equation and the picture, one is tempted to say that the equation
is an abstraction of the figure. However, according to the amount of information that
270 8 Properties of the KRA Model

convey, the equation and the picture are reformulations from one another. This is true
for each maximum number of iteration Imax allowed. Abstraction intervenes when
Imax is changed. In fact, by increasing Imax , more and more detailed pictures are
obtained. Then, abstraction corresponds to a decrease of Imax , as for each Imax the
set of generated points is a subset of those generated with a higher Imax .

8.11 Summary

Setting information at the basis of representation changes allows the processes of

abstraction, approximation, and reformulation to be characterized and distinguished
in terms of relations among configuration sets. More precisely, abstraction reduced
information by either hiding or aggregating it; approximation replaces pieces of
information with others, and reformulation does no change the information content
but only its format.
In the KRA model abstraction is always considered with respect to a given task
to be solved, represented as a query to be answered. The task and the information
needed to answer the query are collected in a query environment.
Moreover, the model is only concerned with the way of representing a problem,
not with its solution. Abstraction involves both information coming from the world,
in the form of observations, and theoretical knowledge coming from a user in the
form of a theory, or a program, or generic problem solving tools (including physical
ones). In our model we assign a preeminent role to the observations, and, hence,
the model is particularly well suited to handle experimental, grounded systems. In
fact, abstraction is primarily defined on observations, and then extended to the other
components of the query environment. Nevertheless, as we have shown, abstraction
can also be modeled as originating from other components, if required by the nature
of the problem.
Moving away from the logical view of abstraction and going toward an information
based approach, the basic problem of inconsistency, often arising in the abstract space,
can be easily circumvented. We say “circumvented” and not solved, because it cannot
be solved (and maybe should not). We have only to accept that in the abstract space
we know less that in the ground one, and then inconsistency is actually ignorance,
which can be tolerated, because it may or may not harm the answer to the query.
Previous formal models of abstraction, very interesting and sound as they are,
do not lend themselves to applications in complex real-world domains. Our goal, in
proposing the KRA model, has been to follow the steps of more concrete or grounded
approaches, targeted to specific domains (see, among others, Holte et al. [258], or
Choueiry et al. [102]). To this aim, we have defined a large number of operators,
which cover most past applications, in the form of Abstract Procedural Types, easily
implemented in the language of choice. These operators include complex abstractions
such as aggregation, collective object definition, or construction of new descriptors,
which previous models could not deal with, except perhaps for a formal definition. By
8.11 Summary 271

specializing KRA’s operators, we have shown that they can implement all operators
proposed so far.
The KRA model does not just mimic previously proposed abstraction theories,
but could also be used in other domains, to make abstraction operational where they
were defined informally. The possibility of translating an operator into a program
could allow the exploration of the role of abstraction in several disciplines, typically
cognitive science. In fact, a systematic use of the operators, possibly inserted into a
wrapper approach, would allow the ones that best explain the experimental finding
to be discovered. This could be the case, for instance, to explain why animals are
recognized so rapidly in pictures.
Chapter 9
Abstraction in Machine Learning

ven though abstraction mechanisms are as fundamental for learning as

for knowledge representation and reasoning, there are surprisingly few approaches
that make explicit reference to abstraction. Nevertheless, the problems of choosing
a well-suited representation and of changing representations are ubiquitous in the
Machine Learning (ML) literature.
However, as described in the previous chapter, not every change of representa-
tion satisfies our definition of abstraction; thus, we will focus our review to those
approaches that may be related to some type of abstraction only. Today, Machine
Learning is used in an astonishing variety of tasks and approaches; without being
too restrictive, we will limit ourselves to those that are more easily amenable to our
model of abstraction, namely approaches to learn from examples or reinforcements.
In Machine Learning terms this corresponds respectively to supervised/unsupervised
learning and Reinforcement Learning. Many abstractions used in the Machine Learn-
ing context, such as feature discretization or state abstraction, have been also used
in almost every other field of AI. There is thus a strong motivation to review the
different abstractions that have been used and experimented in Machine Learning.
One of the first approaches to explicitly use abstraction in Machine Learning was
proposed by Drastal et al. [141], who describe the MIRO system, which performs
supervised concept formation in an abstraction space, constructed by deduction over
instances (a form of constructive induction).
Shortly thereafter, Giordana and Saitta [208] have proposed a framework to use
abstraction in First Order Logic (FOL) concept learning, and have discussed the
relations between abstraction and generalization. They have also provided a definition
of abstraction as a knowledge-level conceptual framework, which tries to capture,
formalize and possibly automatize a fundamental aspect of human thought: the ability
of changing the level of detail in representing knowledge. Later on, Giordana et al.
[210] extended their framework by introducing the inverse resolution operator (see
also [564, 584]), and considered the propositional setting for learning [206] as well.

L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 273
DOI: 10.1007/978-1-4614-7052-6_9, © Springer Science+Business Media New York 2013
274 9 Abstraction in Machine Learning

Around the same period, Zucker and Ganascia [591, 592] developed an approach to
FOL learning based on an abstraction and a reformulation of the learning examples.
Merging together their previous approaches, Saitta and Zucker have proposed a
unifying model of abstraction, the KRA model, which moves the semantic view of
abstraction a step further toward observations and perception [468, 590]. The KRA
model has been applied to various domains, such as relational learning [208], car-
tographic generalization [393], robotic vision [470], complex networks [466, 471],
and model-based diagnosis [467]. The authors have also shown that their abstrac-
tion framework can alleviate the computational complexity of relational learning
originated from the phase transition emerging in the covering test in FOL learning
[209, 469].
Afterwards, abstraction has been addressed directly in learning languages [120], in
learning abstraction hierarchies [289], in abstracting data for decision tree induction
[306], in grouping objects by conceptual clustering [6], in relational learning [58,
164], in Reinforcement Learning [185], and in Support Vector Machines (SVM).
Apart from the works cited above, there are several subfields of Machine Learning
and Data Mining in which abstraction is largely present under other names. These
subfields are the following ones:
• Data discretization—When continuous data are discretized, values in the same
interval are considered equivalent and replaced by a symbol [140, 292].
• Feature and Instance selection—Collected data for learning are often described
with a set of features, some of which are irrelevant to the goal. These features
increase the complexity of learning, and may also mask useful knowledge. The
literature on feature selection is immense [291]. A recent overview is provided
by Liu et al. [335]. Instance selection is a similar task, where only a subset of
the available data is kept [64, 334, 358]. Both feature and instance selection are
clearly related to the idea of abstraction as the process of focalizing on the important
aspects of observations, forgetting the irrelevant ones.
• Constructive induction and Predicate invention—Constructive induction and Pred-
icate invention are techniques for introducing descriptors that are combinations of
the original ones. The goal is to obtain more meaningful features, able to facili-
tate learning. There is a rich literature in this field, starting from the old paper by
Michalski and Stepp [365], followed by Muggleton [381], who described the sys-
tem DUCE, in which propositional formulas were grouped by means of operators
to form new ones, and by Rendell and co-workers [362, 357], who tried to pro-
vide some principles for introducing new terms. An incremental, instance-based
approach was described by Aha [5], whereas Japkowicz and Hirsh [275] presented
a bootstrapping method starting from human-suggested features. Except Michal-
ski’s proposal, all the others mentioned so far dealt with learning in propositional
settings. More recently, constructive induction moved to FOL learning, under the
name of predicate invention [172, 294, 354, 592]. Constructive induction is thus a
9 Abstraction in Machine Learning 275

type of abstraction related to aggregating elements into a single one inasmuch as

original features are removed; otherwise it is a strict reformulation.
• Propositionalization—In order to cope with the complexity of FOL learning, a pos-
sible approach is to propositionalize the problem, by translating the FOL learning
task into an (almost) equivalent propositional one, in such a way that efficient
algorithms for propositional learning can be applied [10, 303, 305]. Proposition-
alization is related to the aspect of abstraction that tries to simplify the problem at
hand, and can be traced back to Plaisted’s propositionalization abstraction.

9.1 A Brief Introduction to Machine Learning

Machine Learning is a very active field of research, which encompasses, in a wide
sense, all the approaches (design and development of algorithms) that allow a com-
puter to “learn”, based on a database of examples or sensor data. Such an artifact
is thus a learner, which takes advantage of the information provided by a teacher
and/or the world. As Tom Mitchell defines it, “A computer program C is said to learn
from experience E with respect to some class T of tasks and performance measure
P, if its performance at tasks in T, as measured by P, improves with experience E”
[375]. Over the past sixty years, Machine Learning algorithms have undergone a
tremendous development and may be organized along different dimensions, such as
outcome, task, application, protocol, or algorithm. From the point of view of the
information available to learn they can be classified into three broad classes:
• Supervised learning, which learns a function that maps inputs to a given set of
labels (classes), is schematized in Fig. 9.1.
• Unsupervised learning, which learns how to unravel hidden structures in unla-
beled data (also called observations),1 is also schematized in Fig. 9.1.
• Reinforcement Learning, which learns how an agent ought to take actions in an
environment so as to maximize some reward, is represented in Fig. 9.2.

We will focus on the first and third type of learning to explore abstraction in
Machine Learning. In fact, most methods that are applicable in supervised learning
may also be applied to unsupervised learning. Addressing these two fundamentally
different types of learning will support exploring a wide range of abstraction. In both
cases the role of the representation is critical to the success of learning.
In supervised learning, the task relies on training examples of a function that are
given in an initial representation based on “raw” input variables.2 The “features”
used by the learning algorithms, which are constructed from the raw input variables,

1 Semi-supervised learning also learns a function that maps inputs to desired outputs, using both
labeled and unlabeled examples, the latter ones providing information about the distribution of the
observations.
2 Guyon et al. [229] suggest to call variables the raw input variables, and features the variables

constructed from the input variables. The distinction is necessary in the case of kernel methods for
which features are not explicitly computed.
276 9 Abstraction in Machine Learning

Fig. 9.1 Schemes of supervised and unsupervised learning from examples and observations

Fig. 9.2 Schematic representation of Reinforcement Learning. An agent receives rewards from the
environment it belongs to and that it perceives. The agent performs action that may modify itself
or the environment

are deeply related to the complexity and success of learning. Abstracting the repre-
sentation of each example is a step that has paramount impact on supervised learning
algorithms’ complexity and accuracy. Another component of the learning process,
where abstraction can take place, is the hypothesis space that is explored. There is
a classical simplification, often considered in supervised learning, that consists in
representing examples in the language used to describe the function to be learned.
It is called the “single representation trick”. Abstracting the representation of the
function to learn is thus often coupled with the abstraction of the examples.
In Reinforcement Learning (RL) there are no examples, but one or more agents
interacting with an environment. An agent performs actions that (most often) modify
its state, and, while it receives rewards from the environment, it builds value functions
that guide its search for the good action to take so as to maximize its cumulative
reward. As Dietterich argues, all basic Reinforcement Learning algorithms are “flat”
methods inasmuch as they treat the state space as one very large flat search space
[137]. The paths from a start state to a generic state may be very long, and its length
9.1 A Brief Introduction to Machine Learning 277

has a direct impact on the cost of learning, because information about future rewards
must be propagated backward along these paths. The representation of the states, of
the actions, and of the value functions play therefore a key role in the complexity
and success of the learning task. To scale up Reinforcement Learning to real life
problems there is thus a need to introduce various mechanisms for abstraction, either
by abstracting the state space (be it flat and/or factored) or the actions, [185] or any
of the functions that apply to states or actions [137].

9.2 Abstraction in Learning from Examples or Observations

The two first broad classes of learning introduced above (supervised or unsupervised
learning), illustrated on Fig. 9.1, can be more precisely defined as follows:
• In supervised learning the goal is to learn a function f that maps a vector xj =
(vj,1 , . . . , vj,n ) of features values Ai = vj,i into a set of values (be it Boolean,
discrete or continuous), by looking at several pairs (xj , f (xj )), called examples or
training data of the function. The quality of an approximation h of f is measured
by their differences on some testing data [238, 375, 475].
• In unsupervised learning there are only observations (xj ), and the goal is to find
a “good” clustering of a given set of observations into groups (be it a partition,
a hierarchy, a pyramid, a lattice, …), so that objects of the same group are more
similar than objects of different groups. Different measures have been proposed in
the literature to address the problem of determining the optimal number of clusters
and the quality of the clustering [273].
As an illustrative supervised learning problem, we have chosen the task of deciding
whether an adult woman of Pima Indian heritage shows signs of diabetes according to
the World Health Organization criteria. A Public Database [179] illustrating this task
is often used to test Machine Learning algorithms, namely the Pima Indians Diabetes
Data Set. This data set contains 768 observations with 8 real-valued features that have
hopefully self-explanatory names: “Number of times pregnant”, “Plasma glucose
concentration”, “Diastolic blood pressure (mm Hg)”, “Triceps skin fold thickness
(mm)”, “2-Hour serum insulin (mu U/ml)”, “Body mass index (weight/(height)2 )”,
“Diabetes pedigree function” and “Age (years)”. The class variable is Boolean,
and expresses whether the considered Pima Indian woman shows signs of diabetes
or not.
As already mentioned, abstraction takes several forms in Machine Learning, which
are related to either features or instances. The main areas of research that include
abstraction in learning includes:

• Feature selection: hiding irrelevant features.

• Instance selection: hiding irrelevant instances.
278 9 Abstraction in Machine Learning

• Feature discretization: grouping values of a feature whose differences are consid-

ered irrelevant.
• Feature construction: constructing features composed with existing ones.
• Predicate invention: constructing predicates based on existing ones.
• Term abstraction: constructing new terms from existing ones.
• Propositionalization: constructing a single relational term and reformulating it in
a propositional representation.

The last four types of representation changes are grouped under the term of Construc-
tive Induction, whose goal is to construct new objects, features, functions, relations
or predicates for the language used to describe the data and the hypothesis language.

9.2.1 Feature Selection

In Machine Learning and Data Mining, data are often described by a large set of
features. In Metagenomics prediction tasks, for example, there might be up to several
millions of features for a set of hundreds of examples [98]; in many cases all but a
few are in fact relevant to the learning task. The presence of irrelevant features has
numerous drawbacks. First, these features increase the computational complexity.
Second, these irrelevant features might spuriously correlate to other more meaningful
features, and thus be preferred to the latter although not related to the task. Finally,
and more generally, they are known to increase the storage of data, deteriorate the
performance, diminish understandability of models, introduce a wide range of errors,
and reduce robustness of algorithms [229, 475]. In this context, feature selection is
thus simply the process of hiding features from the initial representation of the
instances. The difficult part of the process is obviously to define the criterion for
keeping some features and discard the other ones.
The literature on the task of feature selection, which goes back to the field of
Pattern Recognition, is very large, as the topic has been an active field of research
in Machine Learning, Data Mining and Statistics for many years [64, 229, 291, 294,
465, 537, 562]. It has been widely applied to many fields, such as bioinformatics
[160, 229, 270, 465], text mining [177, 276], text retrieval [224], music classification
[409], only to cite a few.
In feature selection the data used can be either labeled, unlabeled or partially
labeled, leading to the development of supervised, unsupervised and semi-supervised
feature selection algorithms, respectively. We will briefly present the principles of
feature selection in the framework of supervised learning, our goal being to analyze
abstraction in Machine Learning rather that exhaustively list all the methods that
have been used; to the latter aim excellent reviews already exist [64, 229, 291, 294,
465, 562]. In the supervised learning case, there is a class label associated to each
instance (see for example the column “Class”’ in Table 9.1). The feature selection
process can be an iterative one, and different information can be used to guide it,
including the class label:
9.2 Abstraction in Learning from Examples or Observations 279

Table 9.1 Illustrative dataset for a supervised learning task

Features Class
Id Pregnant Glucose Pressure Triceps Insulin BMI Pedigree Age DiabetesSigns
1 6 148 72 35 0 33.6 0.627 50 1
2 1 85 66 29 0 26.6 0.351 31 0
3 8 183 64 0 0 23.3 0.672 32 1
4 1 89 66 23 94 28.1 0.167 21 0
5 0 137 40 35 168 43.1 2.288 33 1
… … … … … … … … … …
767 1 126 60 0 0 30.1 0.349 47 1
768 1 93 70 31 0 30.4 0.315 23 0
The goal is to predict whether Pima Women have diabetes signs [179]. For instance, the woman
with Id = 3 has the value 8 for “Pregnant” (number of pregnancies), whereas the woman with Id =
4 is 21 year old and does not show signs of diabetes

• General characteristics of the data: e.g., feature value distribution or correlation

with the class.
• Currently selected features: e.g., fixing a threshold in the number of features to
retain.
• Target learning model: e.g., fixing a limit in the size of the model learned for
comprehensibility reason.
• Prior knowledge: e.g., taking into account the a priori importance or cost of some
features.
Feature selection approaches are classified into three types: Filter, Wrapper and
Embedded. These approaches are illustrated in Fig. 9.3 (and again in Fig. 9.37). In
Fig. 9.4 the taxonomy of feature selection approaches, proposed by Saeys et al. [465],
is reported. Guyon et al. [228] introduce another view of feature selection, which

Fig. 9.3 A view of a feature selection process distinguishing Filter, Wrapper and Embedded
approaches
280 9 Abstraction in Machine Learning

Fig. 9.4 A taxonomy of feature selection techniques. For each feature selection type, we highlight
a set of characteristics which can guide the choice for a technique suited to the goals and resources
of practitioners in the field. (Reprinted with permission from Saeys et al. [465] with a few updates)

depends on the search strategy, relevance index or predictive power, and assessment
method, and which is described in Fig. 9.5.
Filter approaches select variables by ranking them with coefficients, based on
correlations between the features (or subset thereof) and the class, or usefulness of
the feature to differentiate neighboring instances with different classes, and so on.
They can be very efficient (whether individual or subset of variables are ranked).
Wrapper methods require one or more predetermined learning algorithms, and use
their performances on the provided features to assess relevant subsets of them. Finally
Embedded approaches incorporate feature selection as a part of the learning process,
and directly optimize a two-part objective function with a goodness-of-fit term and
a penalty for a large number of variables [229]. The lasso introduced by Tibshirani
[529], a shrinkage and selection method for linear regression, is a good example of
such embedded systems. Whatever approach is chosen to search for the best subset
of variables, tractability is an issue as this problem is known to be NP-complete [12].
Many software packages are currently available to perform feature selection.
9.2 Abstraction in Learning from Examples or Observations 281

(a) (b)

(c)

Fig. 9.5 The three principal approaches to feature selection: Filter, Wrappers, and Embedded. The
shades show each of the components used by each of the three approaches; Cross-validation, for
example, is used by both the Embedded and Wrappers methods but not Filters. (Reprinted with
permission from Guyon et al. [228]). a Filters, b Wrappers, c Embedded methods

Example 9.1. For the sake of illustration we present a very simple implementation
of a procedure that supports selecting one feature out of the initial set.3 The R code—
hopefully self-explanatory—is reported in Fig. 9.6. The results of applying this code
to the PIMA Dataset is reported in Fig. 9.7.
In Machine Learning there is also a specular approach to feature selection, i.e.,
given a set of features, each of them is added one at a time to the example description.

3 There exist various R packages that support a wide variety of feature selection methods (for

example the FSelector Package, which provides functions for selecting attributes from a given
dataset: http://cran.r-project.org/web/packages/FSelector/index.html). Several approaches to fea-
ture selections are also available in the WEKA [565] package (http://www.cs.waikato.ac.nz/ml/
weka/).
282 9 Abstraction in Machine Learning

Fig. 9.6 Short R code for performing feature selection on the Pima Database. The feature that has
the greatest number of zero values (i.e., the feature “insulin”) is hidden

Fig. 9.7 Result of the execution of the R code of Fig. 9.6. In the Abstract Data, the “insulin” feature
is hidden from the ground data

This approach has not been considered here in more details: it is a kind of “inverse”
abstraction or “concretion”. On the other hand, it corresponds to Floridi’s notion,
where, in hierarchical GoAs, observables are added rather than removed from one
layer to the other.
9.2 Abstraction in Learning from Examples or Observations 283

9.2.2 Instance Selection

Instance selection is in some sense a dual task with respect to feature selection,
because instances rather than features are hidden [64, 80, 334, 358]. Just as some
attributes may be more relevant than others, some examples may be as well more
relevant than others for the learning task [64]. Instance selection has been widely
studied in the field of outlier or anomaly detection [92, 253, 380, 484]. Here the
removed instances correspond to anomalous observations in the data. Outliers may
arise because of human error, instrument error, fraudulent behavior, or simply through
natural deviations in populations. Today, principled and systematic techniques are
used to detect and remove outliers.
Instance selection has also been studied in the field of Instance-Based Learning
(IBL), but it also finds application in Data Mining, where the number of observations
are potentially huge. It is also of critical importance in the field of online learning [60,
560], learning from data streams [189] and, recently, in learning with limited memory
[131]. The Forgetron, for example, is a family of kernel-based online classification
algorithms that restrict themselves to a predefined memory budget [131].
A first reason to reduce the number of examples required to learn, or to learn
with a predefined memory budget, is to reduce the computational cost of learning.
Another reason is related to the cost of labeling. Be it because it must be obtained
from experts, or because the technique to obtain labeled examples is itself expensive,4
reducing the number of examples required to learn is important. A third reason is to
increase the rate of learning by focusing attention on informative examples.5
As Blum et Langley note, one should distinguish between examples that are rele-
vant from the viewpoint of information, and ones that are relevant from the viewpoint
of the algorithm [64]. Most works emphasize the latter, though information-based
measures are sometimes used for this purpose. Instance selection approaches can be
classified, like feature selection methods, into Filter, Wrapper or Embedded.
Example 9.2. For illustration purposes, we present a very simple implementation of
a procedure that supports selecting one instance out of the initial training set. The R
code is presented in Fig. 9.8.6 The results of the run of the code reported in Fig. 9.8
are collected in Fig. 9.9.

4 For example, the price for sequencing one individual genome to support personalized medicine

is currently still a few thousands dollars.

5 Although not directly related to abstraction, active learning addresses the question of informa-

tive instances. Active learning, also called optimal experimental design in Statistics, is a form of
supervised Machine Learning method, in which the learning algorithm is able to interactively make
requests to obtain the desired outputs at new data points. As a consequence, the number of examples
to learn a concept can often be much lower than the number required in normal supervised learning.
6 There exists an R package that support a wide variety of instance selection methods, such as, for

example, the “outliers” Package, which provides functions for selecting out instances from a given
dataset: http://cran.r-project.org/web/packages/outliers/index.html
284 9 Abstraction in Machine Learning

Fig. 9.8 Short R code for performing an instance selection on the Pima Dataset. The hidden instance
is the one that has the greatest number of zero values in its description (i.e., the woman with Id427)

Fig. 9.9 Result of the execution of R code of Fig. 9.8: the woman Id427 is hidden from the Ground-
Data
9.2 Abstraction in Learning from Examples or Observations 285

9.2.3 Feature Discretization

Discretization is one of the most frequently used abstractions in AI in general, and in

Machine Learning in particular. In Machine Learning, feature discretization consists,
in most cases, in splitting the range of an attribute values into a finite number of inter-
vals. Values in the same interval are considered equivalent [140, 292, 298, 337] and
their differences thus irrelevant. As such it is clearly a process that hides information
and abstracts the initial representation. Given a feature Ai and its continuous domain
Λi , a discretization of Ai is a partition of its continuous domain into K intervals. The
values of Ai in an interval are considered indistinguishable, and a single label may
be attached to them. An example is given in Fig. 9.10 for the BMI feature, the 6th
feature in the example reported in Table 9.1. Depending on the feature discretization
approach, the number and nature of intervals (or bins) in the partition vary.
There is an immense literature on discretization, a task that has been studied in
many fields, including Statistics, Machine Learning, rough sets, fuzzy sets and, more
recently, granular computing [414].
Figure 9.11 proposes a classification of discretization methods, which can be
grouped according to several dimensions:
• Unsupervised versus supervised discretization. In the former case the discretiza-
tion process does not take advantage of the dependence of the distribution of the
feature values upon different instance groups, whereas in the latter it does.
• Local versus global discretization. In the former, just one feature is considered at
one time, whereas in the latter several features are discretized at the same time. In
this case the process attempts to partition an N-dimensional space, and it is related
to vector quantization [140, 406].
• Parametric versus Non parametric discretization. The first approach relies on a
given number K of bins, whereas the second supports a dynamic choice of K.
A widely used unsupervised method consists in partitioning the value range of
the attribute either in K intervals of equal width, or K quantiles. Supervised methods
use all kinds of scoring function to select features, including information gain [108],
entropy [292], χ2 [336], and so on (See Fig. 9.11). Automatically determining the

Fig. 9.10 Example of the dis-

cretization of the continuous
variable “Body Mass Index”
into a set of 7 intervals labeled:
Underweight, Normal Weight,
Overweight, Obese, Severely
Obese, Morbidly Obese and
Super Obese
286 9 Abstraction in Machine Learning

Fig. 9.11 Classification of discretization methods and several illustrative algorithms. (Reprint with
permission from Liu et al. [335])

“best” number of clusters is a difficult problem, which is often cast as the problem of
model selection [273]. In supervised learning, the search for an optimal K is usually
led by the quality of learning, and operates on the principle of wrapper defined
earlier on, whereas in unsupervised learning there are several measures that have
been proposed to identify an optimal number of clusters (BIC, AIC, MML, Gap
statistics,...) [273].
One of the effects of discretization for learning is that it reduces information,
offering thus a satisfactory trade-off between gain in tractability and loss in accuracy.
Many studies show that induction tasks benefit from discretization, be it in terms of
accuracy, or time required for learning (including discretization), or understandability
of the learning result. The majority of discretization algorithms found in the literature
performs an iterative greedy heuristic search in the space of candidate discretizations,
using different types of scoring functions to evaluate their quality [298]. Finally we
should mention that when the feature to discretize is either time or space (the term
aggregation is also used), dedicated approaches have been developed [8, 575].
Example 9.3. For illustration purposes, we present a very simple implementation of
a procedure that supports discretizing one feature of the initial representation of the
Pima Data. The R code is presented in Fig. 9.12.7

9.2.4 Constructive Induction

In Machine Learning, Constructive Induction encompasses several approaches to

add new features, functions, relations or predicates to the language used to describe

7 There exist packages in R that support a wide variety of discretization, such as, for example, the
“discretization” Package, which provides functions for discretizing features http://cran.r-project.
org/web/packages/discretization/index.html. Several approaches to feature discretization are also
available in the WEKA [565] package (http://www.cs.waikato.ac.nz/ml/weka/).
9.2 Abstraction in Learning from Examples or Observations 287

Fig. 9.12 Short R code for performing a discretization of the feature “Age”, from the Pima Dataset,
into two bins of the same width (here, 30 years). The results are reported in Fig. 9.13

Fig. 9.13 Result of the execution of code of Fig. 9.12. The “Age” of the GroundData has two
possible values after discretization

the data or the hypotheses [136, 141, 173, 286, 319, 381]. Whereas feature selection
is a form of dimensionality reduction, the purpose of constructive induction is to
change the language of representation. It might sound counter-intuitive to consider
as an abstraction an approach that constructs new features. However, we must first
notice that the features do not add really “new” information, because they are built
using existing features, which are removed from the description.
In this section we will focus on four types of such representation changes, and
describe the nature of the abstraction at stake in each:
• Feature construction
• Predicate invention
• Term abstraction
• Propositionalization
Feature construction is with no doubt the most widely used approach of con-
structive induction. It can be applied to almost all representations used in Machine
Learning or Pattern Recognition. As opposed to feature construction, the last three
representation changes mentioned above apply mostly inside the framework of struc-
tural learning, Inductive Logic Programming (ILP [384, 385]) or Statistical Rela-
tional Learning (SRI), all three relying on First Order Logic representation.
288 9 Abstraction in Machine Learning

9.2.4.1 Feature Construction

Feature construction is a very popular way of generating features that are combi-
nations of the original ones, and may be more appropriate as they preserve crucial
information for the task while removing redundant features [141, 173, 286, 319,
381]. The term feature generation [324, 351, 409] is also used to characterize the
creation of new features.
Feature construction may be used as an approach to reduce the dimensionality,
when it projects the data onto a lower-dimensional space. Among the many statistical
approaches used to construct features, the Principal Component Analysis (PCA),
which is a particular case of the family of Karhunen-Loeve’s transformations, is
with no doubt one of the most commonly used. Principal component analysis uses
an orthogonal transformation to convert a set of observations of possibly correlated
features into a set of linearly uncorrelated features, called principal components,
which “explain” most of the variance in the data. However, in general only a subset
of the most significant coefficients is considered, obtaining thus a more abstract
representation of the original function.
Another approach that performs abstraction on numerical data is the use of the
Discrete Fourier transform. Also in this case a function is expressed as a series of
trigonometric functions, each with a coefficient associated. If all the coefficients
are kept (they are often in an infinite number), no abstraction occurs but only
reformulation.
In Machine Learning, feature construction was first introduced by Michalski [366,
367], and then formally defined by Matheus and Rendell [357] as the application of
a set of constructive operators to a set of existing features; the result consists in the
construction of one or more new features intended for use in describing the target
concept to be learned. Similarly to feature selection, the building of abstract feature
can take place before induction (as in Filter mode), after induction (as in Wrapper
mode) or during induction (as in Embedded mode). Feature construction has been
widely used in Machine Learning in all kinds of representation, from propositional
to relational learning: PLSO [449], DUCE [381], STAGGER [476], BACON [315],
Fringe [409], MIRO [141], KLUSTER [286], MIR-SAYU [122], Feat-KNN [232], to
cite only a few.
Wnek [566] offers a review of feature construction distinguishing (a) Deductive
Constructive Induction (DCI), which takes place before induction, and (b) Hypothesis
Constructive Induction (HCI), which takes place after. We add to this list the notion
of ECI (Embedded Constructive Induction) to account for constructive induction that
is embedded in the Machine Learning algorithm.
• In DCI (corresponding to a Filter approach) the space of possible new features is
combinatorial, and a priori knowledge must be used to choose the types of fea-
tures to construct (e.g., product fi ∗ fj , ratio fi /fj , Boolean formula M-of -N, . . .).
Constructed features may also use expert knowledge; for instance, Hirsh and Jap-
kowicz [250] presented a bootstrapping method starting from human-suggested
9.2 Abstraction in Learning from Examples or Observations 289

features. The main problem of DCI approaches is clearly the combinatorial aspect
of the possible features that may be generated.
• In HCI (corresponding to a Wrapper approach) the new features are built after a
learning process has taken place. Some typical approaches are listed below.
– FRINGE [410] is a decision tree-based feature construction algorithm. It
reduces the size of the learned decision tree by iteratively conjoining features at
the fringe of branches in the decision tree. It addresses the replication problem
in trees (i.e., many similar subtrees appearing in the tree) and provides more
and more compact decision trees [306, 418]. The features built this way corre-
spond to particular Boolean formulas over the initial or previously constructed
features. A continuous feature can also be created.
– In Feat-KNN [232] the new features are built from pairs of features that are
density estimator functions learnt from the projection of the data onto the 2-D
space formed by the pair of features. Only the best newly built features are then
kept for subsequent learning.
– Pachet et al. [409] present a feature construction system designed to create audio
features for supervised classification tasks. They build analytical features (AFs)
based on raw audio signal processing features. AFs can express arbitrary signal
functions, and might not make obvious sense. MaxPos(FFT (Split(FFT (Lp
Filter(x, MaxPos(FFT (x))), 100, 0))) is, for instance, an AF that computes a
complex expression involving a low-pass filter, whose cutoff frequency is itself
computed as the maximum peak of the signal spectrum. AFs are evaluated
individually using a wrapper approach.
• In ECI (corresponding to an Embedded approach) the new features are built during
the learning process, as it is the case of OC1, which learns oblique decision trees
[388]. Decision splits that are ratios of the initial features are considered during the
building of the tree. Motoda proposes an algorithm that extracts features from a
feed-forward neural network [379]. Indeed, feed-forward neural networks might be
considered as dynamically building features [542], because they construct layered
features from input to hidden layer(s) and further to the output layer.
Finally let us mention that feature construction has some natural links with the field
of Scientific Discovery [145, 316, 543]. BACON [315], for example, is a program
that discovers relationships among real-valued features of instances in data using
two operators, multiply and divide.
As a conclusion, we can say that, according to Definition 6.19, feature construction
is an abstraction only if the original features are removed from the “abstract” space.
Otherwise, it is a simple reformulation; in fact, the set of configurations remains
the same, as both the newly created feature and its components are visible in each
example.
In Fig. 9.14 a simple R code to perform the construction of two features is reported.
The results on the PIMA dataset is reported in Fig. 9.15.
290 9 Abstraction in Machine Learning

Fig. 9.14 Short R code for performing the construction of two features from the eight features of
the Pima Database

Fig. 9.15 Result of the execution of the R code of Fig. 9.14 (apart from the last three lines): the
two new features PC1 and PC2 are abstracted from the 8 initial ones

9.2.4.2 Predicate Invention

Predicate invention (PI) refers to the automatic introduction of new relations or pred-
icates directly in the hypothesis language [172, 302, 504, 505]. To extend the initial
set of predicates or relations (sometimes called the vocabulary) may be useful to
either speed up or ease the learning task. However, inventing relevant new predi-
cates is one of the hardest tasks in Machine Learning, because they are difficult to
9.2 Abstraction in Learning from Examples or Observations 291

evaluate, and there may potentially be so many of them [174]. As an example let us
consider the predicates StepMotherOf and MotherOf . It might be useful to consider
a new predicate MomOf , true whenever StepMotherOf or MotherOf are true. This
predicate would account for the maternal relationship that is true for both mothers
and stepmothers. Formally:

MomOf (x, y) ≡ MotherOf (x, y) ∨ StepMotherOf (x, y)

The first occurrence of predicate invention appeared in the context of inverse

resolution [381, 383], realized in the DUCE system, in which propositional formulas
are grouped by means of operators to form new ones [210]. One of the main methods
explored for predicate invention (see [302] for a detailed review8 ) is Inverse Resolu-
tion. Its basic step consists in inverting the “logical resolution” process in First Order
Logic inference. This process relies on the so called V -operator (see an illustrative
example in Fig. 9.16). There are two other inverse resolution operators which invert
two resolutions steps, i.e., intra-construction [383] and inter-construction [567],
which are called W -operators. The intra-construction operator is the most frequently
used. It basically consists in “factoring out” the generalization of two clauses and
assigning the “residue” to a new predicate. See the example of the parent predicate
on Fig. 9.17. There is a known risk of combinatorial explosion because of the number
of possible inversions. A weakness of inter/intraconstruction is that they are prone

IF SON(Luke,X)
SON(Luke,Anakin) THEN PARENT (X,Luke) DAUGHTER (Leia,Padme) ?

PARENT (Anakin,Luke) PARENT (Padme,Leia)

Resolution Inversion of the resolution

IF DAUGHTER (Leia,X)
?= THEN PARENT (X,Leia)

Fig. 9.16 The V-operator inverting the resolution. On the right, a new clause stating that if Leia is
a daughter of X then X may be the parent of Leia is a possible result of the resolution inversion

8 There are also scheme-driven methods, which define new intermediate predicates as combinations
of known literals that match one of the schemes provided initially for useful literal combinations
[504].
292 9 Abstraction in Machine Learning

IF FATHER (X,Y) &

IF FATHER (X,Y) NEW(Y,Z) IF MOTHER (X,Y)
THEN NEW(X,Y) THEN GRANDFATHER (X,Z) THEN NEW(X,Y)

IF FATHER(Laiös,Œdipe) & IF FATHER(Ménécée,Jocaste) &

FATHER(Œdipe, Antigone) MOTHER(Jocaste,Antigone)
THEN THEN
GRANDFATHER(Laiös,Antigone) GRANDFATHER(Ménécée,Antigone)

Fig. 9.17 The W-operator of intra-constructing. It consists here in two simultaneous V-operators
based on a factorization, where a new predicate that subsumes father and mother is invented (called
“new”, assuming that the “parent” concept had not been given before)

to over-generating predicates, many of which are not useful. Predicates can also be
invented by instantiating second-order templates [488], or to represent exceptions to
learned rules [501]. Relational predicate invention approaches suffer from a limited
ability to handle noisy data.
As noted by Khan et al. [285], “... surprisingly little, if any, experimental evi-
dence exists to indicate that learners which employ PI perform better than those
which do not. On the face of it, there are good reasons to believe that since increas-
ing the learner’s vocabulary expands the hypothesis space, PI could degrade both the
learner’s predictive accuracy and learning speed” . But, as recently underlined by
Kok and Domingos [293], there are only few systems able to invent new predicates,
and only weak or no results about the properties of their operators. The crucial prob-
lems concerning the introduction of new predicates have not yet been satisfactorily
solved. Nevertheless, the need for predicate invention is undoubted.
In the past few years, the Statistical Relational Learning (SRL) community has
recognized the importance of combining the strengths of statistical learning and
relational learning, and developed several novel representations, as well as algorithms
to learn their parameters and structure [199]. However, the problem of statistical
predicate invention (SPI) has so far received little attention in the community. SPI
is the discovery of new concepts, properties and relations from data, expressed in
terms of the observable ones, using statistical techniques to guide the process, and
explicitly representing the uncertainty in the discovered predicates. These can in turn
be used as a basis for discovering new predicates, which is potentially much more
powerful than learning based on a fixed set of simple primitives. Essentially, all the
9.2 Abstraction in Learning from Examples or Observations 293

concepts used by humans can be viewed as invented predicates, with many levels
of discovery between them and the sensory percepts they are ultimately based on.
A recent proposal for predicate invention has been put forward by Domingos [293].
Comments similar to those provided for predicate invention can be done for feature
construction. If the new predicate is meaningful for the task at hand, then learning will
occur with less computational effort and better results. Otherwise, a good concept
can be lost, because it may be masked by too many other predicates.

9.2.4.3 Term Abstraction

A fundamental type of abstraction, frequently needed in reasoning, consists in iden-

tifying compound objects whose internal details are hidden. This is what is done, for
instance, in programming according to the paradigm of the abstract data types. We
have also seen that in databases the operation of aggregation is an abstraction that
turns a relationship between objects into an aggregated object [371].
Term abstraction, as introduced by Giordana et al. [207, 208], aims at (semi-
automatically) generating aggregated objects that can be used as single units in
relational learning (see Figs. 9.18 and 9.19 for illustrative examples), obtaining thus

(a) (b)
Fig. 9.18 a A learning example (a cart) with structural noise. b A new description of a cart is
obtained by hiding objects a and d, and merging object h and m into p, as well as objects e, f, and
g into n. (Reprinted with permission from Giordana et al. [210])

Fig. 9.19 Example of two different term abstractions of a molecule: in the one on top, a unique
term abstracts the whole molecule, whereas in the one at the bottom several terms abstract pairs of
atoms
294 9 Abstraction in Machine Learning

a significant speed-up. Finding such terms may be related to approaches that attempt
to detect common substructures, like the SUBDUE system [256].
In relational learning examples may have internal parts (e.g., a carcinogen mole-
cule may be described by its components and their relations [241, 287]), and a
possible abstraction aims at simplifying its representation by hiding part of the com-
plexity of their structure [416]. This aggregation hides the structural complexity of
objects, and as such it leads to very significant speed-up in Machine Learning at the
expense of using a simplified representation.

9.2.4.4 Propositionalization

Although some learning systems can directly operate on relations, the vast majority of
them only operate on attribute-value representations. Since the beginning of Machine
Learning there have been approaches to reformulate relational representation into
propositional ones. Such a reformulation requires first that new features are built up
(e.g., LINUS [318]), and then the relational data are to be re-expressed in the new
representation. This process, called originally selective reformulation [318, 591,
592], was later called propositionalization [9, 72, 304, 578, 583].
The issue becomes then the translation of the relational learning task into a propo-
sitional one, in such a way that efficient algorithms for propositional learning can be
applied [10, 303, 305, 592, 591]. Propositionalization involves identifying the terms
(also called morion [592]) that will be used as the individual objects for learning.

9.3 Abstraction in Reinforcement Learning

Reinforcement Learning (RL) is another type of learning that has been extensively
studied in the learning community [519]. As opposed to supervised or unsupervised
learning, it does not learn from examples or observation per se. In an RL task, a learn-
ing agent has to choose sequentially an action in each state it encounters, with the goal
of maximizing a possibly time-delayed reward signal. In other words, RL addresses
the problem of learning optimal policies for sequential decision-making problems.
As it involves stochastic operators and numerical reward functions, rather than the
more traditional deterministic operators and logical goal predicate, RL recapitulates
the development of classical research in planning and problem solving [137]. It does
not require a model but rather collects samples from the environment to estimate it.
Today this field is also receiving attention from the decision theory, operational
research, and control engineering fields. Most RL research uses Markov Decision
Processes (MDP) to model the decision of the agent that interacts with an external,
unknown, fully-observable stochastic environment from which it receives rewards
which it learns how to maximize. An MDP = S, A, P, R, γ is defined as follows9 :

9 To simplify the treatment, we will not explicitly represent the starting state probability distribution.
9.3 Abstraction in Reinforcement Learning 295

• S: Finite set of states of the environment (each state is completely observable by

the agent).
• A: Finite set of actions (we consider them independent of the current state s).
• P: Probability distribution P(s |s, a) over the states s generated by an action a
applied to a state s.
• R: Expected value R(s |s, a), real-valued (possibly stochastic) reward r given at
the time that action a is initiated.
• γ: a discount factor.
A policy, π : S → A, is a mapping from states to actions that tells what action
a = π(s) to perform when the agent is in state s. The state-value function V π for
policy π is a function from S → R, that tells, for each state s, what the expected
cumulative reward will be of executing policy π starting in state s. The action-value
function or Qπ function from S × A → R gives the expected cumulative reward of
performing action a in state s and then following policy π thereafter.
There are a few successful RL algorithms that theoretically solve any problem
that can be cast in the MDP framework. To illustrate Reinforcement Learning we will
use a toy example introduced by Dietterich [137], called the Taxi Problem shown in
Fig. 9.20.
The problem is to learn the best program for the Taxi agent. The world is a simple
grid that contains a taxi, a passenger, and four specially-designated locations labeled
R, G, B, and Y. In the starting state, the taxi is in a randomly-chosen cell of the grid
without having a passenger on board, and the passenger is at one of the four special
locations (R, G, R, or Y). The passenger has a desired destination that he/she wishes
to reach, and the job of the taxi is to go to the passenger, pick him/her up, go to
the passenger’s destination, and drop the passenger off. Both the taxi driver and the
passenger would like to reach their goal as quickly as possible.
The Taxi agent has six primitive actions available to it: move one square (north,
south, east, or west), pickup the passenger, putdown the passenger. This primitives
are illustrated in Fig. 9.21. The Taxi receives positive rewards when it drops the

(a) (b)

Fig. 9.20 a A simple Markov decision problem used to introduce reinforcement learning after
Dietterich [137]. A passenger is located in Y at (0,0), and wishes to go by taxi to location B at (3,0).
b The optimal value function V for the taxi agent in the case described in a
296 9 Abstraction in Machine Learning

Taxi Agent s Primitive Actions

Pickup North South East West Dropoff

Fig. 9.21 The six primitives actions of the taxi agent in the taxi problem

passenger at his/her destination, and negative ones when he/she attempts to pickup
a non-existent passenger or putdown the passenger anywhere except one of the four
special spots [137].
In spite of several success stories of RL, in many cases, tackling difficult tasks
using RL may be slow or infeasible, the difficulty being usually a result of the
combination of the size of the state space with the lack of immediate reinforcement
signal. Thus, a significant amount of RL research is focused on improving the speed
of learning by using background knowledge to either reuse solutions from similar
problems (using transfer or generalization techniques [524]) to bootstrap the value
of V π (s), or abstracting along the different dimensions of knowledge representation
in RL [137, 363, 426, 516]. To represent large MDP Boutilier et al. [75] have been
precursors in proposing to use a factored model in planning. Factored MDPs are
a representation language that supports exploiting problem structure to represent
exponentially large MDPs in a compact way [226]. In a factored MDP, the set of
states is described via a set of random variables. There is in fact a wide spectrum
of abstractions that have been explored in the field of Reinforcement Learning and
planning literatures. Both positive and negative results are known [330]. There are
mainly four dimensions (summarized in Fig. 9.22), along which these representation
changes involving abstraction have been explored:

Fig. 9.22 The different types of abstraction in reinforcement learning

9.3 Abstraction in Reinforcement Learning 297

• State aggregation: Aggregating states together, be it on a flat representation [18,

243, 277, 330, 433] (which consists in “extensionally” forming subsets from the set
of states), or on a factored representation (which consists in “intentionally” forming
subsets from the set of states by dropping features describing them) [225, 226].
• Task decomposition: Decomposing the problem in a hierarchy (or a graph) of
subtasks, such that parent node tasks invoke children tasks as if they were primitive
actions [247] and supporting learning sub-tasks of the hierarchy (or graph) [76,
137, 317, 376]. The related approaches belong to the domain of Hierarchical
Reinforcement Learning (HRL).
• Temporal abstraction: Generalizing one-step primitive action to include tempo-
rally extended actions called macro-actions or options [507, 516].
• Function approximation: Finding a compact representation of the probability
transition function, or value function, or policies [363, 425, 520, 535]. The resulting
MDPs are called factored as opposed to flat, which have a tabular representation
of the states.
Even though these four dimensions are strongly interrelated and often discussed
together (for example, state abstraction is strongly related to task decomposition, but
also to function approximation), we will present them separately to better identify
the abstraction process at stake.
There are several very good reviews of abstraction methods in RL, including the
ones from Sutton [516], Dietterich [138], Ponsen et al. [426], Lasheng et al. [317],
and Sammut et al. [475, 186]. The goal in this chapter is to focus on the abstraction
aspect of the methods of this field, rather than on the Machine Learning point of
view, which has been addressed in these reviews.

9.3.1 State Space Abstraction in Reinforcement Learning

Safe state abstraction in Reinforcement Learning allows an agent to ignore aspects
of its current state that are irrelevant to its current decision, and therefore speeds up
dynamic programming and learning. State abstraction (or state aggregation) treats
groups of states as a unit by ignoring irrelevant state information. There are many
reasons that may lead to aggregate states: some variables describing the states may
prove irrelevant for the task, or structural constraints may support reducing the rep-
resentation as some states are never reached, or certain actions may cause a larger
number of initial states to be mapped into a small number of resulting states. For
example, a Navigate(t) action maps any state into a state where the taxi is at location
t [137]. Thus with respect to this action the starting “location” is not relevant, and
may be temporarily ignored.
There are two main questions related to state abstraction:
• How to find a good state abstraction. There are four main approaches, one of them
driven by an expert, one driven by looking at trajectories (model driven), one driven
by the value functions (value driven), and one driven by the policy (policy driven).
In the first case the abstraction is given as background knowledge by an expert,
298 9 Abstraction in Machine Learning

in the second case there are algorithms that search regularities in the state space
(symmetry, equivalence, irrelevant features, etc.), in the third case a surrogate of
the Value function V or Q is built up using a classifier systems (Decision tree,
SVM, …) to learn a model of V or Q, and, similarly, in the fourth case a surrogate
of the policy function is built up using again a classifier [322].
There is a large literature on factored MDPs that is relevant to this question [226].
• How to guarantee the convergence of algorithms in the abstracted state space. This
issue has raised a lot of work, both theoretical and empirical [523].
There have been several strategies proposed to state aggregation, overviewed by Li
et al. [330] (see Fig. 9.23). Symmetry of the state space arises when states, actions or
a combination of both can be mapped to an equivalent reduced or factored MDP that
has less states or/and actions. An example of state symmetry is learning to exit similar
rooms that differ only in non relevant properties. A more subtle kind of symmetry
arises when the state space can be partitioned into blocks such that the inter-block
transition probabilities and reward probabilities are constant for all actions.
Early work from Boutilier et al. [76] introduced a stochastic dynamic program-
ming algorithm that automatically builds aggregation trees in the abstract space to
create an abstract model, where states with the same transition and reward functions,
under a fixed policy, are grouped together.

Fig. 9.23 Different strategies for state aggregation in Reinforcement Learning. The column “MDP
given” states whether an MDP is given or not before learning. (Reprinted with permission from Li
et al. [330])
9.3 Abstraction in Reinforcement Learning 299

This principle has been formalized using the notion of bisimulation homogeneity
by Dean et al. [128]. The elimination of an irrelevant random variable in a state
description is an example of such homogeneity. Givan et al. [215] have proposed
an algorithm that generalizes Boutilier’s approach [76] based on the iterative meth-
ods for finding a bisimulation in the semantics of concurrent processes. This algo-
rithm, where states with the same transition probability and reward function are
automatically aggregated, supports building the abstraction of an MDP in polyno-
mial time [215].
Andre and Russell [18] propose a state abstraction that maintains the optimality
among all policies consistent with the partial program that they call hierarchical
optimality. They have demonstrated that their approach, on variants of the taxi prob-
lem, shows faster learning of better policies, and enables the transfer of learned skills
from one problem to another. Fitch et al. [171] consider using homomorphism as
algebraic formalism for modeling abstraction in the framework of MDPs and semi-
MDPs [444]. They explore abstraction in the context of multi-agent systems, where
the state space grows also exponentially in the number of agents. They also investi-
gate several classes of abstractions specific to multi-agent RL; in these abstractions
agents act one at a time as far as learning is concerned, but they are assumed to be
able to execute actions jointly in a real world.
Li et al. [329] have proposed a framework to unify previous work on the subject
of abstraction in RL. They consider abstraction as a mapping φ between MDPs, and
distinguish abstractions from the less to the more coarser ones:
• φmodel gives the opportunity to recover essentially the entire model (e.g.,
bisimulation [215]);
• φQπ preserves only the state-action value function for all policies;
• φQa preserves the optimal state-action value function (e.g., stochastic dynamic
programming with factored representations [76], mentioned above);
• φa preserves the optimal action and its value, and thus does not guarantee learn-
ability of the value function for suboptimal actions, but does allow for planning
(i.e., value iteration).
• φπ attempts to preserve the optimal action, but optimal planning is generally lost,
although an optimal policy is still representable [277].
Ponsen et al. [425] present an interesting survey that summarizes the most important
techniques available to achieve both generalization and abstraction in Reinforcement
Learning, and illustrate them with examples. They rely on the KRA model presented
in Chap. 6.

9.3.2 Function Approximation in Reinforcement Learning

A tabular state-action representation (corresponding to a flat MDP) requires stor-

age proportional to the product of the size of all the states, action variables and
agents, leading to intractable storage and time complexity. The idea of “function
300 9 Abstraction in Machine Learning

(a) (b)
Fig. 9.24 a Example of a function approximation where the tabular state-action representation is
described using a decision tree or a neural network. b A decision tree dynamically abstracts the
state space (Adapted from Pyeatt and Howe [434])

approximation” is to change the representation of the mapping between the tab-

ular state-action into a more compact one, generalizing the value function across
many states (corresponding to a factored MDP). Function approximation are thus
frequently used in RL, implemented using supervised learning methods such as deci-
sion tree [433], linear regression, or neural networks [519]. Although they rely on
learning algorithms, the process of replacing a look-up table by a function approxi-
mation (be it a decision tree or a neural network) may be seen as an abstraction (see
Fig. 9.24), as it reduces the information by automatically aggregating states either
explicitly (in the case of a decision tree) or implicitly (in the case of a neural network).
This kind of abstraction is dynamic, in the sense that the space may be aggregated
or de-aggregated depending on the states explored by the agents. As a matter of facts,
the approximation function is continuously updated. The problem of the convergence
of the algorithms using such function approximation has been widely studied [363,
520, 523, 535]. The results include various types of convergence (good, lucky, bad)
or divergence [77, 523].

9.3.3 Task Decomposition and Hierarchical Reinforcement

Learning

Hierarchical Reinforcement Learning (HRL) factors out common substructures in

the functions that represent an MDP in order to solve it efficiently. It thus supports
decomposing a complex task into a set of simpler subtasks that can be solved inde-
pendently, so as to scale up RL to more complex problems. This factoring has been
done in many different ways. Here are three seminal works:
9.3 Abstraction in Reinforcement Learning 301

Fig. 9.25 A task directed graph for the Taxi problem. The leaves of this pyramid are primitive
actions (see Fig. 9.21). Root is the whole taxi task. The nodes represent individual subtasks that are
believed to be important for solving the overall task. Navigate(t), for example, is a subtask whose
goal is to move the taxi from its current location to one of the four target locations (indicated by the
formal parameter t). Get is a subtask whose goal is to move the taxi from its current location to
the passenger’s current location and pick up the passenger. Put is a subtask whose goal is to move
the taxi from the current location to the passenger’s destination location and drop off the passenger.
The directed links represent task dependancies. According to the pyramid, the Navigate(t) subtask
uses the four primitive actions North, South, East, and West. (Reprinted with permission from
Dietterich [137])

• Dietterich’s work on a pre-specified MAXQ hierarchy to re-use common elements

in a value function [137].
• Sutton’s work on options framework focusses on temporal abstraction, and re-use
of policy elements [516].
• Moore’s Airport Hierarchy allows automatic decomposition of a problem where
the specific goal could change over time [376].
Most of the research on HRL rely on action hierarchies (or directed graphs where
nodes may have several parents) that follow roughly the same semantics of hierarchies
of macros or subroutines [39]. Hierarchical structures within a Markov decision
problem can be either given by a “programmer” [18, 137] or automatically discovered
[246, 247, 300, 350, 376]. Fig. 9.25 shows a hierarchical task graph for the Taxi
problem.
The seminal paper by Dietterich [137] has inspired numerous developments. In
MAXQ states within a subtask are aggregated only if their reward and transition
functions are the same for any policy consistent with the hierarchy. As noted by Diet-
terich, many experiments show that HRL can be much faster than flat Reinforcement
Learning [137]. Recursively optimal policies can be decomposed into recursively
optimal policies for individual sub-tasks, and these sub-task policies can be re-used
whenever the same sub-task arises. There are two important issues in using MAXQ:
one is whether programmers will be able to design good MAXQ task hierarchies; the
second concerns how to recover an efficient policy from the suboptimal one resulting
from the task hierarchy. Neither recursively optimal nor hierarchically optimal poli-
cies are necessarily close to globally optimal ones. Luckily enough, several methods
302 9 Abstraction in Machine Learning

have been developed for reducing the degree of suboptimality. The most interesting
of these involves using the hierarchical value function to construct a non-hierarchical
policy that is provably better than the hierarchical one [519].
Hengst et al. [246] developed an algorithm that discovers sub-tasks automati-
cally. They introduce two completion functions, which jointly decompose the value
function hierarchically to solve problems simultaneously, and reuse sub-tasks with
discounted value functions. The significance of this result is that the benefits of HRL
can be extended to discounted value functions, and to continuous Reinforcement
Learning. Lasheng et al. [317] present the SVI algorithm, which uses a dynamic
Bayesian network model to construct an influence graph that indicates relationships
between state variables. Their work is also related to state abstraction as most work
in HRL. SVI performs state abstraction for each subtask by ignoring irrelevant state
variables and lower level subtasks. Experimental results show that the decomposition
of tasks introduced by SVI can significantly speed up constructing a near-optimal
policy. They argue that this can be applied to a broad spectrum of complex real world
problems such as robotics, industrial manufacturing, games and others.

9.3.4 Temporal Abstraction in Reinforcement Learning

The last type of abstraction we consider here is temporal abstraction, which has
been analyzed, in particular by Sutton et al. [519], within the framework of both
Reinforcement Learning and Markov Decision Processes. The main idea is to extend
the usual notion of action to include options, namely closed-loop policies for taking
actions over a period of time. Examples of options include picking up an object,
going to lunch, and traveling to a distant city, as well as primitive actions such as
muscle twitches and joint torques.
In previous works Sutton et al. [517, 518] used other terms including “macro-
actions” “behaviors”, “abstract actions”, and “sub-controllers” for structures closely
related to options. The term “options” is meant as a generalization of “actions”, which
is used formally only for primitive choices. It might at first seem inappropriate that
“option” does not connotate a course of non-primitive action, but this is exactly the
authors’ intention. They showed that options enable temporally abstract knowledge
and actions to be included in the Reinforcement Learning framework in a natural
and general way. In particular, options may be used interchangeably with primitive
actions in planning methods, such as dynamic programming, and in learning meth-
ods, such as Q-learning. Formally, a set of options defined over an MDP constitutes
a semi-Markov decision process. One of the tricks to treating temporal abstraction as
a minimal extension of the Reinforcement Learning framework is to build the theory
of options on the theory of semi-Markov decision processes (SMDPs, see Fig. 9.26).
Temporal abstraction provides the flexibility to greatly reduce computational com-
plexity, but can also have the opposite effect if used indiscriminately.
Representing knowledge flexibly at different levels of temporal abstraction has
the potential to greatly speedup planning and learning on large problems [317, 350].
9.4 Abstraction Operators in Machine Learning 303

Fig. 9.26 The state trajectory of an MDP is made up of small, discrete-time transitions, whereas
that of a SMDP comprises large, continuous-time transitions. Options enable an MDP trajectory to
be analyzed in either way [516]

Table 9.2 Several operators used in Machine Learning (focusing on Concept Learning and Rein-
forcement Learning), classified according to the elements they act upon, and to the type of abstraction
performed
Operators Objects Features Predicates & Functions
Hiding Instance selection Feature selection Predicates selection
Factored state
aggregation
Equating Clustering, Macro-action, Feature discretization Value/Function
Flat state aggregation approximation
Hierarchy Climbing hierarchy of Climbing hierarchy of
features or values tasks
Generation
Aggregating Term construction state Feature construction Predicate invention
space aggregation

9.4 Abstraction Operators in Machine Learning

In Machine Learning many abstraction operators have been implicitly or explicitly

used in the literature, and some of them are summarized in Table 9.2.
In this section we formalize, in the KRA model, the “concept learning” task, both
propositional and relational, and the Reinforcement Learning task. Then we discuss
some of the operators reported in Table 9.2.

9.4.1 Modeling Propositional Concept Learning

in the KRA Model

Let CL = {c1 , ..., cS } be a set of given “concepts” (classes), X the (possibly infinite)
set of instances of the classes, and L a language for representing hypotheses. The
set X contains the identifiers of the examples, whose description is provided by the
304 9 Abstraction in Machine Learning

choice of their attributes. Let moreover LS (with cardinality N) be the learning set.
The discriminant learning task can be formulated as the following query:
Q = Given a language L, a set of “concepts” (classes), a criterion for evaluating
the quality of a hypothesis, and a learning set LS, find the hypothesis belonging to L
that correctly assigns classes to previously unseen examples.
The examples are to be observed, and it is the task designer who decides what fea-
tures (attributes) are to be measured on examples and the granularity of the attribute
values. Usually, neither functions nor relations are included in the observations. Once
selected the attributes and their domains, a description frame Γ can be defined for
representing the examples:

Γ = ΓTYPE , ΓO , ΓA , ∅, ∅, with

ΓTYPE = {example}, ΓO = X and ΓA = {(Am , Λm ) | 1 m M)}

Examples are all of the same type example.

A generic configuration ψ, described by Γ and containing N examples, has the
format:

ψ = (xi , example, v1 (xi ), v2 (xi ), . . . , vM (xi )) 1 i N , (9.1)

where vm (xi ) ∈ Λm (1 m M) is the value of attribute Am on example xi .

In principle, examples belonging to different classes might be best described with
different sets of attributes, but, for the sake of simplicity, we consider all attributes
applicable to all classes, which is, anyway, the most common case handled in Machine
Learning.
For the sake of simplicity, we redefine:

xi = (xi , example, v1 (xi ), v2 (xi ), . . . , vM (xi )) (9.2)

Then:
ψ = xi 1 i N , (9.3)

From expression (9.3) we see that, with this formalization, one configuration corre-
sponds to a set of examples, as illustrated in Fig. 9.27.
If all examples are totally specified, i.e., there are no missing values, then the
P-Set P, containing all the observations necessary to answer Q, coincides with just
one configuration. If some examples have some missing values, then P corresponds
to the set of configurations consistent with it (see Definition 6.7).
In more details, we have P = O, A, F, R, where:

O = LS with |O| = N

Then:
A = {xi | 1 i N}
9.4 Abstraction Operators in Machine Learning 305

Fig. 9.27 Correspondence

between the space of all
possible sets of examples and
the configuration set Ψ . A
learning set LS is mapped to
a configuration ψ

As there are no functions or relations, we have F = ∅ and R = ∅.

Let us look now at the other components of the query environment. For what
concerns the database D, the attribute values are memorized into tables. In particular,
using the definition introduced in Chap. 6, D contains two tables, i.e., the OBJ and
the ATTR ones. The first has scheme OBJ = [ID, Type], whereas the second one has
scheme ATTR = [ID, A1 , . . . , AM ].
In propositional learning L is a language that may assume various format, depend-
ing on the type of algorithm used to learn. As an example, let us consider a propo-
sitional, conjunctive logical language L = (V, O), where V is a set of propositional
variables, and O is the set of standard logical connectives. V is defined as follows10 :

V = {pm,j ≡ [Am = vj ] | 1 m M, 1 j m },

where m = |Λm |.
The theory T contains, first of all, a learning algorithm LEARN. Then, we must
provide a criterion to compare candidate hypotheses, for instance, the Information
Gain, and another criterion to stop the search in the hypothesis space.A hypothesis
is of the form:
ϕh (xi ) ⇒ ci ,

where ϕh is a conjunctions of some of the propositional variables pm,j , and ci is the

class to be assigned to xi , given the theory.

9.4.2 Answering a Query Q in Propositional Concept Learning

Solving Q consists in applying LEARN to LS, and searching for a ϕ∗ using the given
criteria for hypothesis comparison and for stopping. For the sake of illustration, let
us consider a simple example, taken from Quinlan [440].
Example 9.4. Suppose that we want to decide whether to play or not to play tennis
in a given day, based on the day’s weather. We define ΓTYPE = {example}, and,

10 A contains the “selectors”, introduced by Michalski [368].

306 9 Abstraction in Machine Learning

Table 9.3 Table OBJ of

ID Type
Example 9.4
x1 example
x2 example
x3 example
x4 example
x5 example
x6 example
x7 example
x8 example
x9 example
x10 example
x11 example
x12 example
x13 example
x14 example

for instance, ΓO = {1, 2, . . . , 365}, i.e., the days of a year. Each day is described
by the attributes:

ΓA = (Outlook, {sunny,overcast,rain}),
(Humidity, {high,normal}),

(Temperature, {hot,mild,cool}), (Windy, {true,false})

We can build up the description frame:

Γ = ΓTYPE , ΓO , ΓA , ∅, ∅

All attributes are applicable to all examples. For learning, we consider a P = O, A,
∅, ∅, including in O a learning set LS of N = 14 examples. Then, O = {x1 , . . . , x14 }
and

A = {(x1 , example, sunny, hot, high, false),

(x2 , example, sunny, hot, high, true), . . . ,
(x14 , example, overcast, hot, normal, false)}

The database D contains two tables, namely OBJ, and ATTR, reported in Tables 9.3,
and 9.4, respectively.11
The language L consists of decision trees. Each node of the tree has an attribute
associated to it, and the edges outgoing from the node are labelled with the values

11 As all objects have the same type, the type specification is superfluous, but we have kept it for
the sake of completeness.
9.4 Abstraction Operators in Machine Learning 307

Table 9.4 Table ATTR of Example 9.4

ID Overlook Temperature Humidity Wind
x1 sunny hot high false
x2 sunny hot high true
x3 rain cool normal true
x4 sunny mild high false
x5 rain mild high true
x6 overcast hot high false
x7 rain mild high false
x8 rain cool normal false
x9 overcast cool normal true
x10 sunny cool normal false
x11 rain mild normal false
x12 sunny mild normal true
x13 overcast mild high true
x14 overcast mild high true

taken on by that attribute. Each path from the root the a node ν represents a conjunctive
description ϕ, and then the set of examples verifying ϕ is “associated” to ν as well.
Examples of more than one class can verify ϕ, but leaf nodes contain examples of
just one class.12
The theory T includes the learning algorithm LEARN = ID3 [440], and the
information gain IG as a hypothesis evaluation criterion. The stop criterion states
that learning stops when the frontier to be expanded in the decision tree T consists of
only leaf nodes. The examples in O are assigned by a teacher to one of the two classes
contained in CL = {Yes, No}. The labeling by the teacher (class No to examples
x1 , . . . , x5 , and class Yes to examples x6 , . . . , x14 ) adds a column to table OBJ,
reporting the correct classification. The new table in D is reported in Table 9.5.
The query can be formulated as follows:

Q = “Using O, find the best decision tree, according to the IG criterion,

to assign labels Yes or No to unknown examples.”

In order to answer the query, algorithm ID3 is run on LS, and the resulting “best”
decision tree is output.

9.4.3 Feature Selection in Propositional Learning

After having formalized the propositional learning problem inside the KRA model,
we address the task of feature selection. Feature selection, in this context, means

12 An exhaustive description of decision trees is provided by Quinlan [440].

308 9 Abstraction in Machine Learning

Table 9.5 New table OBJ of

ID Type Class
Example 9.4
x1 example No
x2 example No
x3 example No
x4 example No
x5 example No
x6 example yes
x7 example yes
x8 example yes
x9 example yes
x10 example yes
x11 example yes
x12 example yes
x13 example yes
x14 example yes

hiding attributes (features). We will concentrate here on the elementary operator

ωhattr (Aj , Λj ), which hides feature Aj .
Given the description frame introduced in Sect. 9.4.1, i.e.,

Γg = {example}, X , {(Am , Λm ) | 1 m M)}, ∅, ∅,

we apply to it operator ωhattr (Aj , Λj ). obtaining thus an abstract description frame

Γa :
Γa = {example}, X , {(Am , Λm ) | 1 m M, m = j)}, ∅, ∅

By applying ωhattr , there are subsets of examples in X that collapse. In fact, all those
examples only differing for the value of attribute Aj , become identical.
Let us now consider a specific learning task, in which we have:

Pg = LS, {(xi , example, v1 (xi ), . . . , vM (xi ) | 1 i N}, ∅, ∅.

When ωhattr is applied to Pg , we obtain:

Pa = LS, {xi , example, v1 (xi ), . . . , vj−1 (xi ),

vj+1 (xi ), . . . , vM (xi )) | 1 i N}, ∅, ∅

As already mentioned, if in LS there are two examples that only differ for the
values of the attribute Aj , then the descriptions of those examples now coincide.
However, the examples are still distinguishable, due to their unique identifiers. The
associated method shall specify if duplicate examples are to be removed or not.
Let COMPg (Pg ) be the set of configurations in Ψg consistent with Pg . If no
example has missing values, then COMPg (Pg ) contains a unique configuration
9.4 Abstraction Operators in Machine Learning 309

Fig. 9.28 Examples of four

structured objects, used to
learn the concept of an “arch”.
Each component has a shape
(rectangle or triangle) and a
color (blue, red, yellow, or
green). They are linked by two
relations, namely Rontop and
Radjacent . [A color version of
this figure is reported in Fig.
H.14 of Appendix H]

(a)
Pg ≡ ψg , otherwise |COMPg (Pg )| > 1. An abstract example xi corresponds
to j = |Λj | ground examples.

9.4.4 Modeling Relational Concept Learning in the KRA

Model

Let us extend the modeling of propositional learning to a relational framework. Apart

from the use of a more expressive language, structural learning is not different in
terms of query from the propositional learning task.
There are several approaches of relational concept learning that have historically
been introduced, namely Structural learning [52, 135, 563], Inductive Logic Pro-
gramming (ILP) [382, 385] and Statistical Relational Learning [200], to name just
the most well-known ones.
In relational learning examples may have an internal structure, i.e., they con-
tain components, each one with its own attributes and connected one to another by
relations. The components can be considered as elementary objects. In Fig. 9.28 an
example is provided.
In the KRA model a description frame Γ = ΓTYPE , ΓO , ΓA , ΓR can be
defined, where ΓTYPE contains the types of the elementary objects occurring in the
examples, ΓO contains a set of object identifiers, ΓA specifies the attributes of the
elementary objects, and ΓR the relations linking objects among each others. We do
not consider functions, as this is the most frequent case.
For learning, we observe a specific P, which contains a set of observed elementary
objects, their attributes, and their relations. These objects will be organized into a
database D, and then a language L is constructed as it was described in Sect. 6.3.
The most commonly used language in relational learning is a DATALOG one.
The theory T contains a learning algorithm LEARN, possibly some background
knowledge BK, and a labeling supplied by the teacher. This labeling consists, for
310 9 Abstraction in Machine Learning

each elementary object o, of two pieces of information, namely the values of two
special attributes: Example and Class. Attribute Example specifies which example o
belongs to, and Class specifies which class the example (and, hence, also object o) is
an instance of (See illustrative example in Table 9.1). These two attributes are added
as columns to the table OBJ in D. Depending on the learning algorithm, the standard
format of the database D, as it is built up in the KRA model, may or may not be used.
Then, data in D might be reformulated, without changing the information content they
provide, in another format. One commonly used transformation is a reformulation
of the data into ground logical formulas, to be used by an ILP algorithm. Another
way is to use relational algebra operators to regroup the data into a different set of
tables, one for each example. Relational learning is one of the cases where multiple
formulations of the same data can be present in D, to be used according to the nature
of the theory.
Finally, in the theory T we have to provide a mean to evaluate learning and,
as usual, a stopping criterion. In relational learning the language for representing
hypotheses is a subset of a First Order Logic, so that hypotheses have variables that
must be bound to objects. Remember that an example is here a composite object. In
order to see whether an example satisfies a formula, it is frequently used a restricted
form of deduction called θ-subsumption [384, 423].
Definition 9.1. {θ-Subsumption in DATALOG} Let h(x1 , x2 , ..., xn ) be a First
Order Logic formula, with variables xi (1 i n), and e the description of an
example. We will say that h subsumes e if there exists a substitution θ for the vari-
ables in h which makes h a subset of e.
The θ-subsumption relation offers a means to perform the covering test between
a hypothesis h and an example e, but also for testing the more-general-than relation.
An informal, but intuitive way for testing whether a hypothesis h covers a learning
example e is to consider each atomic predicate in h as a test to be performed on the
example e. This can be done by binding the variables in h to components of e, and then
ascertaining whether the selected bindings verify, in the example e, the predicates
appearing in h. The binding procedure should be made for every possible choice of
the objects in e. The procedure will stop as soon as a binding satisfying h is found,
thus reporting true, or it will stop reporting false after all possible bindings have
been tried. The procedure is called, as in the propositional case, “matching h to e”.
Matching h to e has the advantage that it avoids the need of translating the example
e into a ground logical formula, because examples normally come in tabular form.
In practice, several learners use this matching approach for testing coverage [52, 69,
439].
For the sake of exemplification, let consider an example.
Example 9.5. Let Γ = ΓTYPE , ΓO , ΓA , ΓF , ΓR be a description frame, where:

ΓTYPE = {girder, support}

ΓO = {a1 , a2 , a3 , a4 , b1 , b2 , b3 , b4 , c1 , c2 , c3 , c4 }
9.4 Abstraction Operators in Machine Learning 311

ΓA = {(Shape, {triangle,rectangle}),
(Color, {yellow,green,blue,red})}
ΓF = ∅
ΓR = {Rontop , Radjacent }

In Fig. 9.28 a P-Set P = O, A, F, R is reported. Specifically:

O = Ogirder ∪ Osupport
Ogirder = {c1 , c2 , c4 }
Osupport = {a1 , a2 , a3 , a4 , b1 , b2 , b3 }
A = {(a1 , support,rectangle, blue),
= (a2 , support,rectangle,yellow),
= (a3 , support,rectangle,blue),
= (a4 , support,rectangle,blue),
= (b1 , support,rectangle,blue),
= (b2 , support,rectangle,red),
= (b3 , support,rectangle,yellow),
= (c1 , girder,rectangle,red),
= (c2 , girder,triangle,green),
= (c4 , girder,rectangle,yellow)}
R = {RCOV (Rontop ), RCOV (Radjacent )}
RCOV (Rontop ) = {(c1 , a1 ), (c1 , b1 ), (c2 , a2 ), (c2 , b2 ), (c4 , a4 )}
RCOV (Radjacent ) = {(a3 , b3 )

The encoding of P in the database generates the set of tables reported in Fig. 9.29.
From the theory we also know the distribution of the objects among the examples
e1 , e2 , e3 , e4 , as described in Fig. 9.28. Moreover, we have two classes, i.e., CL =
{Arch, NoArch} and the teacher tells that e1 and e2 are arches, whereas e3 and
e4 are not. Then, the OBJ table in D is modified as in Fig. 9.30.
The language L is a DATALOG language L = C, X, O, F, P, where:

C = {a1 , a2 , a3 , a4 , b1 , b2 , b3 , b4 , c1 , c2 , c3 , c4 , rectangle, . . . ,
small, . . . , yellow, . . .}
P = {girder(x), support(x), shape(x, rectangle), shape(x, triangle),
= color(x, red), color(x, blue), color(x, yellow), color(x, green),
= ontop(x, y), adjacent(x, y), arch(x), noarch(x), example(x)}
312 9 Abstraction in Machine Learning

Fig. 9.29 Database D obtained from the P in Example 9.5

Fig. 9.30 New table OBJ,

where each elementary object
is assigned to an example and
to a class

9.4.4.1 Abstraction Operators in Relational Concept Learning

In this section we consider a problem of relational learning, and we show how

abstraction operators can easily be applied to it. For illustrative purposes we use the
historical example of Michalski’s trains [367]. We can use an algorithm LEARN
to learn relational concepts. Evaluation criterion is a combination of correctness,
completeness, and simplicity. A stop criterion is also supplied. The task is to discover
a low size hypothesis for classifying trains as Eastbound or Westbound [369]
as illustrated in Fig. 9.31.
Let us first build up a description frame Γg . The elementary objects that can be
observed are engines, cars and loads. Then:
9.4 Abstraction Operators in Machine Learning 313

Fig. 9.31 The 10-trains original East-West challenge, after Michalski [367]

(g)
ΓTYPE = {engine, car, load}

Moreover:
(g)
ΓO,engine = {g1 , g2 , . . .}
(g)
ΓO,car = {ci,j | i, j 1}
(g)
ΓO,load = {i,j,k | i, j, k 1}

and
(g) (g) (g) (g)
ΓO = ΓO,engine ∪ ΓO,car ∪ ΓO,load

Cars and loads have attributes associated to them, whereas engines do not.

(g)
ΓA,car = {(Cshape, ΛCshape ), (Clength, {long, short}),
(Cwall, {single, double}), (Cwheels, {2, 3})},

where:

ΛCshape = {openrect, closedrect, ushaped, trapshaped,

hexagon, oval, roofshaped}
(g)
ΓA,load = {(Lshape, {triangle, rectangle, circle, hexagon,
diamond})}

and
(g) (g) (g)
ΓA = ΓA,car ∪ ΓA,load
314 9 Abstraction in Machine Learning

(g) (g)
No function is considered, so that ΓF = ∅. Finally, ΓR = {RInfrontof , RInside }
contains the considered relation between pairs of elementary objects. More precisely:

(g) (g) (g)
RInfrontof ⊆ ΓA,engine ∪ ΓA,car × ΓA,car
(g) (g)
RInside ⊆ ΓA,load × ΓA,car

Figure 9.31 contains an observed set of trains. Then, Pg = Og , Ag , Fg , Rg , where:

Og,engine = {g1 , g2 , . . . , g10 }

Og,car = {c1,1 , c1,2 , c1,3 , c1,4 , c2,1 , c2,2 , c2,3 , c3,1 , c3,2 , c3,3 , c4,1 , c4,2 , c4,3 ,
c4,4 , c5,1 , c5,2 , c5,3 , c6,1 , c6,2 , c7,1 , c7,2 , c7,3 , c8,1 , c8,2 , c9,1 , c9,2 ,
c9,3 , c9,4 , c10,1 , c10,2 }
Og,load = {1,1,1 , 1,1,2 , 1,1,3 , 1,2,1 , . . . , 10,1,1 , 10,2,1 , 10,2,2 }

The name ci,j denotes the jth car in train i, counted starting from the one directly
connected to the engine. Similarly, i,j,k denotes the kth load (from the engine to the
rear of the train) in car ci,j . Notice that the meaning of the indexes is only for the
reader.
The set Ag contains the specification of the attributes for each object:

Ag,engine = {(g1 , engine), (g2 , engine), . . . , (g10 , engine)}

Ag,car = {(c1,1 , car, openrect, long, single, 2),
(c1,2 , car, openrect, short, single, 2),
············
(c10,2 , car, openrect, long, single, 2)}
Ag,load = {(1,1,1 , load, square), (1,1,2 , load, square),
(1,1,3 , load, square),
············
(10,2,1 , load, rectangle), (10,2,2 , load, rectangle)}

The set Fg is empty, whereas Rg = {RCOV (RInfrontof ), RCOV (Inside)}. In more

details:

RCOV (RInfrontof ) = {(g1 , c1,1 ), (c1,1 , c1,2 ), (c1,2 , c1,3 ), . . . , (g10 , c10,1 ),
(c10,1 , c10,2 )}
RCOV (RInside ) = {(1,1,1 , c1,1 ), (1,1,2 , c1,1 ), (1,1,3 , c1,1 ), . . . ,
(10,2,1 , c10,2 ), (10,2,2 , c10,2 )}
9.4 Abstraction Operators in Machine Learning 315

Fig. 9.32 Tables in the Dg of Michalski’s “trains” problem, referring to the objects and their
attributes

Fig. 9.33 Tables in the Dg of

Michalski’s “trains” problem,
containing the covers of the
relations

The theory Tg contains the learning algorithm LEARN, and the criteria for stopping
the search and for evaluating hypotheses. Moreover, the query Q specifies that there
are two classes, namely CL = {East, West} and the teacher labels all elementary
objects with respect to the classes and the examples.13
The database Dg contains the tables reported in Figs. 9.32 and 9.33, where OBJ
has already incorporated the information provided by the teacher, referring to the
query.
In order to apply LEARN it is often more convenient to reformulate the content of
Dg in such a way to put together all the information referring to a single example. It is
sufficient to make a selection on table OBJ on the base of condition “Example = ei ”
and then selecting from the other tables the rows corresponding to the ID of the

13 More details can be found in Appendix G.

316 9 Abstraction in Machine Learning

Fig. 9.34 Reformulation of the database Dg in such a way that information regarding a single
example (train 6) is grouped

objects extracted from OBJ. As an example, we report in Fig. 9.34 the reformulation
of example e6 .
The ground language L we consider (noted Lg ) is a DATALOG language Lg =
Cg , X, O, Fg , Pg , where:

Cg = Og
Fg = ∅
Pg = {engine(x), car(x), load(x) ∪ {example(x, y) | x ∈ Og , y ∈ {e1 , . . . , e10 }}
∪ {class(x, z) | x ∈ Og , z ∈ CL} ∪ {cshape(x, v)|x ∈ Og,car , v ∈ ΛCshape }
∪ {clength(x, v)|x ∈ Og,car , v ∈ ΛClength }
∪ {cwall(x, v)|x ∈ Og,car , v ∈ ΛCwall }
∪ {cwheels(x, v)|x ∈ Og,car , v ∈ ΛCwheels }
∪ {lshape(x, v)|x ∈ Og,load , v ∈ ΛLshape }

We recall that we used throughout the book the convention of using the same
names for the objects in O and the constants in L.
By using LEARN, several sets of rules distinguishing trains going East from trains
going west can be found. Let us consider the following ones:

r1 : car(x) ∧ cshape(x, roofshaped ∨ closedrect ∨ hexagon ∨ oval)∧

clength(x, short) ∧ example(x, e) → class(e, East)
r2 : car(x1 ) ∧ car(x2 ) ∧ infrontof(x1 , x2 ) ∧ example(x1 , e) ∧ example(x2 , e)∧
¬example(x3 , e) → class(e, West)
r3 : car(x) ∧ cshape(x, raggedtop) ∧ example(x, e) → class(e, West)

We want now to apply some abstraction operators to the learning problem:

• Aggregation operator, to generate new terms,
• Hierarchy building operator, to generate more generic values for attributes,
9.4 Abstraction Operators in Machine Learning 317

• Construction operator to generate new attributes.

Concerning aggregation,
we want to put
together a car and its loads. Then, we
apply operator ωaggr (car, load), t(a) , where t(a) = loadedcar. Then:

(a) (g)
ΓTYPE = ΓTYPE ∪ {loadedcar}
(a) (a) (g)
ΓO = ΓO ∪ ΓO,loadedcar
(a) (r) (a) (a)
ΓR = ΓR − {RInfrontof } ∪ {RInfrontof }

(a)
Relation RInfrontof is defined as follows:

z = g(y, x1 , . . . , xn ) ∧ (y , y) ∈ RCOV (RInfrontof )

(a)
⇒ (y , z) ∈ RCOV (RInfrontof )
z = g(y, x1 , . . . , xn ) ∧ (y, y ) ∈ RCOV (RInfrontof )
(a)
⇒ (z, y ) ∈ RCOV (RInfrontof )

The actual way of performing the aggregation is explained by the method

meth(Pg , ωaggr ((car, load), loadedcar)). As a result of the aggregation
a RPartof relation is created, whose cover is the following:

RCOV (RPartof ) = { c1,1 , lc1,1 , 1,1,1 , lc1,1 , 1,1,2 , lc1,1 ) , 1,1,3 , lc1,1 ,

. . . , 10,2,2 , lc10,2 }

The theory Tg has to be modified by the operator τaggr Tg , (car, load), t(a) . The
learning algorithm does not change, and the only changes are the following ones:

z = g(y, x1 , . . . , xn ) ∧ [Example(y) = e] ⇒ [Example(z), e]

z = g(y, x1 , . . . , xn ) ∧ [Class(y), χ] ⇒ [Class(y) = χ], with χ ∈ CL

In addition to aggregate cars and loads, we want to generate a node in a hierarchy

including values of the attribute Cshape. In particular, we want to apply

ωhierattrval (Cshape, ΛCshape ), ΛCshape,child , v(a) ,

where ΛCshape has been given before, v(a) = closedtop and

ΛCshape,child = {closedrect, roofshaped, raggedtop,

hexagon, oval} .
318 9 Abstraction in Machine Learning

The hierarchy operator can be applied, in this case, independently from the aggre-
(a)
gation operator. It only affects ΓA , changing the value of car’s attribute Cshape.
Now we will show how the construction operator can be applied to generate a new
attribute for object of type loadedcar, which does not add new information, as it
can be derived from the ground space. The corresponding operator is ωconstr (Count),
where:
(g) (g) (a)
Count : ΓO,car × (ΓO,load )k → ΓA

Count takes as input a loaded car and its loads, and counts how many loads there
are. If the number of loads is 3, the car is declared Heavy. Then we define a new
attribute for cars, namely (Heavy, {true, false}), associated to objects of type
loadedcar.
The operator ωconstr (Count) can be applied independently from both the aggrega-
tion and the hierarchy building one. It can be easily applied with a relational algebra
operation in Dg .
The application of the three above mentioned operators generates three abstract
description frames. In order not to multiply these, we define a parallel/sequential
abstraction process Π :

Π = {ωaggr ((car, load), loadedcar) ,

ωhierattrval (Cshape, ΛCshape ), ΛCshape,child , closedtop }
⊗ ωconstr (Count)

(1) (2)
If we call Γa the description frame generated by ωaggr , Γa the description
frame generated by ωhierattrval , and Γa the final one obtained by Π , we have the
following relations with respect to the relative abstraction levels:

Γa(1) Γg , Γa(2) Γg , Γa Γa(1) , Γa Γa(2) . (9.4)

Clearly, by transitivity we also have Γa Γg . In order to see how Pa looks like,

we provide the description Da , which is equivalent to Pa , and is reported in Fig. 9.35.
The language La has to be modified in an obvious way. Using Γa algorithm
LEARN finds the following rules:

r1 : loadedcar(z) ∧ cshape(z, closedtop) ∧ clength(x, short)∧

example(x, e) → class(e, East)
r2 : loadedcar(z1 ) ∧ loadedcar(z2 ) ∧ infrontof(z1 , z2 ) ∧ example(z1 , e)∧
example(z2 , e) ∧ ¬example(z3 , e) → class(e, West)
r3 : loadedcar(z1 ) ∧ cshape(z1 , closedtop) ∧ long(z1 ) ∧ loadedcar(z2 )∧
cshape(z2 , ushaped ∨ trapshaped}) ∧ example(z1 , e)∧
example(z2 , e) → class(e, West)
9.4 Abstraction Operators in Machine Learning 319

Fig. 9.35 Database Da corresponding to the perception Pa obtained by applying the abstraction
process Π

By analyzing the above rules, we may notice that aggregation has reduced the com-
plexity of the description without affecting the quality of the classification rules. The
hierarchy building operator has simplified rule r1 , but has negatively affected rule r3 ;
in fact, by replacing the raggedtop value with closedtop, rule r3 also covers
trains 3 and 5, which are bound East. Then, rule r3 becomes more complex, as it was
necessary to add that there is also a U-shaped or trap-shaped car in the train.
As a last step, we would like to add a new level of abstraction, where only the trains
as single
entities are present. We have to apply again an aggregation operator, namely
ωaggr (engine, loadedcark ), train . The aggregation rule is the following:

(g) (g)
f (y, x1 , . . . , xn ) = If [y ∈ ΓO,engine ] ∧ [x1 , . . . , xk ∈ ΓO,loadedcar ] ∧
(a)
[(y, x1 ), (x1 , x2 ), . . . (xk−1 , xk ) ∈ RCOV (RInfrontof )]
Then t

We have to decide what attributes (if any) are to be transferred to the trains. Only the
length, called Tlength, is applicable. None of the relations is applicable anymore. We
obtain then a new description framework Γa , more abstract than Γa , which contains:

(a ) (a)
ΓO = ΓO,train

ΓA(a ) = {(Tlength, {long, short})}

ΓF(a ) = ∅
(a )
ΓR =∅

The value long of the attribute Tlength is applied to a train if it has 3 or more
loaded cars, otherwise the train is short. In this abstract space clearly the trains
cannot be distinguished anymore; in fact, even if it is true that all trains going East
are long, two of those going West are long as well. Then, we have removed too much
320 9 Abstraction in Machine Learning

information to still be able to answer our question, i.e., to learn to distinguish the
two sets of trains.

9.4.5 Modeling Reinforcement Learning in the KRA Model

In this section, which addresses the modeling of Reinforcement Learning in the

KRA model, we will focus on factored MDPs (i.e., MDPs in which states have a
combinatorial structure) and not on flat MDPs (see Sect. 9.3). Let us consider the case
where an agent’s goal is to find a policy that maximizes the discounted sum (given a
discount factor γ) of the (finite) future rewards. The Reinforcement Learning problem
can be formulated as the following query:

Q = Given a MDP = S, A, P, R, γ, find an optimal policy π ∗ that

simultaneously maximizes the expected cumulative reward in all states

s ∈ S : V π (s) = E rt + γrt+1 + γ 2 rt+2 + · · · st = s, π .

The associated optimal value function is noted V ∗ (s), and it is the unique solution
to Bellman equation:

V ∗ (s) = max P(s |s, a) R(s |s, a) + γV ∗ (s ) . (9.5)
a
s

Let us now model this query Q in the KRA model. The observable states S of the
environment are objects of type state. The observable actions A are objects of type
action. The parameter γ is a constant of type R+ . States are to be observed, and it
is the task designer who decides what features are to be measured in each state and
their value sets. Each state is thus described by a set of attributes and corresponding
values (Am , Λm ) (1 m M). The probability distribution P(s |s, a) is represented
by a function whose domain is S × A × S and its co-domain is [0, 1]. The reward R
is a function that has domain S × A, but its co-domain is given by the designer. Let
us add to the description frame two other functions, i.e., the current policy π, and
one value function V π (s) (function Qπ (s, a) could have been chosen instead). No
relations are used.
Once the attributes of the states, their values, and the actions are all selected, a
description frame Γ = ΓTYPE , ΓO , ΓA , ΓF , ΓR , can be defined, where14 :
ΓTYPE = {state, action, real},
ΓO,state = S, ΓO,action = A, ΓO = S ∪ A ∪ R,
ΓA = ΓA,state = {(Am , Λm )|1 m M}
ΓF = {V π : S → R, π : S → A, P : S × A × S → [0, 1], R : S × A → R},
ΓR = ∅.

14 In principle, also attributes for the actions could be envisaged. They can be added if needed.
9.4 Abstraction Operators in Machine Learning 321

A generic configuration ψ, described by Γ and belonging to the configuration space

Ψ , has the following format:

(M)
ψ ={(si , state, v(1) (2)
i , vi , . . . , vi ) 1 i |S|} ∪

{(aj , action) 1 j |A|} ∪ (9.6)
FCOV (V π ) ∪ FCOV (π) ∪ FCOV (P) ∪ FCOV (R)

where FCOV (V π ) contains pairs of the form (si , Viπ ), FCOV (π) contains pairs of
the form (si , πi = aj ), FCOV (P) contains 4-ple of the form (si , aj , sk , pijk ), and
FCOV (R) contains triplets of the form (si , aj , rij ), with:
• si is the identifier of a state,
• v(m)
i ∈ Λm ( 1 m M) is the value of attribute Am of state si ,
• Viπ is the value function in state si ,
• πi the value of the policy in state si , i.e., πi = π(si ) = aj ,
• pi,j,k is the probability of obtaining state sk by applying action aj in state si ,
• ri,j is the reward received by the agent choosing action aj in state si .
The theory T contains the discount parameter γ ∈ R, the Bellman equation (9.5),
and an algorithm Algo that chooses the action to apply at each step. In addition
T must provide a criterion to stop Algo. As a learning algorithm we can consider
Q-learning; after state s has been observed, action a has been chosen, a reward r has
been gathered, and the next state s has been observed as well, Q-learning performs
the following update:

Qt (s, a) := (1 − αt )Qt−1 (s, a) + αt [r + γ max

Qt−1 (s , a )], (9.7)
a

where αt is a learning rate parameter. For choosing the action we may consider a
classic -greedy policy, which chooses a random action with probability instead of
trying to choose the “best” action in terms of the highest value of the successor states
(with probability 1/ ). Equation (9.7), and parameters αt and are to be added to
the theory.
With this formalization, one configuration corresponds to a complete description
of the knowledge of the agent at time t. As opposed to the modeling of propositional
concept learning, Reinforcement Learning cannot easily be modeled as an inference
from a set of known facts. As a matter of facts, in Reinforcement Learning the agent
both explores the world and learns from it at the same time.

9.4.5.1 Solving a Query Q in Reinforcement Learning

Solving a query Q in RL consists in applying Algo to a set of learning examples

LS, and searching for a π ∗ (and hence a V ∗ ), using the search and stopping criteria
provided by the theory.
322 9 Abstraction in Machine Learning

Example 9.6. For the sake of illustration, let us consider the simple Taxi example
(taken from [137], and described in Fig. 9.20), where a taxi has to navigate a 5-by-5
grid world, picking-up a passenger and delivering him/her to a destination. There are
six primitive actions in this domain: (a) four navigation actions that move the taxi one
square North, South, East, or West, (b) a Pickup action (only possible if
the taxi is at the passenger’s location), and (c) a Putdown action (only possible if
a passenger is in the taxi at his/her destination). The agent receives a reward of −1
for each action, and a final +20 for successfully delivering the passenger to his/her
destination. There is a reward of −10 if the taxi attempts to execute the Putdown
or Pickup actions illegally. If a navigation action would cause the taxi to hit a wall,
the action is a no-op, and there is only the usual reward of −1. The six primitive
actions are considered deterministic for the sake of simplicity. The query is to find a
policy that maximizes the total reward per episode.
First of all, we have objects of two types:

ΓTYPE = {state, action}

and the sets of object identifiers:

ΓO,state = {si | 1 i N},

ΓO,action = {North,South,East,West,Pickup,Putdown}

Then, we have to define the structure of the states and their attributes. A state is
a triple ((i, j), 1 , 2 ) containing the location of the taxi, (i, j), the initial location
of the passenger, 1 , and the location 2 of the final destination. We can define the
following attributes for a state:

ΓA,state = {(TaxiLocation, {(i,j)|0 i, j 4}),

(PsgLocation, {R,G,B,Y,inTaxi}),
(PsgDestination, {R,G,B,Y})}

Considering the attributes, there are in total N = 500 states, because there are 25
values for TaxiLocation, 5 values for PsgLocation, and 4 values for PsgDestination.
Actions may have applicability constraints associated to them, which we can model
as the values of an attribute Constr. Hence:

ΓA,action = {(Constr, ΛConstr )},

where ΛConstr is the set of given constraints. The theory contains the value of the
parameters γ and α.
The six primitive actions are considered deterministic for the sake of simplicity,
and thus the transition probability function P takes values in {0, 1}. The knowledge
9.4 Abstraction Operators in Machine Learning 323

about the walls (e.g., a Taxi

at (0,0) can only move North) is encoded in this function
as well. For example, P ((i, j), 1 , 2 ) | ((0, 0), 1 , 2 ), a = δi,0 · δj,1 .15
As no relation is considered, only the set of functions is still to be defined. We
have for this:

ΓF = {V π : S → R, π : S → A, P : S × A × S → {0, 1}, R : S × A → R}

Given the description frame built up as above, we consider now a specific RL task
for the Taxi problem, namely a P-Set P. As the world does not change from one
problem instance to another, a problem instance is specified by the initial location
1 of the passenger and his/her destination, 2 . Let, in our case, 1 = Y and 2 = B.
Then, P = O, A, F, R contains:

Ostate = {o}
Oaction = {North, Sud, East, West, Pickup, Putdown}

Notice that P contains a single, not completely specified state o, which corresponds
to 125 observable states {si | 1 i 125}, because the position of the taxi is not
observed. However, the algorithm Algo, given in the theory, may use non observed
states, because the position of the passenger may change to inTaxi or to B.
For A we simply have:

A = { (UN,UN), Y,B }

Finally, R = ∅ and

F = {FCOV (V π ), FCOV (π), FCOV (P), FCOV (R)}

The covers of all functions are given by the designer. The observed information is
stored in a database D, where the table OBJ contains both the identifiers of the states
and those of the actions, the tables state-ATTR and action-ATTR contain the
attributes of the states and of the actions, respectively, and there is one table for each
function cover. The theory contains the parameters α = 0.1 and γ = 0.9.
There are several existing algorithms in RL that provide good solution to large MDPs.
One of their limitations is that in most cases they consider S as a single “flat” search
space [137]. These methods have been successfully applied in several domains, such
as game playing, elevator control, and job-shop scheduling. Nevertheless, in order to
scale to more complex tasks, which have large state spaces and a complex structure,
abstraction mechanisms are required.
In Sect. 9.3 we have briefly introduced the four dimensions along which abstrac-
tion has been explored in the field of Reinforcement Learning: State aggregation,

15 This formulation is equivalent to say that the probability is equal to δa,North , because we are in
the deterministic case.
324 9 Abstraction in Machine Learning

Transition function approximation, Value function approximation, Macro-actions.

In the next section we will use the formalization of the KRA model to describe state
abstraction in Reinforcement Learning.

9.4.5.2 State Aggregation in Reinforcement Learning

After having formalized Reinforcement Learning in the KRA model, we describe

the process of state aggregation [330]. As mentioned is Sect. 9.3, state aggregation
can be performed either on a flat representation (with a tabular representation of the
states [18, 277, 432]), or on a factored16 one [225, 226].
How to abstract states may be suggested by an expert, but there are several
approaches that have addressed the question of learning how to reduce a large set of
states to a single state in an abstract space [25, 243, 246, 277]. One key research issue
is thus to show that the generated abstraction provides an accurate policy. We will
not review the literature on this subject, but we show how to produce the abstracted
representation, given the states to be aggregated.
We focus here on the elementary operator ωaggr (t, t(a) ), which takes objects
of the same type t = state as input, and forms a new object out of them, with
type t(a) = newstate. Following Sect. 9.4.5, a ground description frame Γg =
(g) (g) (g) (g) (g)
ΓTYPE , ΓO , ΓA , ΓF , ΓR , can be defined, where:

(g)
ΓTYPE = {state, action, real},
(g) (g) (g)
ΓO,state = S, ΓO,action = A, ΓO = S ∪ A ∪ R,
(g) (g)
ΓA = ΓA,state = {(Am , Λm )|1 m M}
(g)
ΓF = {V π : S → R, π : S → A, P : S × A × S → [0, 1], R : S × A → R},
(g)
ΓR = ∅.

By applying the operator to several ground states, specified in the set ΓO,state,aggr
a new one is created. A more abstract description frame is obtained, where:

(a)
ΓTYPE = {state, action, real, newstate},
(a) (a) (a)
ΓO,state = S − ΓO,state,aggr , ΓO,action = A, ΓO,newstate = {c, c1 , · · ·}
ΓO(a) = ΓO(a),state ∪ A ∪ R ∪ ΓO(a),newstate
(a)
ΓA,state = {(A(a) (a)
m , Λm )|1 m M}

ΓA(a),action = {(Constr (a) , ΛConstr (a) )}

16 As mentioned before, factored MDPs exploit problem structure to represent exponentially large
state spaces very compactly [76].
9.4 Abstraction Operators in Machine Learning 325

(a) (b)
Fig. 9.36 a An abstracted state space (after [25]) with six states B1 to B6 . If the taxi is in block B2 , it
can go left to B1 , right to B6 and down to B4 . In B3 , the taxi can only go up to B1 . b A reformulation
of the abstract space that makes it similar to the ground formulation

(a) (a) (a)

ΓF = {V π,(a) : ΓO,state ∪ ΓO,newstate → R,
π : ΓO(a),state ∪ ΓO(a),newstate → A,
(a) (a) (a) (a)
P : ΓO,state ∪ ΓO,newstate × A × ΓO,state ∪ ΓO,newstate → [0, 1]
(a) (a)
R : ΓO,state ∪ ΓO,newstate × A → R}
(a)
ΓR = ∅.

All the details of the actual aggregation are defined in meth(Pg , ωaggr ). This
method determines what attributes are to be kept for the new aggregated states and
with what values, and describes how the abstract functions are to be computed.
For instance, suppose that a new state c is formed by aggregation of k original
states {s1 , . . . , sk }. Then, the attribute TaxiLocation could be defined as the averages
of the i’s and j’s of the components states. For the attribute PsgLocation, state c
could be labelled, for instance, R, if R ∈ {s1 , . . . , sk }. The same can be done for
PsgDestination. Actually, this abstraction could also be realized by equating subsets
of values in the domains of the attributes characterizing the states.
Two instances of state abstraction are presented in Fig. 9.36 for the Taxi problem
of Example 9.6. A taxi can navigate between two points with or without a client in
a similar manner; then, all the states corresponding to the taxi with or without the
passenger can be considered equivalent. The same is true regarding the passenger
destination with respect to the “navigate” and “get a passenger” subtasks. In Fig. 9.36
the number k of abstract taxi locations is 6 instead of the 25 initial ones.
In the abstract representation, a passenger located in Y at (0,0) in the ground
space (see Fig. 9.20) is now in B3, and its destination (3,0) in the ground space is
now in state B4. A solution in the abstract space is for the Taxi to go through state B3,
pickup the passenger, go through B1 and B2, and finally drop the passenger in B4.
326 9 Abstraction in Machine Learning

9.5 Summary

As in other fields of Artificial Intelligence, abstraction plays a key role in learning.

We have seen that two of the most studied paradigms of Machine Learning, namely
learning from examples and learning from reinforcement, could greatly benefit from
abstraction. However, there are cases in which abstraction may not prove useful, and
the search for good abstractions is yet an open problem. Beside expert knowledge,
such as the one required by the choice of the kind of abstraction, many approaches
have been explored to find or to learn useful abstractions. In the case of learning
from examples, the most frequently used approaches are feature selection and feature
discretization. To scale up to relational representation, term abstraction has been yet
underused, and there are many avenues for future research.
In the case of Reinforcement Learning, abstraction methods can be either model
driven (by analyzing the transition table and approximating it using a dynamic
Bayesian network), or value driven (by analyzing the function V , and learning for
it a compact representation, such as a decision tree), or policy driven ([17, 18]).
Li et al. [330] have proposed a general treatment of state abstraction, which unifies
many previous works on the topic, focusing on the abstraction theory itself, including
formal definitions, preservation of optimality and learnability.
In this chapter we have described how Machine Learning tasks can be formulated
in the KRA model. The interest of such a modellization is twofold: on the one
hand, it unifies into the same framework different types of abstraction and different
Machine Learning tasks, which can be handled in a uniform way. On the other
hand, KRA supports the possibility of an automatic and systematic exploration of
representation changes in learning. In other words, the Filter, Wrapper and Embedded
approaches (see Fig. 9.37), used for feature selection, can be extended to include many
other types of representation changes, in both propositional and relational learning,

Fig. 9.37 Three approaches to combine abstraction and learning. The idea has its root in the Feature
Selection task, which uses the hiding feature operator, and can be extended to any set of abstraction
operators
9.5 Summary 327

without the need to manually re-implement some change of representation at each

new application. It is sufficient to insert, in the pre-defined search loop, already
implemented abstraction operators.
This view of abstraction could prove particularly well suited to complex learning
tasks, such as those arising in learning in the relational setting, or in performing data
mining on graphs and complex networks.
Chapter 10
Simplicity, Complex Systems, and Abstraction

“Simplicity is the ultimate sophistication”

[Leonardo da Vinci]

s discussed in the previous chapters, the notion of abstraction is intuitively

connected with some idea of “simplicity”: it is a transformation that modifies the
problem at hand, making it “simpler”. It would be very important, then, to make this
notion more precise, in order to achieve a deeper understanding of the mechanisms
underlying abstraction.
Simplicity, or its opposite, “complexity”, has been a subject of study in several
disciplines, including Physics, Philosophy, and Art. In the last decades the notion
gained an increasing attention, becoming even definitional for a class of systems, the
“complex systems”, which are fundamental in many fields, from biology to sociology,
from economy to information sciences.
After a brief excursus on complex systems and the abstraction techniques used for
their investigation, we will move, later in this chapter, to the more basic aspect of a
pragmatic definition of complexity/simplicity, and to its relationship with abstraction
models.

10.1 Complex Systems

The term “complex systems” has not a precise, general definition, but it is agreed
upon that it applies to systems that have at least the following properties:
• They are composed of a large number of elements, non-linearly interacting with
each other.
• Their behavior cannot be determined by the behaviors of the components, but it is
emerging from those interactions as an ensemble.

L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 329
DOI: 10.1007/978-1-4614-7052-6_10, © Springer Science+Business Media New York 2013
330 10 Simplicity, Complex Systems, and Abstraction

Fig. 10.1 A bifurcation diagram for the logistic map: xn+1 = rxn (1 − xn ). The horizontal axis is
the r parameter, the vertical axis is the x variable. The map was iterated 1000 times. A series of
bifurcation points leads eventually to chaos

• Often, critical phenomena, such as phase transitions, or scale invariance, appear.

Complex systems include both dynamical continuous systems, governed by dif-
ferential equations, and showing chaotic behaviors, bifurcation points, and essential
unpredictability (see Fig. 10.1), and discrete ones, where the underlying system struc-
ture is constituted by discrete components, connected together into a large networks.
In this book we are focused on the latter case.
Typical examples of discrete complex systems are the Internet, represented in
Fig. 10.2, and the neural system, represented in Fig. 10.3.
For complex networks many measures of complexity have been proposed, which
are different from those illustrated later on in this chapter, because they take explicitly
into account the topology of the network. Bonchev and Buck [66] have reviewed
some of the proposed measures, discussing their relative merits. They start from a
controversy regarding network connectivity: should complexity increase with the
number of edges, or should it reach a maximum between no edges and complete
connection? In Fig. 10.4 an example is reported.
Visually, the complete graph looks more complex than the totally disconnected
one, but if we consider some methods to described them, for instance using the
adjacency matrix, the effort to encode either one is the same. The authors decided
in favor of the alternative of complexity increasing with the number of edges, for
intuitiveness reasons. As a consequence, they define a measure Ivd , based on the
vertex degree distribution. Precisely, let G be a graph with V vertices and E edges.
Each vertex xi has degree di .1 The complexity Ivd is defined as

1 The degree of a vertex, in an undirected graph, is the number of edges connected to the vertex. In
a directed graph is the sum of the number of ingoing and outgoing edges.
10.1 Complex Systems 331

Fig. 10.2 The Internet network is a very large complex system

Fig. 10.3 A view of a human brain neural interconnection network

332 10 Simplicity, Complex Systems, and Abstraction

(a) (b)
Fig. 10.4 a A graph with 7 vertices, with no edges. b A graph with 7 vertices, fully connected, i.e.,
with 21 edges. Which one is more complex?

V
Ivd = di log2 di (10.1)
i=1

Ivd is equal to 0 for a completely disconnected graph, whereas it is equal to

V (V − 1)log2 (V − 1) for a totally connected one. Other topological measures are
the following ones:

V E, for directed graphs
• Global edge complexity Eg = i=1 di =
2E, for undirected graphs
Eg
• Average edge complexity Ea = V
Eg
• Normalized edge complexity En = V (V −1)
The above measures are the most basic ones, because they take into account isolated
elements of the graph (single nodes or single edges). We may think of considering
larger components of the graph, such as subgraphs with two edges. These subgraphs
are called two-bond fragments in Chemistry, and their total number in G is called
Platt index [421]. The Platt index is a considerably better complexity measure than
the number of edges. At the same number of edges the Platt index increases rapidly
with the presence of branches and cycles.
It is then natural to extend Platt index to the count of k-bond fragments, with
k 3. For k = 3 this count is known as the Gordon-Scantleburry index [218].
Bertz and Herndon [54] have proposed to use the total subgraph count, SC, which
includes subgraphs of all size. However, one has to be careful in using these indices,
because their computation grows exponentially in the number of vertices of G. For
this reason, Bonchev and Buck suggest to only use fragments with 1, 2 or 3 edges.
Another measure of complexity is the overall connectivity. Given the graph G, let
Gi (k) (1 k E) be one of its k-edge subgraph. If G is represented by its adjacency
matrix A, let Ai (k) be the sub-matrix corresponding to Gi (k). The overall connectivity
of order k is defined by:
10.1 Complex Systems 333

OCk (G) = ar,s
i|Ai (k)⊆A ar,s ∈Ai (k)

Then, the overall connectivity is given by:

E
OC(G) = OCk (G) (10.2)
k=1

Instead of counting edges, the complexity of a graph can be related to paths. Rücker
and Rücker [445] have proposed a measure called “total walk count” (TWC). Denot-
ing by wi () a generic path on G of length , the TWC is defined by:

V −1
TWC(G) = wi () (10.3)
=1 i

The number of walks of length is obtained from the th power of the adjacency
()
matrix A. In fact, the entry ars in A() is equal to 1 iff there is a path of length in
the graph.
By observing that highly complex networks are characterized by a high degree of
vertex-vertex connectedness and a small vertex-vertex separation, it seems logical to
use both quantities in defining the network complexity. Given a node xi in a network
G, let di be its degree and λi its distance degree [66], which is computed by

V
λi = d(i, j),
j=1

where d(i, j) is the distance between node i and node j. The complexity index B is
then given by:
V
di
B= (10.4)
λi
i=1

Analyzing the definition of the above measures of graph complexity, one may notice
that all of them capture only very marginally the “intricacies” of the topological
structure, which is actually the aspect that makes a graph “complex”. As an example,
one would like to have a measure that distinguishes a star-shaped network from a
small-world network, or from a scale-free one with large hubs. An effort in this
direction has been done by Kahle et al. [278], who tried to link network complexity
to patterns of interactions among nodes.
334 10 Simplicity, Complex Systems, and Abstraction

10.1.1 Abstraction in Complex Systems

The analysis of complex networks is difficult because of their very large size (number
of vertices) and intricacy of the interconnections (number of edges and topological
structure). In order to discover organizations, structures, or behaviors in these net-
works we need some tool that allows their simplification, without loosing the essence.
Abstraction comes here into play. As a matter of fact, there are few approaches that
explicitly mention abstraction to help investigating large networks, but there are many
that uses it without saying, for instance under the name of multi-scale or hierarchical
analysis.
A case of network abstraction is presented by Poudret et al. [427], who used
a topology-based approach to investigate the Golgi apparatus. A network of com-
partments describes the static and dynamic characteristics of the system, with spe-
cial focus on the interaction between adjacent compartments. Golgi apparatus is an
organelle whose role includes the transport of proteins synthesized by the cell from
the endoplasmic reticulum to the plasma membrane. Its structure is not completely
known, and the authors have built up two abstract topological models (the “plate
stack model” and the “tower model”) of the apparatus (represented in Fig. 10.5) in
order to discriminate between two existing alternative hypotheses: one supposes that
vesicles play a major role in the excretion of proteins, whereas the other one suggests
the presence of a continuous membrane flow.
The abstract models ignore the geometry of the apparatus components, and focus
on their interactions to better capture their dynamics. The building of these abstract
models allowed the authors to show that only one of them (namely, the “tower model”)
is consistent with the experimental results.
An interesting approach, developed for the analysis of biological networks (but
actually more general), has been recently presented by Cheng and Hu [97]. They
consider a complex network as a system of interacting objects, from which an itera-
tive process extracts meaningful information at multiple granularities. To make this

(a) (b)

Fig. 10.5 Abstract models of the Golgi apparatus. a Plate stack model. b Tower model. Only the
“tower model” proved to be consistent with the experimental results. (Reprinted with permission
from Poudret et al. [427])
10.1 Complex Systems 335

Fig. 10.6 Structure of the abstraction pyramid built up from a complex network. Each circle
represents a module. Vertical relationships and horizontal relationships are denoted by dashed lines
and solid lines, respectively. The thickness of a solid line increases with the importance of the
connection. The original network is at the bottom (Level 4). Higher-level networks are abstractions,
to a certain degree, of the next lower network. (Reprinted with permission from Cheng and Hu [97])

possible, the authors developed a network analysis tool, called “Pyramabs”, which
transforms the original network into a pyramid, formed by a series of n superposed
layers. At the lowest level (n ) there is the original network. Then, modules (sub-
graphs) are identified in the net and abstracted into single nodes, that are reported
on the immediately higher level. As this process is repeated, a pyramid is built up: at
each horizontal layer a network of (more and more abstract) interconnected modules
is located, whereas the relationship between layers i+1 and i is constituted by the
link between a module at layer i+1 and the corresponding abstract node at layer i .
An example of such a pyramid is reported in Fig. 10.6.
In order to generate the pyramid, two tasks must be executed by two modules,
namely the discovery and the organization modules. A top-down/bottom-up cluster-
ing algorithm identifies modules in a top-down fashion, and constructs the hierarchy
bottom up, producing an abstraction of the network with different granularities at
different levels in the hierarchy. Basically, the method consists of three phases: (1)
computing the proximity between nodes; (2) extracting the backbone (a spanning
tree) from the network, and partitioning the network based on that backbone; (3)
generating an abstract network. By iteratively applying the same procedures to a
newly generated abstract network, the pyramid is built up.
Other multi-scale or hierarchical approaches are proposed by Lambiotte [310],
Arenas, Fernández and Gómez [23], Bang [32], Binder and Plazas [61], Dorat et al.
[139], Kurant and Thiran [308], Oliveira and Seok [407], Ravasz and Barabasi [443],
and Sales-Pardo et al. [474].
336 10 Simplicity, Complex Systems, and Abstraction

An effort to provide practitioners with concrete means to perform abstraction on

graphs was done by Rensink and Zambon [452], who describe GROOVE, a tool set
for abstracting states in the state space of a system: “similar” states are grouped,
and the behavior of the abstract state is the collection of possible behaviors of the
original ones. Rensink and Zambon’s notion of abstraction is based on neighborhood
similarity: two nodes are considered indistinguishable if they have the same incoming
and outgoing edges, and the opposite ends of those edges are also comparable (in
some sense). Indistinguishable nodes are merged into a single one, while keeping
count of their original number up to some bound of precision. Incident edges are also
combined and their multiplicity recorded. Graph abstractions are called shapes. A
shape S is a 5-tuple S = (G, ∼, mn , min , mout ), where:
• G is the shape’s underlying structure,
• ∼ is a neighborhood similarity relation: ∼ ⊆ V × V , being V the set of nodes of
G,
• mn : V → Mν is a node multiplicity function, which records how many concrete
nodes were folded into a given abstract node, up to bound ν,
• min : (V × L × V / ∼) → Mμ and mout : (V × L × V / ∼) → Mμ are incoming
and outgoing edge multiplicity functions, which record how many edges with a
certain label the concrete nodes had that were folded into an abstract node, up to
a bound μ and a group of ∼-similar opposite nodes.
Shapes are selected, among a set of pre-defined ones, according to a given strategy
and evaluation criterion.
In general, uncovering patterns in large interaction networks is a difficult task,
because of the large number of nodes and edges. Thus, several proposals have been
put forwards to extract manually or automatically useful information from these net-
works. To ease network exploration and analysis, Hu et al. [264] advocate the use
of multi-level approaches, which support zooming the network at different levels
of granularity. Other approaches rely on clustering nodes [265, 266] based on their
topological properties and/or their associated labels. Corneliu et al. [245], for exam-
ple, use node labels and make them diffuse on a graph of gene interactions to generate
clusters that are then used to extract useful biological information. In fact, a strong
relation between the biological roles of molecules and the modular organization of
their interactions has been long hypothesized even before high-throughput genomic
data became available [236]. Several studies have uncovered correlations of network
centrality indices2 (be it connectivity or betweenness) with indicators of biological
relevance [430]. However, the computation of these indices on large networks may
be expensive, so it makes sense to try to compute them (or at least a good approxima-
tion thereof) on an abstracted network, according to the scheme reported in Fig. 10.7.
This approach has been followed by Saitta et al. [466].

2 Extensive presentations of the main indices characterizing networks are provided by Newman

[399], Boccaletti et al. [65], Costa et al. [116], Emmert-Streib and Dehmer [148], and Dehmer and
Sivakumar [130].
10.1 Complex Systems 337

Abstract Network

Abstract
Easier computation
Measures

Ground
Measures

X= ( )
Ground Network
Xa = (X')

Fig. 10.7 Scheme of the abstraction mechanism to compute network measures. Starting from a
ground interaction graph G , some measures must be obtained on it, such as, for instance, the centrality
of every node or their betweenness. Let X = f (G ) be one such measure, obtained by means of a
procedure f . In order to compute X at a reduced cost, an abstract network G is generated, and the
corresponding abstract measure X is computed. Finally, X is re-transformed into the value of X
again, by applying the functions h and obtaining Xa = h(X ). (Reprinted with permission from
Saitta et al. [466])

In order to abstract networks, a fundamental step is to find structures inside them,

corresponding to sets of nodes that can be grouped together, according to some
equivalence criterion. In the literature on graphs, such sets are found using cluster-
ing algorithms. Among others (see, for instance, [84, 152]), Harry and Lindquist
[234] describe two algorithms to abstract graphs with the aim of obtaining a clearer
visualization. The first is based on centrality erosion: by means of iterative centrality
calculations vertices are categorized by their distance from the core of the graph.
This method is most effective at clarifying topological details of small graphs. The
second algorithm, k-clique minimization with centrality reduction, attempts to trans-
form a complex graph into its abstract components, creating new graphs that are
representative of the original, but have a structure which is more effectively laid out
by existing visualization algorithms.
In the field of complex networks, especially social networks, groups correspond to
communities; several algorithms have been proposed to find both overlapping and non
overlapping ones. Even though, again, there is no precise definition of “community”,
there is the common understanding that a community is a subnetwork within which
the connections are more dense than those going outside. Among others, algorithms
338 10 Simplicity, Complex Systems, and Abstraction

for detecting communities inside complex networks have been proposed by Arenas
et al. [23], Expert et al. [157], Girvan and Newman [212], Fortunato and Castellano
[178], Lancichinetti et al. [312], Leicht and Newman [325], Lozano et al. [346],
Newman [400], Zhang et al. [585], and Lancichinetti and Fortunato [311].
An interesting approach to reduce the size of a weighted (directed or undirected)
complex network, while preserving its modularity structure, is presented and/or
reviewed by Arenas et al. [22]. A comparative overview, in terms of sensitivity
and computational cost, of methods for detecting communities in networks is pre-
sented by Danon et al. [121], whereas Lancichinetti et al. [312] propose a benchmark
network for testing algorithms [313].

10.2 Complexity and Simplicity

The notion of simplicity has played an important role in Philosophy since its early
times. As mentioned by Baker [29] and Bousquet [74], simplicity was already invoked
by Aristotle [24], in his Posterior Analytics, as a merit for demonstrations with as
few postulates as possible. Aristotle’s propension for simplicity stemmed from his
ontological belief that nature is essentially simple and parsimonious; then, reasoning
should imitate these characteristics.
However, the most famous statement in favor of simplicity in sciences is Ockham’s
razor, a principle attributed to the philosopher William of Ockham.3 In the light of
todays studies, Ockham’s razor appears to be a myth, in the sense that its reported
formulation “Entia non sunt multiplicanda, praeter necessitatem”4 does not appear
in any of his works nor in other medieval philosophical treatises; on the contrary, it
was minted in 1639 by John Ponce of Cork. Nevertheless, Ockham did actually share
the preference toward simplicity with Scotus and other fellow medieval philosophers;
indeed, he said that “Pluralitas non est ponenda sine necessitate”,5 even though he
did not invent the sentence himself.
Independently of the paternity of the principle, it is a fact that the putative Ock-
ham’s razor does play an important role in modern sciences. On the other hand, the
intuition that a simpler law has a greater probability of being true has never been
proved. As Pearl [415] puts it: “One must resign to the idea that no logical argument
can possibly connect simplicity with credibility”. A step ahead has been done by Karl
Popper [426], who has connected the notion of simplicity with that of falsifiability.
For him there is no a priori reason to choose a simpler scientific law; on the contrary,
the “best” law is the one that imposes most constraints. For such a law, in fact, it
will be easier to find a counterexample, if the law is actually false. According to this

3 William of Ockham (c. 1288–c. 1348) was an English Franciscan and scholastic philosopher, who
is considered to be one of the greatest medieval thinkers.
4 “Entities must not be multiplied without need”.
5 “Plurality is not to be supposed without necessity”.
10.2 Complexity and Simplicity 339

idea, Popper defines as a measure of simplicity the number of tunable parameters

occurring in the formulation of a law.
A more modern approach to simplicity is based on a probabilistic view of the
nature instead of a deterministic one. In this view, the simpler among a set of alterna-
tives is the most probable one. A typical expression of this view is Bayes’ formula.
Given a set H = {H1 , ..., Hn } of alternative hypotheses explaining a phenomenon,
and given some experimental data D, we must choose the hypothesis H ∗ such that:

Pr(D|Hi ) Pr(Hi )
H ∗ = Arg Max Pr(Hi |D) = Arg Max (10.5)
Hi ∈H Hi ∈H P(D)

The rational behind Bayes’ formula is that we may have an a priori idea of the relative
probabilities of the various hypotheses being true, and this idea is represented by the
prior probability distribution Pr(Hi ). However, the examination of the experimental
data D modifies the prior belief, in order to take into account the observations, and
produces a posterior probability distribution Pr(Hi |D).
Going a bit further along this line of reasoning, Solomonoff [498, 499] has pro-
posed a formal definition of simplicity in the framework of inductive inference. In
his view, all the knowledge available in a domain at time t can be written in the
form of a binary string. When new experiences take place, their results contribute to
lengthen the string. Then, at any given time, we may see the initial part of a poten-
tially infinite string, representing the “perfect” knowledge. The induction problem
is to predict the next bit by extrapolating the known part of the string using Bayes’
formula. Solomonoff’s contribution consists in showing that it exists an optimal prior
probability distribution on the infinite strings, called universal distribution, such that
it determines the best possible extrapolation. By considering the strings as generated
by a program running on a universal computer, the simplicity of the string is defined
as the length of the shortest program that outputs it. This definition was independently
proposed by Kolmogorov [295] and Chaitin [90], and we will formally introduce it
later on.
According to Feldman [162],“simple patterns are compelling”. As a matter of fact,
there is no doubt that humans are attracted by simple or regular patterns. Intuitively,
we tend to ascribe to randomness the generation of complicated patterns, whereas
regular ones are believed to have been generated on purpose, and hence to bear some
meaning. As, in general, objects and events in the world are neither totally random
nor totally regular, it would be interesting to have some criterion to test whether a set
of observations can be considered random or not. Feldman provides such a criterion
by defining a test for checking the null hypothesis: H0 ≡ “observations are random”.
More precisely, let x = {x1 , ..., xn } be a set of objects or observations, each one
described by m attributes A1 , ..., Am . Each observation is defined by a conjunction
of m literals, i.e., statements of the form “Ai = v” or “Ai = v”. The goal of the
test is to allow a set of regularities to be extracted from the observations, where by
“regularity” it is meant a lawful relation satisfied by all n objects, with the syntax:

φk ≡ Ai1 , ..., Aik → A0 ,

340 10 Simplicity, Complex Systems, and Abstraction

where φk is a regularity of degree k, and A0 is some label. Let S(x) be the mini-
mum set of (non redundant) regularities obtainable from x. The function |φk |, giving
the number of regularities of each degree k contained in S(x), is called the power
spectrum of x. A useful measure of the overall complexity of x is its total weighted
spectral power:

m−1
C(x) = (k + 1)|φk | (10.6)
k=0

The test for x’s simplicity consists in computing the probability distribution of C and
setting a critical region on it.
Along the same lines, Dessalles6 discusses Simplicity Theory, a “cognitive model
that states that humans are highly sensitive to any discrepancy in complexity, i.e., their
interest is aroused by any situation which appears too simple”. Simplicity Theory
links simplicity to unexpectedness, and its main claim is that “an event is unexpected
if it is simpler to describe than to generate”. Formally, U = Cw − C, where U is
the unexpectedness, Cw is the generation complexity, namely the size of the minimal
description of parameter values the “world” needs to generate the situation, and C
is the description complexity, namely the size of the minimal description that makes
the situation unique, i.e., Kolmogorov complexity.
The same idea of surprise underlies the work by Itti and Baldi [272]. These
authors claim that human attention is attracted by features that are “surprising”, and
that surprise is a general, information-theoretic concept. Then, they propose a quan-
tification of the surprise using a Bayesian definition. The background information
of an observer about a certain phenomenon is given by his/her prior probability
distribution over the current set M of explaining models. Acquiring data D allows
the observer to change the prior distribution P(M) (M ∈ M) into the posterior
distribution P(M|D) via Bayes theorem. Then, surprise can be measured by the dis-
tance between the prior and posterior distributions, via the Kullback-Liebler (KL)
divergence:
P(M|D)
S(D, M) = KL(P(M)||P(M|D)) = P(M|D) log2 (10.7)
P(M)
M∈M

In order to test their theory, the authors have performed some eye-tracking experi-
ments on subjects looking at videos. They have compared the measured tracks with
those that would have been predicted if people’s attention were guided by surprise,
entropy or salience. The results show that surprise appears to be a more effective
measure of visually interesting points.
The attempt to link surprise, unexpectedness, and information theory is not new.
It was already proposed by Watanabe [552] in the late 60’s. He linked the notion of
surprise with the information provided by an event in Shannon’s theory: if an event
e with probability p(e) close to 0 does actually occur, we would be very surprised;

6 http://perso.telecom-paristech.fr/~jld/Unexpectedness/
10.2 Complexity and Simplicity 341

on the contrary, we would not be surprised at all by the occurrence of an event with
probability close to 1. The surprise S(e) is then equated to Shannon’s information:

S(e) = −log2 p(e)

10.3 Complexity Measures

The complexity of an entity (an object, a problem, an event, ...) is a difficult notion to
pin down, because it depends both on the nature of the entity, and on the perspective
under which the entity is considered. Often, complexity is linked with information,
not an easier notion to be defined itself.
The approaches to quantify complexity include symbolic dynamics, information
and ergodic theory, thermodynamics, generalized dimensions and entropies, theory
of computation, logical depth and sophistication, forecasting measures, topological
exponents, and hierarchical scaling. All the above perspectives help us to understand
this rich, important, and yet elusive concept.
In January 2011 the Santa Fe Institute for complex sciences has organized a
workshop on “Randomness, Structure, and Causality: Measures of complexity from
theory to applications”, whose goal was to answer (among others) the following
questions:
1. Are there fundamental measures of complexity that can be applied across disci-
plines, or are measures of complexity necessarily tied to particular domains?
2. Are there universal mechanisms at work that lead to increases in complexity, or
does complexity arises for qualitatively different reasons in different settings?
3. Can an agreement be reached about the general properties that a measure of
complexity must have?
By exploring the talks at the workshop, one may notice that, even not addressing the
above questions explicitly, most of them provide an implicit negative answer to all
three. This comes to say that the issue of complexity definition is really a tough task,
especially concerning applicability to concrete systems.
As a further confirmation of the fluidity of the field, we may quote a recent
comment by Shalizi7 :
“Every few months seem to produce another paper proposing yet another measure
of complexity, generally a quantity which can’t be computed for anything you’d
actually care to know about, if at all. These quantities are almost never related to
any other variable, so they form no part of any theory telling us when or how things
get complex, and are usually just quantification for quantification’s own sweet sake.”
In the literature there are many definitions of complexity measures, which can be
roughly grouped into four main strands:
• Predictive information and excess entropy,

7 http://cscs.umich.edu/~crshalizi/notabene/complexity-measures.html
342 10 Simplicity, Complex Systems, and Abstraction

• Statistical complexity and causal structure,

• Logical depth and computational complexity,
• Effective complexity.
To review all the proposed measures is out of the scope of this book, and we will
focus on those that can be more closely related to abstraction.

10.3.1 Kolmogorov Complexity

One of the first proposed measures of complexity has been Algorithmic Complexity,
which was independently defined by Solomonoff [498, 499], Kolmogorov [295],
and Chaitin [90], but is almost universally named after Kolmogorov (see also Li and
Vitànyi’s [331] book for an extensive treatment). Kolmogorov complexity K makes
use of a universal computer (a Turing machine) U, which has a language L in which
programs p are written. Programs output sequences of symbols in the vocabulary
(usually binary) of L. Let x n be one of such sequences. The Kolmogorov complexity8
of x n is defined as follows:

K(x n ) = Min{|p| : U(p) = x n } (10.8)

Equation (10.8) states that the complexity of x n is the length |p|, in bits, of the shortest
program that, when run on U, outputs x n and then halts. Generalizing (10.8), we can
consider a generic object x, described by strings generated by programs p on U.
There are small variants of formulation (10.8), but in all of them the complexity
captured by the definition is a descriptional complexity, quantifying the effort to be
done for identifying object x from its description. If an object is highly regular, it is
easy to describe it precisely, whereas a long description is required when the object
is random, as exemplified in Fig. 10.8. In its essence, Kolmogorov complexity K(x)
captures the randomness in x.
As all programs on a universal computer V can be translated into programs on
the computer U by q, one of U’s programs, the complexity KV (x) will not exceed
KU (x) plus the length of q. In other words:

KV (x) KU (x) + |q| (10.9)

Even though |q| may be large, it does not depend on x, and then we can say that
Kolmogorov complexity is (almost) machine-independent. Then, we may omit the
subscript indicating the universal machine, and write simply K(x). Whatever the
machine, K(x) captures all the regularities in x’s description.
Kolmogorov complexity provides Solomonoff’s universal distribution over
objects x belonging to a set X :

8 Notice that in the literature, this measure is often called Algorithmic Information Content (AIC).
10.3 Complexity Measures 343

(a) (b)

Fig. 10.8 a Very regular pattern, consisting of 48 tiles. b Irregular pattern, although not really
random, consisting of 34 tiles. Even though the left pattern has more tiles, its Kolmogorov complexity
is much lower than that of the right pattern

1 −K(x)
Pr(x) = 2 , (10.10)
C
where:
C= 2−K(x) (10.11)
x∈X

The universal distribution (10.10) has the remarkable property of being able to mimic
any computable probability distribution Q(x), namely:

Pr(x) A Q(x),

where A is a positive constant. Kolmogorov complexity has some interesting prop-

erties:
• K(x) ≤ |x|
• The “typical” binary string of length n has complexity close to n.
The main criticism toward Kolmogorov complexity is that it is not computable.
Nevertheless, computable approximations of K(x) (even if not uniformly convergent)
can be used, which are sufficient for practical purposes.
Another criticism regards its monotonic increase with randomness; hence, it is
unable to capture the internal structure of objects. It is instead widely believed that
a suitable measure of complexity must have low values for both totally regular and
totally random patterns.
When we want to model a set of data, there is usually some noise affecting them.
Kolmogorov complexity captures the regular part of data, i.e., the model. We have
344 10 Simplicity, Complex Systems, and Abstraction

Fig. 10.9 Kolmogorov complexity captures the “regular” part of data, whereas the irregular part
(the “noise”) must be described separately. The regular part, i.e., the box, requires K(x) bits for its
description, whereas the sparse objects require log2 |X | bits, where x ∈ X

then a two-part encoding [546], one describing the regularities (the model) in the
data, and the other describing the random part (noise), as schematically represented
in Fig. 10.9. The irregular part describes the objects that are not represented by
the regularities simply by enumeration inside the set X . Then, an object x will be
described in (K(x) + log2 |X |) bits.
The two-part encoding of data is the starting point of Vitànyi’s definition of com-
plexity [546], called “Meaningful Information”, which is only the part encoding the
regularities; in fact, Vitànyi claims that this is the only useful part, separated from
accidental information. In a more recent work Cilibrasi and Vitànyi [106] have intro-
duced a relative notion of complexity, i.e., the “Normalized Compression Distance”
(NCD), which evaluates the complexity distance NCD(x, y) between two objects x
and y. The measure NCD is again derived from Kolmogorov complexity K(x), and
can be approximated by the following formula:

K(x, y) − Min{K(x), K(y)}

NCD(x, y) = (10.12)
Max{K(x), K(y)}

In (10.12) K(x) (resp. K(y)) is the compression length of string x (resp. y), while
K(x, y) is the compression length of the concatenation of strings x and y. These
lengths are obtained from compressors like gzip.
The approach to complexity by Gell-Mann and Lloyd [195] follows Vitànyi’s
idea that complexity should only be related to the part of the description that encodes
regularities of an object. For this reason they define the effective complexity (EC) of
an entity as the length of a highly compressed description of its regularities. More-
over, they claim that the notion of complexity is context-dependent and subjective,
and that it depends on the description granularity and language, as well as from a
clear distinction between regularity and noise and between important and irrelevant
aspects. The authors justify their proposal by stating that EC is the definition that
most closely corresponds to what we mean by complexity in ordinary conversation
and in most scientific discourses.
Following Gell-Mann and Lloyd’s approach, Ay et al. [26] have proposed a defi-
nition of effective complexity in terms of algorithmic information theory. Then, they
10.3 Complexity Measures 345

have applied this notion to the study of discrete-time stochastic stationary (and, in
general, not computable) processes with binary state space, and they show that, under
not too strong conditions, long typical process realizations are effectively simple.
The NCD measure is the starting point for another relative measure of complex-
ity, the “statistical complexity” CS , proposed very recently by Emmert-Streib [147],
which provides a statistical quantification of the statement “x is similarly complex
as y”, where x and y are strings of symbols from a given alphabet A. Acknowledging
that “a commonly acknowledged, rigorous mathematical definition of the complexity
of an object is not available”, Emmert-Streib tries to summarize the conditions under
which a complexity measure is considered a good one:
1. The complexity of simple and random objects is less than the complexity of
complex objects.
2. The complexity of an object does not change if its size changes.
3. A complexity measure should quantify the uncertainty of the complexity value.
Whereas the first two had been formulated previously, the third one has been
added on purpose by Emmert-Streib. The novelty of this author’s approach consists
in the fact that he does not attribute complexity to a single object x, but rather to the
whole class of objects generated by the same underlying mechanism. The measure
CS is defined through the following procedure:
1. Let X be a process that generates values x, x , x , ... (denoted x ∼ X), and let
F̂X,X be the estimate of the empirical distribution of the normalized compression
distances between the x s from n1 samples, SX,X n1
={xi =NCD(x , x ) | x , x ∼
n1
X}i=1 .
2. Let Y be a process that generates values y, y , y , .... Let F̂X,Y be the estimate of
the empirical distribution of the normalized compression distances between the
x s and y s from n2 samples, SX,Y
n2
= {yi = NCD(x , y ) | x ∼ X, y ∼ Y }ni=1 2
,

from object x and y of size m from two different processes X and Y .
3. Compute T = Supx |F̂X,X − F̂X,Y | and p = Pr(T < t).
n1 n2
4. Define CS SX,X , SX,Y | X, Y , m, n1 , n2 .

The statistic complexity corresponds to the p-value of the underlying null hypothesis
H0 = FX,X = FX,Y .
An interesting use of Kolmogorov complexity has been done by Schmidhuber
[477], who invoked simplicity as a means to capture the “essence” of depicted
objects (cartoon-like figures). For him the final design of a picture should have a low
Kolmogorov complexity, while still “looking right”. Schmidhuber generates figures
using only circles: any figure is composed by arcs on circles, as it was exemplified
in Fig. 8.2, and also in Fig. 10.10.
In order to make Kolmogorov complexity tractable, he selects a specific language
in which to write the program encoding figures. As the circles are drawn in sequence
with decreasing radiuses, each circle will have an integer associated to it, denoting
the ordering of drawing. Large circles are few and are coded by small numbers,
whereas small circles are many and are coded by larger numbers. For each arc in
346 10 Simplicity, Complex Systems, and Abstraction

Fig. 10.10 A butterfly

approaching a vase of flowers.
(Reprinted with permission
from Schmidhuber [477])

the figure we need to specify the number c of the circle it belongs to, the start point
s and the end point e , and the line thickness w . Arcs are drawn clockwise from
s to e ; point s (resp. e ) can be specified by indicating the number of the circle
intersecting or touching c in s (resp. e ) (plus an extra bit to discriminate between
two possible intersections). Thus, each pixel on an arc can be specified by a triple of
circle numbers, two bits for differentiating intersections, and a few more bits for the
line width.
Using very many very small circles anything can be drawn. However, the chal-
lenge is to come up with an acceptable drawing with only a few large circles, because
this would mean to have captured the “essence”of the depicted object. Such repre-
sentations are difficult to obtain; Schmidhuber reports that he found much easier to
obtain acceptable complex drawings than acceptable simple ones of given objects.
By making a step further, Schmidhuber uses Kolmogorov complexity also to
define “beauty”, assuming that a “beautiful” object is one that requires the minimum
effort to be processed by our internal knowledge representation mechanism, namely
by the shortest encoding program.

10.3.2 Normalized Complexity

In order to provide a measure of complexity which is not monotonic with order,

López-Ruiz et al. [340, 309] have defined the concept of “Normalized Complexity”,
taking inspiration from physical systems.
10.3 Complexity Measures 347

They consider a system composed by N particles, which can be in a set Ψ of states.

As extreme cases, the system can be in the state of a perfect crystal or of an ideal gas.
In the crystal the positions of the particles are highly constrained, so that only a few
among the accessible states in Ψ (in the limit, just one) have a probability different
from zero. The state is one of order, and a small amount of information is stored into
the system, and we expect that it has a low complexity. On the opposite, the isolated
ideal gas, which is completely disordered, can be found in any of its accessible
states with the same probability. There is then a maximum of stored information,
and we again expect low complexity. On the other hand, an ideal gas is in a state
of complete equilibrium, while the crystal is far from it. From these considerations
one can conclude that complexity should have two components: one is the amount
of information stored into the system (increasing from order to disorder), and one is
the disequilibrium (decreasing from order to disorder).
López-Ruiz et al. define thus the complexity C of a system as the product of two
terms, the normalized entropy H ∗ , and the disequilibrium D:

CLMC = H ∗ D,

where:

N N

∗ 1 H 1 2
H =− pi log2 pi = and D = pi −
log2 N log2 N N
i=1 i=1

Notice that: 2
∗ N −1
0 H 1 and 0 D ≈1
N

This definition of complexity does satisfy the intuitive conditions mentioned above.
For a crystal, disequilibrium is large but the information stored is vanishingly small,
so CLMC ∼ = 0. On the other hand, H ∗ is large for an ideal gas, but D is small,
∼
so CLMC = 0 as well. Any other system will have an intermediate behavior and
therefore C > 0. A final remark is that the CLMC is dependent on the scale of system
analysis; changing the scale, the value of CLMC changes.

10.3.3 Logical Depth

Another perspective is taken by Bennett [48, 49] in defining complexity. He starts by

distinguishing the content of a message from its value, and identifies this last with the
amount of mathematical or other work plausibly done by the message’s originator,
which its receiver is saved from having to repeat. In other words, something complex
contains internal evidence that a lengthy “computation” has already been done. Such
an object is said to be deep.
348 10 Simplicity, Complex Systems, and Abstraction

In order to arrive at a formal definition of depth, some notions are to be intro-

duced. Given a string of n bits, the string is said to be compressible by k bits if its
minimal program (in Kolmogorov’s sense) is shorter than itself of at least k bits.
A simple counting argument shows that at most a fraction 2−k of strings of length
no greater than k bits can have this property. This fact justifies calling strings that
are incompressible, or nearly so, “algorithmically random”. Because of the ability
of universal Turing machines to simulate one another, the property of algorithmic
randomness is approximately machine-independent.
Even acknowledging the merits of Kolmogorov’s definition of complexity, Ben-
nett invokes as a better notion of an object’s complexity the “difficulty”, or the
“length” required for the Turing machine U’s program to actually generate the object
(or its encoding). The notion of complexity defined by Bennett is actually at odds
with Kolmogorov’s one; in fact, where the one shall increase, the other may decrease.
A relevant question is what program one should consider for generating the object.
At the beginning, Bennet considered the shortest program in Kolmogorov sense.
However, he realized that the shortest program by no means is bound to provide the
shortest computation time (i.e., minimum work). After trying more than one defini-
tion, Bennett settled for the one described in the following. Let x and w be strings and
s a significance parameter. A string’s depth at significance level s, denoted Ds (x), is
defined as the least time required to compute it by an s-incompressible program. At
any given significance level, a string will be called s-deep if its depth exceeds s, and
s-shallow otherwise. In summary, Bennett starts from Kolmogorov’s complexity, but,
instead of considering complexity as the length of programs on a Turing machine, he
considers the time these programs take to “reproduce” the object in question. Then,
his definition of complexity, called logical depth, is as follows:
Definition 10.1 Let x be a string, U a universal Turing machine, and s a significance
parameter. A string’s depth at significance level s, denoted Ds (x), is defined by

Ds (x) = min{T (π) : (|π| − |π ∗ | < s) and (U(π) = x)},

where π ∗ is the shortest program generating x running on U, and T (π) is the time
taken by program π.
As it is possible that the machine U needs some string of data in order to compute
x, Definition (10.1) can be generalized to the following one:
Definition 10.2 Let x and w be any two strings, U a universal Turing machine, and s
a significance parameter. A string’s depth relative to w at significance level s, denoted
Ds (x|w), is defined by

Ds (x|w) = min{T (π, w) : (|π| − |(π|w)∗ | < s) and (U(π, w) = x)}

According to Definition (10.2) x’s depth relative to w is the minimum time required
to compute x from w by an s-incompressible program relative to w.
10.3 Complexity Measures 349

10.3.4 Thermodynamic Depth

Lloyd and Pagels [338] take a view of complexity in physical systems similar to
Bennett’s one [49]. Given a system in a particular state, its complexity is not related
to the difficulty of state description, but rather to that of state generation. Then,
complexity is not a property of a state but of a process, it is a “measure of how hard
it is to put something together” [338].
More formally, let us consider a system in a macroscopic state d, and let σ1 , ..., σn
be the set of trajectories, in the system’s phase space, that lead to d. Let pi be the
probability that the system has followed the ith trajectory. Then, the “Depth” D of
the state d is
D = −k ln pi ,

where k is an arbitrary constant that can be set equal to Boltzmann’s constant kB . D

measures the amount of information required to specify the trajectory followed by
the system. In Hamiltonian systems a “Thermodynamic Depth” can be defined as:

DT = S̄ − S0 = S̄ − kB Ω0 ,

where S̄ is the macroscopic (thermodynamic) entropy of the state, and Ω0 is the

volume in the phase space corresponding to the trajectory followed by the system.

10.3.5 Gamma Function (Simple Complexity)

Shiner et al. [487] start by observing that several among the complexity measures
proposed in the literature, even though interesting, are difficult to compute in practice
and, in addition, they depend on the observation scale. On the contrary, they claim
that complexity should be easy to compute and be independent of the size of the
system under analysis.
Based on these considerations, Shiner et al. proposed a parameterized measure,
namely a simple measure of complexity Γαβ , which, by varying its two parameters α
and β, shows different types of behavior: complexity increasing with order, complex-
ity increasing with disorder, and complexity reaching a maximum in between order
and disorder. As required, the measure Γαβ is easy to compute and is independent
from the system’s size.
In order to provide a precise definition of Γαβ , order and disorder are to be
introduced first. If n is the number of system’s states, and pi (1 i n) the
probability that the system be in state i, then disorder is defined as:

S −1 n
Δ= = kB pi ln pi
Smax Smax
i=1
350 10 Simplicity, Complex Systems, and Abstraction

where S is the Boltzmann-Gibbs-Shannon entropy, and kB is Boltzmann constant,

appropriate to a physical system. For other types of systems another appropriate
constant can be substituted to kB (e.g., log2 e for information systems). Accordingly,
order is defined as follows:
Ω =1−Δ

Using the definitions of order and disorder, the simple complexity of disorder strength
α and order strength β is defined as:

Γαβ = Δα Ω β (10.13)

By assuming that the maximum entropy is reached at equilibrium, we can write:

(Seq − S)
Ω=
Seq

In other words, Ω is a measure of the distance from equilibrium. Thus, for non-
equilibrium systems, the simple measure of complexity is a function of both the
“disorder” of the system and its distance from equilibrium.
Finally, the authors also show that, for the logistic map, Γ11 behaves like Grass-
berger’s effective complexity (see Sect. 10.3.7). Moreover, Γ11 is also related to
Lopez-Ruiz et al.’s normalized complexity [340].

10.3.6 Sophistication

The measure of complexity called “sophistication” was proposed by Koppel [296]

taking inspiration from Bennett’s “depth” [49]. Sophistication tries to quantify how
long it took for an object to evolve into its present form. For a formal definition,
Koppel used a Turing machine U with a program tape, a data tape, a work tape, and
an output tape; U operates according to the following rules:
• The program, data, and output tapes are only scanned from left to right.
• U writes on the output tape only if the cell scanned is blank, and moves right on
the output only if the cell scanned is not blank.
• The computation halts iff a blank is scanned on the data tape.
If program p is written on the program tape, data D is written on the data tape, and
the computation halts with the string σ written on the output tape, we will say that
U(p, D) = σ, i.e., program p generates σ (finite or infinite) using data D.
A program is self-delimiting if during the course of the computation of U(p, D)
it reads the last bit of p, but does not go beyond.
Definition 10.3 (Complexity) The complexity of σ is H(σ) = Min{|p| + |D| :
p is total and self-delimiting, and U(p, D) = σ}.
10.3 Complexity Measures 351

This definition states that the complexity of σ is the sum of the size of the program
that generates it plus the size of the input data.
Definition 10.4 (c-Minimal Description) A description of σ, (p, D), is c-minimal
if |p| + |D| H(σ) + c.
A description of σ is c-minimal if it does not exceeds the minimum one plus a
constant c.
Definition 10.5 (Sophistication) The c-sophistication of a finite string σ, SOPHc
(σ) = Min{|p| : ∃ D such that (p, D) is a c-minimal description of σ}.
Koppel shows that there is a strict relation between sophistication and logical depth,
as stated by the following theorem.
Theorem 10.1 (Koppel [296]) SOPH(σ) is defined for all σ. Moreover, there exists
a c such that, for all σ, either SOPH(σ) = D(σ) = ∞ or [SOPH(σ) − D(σ)] < c.

10.3.7 Effective Complexity

Grassberger [221] defines as Effective Measure Complexity (EMC) of a pattern the

asymptotic behavior of the amount of information required to predict the next symbol
at a given level of granularity. This captures an aspect of the scaling behavior of
the information required for successful prediction. If we consider a Markov chain,
only the present state is relevant, so the amount of information needed for optimal
prediction is equal to the amount of information needed to specify the current state.
One has then the feeling that both highly random and highly ordered sequences are
of low complexity, and that the more interesting cases are those in between.
More formally, any predictor f will translate the past of the sequence x − into an
effective state, s = f (x − ), and then make its prediction on the basis of s. The amount
of information required to specify the state is H[s]. This value can be taken as the
complexity of f . By confining attention to the set M of maximally predictive models,
Grassberger introduced what he called the “true measure complexity” or “forecast
complexity” of the process as the minimal amount of information needed for optimal
prediction:
C = Min H[f (X − )] (10.14)
f ∈M

Grassberger did not say how to find the maximally predictive models, nor how the
information required can be minimized. However, in Information Theory, the data-
processing inequality says that, for any variables A and B, I[A, B] I[f (A), B]; in
other words, we cannot get more information out of data by processing it than was in
there to begin with. Since the state of the predictor is a function of the past, it follows
that I[X − , X + ] I[f (X − ), X + ]. It could be assumed that, for optimal predictors,
the two informations are equal, i.e., the predictor’s state is just as informative as the
original data. Moreover, for any variables A and B, it is the case that H[A] I[A, B],
352 10 Simplicity, Complex Systems, and Abstraction

namely no variable contains more information about another than it does about itself.
Then, for optimal models it is H[f (X − )] I[X − , X + ]. The latter quantity is what
Grassberger calls Effective Measure Complexity (EMC), and it can be estimated
purely from data. This quantity, which is the mutual information between the past
and the future, has been rediscovered many times, in many contexts, and called
with various names (e.g., excess entropy, predictive information, and so on). Since
it quantifies the degree of statistical dependence between the past and the future, it
looks reasonable as a measure of complexity.

10.3.8 Predictive Information Rate

In a recent work Abdallah and Plumbley [3] have proposed yet another measure of
complexity, the “predictive information rate” (PIR), which is supposed to capture
some information that was not taken into account by previous measures, namely tem-
poral dependency. More precisely, let {..., X−1 , X0 , X1 , ...} be a bi-infinite stationary
sequence of random variables, taking values in a discrete set X . Let μ be a shift-
invariant probability measure, such as the probability distribution of any contiguous
block of N variables (Xt+1 , ..., Xt+N ) is independent of t. Then, the shift-invariant
block entropy function will be:

H(N) = H(X1 , ..., XN ) = − μ (x) log2 pμ (x),
pN N
(10.15)
x∈Xn

where pN μ : X → [0, 1] is the unique probability mass function for any N consecu-
n

μ (x) = Pr(X1 = x1 , ..., XN = xN ). The entropy rate

tive variables in the sequence, pN
hμ is defined by:
H(N)
hμ = lim (10.16)
N→∞ N

If we now consider the two continuous sequences (X−N , ..., X−1 ) (the “past”) and
(X0 , ..., XM−1 ) (the “future”), their mutual information can be expresses by

I(X−N , ..., X−1 ; X0 , ..., XM−1 ) = H(N) + H(M) − H(N + M) (10.17)

If both N and M tend to infinity, we obtain the excess entropy [119] or the effective
measure complexity E [221]:

E = lim (2H(N) − H(2N)) (10.18)

N→∞

On the other hand, for any given N, letting M go to infinity, Bielik et al.’s predictive
information Ipred [57] is obtained from:
10.3 Complexity Measures 353

Ipred = lim (H(N) + H(M) − H(N + M)) (10.19)

N→∞

←−
Considering a time t, let Xt = (..., Xt−2 , Xt−1 ) denote the variables before time t,
−
→
and Xt = (Xt+1 , Xt+2 , ...) denote the variables after time t. The PIR It is defined as:
−
→← − −
→← − −
→ −
←
It = I(Xt ; Xt | Xt ) = H Xt | Xt − H Xt |Xt , Xt (10.20)

Equation (10.20) can be read as the average reduction in uncertainty about the future
←− ←− − →
on learning Xt , given the past. H(Xt | Xt ) is the entropy rate hμ , but H(Xt | Xt , Xt ) is a
quantity not considered by other authors before. It is the conditional entropy of one
variable, given all the others in the sequence, future and past.
The PIR satisfies the condition of being low for both totally ordered and totally
disordered systems, as a complexity measure is required to be. Moreover, it captures
a different and non trivial aspect of temporal dependency structure not previously
examined.

10.3.9 Self-Dissimilarity

A different approach to complexity is taken by Wolpert and Macready [568], who base
their definition of complexity on experimental data. They start from the observation
that complex systems, observed at different spatio-temporal scales, show unexpected
patterns, that cannot be predicted from one scale to another. Then, self-dissimilarity
is a symptom (sufficient but not necessary) of complexity, and hence the parameter
to be quantified. From this perspective, a fractal, which looks very complex at our
eye, is actually a simple object, because is has a high degree of self-similarity.
Formally, let Ωs be a set of spaces, indexed by the scales s. Given two scales
(i)
s1 and s2 , with s2 > s1 , a set of mappings {ρs1 ←s2 }, indexed by i, is defined: each
mapping takes elements of Ωs2 to element of the smaller scale space Ωs1 . In Wolpert
and Macready’s approach scales do not refer to different levels of precision in a
system measurement, but rather to the width of a masking window through which
the system is observed. The index i denotes the location of the window.
(i)
Given a probability distribution πs2 over Ωs2 , a probability distribution πs1 ←s2 =
ρ(i) (i)
s1 ←s2 (πs2 ) over Ωs1 is inferred for each mapping ρs1 ←s2 . It is often convenient to
summarize the measures from the different windows with their average, denoted by
πs1 ←s2 = ρs1 ←s2 (πs2 ). The idea behind the self-dissimilarity measure is to compare
the probability structure at different scales. To this aim, the probabilities at scales
s1 and s2 are both translated to a common scale sc , such that sc Max{s1 , s2 }, and
then compared. Comparison is made through a scalar-valued function Δs (Qs , Qs )
that measures a distance between probability distributions Qs and Qs over a space
Ωs . The function Δs (Qs , Qs ) can be defined according to the problem at hand; for
instance it might be
354 10 Simplicity, Complex Systems, and Abstraction

Δs (Qs , Qs ) = |KL(πs , Qs ) − KL(πs , Qs )|,

where KL is the Kullback-Leibler divergence, and πs is a fixed distribution, or, simply,

Δs (Qs , Qs ) = |KL(Qs , Qs )|.
Using the above notions, the self-dissimilarity measure can be defined as follows:

Is1 ,s2 ;sc (πs1 , πs2 ) = Δsc (πssc1 (πs1 ), πssc2 (πs2 ); πsc ) · P(πsc |πs1 , πs2 ) dπsc (10.21)

In order to compute Is1 ,s2 ;sc , it is necessary to specify:

(i)
• A set of mapping {ρs1 ←s2 } relating different scales;
• An inference mechanism to estimate the structure on one scale from another;
• A measure of how much similar two structures on the same scale are.
However, the above knowledge is difficult to obtain in a real system.

10.4 Abstraction and Complexity

In this section we try to compare some of the models of abstraction/approximation

introduced so far with some definitions of simplicity/complexity. As abstraction
aims at simplifying a problem, it would be good that an abstract configuration be
“simpler” than the ground one according to at least some measure of simplicity.
The analysis will be mostly qualitative, but sufficient to point out that the relation
between abstraction and simplicity is not at all obvious. We start from measures of
complexity/simplicity based on Kolmogorov’s algorithmic complexity.

10.4.1 Turing Machine-Based Complexity Measures

Let us consider first Kolmogorov complexity K(ψ) of a configuration ψ. Even

though K(ψ) is uncomputable, we can use an approximation of it by choosing a
specific encoding for the configurations in our domain, in an analogous way to what
Schmidhuber did [477]. In fact, what we are interested in is not to find the optimal
(shortest) description of objects, but to see how a given encoding changes across
abstraction levels.
A very basic (but also general) encoding, independent of any consideration about
the type and frequency of the actual configurations ψ ∈ Ψ , is to establish a one-to-one
correspondence between the elements of Ψ and the integers between 1 and T = |Ψ |
(codes). In this way, we can describe/transmit a particular ψ simply transmitting
the integer I that is its code. In this case, the cost in bits of the description (its
Kolmogorov complexity) is K(ψ) = lg2 T . This value is the maximum needed to
transmit any configuration, and the associated program simply writes I. If we consider
10.4 Abstraction and Complexity 355

the two sets Ψg and Ψa , we obtain: K(ψa ) = lg2 Ta < lg2 Tg = K(ψg ). In fact, we
have Ta < Tg by definition. Then, Kolmogorov complexity of Ψ decreases when
abstraction increases, as we would expect.
Clearly the above code is not effective, because Na and Ng cannot in general
be computed exactly. Moreover, this code does not say anything about the specific
ψg and ψa which are described, because Ig can be smaller, or larger, or equal to
Ia , depending on the order in which configurations are numbered inside Ψg and
Ψa . We consider then another code, which is more meaningful in our case. Let
(g) (g) (g) (g) (g)
Γg = ΓTYPE , ΓO , ΓA , ΓF , ΓR be a description frame, and let ψg be a con-
figuration in the associated space Ψg . We can code the elements of Γg by specifying
the following set of parameters:
• Number of types, V . Specifying V requires lg2 (V + 1) bits. List of types
{t1 , · · · , tV }. Specifying a type requires lg2 (V + 1) bits for each object.
• Number of objects, N. Specifying N requires lg2 (N + 1) bits. Objects are con-
sidered in a fixed order.
• Number of attributes, M. Specifying M requires lg2 (M + 1) bits. For each
attribute Am with domain Λm ∪ {UN, NA}, the specification of a value for Am
requires lg2 (|Λm | + 2) = lg2 (m + 2) bits (1 m M).
• Number of functions, H. Specifying H requires lg2 (H + 1) bits. Each function
fh of arity th has domain of cardinality N th and co-domain of cardinality ch . Each
tuple belonging to any RCOV(fh ) needs th lg2 (N + 1) + lg2 (ch + 1) to be
specified.
• Number of relations, K. Specifying K requires lg2 (K + 1) bits. Each relation
Rk has domain of cardinality N tk . Each pair belonging to any RCOV(Rk ) needs
tk lg2 (N + 1) to be specified.
As a conclusion, given a configuration ψg , its specification requires:

K(ψg ) lg2 (V + 1) + lg2 (N + 1) + lg2 (M + 1) + lg2 (H + 1)

M
+ lg2 (K + 1) + N lg2 (N + 1) + lg2 (V + 1) + lg2 (m + 2)
m=1
H
+ |FCOV(fh )| (th lg2 (N + 1) + lg2 (ch + 1))
h=1
K
+ |FCOV(Rk )| tk lg2 (N + 1)
k=1

In order to describe the abstract configuration generated from ψg we need to consider

the operators one by one. Let us consider here ωhobj (oj ). This operator hides oj from
the set of objects, as well as the values of its attributes; moreover, all tuples in any
cover of functions or relations, in which oj occurs, is hidden. Then:

K(ψa ) lg2 (V + 1) + lg2 N + lg2 (M + 1) + lg2 (H + 1)

M
+ lg2 (K + 1) + (N − 1) lg2 N + lg2 (V + 1) + lg2 (m + 2)
m=1
356 10 Simplicity, Complex Systems, and Abstraction

H (a)
+ |FCOV(fh )| (th lg2 N + lg2 (ch + 1))
h=1
K (a)
+ |FCOV(Rk )| tk lg2 N
k=1

(a) (a)
As N − 1 < N, |FCOV(fh )| |FCOV(fh )|, |FCOV(Rk )| |FCOV(Rk )|, we can
conclude that, for each ψg and ψa , K(ψa ) < K(ψg ).
Similar conclusions can be drawn for Vitànyi’s meaningful information, Gell-
Mann and Lloyd’s effective complexity, and Koppel’s sophistication. In fact, the
first part of a two-part code for ψg is actually a program p on a Turing machine,
which describes regularities shared by many elements of Ψg , including ψg . But
ψg may contain features that are not covered by the regularities. Then, K(ψg ) =
|p| + Kirr (ψg ). Without entering into details, we observe that the same technique
used for Kolmogorov complexity can be adapted to this case, by applying it to both
the regular and the irregular part of the description.
(g) (g) (g) (g) (g)
Example 10.1 Let Γg = ΓTYPE , ΓO , ΓA , ΓF , ΓR , be a ground description
frame, with:
(g)
ΓTYPE = {obj}
(g)
ΓO = {a,b,c}
(g)
ΓA = {(Color, {red, blue, green}), (Size, {small, medium,
large}), (Shape, {square, circle})}
(g)
ΓF = ∅
(g) (g) (g)
ΓR = {ROn ⊆ ΓO × ΓO }

In Γg we have V = 1, N = 3, M = 3, 1 = 3, 2 = 3, 3 = 2, H = 0, K = 1. Let
ψg be the following configuration:
ψg = {(a,obj,red,small,circle), (b,obj,green,small,square),
(c,obj,red,large,square)} ∪ {(a,b), (b,c)}

where RCOV(ROn ) = {(a,b), (b,c)}. Configuration ψg can be encoded with the

following string:
ψg →[V , N, M, 1 , 2 , 3 , H, K] ⊗ [(1, 1, 1, 1, 2), (2, 1, 3, 1, 1), (3, 1, 1, 3, 1)]
⊗ [(1, 2), (2, 3)],
which requires K(ψg ) = 12 + 27 + 8 = 47 bits.
If we apply ωhobj (b), we obtain the abstract configuration:
ψa → [V , N − 1, M, 1 , 2 , 3 , H, K] ⊗ [(1, 1, 1, 1, 2), (2, 1, 1, 3, 1)]
The complexity of ψa is then K(ψa ) = 12 + 18 = 30 bits.
The abstraction ratio ξ(ψa , ψg ), introduced in Definition 6.21 of Chap. 6, has here
a value of 30/47 = 0.638.
10.4 Abstraction and Complexity 357

The analysis of Bennett’s logical depth is more difficult. For simplicity, let us
assume that, given a configuration ψ, its logical depth be the run time taken by the
program p, whose length is exactly its Kolmogorov complexity. Without specifying
the nature of the configurations, it is not possible to say whether pa will run faster
than pg . In fact, even though the abstract configurations are “simpler” (according to
Kolmogorov) than the ground ones, they may be more difficult to generate. Then,
nothing precise can be said, except that an abstract frame can be, according to Bennett,
either simpler or more complex than the ground one. This result comes from the fact
that the logical depth is a generative and not a descriptive measure of complexity.

10.4.2 Stochastic Measures of Complexity

In order to consider a stochastic approach to complexity, the KRA framework must

be extended to a probabilistic setting. More precisely, let us define a Stochastic
Description Frame SΓ as a pair (Γ, π), where Γ is a description frame, and π is a
probability distribution over the set Ψ of associated configurations. We have also to
extend the notion of abstraction process:
Definition 10.6 Given a stochastic description frame SΓg = (Γg , πg ), let Π be an
Π
abstraction process, and let ψg ψa for all ψg ∈ Ψg . Let COMPg (ψa ) be the set of
ground states compatible with ψa . We define a probability distribution πa over Ψa as
follows:
πa (ψa ) = πg (ψg ) for all ψa ∈ Ψa (10.22)
ψg ∈COMPg (ψa )

In Eq. (10.22) the ψg ’s are disjoint, so that their probabilities can be summed up.
Notice that the sets COMPg (ψa ) are also disjoint for different ψa ’s, because each ψg
has a unique image in Ψa , for each abstraction process.
Let us consider now Lopez-Ruiz et al.’s normalized complexity CLMC [340], which
is the product of the normalized entropy and the diversity of Ψ . In probabilistic
contexts the comparison of complexities has to be done on configuration frames,
because it make no sense to consider single configurations. Then, the normalized
complexity (10.13) of the ground space is:

CLMC (Ψg ) = H ∗ (Ψg ) · D(πg )

1 1 2

=− πg (ψg ) log2 πg (ψg ) · πg (ψg ) −
log2 Ng Ng
ψg ∈Ψg ψg ∈Ψg

where Ng = |Ψg |. In the abstract space, we have instead:

358 10 Simplicity, Complex Systems, and Abstraction

CLMC (Ψa ) = H ∗ (Ψa ) · D(πa )

1 1 2

=− πa (ψa ) log2 πa (ψa ) · πa (ψa ) −
log2 Na Na
ψa ∈Ψa ψa ∈Ψa

where Na = |Ψa |. In order for the abstract configuration space to be simpler, it must
be:

D(Ψg ) H ∗ (Ψa )
H ∗ (Ψa ) · D(Ψa ) < H ∗ (Ψg ) · D(Ψg ) → < ∗ (10.23)
D(Ψa ) H (Ψg )

Equation (10.23) may or may not be verified. As an example, if the probability

distribution over the ground configurations is uniform, then D(Ψg ) = 0, whereas
most likely D(Ψa ) = 0; on the contrary, a uniform distribution over the abstract
configurations may not derive from a uniform distribution over Ψg . Then, there is no
fixed relation between D(Ψg ) and D(Ψa ), and hence, CLMC (Ψa ) CLMC (Ψg ).
Example 10.2 Let us consider the ground description frame of Example 10.1. In Γg
there are 18 possible vectors of attribute values for each of the objects {a,b,c}.
Moreover, the relation ROn has 24 possible (meaningful) covers.9 Then, Ψg contains
exactly Ng = 183 · 24 = 139968 configurations. Let us assume that πg is uniform.
Then:
H ∗ (Ψg ) = 1, D(Ψg ) = 0, CLMC (Ψg ) = 0

Consider now operator ωhattrval ((Size, {small,medium,large}), medium),

which hides the value medium from ΛSize . In any ground configuration ψg the only
change consists in replacing a medium value with UN in some object descriptions.
The cover RCOV(ROn ) does not change. Each ψa corresponds to a set COMPg (ψa ),
which includes all ground configurations that have any value in ΛSize in the place
of the UNs. In order to generate the abstract space Ψa , we partition Ψg into four
subspaces:
(0)
• Ψg , which contains those configurations in
which no object

has Size = medium.
Their number is m0 = 41472. In this case
COMPg (ψa )
= 1. Then, each corre-
sponding abstract configuration ψa has probability:

1
πa (ψa ) =
Ng

(1)
• Ψg , which contains those configurations in which a
single object
has Size =
medium. Their number is m1 = 62208. In this case
COMPg (ψa )
= 3. Then,
each corresponding abstract configuration ψa has probability:

9 For instance covers {(a,a)} and {(a,b),(b,a)} are impossible.

10.4 Abstraction and Complexity 359

3
πa (ψa ) =
Ng

• Ψg(2) , which contains those configurations in which exactly

two objects
have Size =
medium. Their number is m2 = 31104. In this case
COMPg (ψa )
= 9. Then,
each corresponding abstract configuration ψa has probability:

9
πa (ψa ) =
Ng

• Ψg(3) , which contains those configurations in which all

have Size =

three objects
medium. Their number is m3 = 5184. In this case
COMPg (ψa )
= 27. Then,
each corresponding abstract configuration ψa has probability:

27
πa (ψa ) =
Ng

Using the above figures we obtain, for verification purposes:

m0 m1 3 m2 9 m3 27
= + + + =1 (10.24)
Ng 3 Ng 9 Ng 27 Ng
ψa ∈Ψa ψg ∈COMP(ψa )

The total number of different abstract configurations is Na = 73859. We can now

compute the normalized entropy and the diversity of Ψa :

1 m0 m1 Ng m2 Ng
H ∗ (ψa ) = log2 Ng + log2 + log2
log2 Na Ng Ng 3 Ng 9
m3 Ng
+ log2 = 0.959 (10.25)
Ng 27

1 1 2 m1 3 1 2 m2 9 1 2
D(ψa ) = m0 − + − + −
Ng Na 3 Ng Na 9 Ng Na
m3 27 1 2
+ − = 0.00001807 (10.26)
27 Ng Na

Then:

H ∗ (Ψa ) = 0.959, D(Ψa ) = 1.81 · 10−5 , CLMC (Ψa ) = 1.73 · 10−5

As we may see, the abstract configuration space is slightly more complex that the
ground one. The reason is that Ψa is not very much diversified in terms of probability
distribution.
360 10 Simplicity, Complex Systems, and Abstraction

The opposite result would have been obtained if a non uniform probability distri-
bution πg over Ψg would generate a more uniform one over Ψa .
Let us consider now Shiner et al.’s simple complexity [487], reported in Sect. 10.3.5.
This measure is linked to the notion of order/disorder of the system under consider-
ation. Given a configuration space Ψ , its maximum entropy is reached when all con-
figurations have the same probabilities, i.e., Hmax (Ψ ) = log2 N. The actual entropy
of Ψ is given by
H(Ψ ) = − π(ψ) log2 π(ψ)
ψ∈Ψ

Then:
H(Ψ )
Δ(Ψ ) = = H ∗ (Ψ ),
Hmax (Ψ )

where H ∗ (Ψ ) is Lopez-Ruiz et al.’s normalized entropy, and

Ω(Ψ ) = 1 − Δ(Ψ )

The simple complexity Γαβ is equal to:

α β
Γαβ (Ψ ) = H ∗ (Ψ ) 1 − H ∗ (Ψ )

The function Γαβ (Ψ ), as a function of the normalize entropy, is a universal function,

in the sense that it does not depend on N. It assumes its maximum value when
α
H ∗ (Ψg ) = ,
α+β

and the maximum value is

α β
α α β
Γαβ =
α+β α+β α+β

In Fig. 10.11 an example of the function Γαβ (x) is reported. Let us consider a value
H(Ψ )
xg = log2 Ng g and let xg the other value of x corresponding to the same ordinate. Then,
H(Ψa )
Γαβ (xa ), where xa = log2 Na
, will be lower than Γαβ (xg ) if xa > xg or xa < xg . In
other words it must be:

H(Ψa ) H(Ψg ) H(Ψa ) H (Ψg )

> or <
log2 Na log2 Ng log2 Na log2 Ng

Example 10.3 Let us consider again the description frames of Example 10.2. In
this case Ng = 139968 and Na = 73859; then Hmax (Ψg ) = log2 Ng = 17.095
10.5 Summary 361

(x)
0.35

0.30

0.25

0.20

0.15

0.10

0.05

g 0.2 0.4 xg 0.6 0.8 1.0

H
x=
H max

Fig. 10.11 Graph of the curve Γαβ (x) = x α (1 − x)β for the values α = 0.5 and β = 1.3. When
α > 1 the curve has a horizontal tangent for x = 0. The x values (where x = HH max
) are included in
the interval [0, 1]. If xg is the value of x in the ground space, then the value xa in the abstract one
should either be larger than xg or be smaller than xg

and Hmax (Ψa ) = log2 Na = 16.173. The normalized entropies of Ψg and Ψa were,
respectively, xg = H ∗ (Ψg ) = 1 and xa = H ∗ (Ψa ) = 0.959. The value of the function
Γαβ (x) for xg = 1 is Γαβ (1) = 0; then, in this special case, the abstract space Ψa
can only be more complex than the ground one. The reason is that Γαβ (x) only takes
into account the probability distribution over the states, and there in no distribution
which has a greater entropy that the uniform one.

10.5 Summary

Abstraction is aimed primarily at simplification. Thus, it should show strong links

with those fields where complexity is the core of the study, as the theory of complex
systems. With this theory abstraction has a double relations: on the one hand, it
offers tools for simplifying systems, in order to lend them to an easier analysis. On
the other hand, it should borrow from the analytic treatment of complexity measures
guidance in defining more effective operators, guaranteed to simplify the system they
are applied to.
Even from the brief treatment presented in this chapter, it clearly emerges that the
relations between abstraction and complexity are articulated and far from obvious. In
fact, according to the complexity measure selected and the characteristics of the con-
sidered representation space, complexity in the abstract space may increase, decrease
or remain the same, especially when probabilistic considerations are introduced. This
conclusion holds for all the statistical measures of complexity considered.
362 10 Simplicity, Complex Systems, and Abstraction

In conclusion, at the present state of the art, only the measures of simplicity based
on Kolmogorov’s one are guaranteed to be co-variant with abstraction, as defined in
our model. This is due to the fact that the notion of simplicity we are interested in
is of descriptive nature and not a probabilistic one. This result can be extended to
other models of abstraction, because, as we have shown, they can be reconnected
to the KRA model. On the other hand, Kolmogorov complexity is hard to handle,
even in approximate versions, and appears far from the concreteness required by the
applications. However, this complexity measure could be a starting point for a more
operational definition of simplicity.
Chapter 11
Case Studies and Applications

n this chapter we will illustrate three examples of application, performed

by exploiting abstraction operators defined in the KRA model. The first one con-
cerns model-based diagnosis of a system’s malfunctions. The second concerns the
automatic creation of maps at different scales, and, more precisely, the acquisition of
knowledge for performing Cartographic Generalization. A third application refers
to the acquisition and use of a hierarchical Hidden Markov Model used for user
profile identification.

11.1 Model-Based Diagnosis

Model-Based Diagnosis (MBD) is an Artificial Intelligence approach developed for

the automatic diagnosis of faults in physical systems. The basic idea underlying
MBD is to exploit both a (qualitative) description of the system to be diagnosed and
a reasoning mechanism in order to produce explanations of a set of available obser-
vations of the system’s behavior in terms of the health conditions of its components.
The MBD approach has its roots in the work of Davis [124], where the usefulness of
knowledge of the system’s structure and behavior for automatic diagnosis is advo-
cated. A solid formalization of MBD has then been laid in the foundational works
of Reiter [449] and de Kleer and Williams [126]. Since then, MBD has captured a
growing interest in the Artificial Intelligence community.
A model-based diagnosis task receives in input a set of observations (e.g., sen-
sor readings) taken on the considered system, and a model which is able to pre-
dict the correct system’s behavior (or the faulty ones, depending on the approach).
Any discrepancy between the observed and the predicted behavior is a symptom to
be explained. Assuming a correct representation of the phenomena of interest, the

L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 363
DOI: 10.1007/978-1-4614-7052-6_11, © Springer Science+Business Media New York 2013
364 11 Case Studies and Applications

discrepancy originates from a fault in the system. The task is then to uncover which
components are faulty, in order to account for the detected discrepancies.
Notwithstanding the increasing number of applications to real-world problems,
the routine use of MBD is still limited by the high computational complexity, residing
in the large number of hypothesized diagnoses. In order to alleviate this complexity
problem, abstraction has been often advocated as one of the most powerful remedies.
Typically, abstraction is used to construct a hierarchical representation of the
system, in terms of both structure and behavior. The pioneering work by Mozetič
[379] has established a connection between some of the computational theories of
abstraction proposed in other areas of Artificial Intelligence (for instance, [214]) and
a notion of abstraction that could be usefully exploited in the MBD field. Since then
novel proposals have been made to apply theories of abstraction in the context of the
MBD task (see, for instance, Friedrich [184], Console and Theseider Dupré [113],
Provan [432], Chittaro and Ranon [99], Torasso and Torta [531], Grastien and Torta
[222], and Saitta et al. [467]).
The approach described by Mozetič, and by some more recent authors [99, 432],
mostly uses abstraction in order to focus the diagnostic process, and thus to improve
its efficiency; in particular, the diagnosis of the system starts by considering the
abstract level(s) and, whenever an (abstract) diagnosis if found, the detailed model
is invoked in order to justify such diagnosis in terms of one or more detailed ones.
Abstraction also makes possible to return fewer and more concise abstract diagnoses
when it is not possible to discriminate among detailed diagnoses. The works by
Console and Theseider Dupré [113], Friedrich [184], and Grastien and Torta [222]
accomplish this goal by including abstraction axioms in the Domain Theory and
preferring diagnoses which are “as abstract as possible”.
Recently, some authors have aimed at the automatic abstraction of the system
model (see, for instance, Sachenbacher and Struss [463], Torta and Torasso [532],
and Torasso and Torta [531]). If the available observables and/or their granularity
are too coarse to distinguish among two or more behavioral modes of a component,
or the distinction is not important for the considered system, a system model is auto-
matically generated where such behavioral modes are merged into an abstract one.
By using the abstract model for diagnosis there is no loss of (important) information,
while the diagnostic process is more efficient and the returned diagnoses are in a
smaller number and more significant.
In MBD, abstraction takes the form of aggregating elementary system compo-
nents into an abstract component, with the effect of also merging combinations of
behaviors of the original components into a single abstract behavior of the abstract
component. Component aggregation can be modeled, inside the KRA framework,
through an aggregation operator, which automatically performs all the required trans-
formations, once the components to be aggregated are specified. These transforma-
tions change the original ground description of the system to be diagnosed into a
more abstract one involving macrocomponents, obtained by suitably aggregating
elementary components. In abstraction for MBD a key role is played by the notion
of indiscriminability, which refers to alternative observation patterns leading to the
same diagnosis. Indiscriminability corresponds, in the abstraction literature, to the
11.1 Model-Based Diagnosis 365

notion of indiscernability. Indiscriminability plays an important role in the definition

of the aggregation operator, and allow us to guarantee that the information provided
by the ground description and the (reduced) information provided by the abstract
one do not impact discriminability among diagnoses.
In order to apply abstraction to MBD, some definitions have to be introduced.
Typically, in MBD a system model is defined in terms of the system components
and their connections. In the following, components are considered to have ports,
i.e., points of connections through which they communicate with other components.
Moreover, exogenous commands are commands that are issued from outside the
system model.
Definition 11.1 (System description) A System Description (SD) is a triple
(COMPS, TOP, DT ) where:
COMPS is the set of components. For each component c a set of predicates BMi (c)
can be used to state that c is in behavioral mode BMi (1 ≤ i ≤ mc ), where mc
is the number of behaviors of component c (there must be at least one predicate
BMi = OK for denoting the nominal behavior, and possibly one or more predicates
denoting faulty behaviors). The ports and exogenous commands of each component
are also defined, through suitable ground atoms.
TOP is a set of ground atoms defining the connections between components ports,
i.e., the topology of the system. The topology partitions the components ports into
Pint (internal ports that connect components with each other) and Pext (external ports
that connect components with the external environment)
DT is the Domain Theory, namely a set of logical formulas representing the rela-
tions between the behavioral modes of components and the values of their ports and
exogenous commands.
The above characterization of a system description is able to capture a wide variety of
static models, and, in particular, it does not require that the model be deterministic.
It is also worth mentioning that, although DT is expressed in terms of predicate
calculus, given that the interpretations relevant for MBD are finite and discrete, DT
can be reformulated into propositional logic. In this way diagnostic reasoning can
take advantage of state-of-the-art propositional reasoners.
Definition 11.2 (System status) A status S of the system is a set of ground atoms
such that for each c ∈ COMPS, S contains exactly one element of the form BMi (c).
From the definitions above it is easy to derive the more general notion of subsystem,
intended as a subset of components with their connections. In particular, a subsystem
Σ involving subset COMPSΣ ⊆ COMPS will have an associated internal topology
TOPΣ , a Domain Theory DTΣ , a set of ports and a set of exogenous commands.
In particular, the set of external ports PΣ,ext contains all the ports of COMPSΣ ’s
components that are connected with the external environment or with components
that do not belong to Σ, while the set of internal ports PΣ,int contains the other ports
of the COMPSΣ components.
366 11 Case Studies and Applications

A status SΣ of subsystem Σ is a set of ground atoms such that, for each c ∈

COMPSΣ , SΣ contains exactly one element of the form BMi (c).
We are now ready to formalize the notion of Diagnostic Problem and of Diagnosis.
Definition 11.3 (Diagnostic problem) A Diagnostic Problem is a triple DP = (SD,
X , O) where:
- SD is a system description,
- X is a set of ground atoms denoting the values of all the external commands,
- O is a set of ground atoms expressing measurements on a subset PO of the com-
ponents’ ports.
Definition 11.4 (Diagnosis) Given a diagnostic problem DP, a (consistency-based)
diagnosis for DP is a status S of the system such that:

DT ∪ X ∪ O ∪ S ⊥

The above definition corresponds to the characterization of consistency-based

diagnosis, requiring that the assignment of a behavioral mode to each component
c ∈ COMPS is logically consistent with X and O under the constraints imposed
by DT.
The observability degree of the system is determined by the subset PO of com-
ponents’ ports that are actually observed. It is worth noting that, even though perfect
observability does not guarantee that a unique diagnosis can be found, in general the
lower the observability, the larger the number of diagnoses.
The following definition introduces a notion of indiscriminability among states
of a subsystem, where the external ports of the subsystem are considered to be
observable, while the measurements on the internal ports of the subsystem, if any,
are ignored.
Definition 11.5 (Indiscriminability) Let Σ be a subsystem and XΣ be a set of
ground atoms denoting the values of the external commands of Σ. We say that two
of Σ are X -indiscriminable iff the following holds:
states SΣ , SΣ Σ

DTΣ ∪ XΣ ∪ SΣ ∪ PΣ ⊥ ⇔ DTΣ ∪ XΣ ∪ SΣ ∪ PΣ ⊥

where PΣ is any set of ground atoms expressing measurements on the external ports
PΣ,ext of Σ.
of Σ are indiscriminable iff
According to Definition 11.5, two states SΣ and SΣ
given any set of values measured on the external ports of the subsystem, SΣ and
are either both consistent or inconsistent with such measurements (under the
SΣ
constraints imposed by DTΣ and XΣ ).
11.1 Model-Based Diagnosis 367

u1 u2 u'1 V1 u'2 u''1 u"2

fin P1 P2 fout
v1 v2 v'1 v'2 v''1 v''2

Fig. 11.1 Fragment of an hydraulic system used as a running example. Two pipes P1 and P2 are
connected to a valve V1 through a set of ports (“u1 , . . . , v2 ”). The valve can receive an external
command s1

Fig. 11.2 Example of behavioral models of components. Fault PC denotes a partially clogged
pipe, BR denotes a broken pipe, whereas CL denotes a clogged pipe. Fault SO denotes a stuck open
valve, while SC denotes a stuck closed one

11.1.1 An Example: The Fragment of an Hydraulic System

In the following we will use a running example to clarify the notions we introduce. In
particular, we consider the fragment of an hydraulic system, reported in Fig. 11.1. It
includes two types of components, i.e., pipe (objects P1 and P2 ), and valve (object
V1 ). Each component has two input and two output ports, where Δflow and Δpressure
can be measured. Ports connect components with each other (as, for instance u2 and
u1 ), or connect components to the environment (as, for instance, u1 and u2 ).
In Fig. 11.2 the behavioral models for a generic valve and a generic pipe are
reported. The models are expressed in terms of qualitative deviations [510]; for
example, when the pipe is in behavioral mode pc (partially clogged), the qualitative
deviation Δfout of the flow at the end of the pipe is the same as the qualitative deviation
Δfin of the flow at the beginning of the pipe. For the pressure, instead, we have that
Δrin = Δrout ⊕ + where ⊕ is the addition in the sign algebra [510]. In other words,
the pressure is qualitatively increased at one end of the pipe with respect to the other
end.
Let us assume that pipe has one nominal behavior (ok) and three faulty modes
pc (partially clogged), cl (clogged) and br (broken), while the valve can be in the
ok mode (and, in this case, it behaves in a different way depending on the external
command s1 set to open or closed), in the so (stuck open) mode, or in the sc
(stuck closed) mode.
In order to formulate MBD within the KRA model, we must define the different
entities needed for modeling the domain at the various abstraction levels. At the most
368 11 Case Studies and Applications

basic (ground) level we have:

(g)
ΓTYPE = {pipe, valve, port, control}
(g) (g)
ΓO,pipe = {P1 , P2 , . . .}, ΓO,valve = {V1 , V2 , . . .}
(g) (g)
ΓO,port = {u1 , u2 , . . .}, ΓO,control = {s1 , s2 , . . .}
(g)
ΓA,pipe = {(Status, {ok,pc,cl,br})}
(g)
ΓA,valve = {(Status, {ok,so,sc})}
(g)
ΓA,port = {(Observable, {yes, no}), (Direction, {in, out}),
(Δr, {+, 0, −, NA}), (Δf , {+, 0, −, NA})}
(g)
ΓA,control = {(Status, {open, closed})}

There are no functions, but three relations, namely:

(g) (g) (g) (g)

ΓR = RisPortof ⊆ ΓO,port × ΓO,pipe ∪ ΓO,valve ,
(g) (g)
RConnect ⊆ ΓO,port × ΓO,port ,
(g) (g)
RControls ⊆ ΓO,control × ΓO,valve

The theory Tg contains two types of knowledge: an algebra over the qualitative
measurement values {+, 0, −}, and a set of rules specifying the component behaviors.
As an example of the first type of knowledge, the semantics of the qualitative sum
⊕ is given:

+ ⊕ + = +, + ⊕ 0 = +, + ⊕ − = {+, 0, −}
0 ⊕ 0 = 0, + ⊕ − = −, − ⊕ − = −

The second type of knowledge contains, for instance, the table in Fig. 11.2. Using
algorithm BUILD-DATA and BUILD-LANG, reported in Sect. 6.3, a database struc-
ture DB g can be constructed, as well as a language Lg to intensionally express DBg
and the theory. The database schema DB g contains the OBJ table, specifying all
objects and their type, the attribute tables PIPE-ATTR, VALVE-ATTR, PORT -ATTR,
and CONTROL-ATTR, and a table for each of the relations RisPortof , RConnect , and
RControls . As the construction of the tables is straightforward, we omit it.
The language Lg = Cg , X, O, Pg , Fg contains the set of constants correspond-
ing to the identifiers of all objects of all types, and to the elements of the domains of
the attributes. There are no function, so that Fg = ∅.
11.1 Model-Based Diagnosis 369

The set of predicates is the following one:

Pg ={pipe(x), valve(x), port(x), command(x)}∪

{status(x, v), observable(x, v), direction(x, v), r(x, v), f(x, v)}∪
{isportof(x, y), connect(x, y), controls(x, y)}

For the sake of simplicity, we define in the theory a set of additional predicates, which
correspond to complex formulas, such as, for instance:

valvec (x, s, u1 , u2 , v1 , v2 ) ⇔ port(u1 ) ∧ isportof(u1 , x) ∧ isportof(u1 , x)∧

direction(u1 , in) ∧ r(u1 , NA) ∧ port(u2 ) ∧ isportof(u2 , x)∧
direction(v1 , in) ∧ f(v1 , NA) ∧ port(v2 ) ∧ isportof(v2 , x)∧
direction(v2 , out) ∧ f(v1 , NA) ∧ command(s) ∧ controls(s, x)

Let us consider now the specific hydraulic fragment Pg of Fig. 11.1. In this fragments
there are two pipes, P1 and P2 , one valve, V1 , a control device s1 , acting on the valve,
and several ports connecting pipes to the valve, and components of the fragment with
the external world. The complete description of this system is contained in a data
base Dg , corresponding to the schema DB g .
Using the KRA model, we apply operator ωaggr ({pipe, valve}, pv) to gener-
ate a new type of component, i.e., pv, starting from a pipe and a valve. Using
meth(Pg , ωaggr ) we first aggregate pipe P1 and valve V1 (see Fig. 11.3, left), obtain-
ing thus an abstract component PV1 , as described in the right part of Fig. 11.3. In PV1
the ports connecting P1 and V1 are hidden, whereas the others remain accessible; hid-
ing these ports aims at modeling a reduction of observability, and has the potential of
increasing the indiscriminability of the diagnoses of the subsystem involving P1 , V1 .

u1 u2 u'1 V1 u'2 u1 u'2

P1 PV1
v1 v2 v'1 v'2 v1 v'2

Fig. 11.3 New abstract component PV1 of type pv resulting from aggregating a pipe and a valve.
Ports u2 and u 1 are connected with each other and disappear, as well as ports v2 and v 1
370 11 Case Studies and Applications

The algorithm meth(Pg , ωaggr ) is applied to the input (P1 , V1 ) and generates
in output PV1 . It also computes the new attribute values of PV1 , according to the
rules inserted in its body by the user. The algorithm also modifies the cover of the
relations, and computes the abstract behaviors as well, starting from the information
reported in Fig. 11.2. Without entering into all the details of meth(Pg , ωaggr ), we
can mention, among others, the following operations:
• P1 and V1 are removed from view, together with their attributes, and PV1 is
added, with attribute Status. For each value of Status(s1 ) and for each pair
(Status(P1 ), Status(V1 )) the value of Status(PV1 ) is computed according to the
rules for composing behaviors.
• Tuples containing any of the hidden ports are hidden in the covers of all relations.
The visible ports are connected to PV1 through relation RisPortof , and the control
device s1 is also associated to PV1 .
• In order to define the behavioral modes of a pv component, the instantiations of the
behavioral modes of the pipe and the valve are partitioned according to the indis-
criminability relation, and a new behavioral mode for pv is introduced for each
class of the partition (an algorithm for efficiently computing indiscriminability
classes is reported by Torta and Torasso [533]).
Once the behavioral modes of the abstract component have been identified, corre-
sponding predicates are introduced at the language level. In particular, assuming that
Status(s1 ) = open, we may name the new modes as Oam1 , . . . Oam4 . The modes
correspond to sets of instances of the subsystem components. As an example, we
provide the definition of mode Oam3 :

Oam3 ={status(P1 , br), status(V1 , ok)},

{status(P1 , br), status(V1 , so)}
{status(P1 , br), status(V1 , sc)}

The above described abstraction process can be automatize in the KRA model; start-
ing from a pipe and a valve serially connected, in principle the abstract component
could have 12 different behavioral modes. Actually, a large number of behavioral
assignments to the components P1 and V1 collapse into the same abstract mode of
the abstract component PV1 ; this is a strong indication that the abstraction is not only
possible but really useful.
The synthesis of formulas describing the behaviors of an abstract component pv
can be performed automatically by taking into account the formulas describing the
behaviors of pipes and valves, and the method meth(Pg , ωaggr ). In particular, given
that behaviors are described with qualitative equations like the ones of Fig. 11.2, it
is possible to synthesize qualitative equations for the abstract level; for example,
formulas for the behaviors of a pv component (with exogenous command open) are
reported in Fig. 11.4. We recall that meth(Pg , ωaggr ) stores in Δ(P ) the modifications
done during abstraction.
11.1 Model-Based Diagnosis 371

As already mentioned, a desired property of abstraction is to obtain a simpler

model without impacting the discrimination among competing diagnoses. It turns
out that, if we apply the ωaggr operator to components c1 and c2 and the ports
connecting such components are not observable, we have the guarantee that there is
no loss of information when diagnosis is performed at the abstract level. This result
is formalized by the following definition and property.
Definition 11.6 (Abstract system description) Let SD = (COMPS, TOP, DT ) be
a system description at the ground level, and let SDa = (COMPSa , TOPa ,
DTa ) be the system description obtained from SD by applying an aggregation oper-
ator ωaggr to a subsystem Γ involving components c1 , c2 of COMPS. In this way
COMPSa = COMPS −{c1 , c2 }∪{ac} where ac is the abstract component obtained
from c1 , c2 with ωaggr . We say that SDa is an abstraction of SD.
Property 1.1 Let SDa be an abstraction of SD, obtained by aggregating c1 and c2
into ac. Given a diagnostic problem DP = (SD, X , O) at the ground level and the
corresponding abstract diagnostic problem DP a = (SDa , X , O), it follows that D is
a diagnosis for DP iff its abstraction Da is a diagnosis for DP a .
Note that a key requirement of the property is that the observations O associated
with the diagnostic problems are the same at the ground and abstract level, i.e., the
ports connecting c1 and c2 (which have been hidden by ωaggr ) are not observed at
the ground level.
The example used here to describe the approach is very simple, and more complex
cases can be found in the papers by Torta and Torasso [532, 533], Saitta et al. [467],
and Grastein and Torta [222]. The use of abstraction in the described application
allowed aggregation operators to be applied repeatedly and automatically, and also
to check whether a given aggregation changed the set of possible diagnoses.

11.2 Cartographic Generalization

In this section we illustrate an example of a double role of abstraction, namely as

a target of learning, and as a tool for solving a complex task. In Chap. 7 we have
represented an abstraction operators ω as an abstract procedural type, separating its
definition (and the effects on a description frame) from the actual implementation,
specified in the method meth(Pg , ω). There we were not concerned with the way
the method was provided. Even though the overall structure of the program meth is

Fig. 11.4 Model of the

Abstract Component PV
372 11 Case Studies and Applications

(a) (b) (c)

Fig. 11.5 Representations used in cartographic generalization. a Aerial photography. b Geographic
database. c Map

normally given by the designer, there may be parameters in it, which lend themselves
to automated learning. The application of abstraction in the domain of Cartography,
which we present here, shows not only that performing abstraction in this domain
brings important improvements, but also that the methods associated to the operators
can be partially learned from examples, contributing thus substantially to achieve
those improvements.
In order to understand the role of abstraction in the Cartography domain, some
introduction is necessary. Automatically creating maps at different scales is an impor-
tant and still unsolved problem in Cartography [180]. The problem has been tackled
through a variety of approaches, including Artificial Intelligence (AI) techniques
[349, 553].
The creation of a map is a multi-step process, involving several intermediate
representations of the underlying geographic data. The first step is the acquisition
of a picture (cf. Fig. 11.5a), taken, usually, from an airplane or a satellite. From
this picture, an expert (a Photogrammetrist or a Stereoplotter) extracts a geographic
database (cf. Fig. 11.5b). This database contains the coordinates of all points and lines
that were identified by the stereoplotter on the picture. Moreover, s/he associates a
category (road, building, field, etc.) to the identified objects. Then, a third step consists
in defining, starting from the geographic database, the objects to be symbolized in
the map, e.g., which roads must be represented, their position, and their level of
detail (e.g., the sinuosity of the road). The obtained representation is a map (cf.
Fig. 11.5c). It is important to notice that the third step has to be repeated for each
desired scale, increasing the cost and time of producing maps at different levels of
detail and reducing flexibility. In fact, maintaining multiple databases is resource-
intensive, time consuming and cumbersome. Furthermore, the map may be completed
with various kinds of information corresponding to the type of thematic map desired
(geology, rainfall, population, tourism, history, and so on).
11.2 Cartographic Generalization 373

(a)

(b) (c)

Fig. 11.6 a Part of a map at 1/25000 scale. b A 16-fold reduction of the map. c Cartographic
generalization of the map at the 1/100 000 scale. By comparing b and c the differences between
simply reducing and generalizing are clearly apparent. [A color version of this figure is reported in
Fig. H15 of Appendix H]

Among the different procedures of interest in Cartography, we focus on one in

particular, namely cartographic generalization.1 Cartographic generalization refers
to a precise process of producing maps at a desired scale from data represented at a
larger scale (and therefore more detailed). In other words, the goal of the process is to
represent the same nominal world on a smaller surface (screen or paper) than the one
originally used. This transformation is by no means a simple proportional reduction,
but requires simplification of the initial representation, emphasizing important ele-
ments, and removing unnecessary details. The choice of the elements to emphasize
or remove must be done on the basis of the particular usage of the map, so that the
map is readable at the chosen scale.

1 To avoid any confusion, we should mention here that the term generalization used in the field of

Cartography does not correspond to the same term used in Artificial Intelligence, but refers to the
process of generating a map while simplifying data.
374 11 Case Studies and Applications

For the sake of exemplification the cartographic generalization process is illus-

trated in Fig. 11.6. Here, the initial data is represented using a 1/25000 scale, and the
goal is to produce a topological map at the 1/100000 scale. This particular change of
scale is required when one needs to visualize larger areas of the map, or to change
the visible details. In this case, the same geographic area must be represented on
a surface that is 16 times smaller. Attempting to simply reduce by a factor 16 the
initial image leads inevitably to a map where all components (roads, house, roads,
etc.) become unreadable and the general structure of the geographic space is lost.
One may think that a simple elimination of some of the map’s components shall
clarify the map, which is true, but, nonetheless it does not make the remaining compo-
nents more readable. It is then required to caricature or schematize the landscape so
as to emphasize key features. Certain elements are eliminated, others enlarged, others
simplified, and others moved. These transformations must be done while ensuring a
good fidelity to the geographic reality and to the spatial relations among represented
objects.
Cartographic generalization constitutes a long and expensive procedure inside
the map production process. Many researches have tried, in the last decades, to (at
least partially) automate this procedure [78, 284, 314, 349, 361, 404, 459]. The
automatization would allow the cost and time necessary to produce paper maps to
be strongly reduced, and this is even more true for electronic maps, where the user
may want to see different parts of the same map with different levels of detail. An
automated tool to produce (parts of) maps at a required scale on demand would be
of utmost utility.
Notwithstanding all the efforts devoted to cartographic generalization, a generally
applicable solution is still out of reach. For this reason, Machine Learning techniques
have been employed to ease the task of cartographic knowledge acquisition [233,
393, 447, 460]. However, owing to the complexity of the task, applying learning
directly to the geographic database is inappropriate. On the contrary, some prelimi-
nary simplification is required, and then abstraction enters the picture.
The KRA model looks particularly well suited to the task, because it closely
matches the way cartographers conceptualize their domain, as illustrated in Fig. 11.7.
In fact, the cartographic generalization process starts from capturing a part of the earth
surface by aerial or satellite photography, providing pictures covering the zone of
interest. This step corresponds to acquiring observations (the perception) in the KRA
model. The elements appearing in the images are then memorized in a Geographical
Data Base (GDB) or Geographical Information System (GIS), corresponding to the
database in KRA. The GDB contains the location of the skeleton of the geographic
objects occurring in the image (such as the axis of a road), represented as a list of
coordinates of points belonging to the objects, and augmented by a set of attributes
(such as the number of lanes in a road).
Objects described in the GDB cannot be directly and understandably represented
on a map. A symbolic description is needed; for example, a road is represented on
a map as a line. The symbols are part of an iconic language. Finally, a theory is
necessary to reason about the map’s content. This theory contains the geographic
11.2 Cartographic Generalization 375

Generic KRA Model ...

W World W Geographic
World

Perception Data Collection

Perceived
P(W) World ( ) P (W) Aerial image

Memorization Plotting

Data Structure
S ( )
S GIS/GDB

Description Map representation

L Language ( ) L Icons for a Map

Description Description

T Theory ( ) T Geographic theory

Fig. 11.7 Correspondence between the KRA model and production steps of a map. The perception
of the world (the observations) corresponds to an aerial image of a zone of the surface. The stereo
plotting process generates a set of point coordinates, memorized in a geographical database. An
iconic language allows the numerical data containing in the databases to be transformed into the
symbols that appear in the map (rivers, roads, cities, buildings, and so on). A geographic theory,
which can be expressed in the defined language, allows the map to be targeted to a specific use and
interpreted

knowledge that allows the various geographic configurations to be analyzed, and it

is actually this very theory that guides the conception of the map.
Considering the above association between elements of a cartographic concep-
tualization and elements of the KRA model, we can define the following query
environment QE = Q, Γ, DS, T , L , where the query Q represent the task “Draw
a map at a specified scale”, Γ describes the output for the camera (including rivers,
roads, building,. . ., with their attributes, measurements, and interrelations), DS cor-
responds to the GBD, L is the iconic language containing the symbols to be used in
the map (the map’s legend), and T contains all the tools necessary to draw the map,
the map’s goal, its intended use, and so on. If the GBD is more detailed than what is
required by the target scale, abstraction has to be used.
In order to illustrate the ideas exposed above, let us consider two examples, namely
the simplification and the displacement of buildings, two very common operations in
Cartography. In these examples, buildings are symbolized through black polygons.
Simplification is motivated by the fact that a too detailed object is not readable on
a map. On the other hand, if two objects are too close on a map, they cannot be
distinguished anymore. Figure 11.8 illustrates these two transformations.
376 11 Case Studies and Applications

Image of the objects at the scale...

initial final initial final

Object before
transformation

Simplification Displacement

Object after
transformation

Fig. 11.8 Simplification and displacement of buildings during reduction of the map scale. In the
upper-left corner there is the building as it appears in the initial representation (or, better, as it would
have appeared if it would have been represented at the level of detail with which it is memorized
in the GDB), and in the final representation (simple reduction). In the lower-left corner the object
appears as it is after simplification (both in the original and in the final representation). In the right
part of the figure a displacement of two buildings is described

Let us look at the two operations from the KRA point of view, with the aim of
finding an abstraction operator implementing them. In the case of simplification, the
effect is obtained by degrading the shape information in the GDB, whereas, in the
displacement, by changing the location information. In both cases the understand-
ability of the map, by the part of the reader, is improved with respect to what would
be perceivable without transformation, even though part of the original information
is lost (hidden).
Considering first the displacement operation, we can see that nothing changes in
the building’s shape representation from one scale to the other, except their relative
positions. If a building’s location is represented with the coordinates (xc , yc ) of its
center of mass, inthe ground map the distance between two buildings b1 and b2
2 2
is given by d = xc,1 − xc,2 + yc,1 − yc,2 . We take the occasion to observe
that, if the change of scale is realized by a proportional shrinking of all the linear
dimensions (as it happens in a photocopy) the translation from one scale to the other
would not involve any abstraction, but a simple reformulation. In fact, all the location
information in one map would be in a one-to-one correspondence with those in the
other, and no information is lost or hidden. Hence, the distance between the modified
buildings b 1 and b 2 would be

2 2
− x − y
d = xc,1 c,2 + yc,1 c,2 =

2 2
= α2 xc,1 − xc,2 + α2 yc,1 − yc,2 = α · d (11.1)
11.2 Cartographic Generalization 377

On the contrary, the displacement operation changes, for instance, the new location
(ap) (ap)
of building b 1 from (αxc,1 , αyc,1 ) to (xc,1 , yc,1 ), in such a way that

2 2
(ap) (ap) (ap)
d =
xc,1 − xc,2 + yc,1 , yc,2 > d (11.2)

By recalling the discussion reported in Sect. 7.6, we observe that, according to our
model, the displacement operation is a reformulation followed by an approxima-
tion. The reformulation changes all x into αx and all y into αy. If a building is an
object of type building, and Xc and Yc are two of its attributes with domain R,
then the reformulation construct b with attributes Xc = αXc and Yc = αYc . After-
(ap)
ward, the approximation process Πap = ρrepl (Xc , R), (Xc , R) , ρrepl (Yc , R),
(ap) (ap) (ap)
(Xc , R) generates b(ap) with attributes Xc and Yc , whose values are chosen
in such a way that condition (11.2) holds.
(ap)
In order to realize Πap , the methods meth Pg , ρrepl (Xc , R), (Xc , R) and

(ap)
meth Pg , ρrepl (Yc , R), (Yc , R) must be provided. These methods contain
parameters; for instance, only buildings that are very close must be displaced, and
only certain directions are useful to separate the buildings, and so on. These para-
meters can be learned from examples.
Moving to simplification, this operation is actually an abstraction, because some
of the information regarding the perimeter of a building is hidden. In the KRA
model, simplification can be modeled with more than one operator, depending on
the way building are represented. For what follows, we refer to Fig. 11.9.
Suppose that a building is represented in the GBD as a sequence of points (speci-
fied by their Cartesian coordinates) belonging to the perimeter, such as (1, 2, . . . , 9).
The theory shall contain an algorithm LINE that, given the coordinates of two points,
draws a straight line between them on the map. In this case, starting from the original
building (the leftmost in Fig. 11.9), we can obtain the final one (i.e., the rightmost),
by iteratively applying operator ωhobj (j) to some point j in the perimeter: the choice

Fig. 11.9 Simplification of the perimeter of a building b into a building b(a) . Given a sequence of
points, the intermediate ones can be hidden by applying a ωhobj operator. The method associated to
ωhobj contains an algorithm which draws a segment between the two extreme points
378 11 Case Studies and Applications

of what point to hide at each step, and the criterion to stop are provided in the associ-
ated meth(b, ωhobj (j)). For example, the final building in Fig. 11.9 can be obtained
from the original one by means of following combination2 of operator applications:

{ωhobj (2)), ωhobj (4)), ωhobj (8))} ⊗ ωhobj (3))

Notice that it is up to the user to say when to stop. For instance, hiding points could
have included a further step for hiding point 5, obtaining thus a quadrilateral.
Another (maybe more interesting) way of proceeding would be to define a domain-
specific operator ωsimplify ((p1 , . . . , pk ), (p1 , pk )), which, taken in input a sequence
of k points, hides all the intermediate ones, by leaving only the first, p1 , and the
last one, pk . This operator could be expressed as a “macro”-operator in terms of the
elementary ones described above.
In the cartographic domain there are other standard geometric transformations
[361], in addition to simplification and displacement:
• geometry enhancing (e.g., all angles of a building are drawn as right angles)
• deletion
• schematising (e.g., a group of objects is represented by a smaller set of objects,
but with the same repartition)
• caricaturing (e.g., an object or one of its parts is enlarged to give it more relevance)
• change of geometric type (e.g., a city, stored in the GDB with its exact surface
coverage, is represented as a point on the map).
Without describing the above operations in detail, we can say that all can be con-
sidered (as well as simplification and displacement) as kinds of combination of
abstractions and possibly an approximation step.

11.2.1 Operator Learning for Cartographic Generalization

In this section we briefly show how the parameters occurring in the methods of the
above introduced operators can be automatically learned, with the goal of speeding
up and improve the process of cartographic generalization.
The first idea is to use, for learning, the scheme of Fig. 11.10, where a transfor-
mation (an operator or set of operators, in our model) is learnt directly from the
GBD. This is the approach taken by Werschlein and Weibel [557] and Weibel et al.
[554], who propose to represent this function by means of a neural network. Even
though interesting, this approach has the drawback or requiring a very large num-
ber of examples for learning, and is sensitive to the orientation of the object to be
represented.
In order to overcome the above problems, we have used a two-step approach,
where an abstract representation of the original object is first generated, by replacing

2 This combination is just one among several others.

11.2 Cartographic Generalization 379

abstraction
{(xj,yj)}1 j 6

h1?

{(xi,yi)} 1 i 20

Reformulation

Fig. 11.10 Task of learning how to transform an element, i.e., learning the operator ω. The perimeter
of the building on the right has six sides (therefore less than the twenty on the left) and is more
regular

the quantitative description provided by the GBD with a qualitative one. Then, oper-
ators are applied to this abstract description, and their parameters are automatically
learned. More precisely, abstract objects are no more represented by a list of coor-
dinates, but by a fixed number of measurements describing them (size, number of
points, concavity). These measurements have been developed in the field of spatial
analysis [360, 422]. The two phases are the following ones:
• Learning how to link the original, numerical object’s attributes (such as Area,
Length, ) to abstract qualitative descriptors (such as the linguistic attribute Size),
using abstraction operators (e.g., “Area < x” → “Size = small”).
• Linking the new description to the operation to be performed on the object (e.g.,
“Size = small” and “Shape = verydetailed” → Apply Simplification with
parameters θ).
The abstraction step, involved in moving from a quantitative to a qualitative rep-
resentation of cartographic objects, can be modeled, in KRA, with operators similar
to those that generate linguistic variables in Fuzzy Sets, as described in Sect. 8.8.1.4.
Such operators establish an equivalence among subsets of values of attributes, and
label these subsets with linguistic terms, to be assumed by a predefined linguistic
variable. Each of these has the following form:

ωeqattrval (X, ΛX ), [xi , xj ], Lij , (11.3)

where X is the original, numerical attribute, ΛX is its domain, [xi , xj ] is the interval of
values that are made indistinguishable, and Lij is the associated linguistic term, taken
on by a linguistic variable L (defined by the user). As the expert is usually unable
to reliably supply the interval [xi , xj ], then it is learned from a set of examples. For
instance, to describe a building an expert can say whether it is small, medium or big,
380 11 Case Studies and Applications

Fig. 11.11 Features used to describe buildings, supplied by Regnaud [446]

Fig. 11.12 Set of “opera- NAME

tions” applicable to buildings Simple dilatation
Simplification
Squaring
Squaring/Simplification/Enlargement
Simplification/Squaring/Enlargement

but s/he will not be able to say that a small building is one with an area smaller than
300 m2 . S/he is able to show examples of small buildings, but usually is unable to
provide a threshold on the surface measurement to characterize small buildings.
The above learning approach has been tested on a task of abstracting buildings.
In order to collect the necessary data and knowledge a space analysis and algorithm
expert 3 was asked to define:
• A set of measures describing buildings, reported in Fig. 11.11. The algorithms for
computing these measures have been defined by Regnauld [446].
• A set of “operations” applicable to buildings, listed in Fig. 11.12.
Each “operation” of Fig. 11.12 has been translated into a corresponding abstraction
operator (or combination thereof). Afterwards, a Cartography expert was asked to
define a set of qualitative abstract descriptors for a given building that are somehow
related to the above-mentioned measures; these descriptors are reported in Fig. 11.13.
Then, the Cartography expert provided a set of 80 observations of buildings, and
he was first asked to describe each building with the defined qualitative descrip-
tors (this building has Shape = L-shape, the Size = medium, or Contains =
no-big-wings). The same set of buildings, each one abstracted by each opera-
tion, was presented to him, and he was asked to say, for each abstract building, if the
result was acceptable or not. Meanwhile the set of measures chosen by the expert

3 N. Regnauld from Edinburgh University, UK.

11.2 Cartographic Generalization 381

Fig. 11.13 Abstract descriptors of building shape

Fig. 11.14 Two of the eighty examples of buildings used in the Machine Learning test

was computed on each building. Two examples of buildings are shown in Fig. 11.14.
Eighty examples were used to first learn how to link measures to abstracted descrip-
tors, then to link abstract descriptors to applicability of each operations.
Figure 11.15 shows a decision tree learned by C4.5 to determine the linguistic
values of the abstract feature Size from a given set of measures.
In addition to the positive qualitative evaluation of meaningfulness, given by the
expert, the abstract descriptions proved to be effective in improving the following
step of choosing the transformation, as well. Finally, the automatically learned rules
382 11 Case Studies and Applications

Size =
<137 m2 Small

> 137 m2 Size =

Surface
and < 1168 m2 Medium
Size =
<0.93 Medium
> 1168 m2
Concavity

>0.93 Size =
Big

Fig. 11.15 Learned decision tree for determining the values of the qualitative attribute Size

Fig. 11.16 Different road representation. The use of abstraction produced the best compromise
between readability and amount of details kept

reduced by a factor 5 the number of cases in which the transformation had to be

chosen manually by an expert. Detailed results of the experiments can be found in
previous works [392, 393].
Encouraged by the results on buildings, we did a more complicated experiment on
roads [390, 391], as proposed in [389, 422]. A set of measures to describe roads was
defined, as well as an abstract language, reported in Tables 11.1 and 11.2, respectively.
Moreover, a set of abstraction operators has been identified. Then, 120 training
examples were collected, and the RIPPER algorithm [110] was used to learn the
operators’ parameters.
By repeating the learning process to acquire the parameters of the identified
abstraction operators, results such as those reported in Fig. 11.16 have been obtained.
As it can be seen from this figure, the use of abstraction improved a lot the read-
ability of road representation. Other examples are reported in Appendix B. Detailed
information about this application can be found in previous works [389, 390, 391].
Table 11.1 Partial description of the road database
11.2 Cartographic Generalization

ID Class Measures describing the considered line

Transform-choice Length Length_base Length_base_ratio #Curves #_Large_curves Lissing_resist Curve_freq Max_curve_size
5094 Stop 5.8 5.81 1.00 1 1 1.00 0.17 0.02
3114 Dec-symbolization 18.2 11.09 0.61 16 9 0.56 1.44 0.26
5105 Accordeon 2.2 1.22 0.55 3 3 1.00 2.46 0.28
5105 Stop 2.9 2.02 0.69 3 2 0.67 1.49 0.45
5107 Gauss 2.9 2.25 0.79 2 2 1.00 0.89 0.27
5107 Stop 2.4 2.25 0.95 2 2 1.00 0.89 0.19
The columns represent several of the attributes used to describe the roads
383
384 11 Case Studies and Applications

Table 11.2 Partial description of the roads in the abstract language

Road Size Complexity Sinuosity Shape Width Envir-
ID onment
5094 Middle Zero-level Null Straight-line Nul Free
3114 Middle Several-levels Heterogene Long-serie Heterogene Dense
5105 Small One-level Mountain-road Sequence-of-bends Strong Free
5105 Small One-level Mountain-road Sequence-of-bendss Nul Dense
5107 Small One-level Soft-bend Sequence-of-bendss Nul Dense
5107 Small One-level Soft-bend Sequence-of-bendss Nul Dense

11.3 Hierarchical Hidden Markov Models

In Chap. 10 some relationships of abstraction with complex networks have been

highlighted. A particular type of networks, where abstraction proved to be crucial
for reducing computational complexity, are Hierarchical Hidden Markov Models
(HHMM). In a series of papers (see, for instance, [187, 188]) HHMMs, synthesized
using abstraction operators, have been successfully used in learning from traces the
profiles of computer users in order to detect illegal accesses. HHMMs are a well
formalized tool, well suited to model complex patterns in long temporal or spatial
sequences. The method for acquiring the profiles makes substantial use of aggregation
operators, which progressively search for groups (motifs) of elementary facts in the
data sequences, thus building the abstraction hierarchy of a HHMM, layer after layer.
A Hidden Markov Model (HMM) is a stochastic finite state automaton [441]
defined by a tuple λ = Q, O, A, B, π , where:
• Q is a set of states,
• O is a set of atomic events (observations),
• A is a probability distribution governing the transitions from one state to another.
Specifically, any member ai,j of A defines the probability of the transition from
state qi to state qj , given qi .
• B is a probability distribution governing the emission of observable events depend-
ing on the state. Specifically, an item bi,j belonging to B defines the probability of
producing event Oj when the automaton is in state qi .
• π is a distribution on Q defining, for every qi ∈ Q, the probability that qi is the
initial state of the automaton.
Three basic problems are connected with HMM:
1. Given a sequence of observations O = {o1 , o2 , . . . , on }, evaluate the probability
that O be generated by λ;
2. Given a sequence of observations O, evaluate the path on λ that most likely
generated O;
3. Given a set of sequences O, estimate A, B and π that have the maximum probability
of generating O.
Three dynamic programming algorithms exist for solving problems (1), (2) and
(3) [441], known under the name Forward Backward, Viterbi and EM, respectively.
11.3 Hierarchical Hidden Markov Models 385

q12
q10 q13

q11

q00 q01 q02

q00 q01
q00 q01 q02

Fig. 11.17 Example of hierarchical hidden Markov model

The difficulty, in this basic formulation, is that, when the set of states Q grows large,
the number of parameters to estimate (A and B) rapidly becomes intractable. As
a matter of fact, in many applications to Pattern Recognition, an HMM λ may be
required to have a very large set of states. One possibility to address this problem is
to impose a structure upon the automaton, by a priori limiting the number of state
transitions and the possible symbol emissions. This corresponds to setting to 0 some
entries in matrices A or B (cfr. [143]).
Another way to face the structural and computational complexity of the HMM
is to use abstraction, searching for group of states to aggregate, subject to some
constraints. The results is an extension of the basic HMM, namely the Hierarchical
HMM (HHMM) first proposed by Fine, Singer and Tishby [169]. The extension
immediately follows from the regular languages property of being closed under sub-
stitution, which allows a large finite state automaton to be transformed into a hierar-
chy of simpler ones. More specifically, an HHMM is a hierarchy where, numbering
the hierarchy levels with ordinals increasing from the lowest towards the highest
level, observations generated in a state qi k by a stochastic automaton at level k are
sequences generated by an automaton at level k − 1. The emissions at the lowest
levels are again single tokens as in the basic HMM. Moreover, no direct transition
may occur between the states of different automata in the hierarchy. As in HMM,
in every automaton the transitions from state to state is governed by a probability
distribution A, and the probability for a state being the initial state is governed by a
distribution π. The constraint is that there is only one state that can be the terminal
one. Figure 11.17 shows an example of HHMM.
The major advantage provided by the hierarchical structure is a strong reduction
of the number of parameters to estimate. In fact, automata at the same level in the
hierarchy do not share interconnections: every interaction through them is governed
by transitions at the higher levels. This means that for two automata λl,k , λm,k at
level k the probability of moving from the terminal state of λl,k to one state of λm,k
386 11 Case Studies and Applications

is determined by a single parameter associated to a transition at level k + 1. A second

advantage is that the modularization enforced by the hierarchical structure allows
the different automata to be modified and trained individually.
The basic abstraction operator used to build up the hierarchical model is the
aggregation operator ωaggr (oldstate, newstate), which works on objects of
type oldstate and generates objects of a new type newstate. The operator
is applied to various parts of the input data sequences, and, recursively, on already
created aggregated states. The observations Pg , at the lowest level, contain a learning
set LS of sequences. The problem, in applying ωaggr (oldstate, newstate), is
to decide with groups of old states are to be aggregated. To this aim, the method
meth(Pg , ωaggr ) is a learning algorithm that includes two phases: frame induction,
and model refinement. The frame induction is activated bottom-up and produces a first
hypothesis for a two level HHMM starting from the set LS of sequences. The model
refinement is called top-down one or more times in order to refine the HHMM, built
in the former phase, until a stable model is obtained. As abstraction is only involved
in the first phase, we will focus on it. In the following, the group of states to be
aggregated is called a motif.
METHOD: meth(Pg , ωaggr )
1. For every different pair of sequence (l1 , l2 ) in LS find all local alignments between
l1 and l2 , having a sufficient statistical evidence, and collect them into a set S. S
contains the potential motifs.
2. Apply a clustering algorithm to subsequences S using the Levenstein’s distance
as distance measure. Clusters with cardinality below a given threshold tc are
discarded.
3. For every retained cluster Ci construct a model Mi of the subsequences contained
in it. To every model Mi give a symbolic name μi .
4. Construct an abstract alphabet Σ containing all names μi given to the automata
constructed in the previous step.
5. Abstract every sequences li ∈ LS using the alphabet Σ.
6. For each cluster Ci of subsequences generated as above, do
a. Construct the multiple alignment MAi among all subsequences in Ci
b. Convert MAi into a left-to-right Hidden Markov Model HMMi . Depending
on the given constraints a different model type may be chosen.
c. Estimate the parameters λ of HMMi on the sequences in Ci .
The algorithm meth(Pg , ωaggr ) works as follows: each HMM which is found is
labeled with a different name and the original sequences are rewritten into the new
alphabet defined by the set of names given to the models. Firstly, every sequence
in LS is processed searching for subsequences corresponding to instances of the
HMMs constructed in the previous step. Everywhere an instance of model HMMi is
found, a hypothesis hi = (μi , b, e, p ) is emitted, being μi the symbol in Σ associated
to HMMi , b the instant where hi begins in the original sequence, and e the instance
where it ends; p is the probability for hi being generated by HMMi . In this way,
for every sequence l a lattice of hypotheses is obtained. Afterwards, lattices are
11.3 Hierarchical Hidden Markov Models 387

processed extracting from each one the sequence which includes the most likely
hypotheses and is compatible with the given constraints. The default constraint is
that hypotheses must not overlap. Finally, every sequence is transformed again into
a string of symbols in order to be able to process it with standard local alignment
algorithms in the next step.
After the phases of abstraction and refinement, a hierarchical structure, such as
the one reported in Fig. 11.17 is obtained. As we can see, the original 8 states are
abstracted into 4, by reducing thus the number of parameters in the probability
distributions to be learned. The methodology sketched above has been applied to a
real-world problem of user profiling, and was able to handle ground HMMs with
some thousands of states [188]. This results could not have been achieved without
abstraction.

11.4 Summary

In this chapter we have briefly illustrated how the KRA model of abstraction has
been used in practice in three non trivial applications. In Model-Based Diagno-
sis (MBD) the model offers a number of advantages in capturing the process of
abstracting the architecture of the diagnosed system. In particular, the use of generic
aggregation operators allows different subsystems to be automatically abstracted
with the only need to specify the parameters in their associated methods. In other
words, the aggregation operators work independently of the nature of the aggregated
components, only exploiting the components’ topological arrangement. This is an
important feature of the model, because it frees the user from the need to redesigning
the procedure for abstracting different types of components.
In recent years, novel approaches have been proposed to try to solve the inverse
problem, i.e., to exploit the degree of observability for defining useful abstraction;
moreover, other works have started to investigate the problem of automatic synthesis
of abstract models. While these approaches have developed some interesting solu-
tions to the problem of abstracting models for MBD, the obtained results, even though
encouraging, are still limited, especially with respect to domain-independence. The
use of the KRA model, which offers a large set of already working and tunable oper-
ators, may help improving the results. The application of the model to Cartography
proved very useful, so that it was extensively used in various tasks related to map
production.
The application of the KRA model to the automated acquisition of the structure
and parameters of an HHMM is perhaps the most natural one, as the very structure
of an HMM spontaneously suggests state aggregation.
An indication of the flexibility and usefulness of the KRA model comes from the
fact that it has been used in real-world applications which are very different from one
another, yet exploiting the same set of operators. Another significant application of
the model to the problem of grounding symbols in a robotic vision system has been
described by Saitta and Zucker [470].
Chapter 12
Discussion

n this chapter we discuss some additional topics related to abstraction in

general, and we present some extensions of the KRA model that suggest possible
future improvements.

12.1 Analogy

The first topic that we find interesting at this point to further investigate how abstrac-
tion has been linked with analogy. The word analogy derives from the Greek
ὰναλoγ ία, which means proportion. Analogy is a powerful mechanism in both
human and animal reasoning [534], at the point that Hofstadter puts it at the very core
of cognition [263]. Differently from other reasoning mechanisms, such as induction
or abduction, analogy is a mapping from a particular to a particular. Oxford dictionary
defines analogy as “a comparison between one thing and another, typically for the
purpose of explanation or clarification”, or “a correspondence or partial similarity”,
or still “a thing which is comparable to something else in significant respects”. From
the above definitions it appears that analogy is somewhat equated to “similarity”, but
we argue that there is more to analogy than simple resemblance. We think, instead,
that analogy has its root in abstraction, as schematically illustrated in Fig. 12.1.
A characteristic of analogical reasoning is that it involves not only a holistic analy-
sis of the concerned entities, but also an account of their behaviors and relations inside
their contextual environment. Then, it is a more complex process than recognition
or classification, or extracting common features, or computing a similarity measure.
Just to give a feeling of the difference that we see, we report here an anecdote.
Jaime Carbonell, at Carnegie Mellon University (Pittsburgh, MA), one time told his
undergraduate students that an even number can be written as 2n, with n integer. To the
question of how an odd number could be defined, several students answered 3n. This
erroneous answer can be (arguably) explained by assuming that the students have

L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 389
DOI: 10.1007/978-1-4614-7052-6_12, © Springer Science+Business Media New York 2013
390 12 Discussion

Fig. 12.1 Scheme of analogical reasoning. From a particular case or object, C1 , an abstract structure
is determined, which captures the essence of C1 . The same structure describes also another particular
case or object C2 , usually in another domain, to which some of the properties of C1 can be transferred
via the abstraction, once the association between C1 and C2 is done

reasoned by similarity, looking at the surface appearance of the formula’s structure:

in fact, 2 is the first even number and 3 is the first odd number. Reasoning by analogy
would have implied to understand the deeper concept of evenness as divisibility by
2, leading to a formula that would hinder an odd number to be divisible by 2, such
as, for instance, 2n + 1.
In the same spirit, it is not possible to transfer Newton’s law of attraction between
masses to Coulomb’s law of attraction between charges by looking at the appearance
of the two phenomena, because masses and charges are not alike; however, they
behave according to the same law, which captures the deep nature of their relationship.
Analogy has been the subject of investigation by philosophers, scientists, lawyers,
medical doctors, and, in last decades, also by cognitive scientists. Mostly, it has been
considered as a process that transfers information from a source (domain, phenom-
enon, object, …) to a target, and several accounts of analogy have been put forward.
First of all, based on the original Greek meaning as proportion, analogy was under-
stood as an identity of relation between two ordered pairs. Nonetheless, Plato and
Aristotle accepted that analogy might also be the sharing of functions, regularities,
etc., between two entities, and not just a relation.
In his Critique of Judgment, Kant maintained that there may be exactly the same
relation between two pairs of different objects [281]. For instance, the “food is for
an animal what electric power is for a machine”. This view of analogy is captured
by the proportion1 :
A1 : B1 = A2 : B2

where the antecedent A1 plays (represented by a semicolon “:”) for the consequent
B1 the same role that A2 plays for B2 . In the example above we would write:

FOOD : ANIMAL = POWER : MACHINE

1 An approach to analogy based on proportions has been presented by Miclet et al. [370].
12.1 Analogy 391

However, finding the right (useful) proportion is not easy, as usually the analogy
cannot be deduced from the definition of the terms in it. On the contrary, it is necessary
to recur to some abstract schema linking antecedents and consequents. In this case,
we must know that an animal, in order to stay alive and acting, needs food. In the
same way, a machine needs electric power in order to operate. On the other hand,
there is no apparent similarity between food and electric power, as there is (usually)
none between an animal and a machine. Even though analogy appears to be allowed
by some kind of abstraction, it is not identical to this last. Abstraction only acts as
a bridge through which information can be transferred, realizing thus the analogy.
Form this point of view, the notion of abstraction based on channel theory [464],
could be a good candidate to model analogical reasoning.
Investigation on analogy was quite active in the Middle Age, especially in the
domain of Law. Already in earlier times, the Roman law contemplated the analogia
legis, which allowed a concrete case (target), for which no explicit norm was given,
to be judged using an existing norm for a case (source) that shared with the target
the “ratio”, i.e., a legal foundation. Something similar is also present since old times
in the Islamic jurisprudence, where the word qiyās (“analogy”) denotes the process
allowing a new injunction to be derived from an old one (nass). According to this
method, the ruling of the Quran and sunnah may be extended to a new problem
provided that the precedent (asl) and the new problem (far) share the same operative
or effective cause (illah). The illah is the specific set of circumstances that triggers
a certain law into action. As an example the prohibition of drinking alcohol can be
extended to the prohibition of assuming cocaine.
One of the scholars who dealt with analogy in the Middle Age is Thomas de Vio
Cardinalis Cajetanus, who in 1498 published the treaty De Nominum Analogia, which
is a semantic analysis of analogy [86]. Cajetanus introduces three types of analogy:
• Analogy of inequality,
• Analogy of attribution,
• Analogy of proportionality.
Over the three alternative modes of analogy Cajetanus clearly favors analogy of
proportionality; he is mostly interested in languages, and shows that analogy is a
fundamental aspect of natural languages.
Other authors, such as Francis Bacon and Stuart Mill, base analogy on induction,
and offer the following reasoning scheme to account of it:
Premises
x is A B C D
y is A B C
Conclusion
y is probably D
In more recent times theories of analogy have been formulated in Philosophy and
Artificial Intelligence. A pioneer in the field has been Gentner [197, 198], who also
supports the claim that analogy is more than simple similarity or property sharing
among objects. Then, she proposes a structure-mapping theory, which is based on the
392 12 Discussion

idea that “… analogy is an assertion that a relational structure that normally applies
in one domain can be applied in another domain”. Gentner uses a conceptualization
of a domain in terms of objects, attributes and relations, similar to the description
frame Γ of our KRA model. She considers a base domain, B, and a target domain,
T , and her account of analogy consists of a two step process:
• Objects in domain B are set in correspondence with objects of domain T .
• Predicates in B are carried over to T , using the preceding for guiding predicate
instantiation.
• Predicates corresponding to object attributes are discarded, whereas relations are
preferentially kept.
• In order to choose what relations to keep, use the Systematicity Principle, exploiting
a set of second order predicates.
Gentner’s proposal heavily rely on the ability to let the “correct” objects be mapped
across domains.
It is instructive to look at the differences that Gentner sees between literal simi-
larity, analogy, and abstraction:
• A literal similarity is a comparison in which a large number of predicates is mapped
from B to T (relatively to those that are not mapped). Mapped predicates include
both attributes and relations.
• An analogy is a comparison in which most relational predicates (with few or no
object attributes) are mapped from B to T .
• An abstraction is a comparison in which B is an abstract relational structure, where
“objects” are not physical entities, but “generalized” ones. All predicates from B
are mapped to T .
As we can see, in Gentner’s original formulation abstraction and analogy have
the same status, i.e., a bridge between a base and a target domain. However, in a
recent paper Gentner and Smith [199] attributed to abstraction the role of “possible
outcome of structural alignment of the common relational pattern”, and investigated
the psychological foundations of analogical reasoning.
Another well known approach to analogy was proposed by Holyoak and Thagard
[261], as a continuation of older works by Holyoak and co-workers (e.g., [204]).
These authors were mostly interested in the role of analogy in problem solving,
where analogy was viewed as playing a central role in human reasoning. Holyoak
and Thagard described a multiconstraint approach to interpret analogies, in which
similarity, structural parallelism, and pragmatic factors interact. They also developed
simulation models of analogical mapping. Holyoak and his group are still interested
in analogy, and their approach evolved, in very recent years, toward a Bayesian
approach to relational transformation [347], and to the neural basis for analogy in
human reasoning [42].
In the field of qualitative physical systems Forbus and coworkers proposed the
system (simulator) SME for analogical reasoning [159]. Given a base and a target
domain, SME computes first one or more mappings, each consisting of a correspon-
dence between items in the two domains; then, it generates candidate inferences,
12.1 Analogy 393

which are statements about the base that are hypothesized to hold in the target by
virtue of these correspondences. The original model evolved along the years, and
it has been updated and improved in recent papers [341], in order to be applied to
human geometrical reasoning, where it obtained good experimental results.
The SME system was criticized by Chalmers et al. [91], who advocated a view of
analogy as “high-level perception”, i.e., as the process of making sense of complex
input data. These authors propose a model of high-level perception and analogical
reasoning in which perceptual processing is integrated with analogical mapping.
Later on, analogy has also been the subject of Hofstadter’s book Fluid Concepts and
Creative Analogies [255].
Several other authors have contributed to the research on analogy. For instance, in
relation to Linguistics, Itkonen [271] distinguishes analogy as a process from analogy
as a structure, and establishes links between analogy and other cognitive operations,
such as generalization and metaphoric reasoning. Keane [405] proposes a three-phase
model of analogical reasoning, encompassing the phases of retrieval, mapping and
inference validation, in the context of designing creative machines. Aamodt and Plaza
[1] set analogy in the context of case-based reasoning, whereas Ramscar and Yarlett
[442] describe a new model, EMMA, exploiting an “environmental” approach, which
relies on co-occurrence information provided by Latent Semantic Analysis. Sowa
and Majumdar [500] investigate the relationships between logical and analogical
reasoning, and describe a highly efficient analogy engine that uses conceptual graphs
as knowledge representation; these authors make the interesting claim that “before
any subject can be formalized to the stage where logic can be applied to it, analogies
must be used to derive an abstract representation from a mass of irrelevant detail”.
Finally, Turney [536] sees analogy as a general mechanism that works behind a
broad range of cognitive phenomena, including finding synonyms, antonyms, and
associations. A recent review of the topic can be found in Besold’s thesis [55].
In studying analogy there are two orthogonal issues: one is how to use analogical
reasoning for explanation, and the other is to “invent” analogies for discovery or
creativity. The first, which is the almost universally investigated issue, consists in
recognizing that some mapping exists between a base and a target domain, and
then using it to derive new properties in the target one. Typically, suppose that we
recognize that there is a correspondence between Coulomb’s Law of interaction
between electric charges and Newton’s Law of interaction between masses; then, it
would be sufficient to solve the equations of motion in just one domain, and then
transfer the results to the other. In this case, analogy is used a posteriori, for explain
why some phenomenon happens. The second issue, namely to start from just the base
domain, and to suggest a possible target one, is much more difficult and less studied.

12.2 Computational Complexity

As we have said from the onset, abstraction is aimed at some kind of simplification of
a problem or problem’s solution. In Computer Science, most often “simpler” means
“requiring a reduced computational cost”. The issue of computational efficiency
394 12 Discussion

has been at the core of the investigation on abstraction since the beginning. The
estimation of the saving in computation due to abstraction is problem-dependent,
and hence cannot be handled in general. We can examine some examples, in order
to provide a feeling of how the issue has been handled in the literature.

12.2.1 Complexity Reduction in Search

In the early 1980s researchers were trying to automatically create admissible heuris-
tics for the A* search using abstraction from a given (“ground”) state space Sg to an
“abstract” state space Sa = φ(Sg ). The idea was to map each ground state S ∈ Sg to
an abstract state φ(S), and to estimate the distance h(S) from S to a goal state G in Sg
using the exact distance between φ(S) and φ(G) in Sa . It has been proved [429] that
such an h(S) is an admissible heuristic if the distance between every pair of states
in Sg is not lower than the distance between the corresponding states in Sa . At the
time, embedding and homomorphism were the most common abstraction techniques
used in search; an embedding is an abstraction transformation that adds some edges
to Sg (for instance by defining a macro-operator), whereas a homomorphism is an
abstraction that groups together a set of ground states to generate a single abstract
one.
The goal of computing the heuristic function h(S) is to focus A*’s search; however,
the cost for computing h(S) has to be included in the computation balance, and it may
happens that the cost to get h(S) overweights its benefits in the search. An important
results was obtained very early by Valtorta [544], who proved that if Sg is embedded
in Sa and h(.) is computed by blindly searching Sa , then, using h(.) A* will expand
every state that is expanded by blindly searching directly in Sg .
Several years later, Holte et al. called the number of nodes expanded when blindly
searching in a given space the “Valtorta’s Barrier” [260]. According to Valtorta’s
results, this barrier cannot be broken using any embedding transformation. Actually,
Holte et al. have generalized Valtorta’s results to any abstraction in the following
theorem.
Theorem 12.1 (Holte et al. [260]) Let Sg be a state space, (Start, Goal) a problem
in Sg , and φ : Sg → Sa an abstraction mapping. Let hφ (S) be computed by blindly
searching in Sa from φ(S) to φ(Goal). Finally, let S be any state necessarily expanded
when the problem (Start, Goal) is solved by blind search directly in Sg .
Then, if the problem is solved in Sg by A* search using hφ (.), it is true that:
• either S itself will be expanded,
• or φ(S) will be expanded.

Valtorta’s theorem is a special case of Holte et al.’s one, obtained when “embed-
dings” are considered; in fact, in this case φ(S) = S. Holte et al. showed that, using
homomorphic abstraction techniques, Valtorta’s Barrier can be broken in a large vari-
ety of search spaces. To speed up search, Holte et al. recursively generate a hierarchy
12.2 Computational Complexity 395

of state spaces; at each level, the state with the largest degree is grouped together
with its neighbors, within a certain distance, to form a single abstract state. This is
repeated until all states have been assigned to some abstract state. The top of the hier-
archy is the trivial search space, whereas the bottom is the original search space. The
authors applied this approach to A*, by showing that in several domains improve-
ments can be obtained, provided that two novel cashing techniques are added to A*,
and a suitable level of granularity is chosen for abstraction.
The results obtained by Holte et al. [260] did not exhaust the topic, but started
instead a research activity, which evolved until recently [411], with the definition
of a novel type of abstraction, called multimapping abstraction, well suited to the
hierarchical version of the search algorithm IDA* (Iterative Deepening A*), which
eliminates the memory constraints of A* without sacrificing solution optimality.
Multimapping abstraction, which allows multiple heuristic values for a state to be
extracted from one abstract state space, consists in defining a function that maps a
state in the original state space to a set of states in the abstract space. Experimental
results show that the multimapping abstraction is very effective in terms of both
memory usage and speed [260].

12.2.2 Complexity Reduction in CSPs

A task of Artificial Intelligence in which abstraction has been frequently and effec-
tively used to reduce complexity is Constraint Satisfaction Problem (CSP). An
overview of the proposed approaches was provided by Holte and Choueiry [257].
Most of the research on abstraction in CSPs has focused on problem reformu-
lation. The most common abstraction involved the domains of the variables, and is
based on symmetry. When the symmetry is known in advance, it can be exploited
by the problem solver to avoid unnecessary exploration of equivalent solutions. A
set of values in the domain of a variable are said to be interchangeable if they all
produce identical results in the problem’s solution. This notion of equivalence allows
a variable’s domain to be partitioned into equivalence classes, where all the values in
a class are equivalent. This allows CSPs to be solved much more quickly, because,
instead of considering all the different values in the domain, it is only necessary to
consider one representative for each class. This approach has been explored since the
beginning of investigation on abstraction [104], and has been refined and extended
until now [103, 396, 521]. In particular, Choueiry and Davis [103] described the
Dynamic Bundling technique for efficiently discovering equivalence relations dur-
ing problem solving. It has been shown to yield multiple solutions to a CSP with
significantly less effort than is necessary to find a single solution.
Another technique for abstracting variables in CSPs is aggregation [257]. For
instance, in a CSP reformulation of the graph coloring problem nodes that are not
directly connected to one another but are connected to the same other nodes in
the graph can be given the same color in any solution to the problem. Thus, the
variables representing these nodes can be lumped together as a single variable. This
396 12 Discussion

aggregation, which can be applied repeatedly, reduces the number of variables in the
CSP, consequently reducing the cost of finding a solution. This procedure will not
necessarily find all possible solutions, but it is guaranteed to find a solution to the
problem if one exists.
Finally, a CSP can be made easier by hiding some of the constraints. If the abstract
problem is unsolvable, then the original problem has been shown to be unsolvable
too. On the other hand, if a solution is found for the abstract problem, this solution can
be used to guide finding a solution to the original problem. For continuous variables x
and y, a constraint C(x, y) can be represented by a 2-dimensional region in the (x, y)
plane. As this region can be very complex, C(x, y) could be replaced by the pair or
constraints C1 (x) and C2 (y), which are the projections of C(x, y) onto the axes x and
y, respectively. Again, if the abstract problem is unsolvable, so is the original one.
Domain abstraction in CSPs has been investigated by Schrag and Miranker [479]
in an original context, namely the emergence of a phase transition between the
solvable and unsolvable phases [4, 196, 431, 495, 561, 570, 571]. Domain abstraction
is a sound and incomplete method with respect to unsatisfiability, i.e., it is effective
only if both the abstract and the original problems are unsatisfiable, because it makes
the problem easier to solve.
Domain abstraction introduces a many-to-one mapping between values of the
variables, reducing thus the size of the variable domains from dg to da = dg /γ
where γ in an integer that is a divisor of dg . A sound and complete algorithm, with
worst-case complexity O(d n ), will solve the abstract problem with a saving factor of
O(γ n ). Schrag and Miranker uses the classical representation of a CSP consisting of
the 4-ple (n, d, m, q), where n is the number of variables, d the domain size (assumed
to be equal for all variables), m is the number of constraints, and q is the number of
allowed tuples for each constraint (assumed to be the same for all constraints). Then,
each CSP is a point in a 4-dimensional space. Domain abstraction maps a point in
this space (the original problem (n, dg , m, qg )) to another one (the abstract problem
(n, da , m, qa )). Clearly, the abstraction process modifies d and, as a consequence,
q, but does not modify either n or m. Whereas da is known, the effect of domain
abstraction on the tightness p2 = 1 − dq2 of the constraints is not; let Q be a random
variable that represents the number of tuples allowed by each constraint in the abstract
space.
By using a mean field approximation, Schrag and Miranker assume that Q is equal
to its mean value, i.e., Q = qa , and set out to predict both qa and the location of the
new problem in the problem space. A good estimate of qa appears to be
⎡ γ 2 ⎤
qg
qa = da2 ⎣1 − 1 − 2 ⎦ (12.1)
dg

Equation (12.1) tells that, increasing qg , qa reaches a plateau where all the constraints
are loose (they allow all possible tuples), and any problem instance is almost surely
satisfiable, making thus the domain abstraction ineffective. The greater γ, the earlier
the plateau is reached. Domain abstraction is then effective only when the constraints
12.2 Computational Complexity 397

are very tight. It is a natural question to ask what is the maximum value of γ that
can be used for domain abstraction to remain effective. Quite surprisingly, the effec-
tiveness of abstraction appears to have a phase transition itself, because there is a
critical value γcr separating effective from ineffective behaviors. Starting from a set
of problem instances (most of which unsatisfiable) at a point (n, dg , m, qg ), effec-
tiveness is computed as the fraction of problem instances still unsatisfiable at point
(n, da , m, qa ). This fraction jumps from almost 1 to almost 0 at γcr .
The authors consider their finding as mixed results, in the sense that, on the one
hand, very significant reductions in complexity can be observed, and, on the other,
the applicability of the approach seems rather limited, given the high degree of
tightness required for the original problem. They explain the negative results with
the absence of any “structure” in the variables’ domain. If some structure can be
found, for instance interchangeability (cfr. above), or hierarchical arrangement, or
other kinds of symmetries, then abstraction might have a more positive impact than
the one predicted by their theory.

12.2.3 Complexity Reduction in Machine Learning

If we consider abstraction under all the names it as been used in Machine Learn-
ing (feature selection and construction, term construction, motif discovery, state
aggregation, …), it would not be possible to investigate, in general, its effects on
computational complexity. What can be only said is that all researchers that used
those techniques reported some minor or major computational advantage.
Then, in this section we concentrate instead on a specific, even though quite
esoteric, issue, namely the relation between abstraction and the presence of a phase
transition in the problem of matching a hypothesis against an example in symbolic
learning [209, 469]. There are at least two reasons for this: firstly, it is a topic at the
frontier of research (and not yet well understood), and, secondly, it has strong links
with the issue discussed in the preceding section.
Matching a hypothesis ϕ(x1 , . . . , xn ), generated by a learner, against an example e
allows one to check whether the example verifies the hypothesis. In a logical context
for learning, a hypothesis is a formula in some First Order Logic language (usually a
DATALOG language), and an example consists of a set of relations, each one being
the extension of a predicate in the language. An example can be found in Fig. 12.2.
A matching problem can be represented with a 4-ple (n, L, m, N), where n is the
number of variable in ϕ, L is the size of the common domain of the variables, m
is the number of predicates in ϕ, and N is the number of allowed tuples in each
relation. It is immediate to see that the matching problem is equivalent to a CSP
(n, d, m, q), as defined in the previous section. The matching problem, which is
a decision problem, shows a phase transition with respect to any pair of the four
parameters. We consider as control parameters of the transition m (characterizing
the size of the hypothesis) and L (characterizing the size of the example), whereas
n and N are fixed. By considering as order parameter the probability Psol that a
398 12 Discussion

on(X,Y) left(X,Y)
e
X Y X Y
a
a b a c left(X,Y),on(Y,Z)
c
c d a d X Y Z
b b c a c d
d
b d b c d

(a) (b) (c)

Fig. 12.2 Tabular representation of a structured example of the block world. a Block world
instance e, composed of four objects, a, b, c, and d. b Tables describing e, assuming that
the description language contains only two predicates, namely on(x, y) and left(x, y). c Sub-
stitutions for x, y, z satisfying the hypothesis ϕ(x, y, z) = left(x, y) ∧ on(y, z). More precisely,
ϕ(a, c, d) = left(a, c) ∧ on(c, d) is true in e

(a) (b)

Fig. 12.3 3-Dimensional plot of the probability of solution Psol for n = 10 and N = 100. Some
contour level plots, corresponding to Psol values in the range [0.85 ÷ 0.15], have been projected
onto the plane (m, L)

randomly chosen example and a randomly chose hypothesis do match, we obtain

the 3-dimensional plot of Fig. 12.3. For each point (m, L) with 15 m 50 and
15 m 45, Psol has been computed as the fraction of problems with a solution
among the 1000 ones generated at that point.
The graph in Fig. 12.3a is very steep. To the left of the descent it shows a plateau,
corresponding to a probability of finding a solution almost equal to 1 (actually all the
generated matching problems were solvable); we call this region the YES-region. To
the right, the graph again shows a plateau, corresponding to a probability of finding a
solution almost equal to 0 (actually no generated matching problem was solvable); we
call this region the NO-region. In between, where the graph values rapidly drop from
almost 1 to almost 0, there is the PT-region (phase transition region). The ideal phase
transition location coincide with the set of points on the graph where Psol = 0.5.
Each point in the plane (m, L) corresponds to a set of learning problems, all
of which have a concept description with m predicates as complete and consistent
solution. Examples of the concept share the value L. A point in (m, L) is the projection
onto this plane of the 4-dimensional space (n, d, m, q) used by Schrag and Miranker
12.2 Computational Complexity 399

+ : success (predictive accuracy 80%)

. : failure (predictive accuracy < 80%).

Fig. 12.4 FOIL’s “competence map”: the success and failure regions, for n = 4 and N = 100.
The phase transition region is indicated by the dashed curves, corresponding, respectively, to the
contour plots for Psol = 90 %, Psol = 50 %, and Psol = 10 %. The crosses to the left of the phase
transition represent learning problems that could easily be solved exactly (i.e., the target concept
was found). The crosses to the right of the phase transition line represent learning problems that
could be approximated (i.e., hypotheses with low prediction error were found, but different from the
target concept). Dots represents problems that could not be solved. Successful learning problems
were those that showed at least 80 % accuracy on the test set. Failed problems are those that reached
almost 100 % accuracy on the learning set, but behaved almost like random predictors on the test
set (around 50 % accuracy)

(see previous section). An extensive experimentation has shown that, in this plane,
there is a large region where learning problems could not be solved by any available,
top-down, hypothesize-and-test relational learner [70], as represented in Fig. 12.4 for
FOIL [438].
A large region (a “blind spot”) emerges, located across the phase transition, where
no learner succeeded. In this region, in the vast majority of cases the hypotheses
learned were very accurate (accuracy close to 100 %) on the learning set, but behaved
like random predictors (accuracy close to 50 %) on the test set. The threshold of 80 %
in accuracy, that we have chosen to state that the learner was successful, could have
been any value between 95 % and 60 %, without making any significant difference
in the shape of the blind spot. The plot in Fig. 12.4 had n = 4, which is a very small
value. In fact, things become much worse with increasing n. Then, the number of
variables in the hypothesis is a crucial parameter for the complexity of learning.
Given this situation, one may wonder whether abstraction could be a way out
of this impasse. We have tried three abstraction operators, namely domain abstrac-
tion, arity reduction, and term construction. We point out that for learning it is not
400 12 Discussion

necessary to revert to the original problem: if good hypotheses can be found in the
abstract space, they can be used directly, forgetting about the original space. For
instance, when the operator that hides attributes is applied (performing feature selec-
tion), there is no need to reintroduce the hidden attributes. Then, what counts in
learning is either to move a learning problem from a position in which it is unsolv-
able to one in which it is, or to move it from a position where it is solvable, but has
high complexity, to a position where it is still solvable, and, in addition, it requires a
lower computational effort.
Domain Abstraction Operator
Let us consider the simplest domain abstraction operator ωeqobj ({a1 , a2 }, b), which
makes constants a1 and a2 indistinguishable, both denoted by b. The effect of apply-
ing ωeqobj is that, in each of the m relations contained in any training example, each
occurrence of either a1 or a2 is replaced by an occurrence of b. The language
of the hypothesis space does not change. With the application of ωeqobj we obtain
na = n, ma = m, La = L − 1, Na = N if we accept to keep possibly duplicate
tuples in the relations, or Na < N otherwise. The point Pg , corresponding to the
(eqobj)
learning problem in the ground space, jumps down vertically to Pa , located on
the horizontal line La = L − 1. At the same time, as (possibly) Na N, the phase
transition line (possibly) moves downwards, as described in Fig. 12.5a.
The effect of ωeqobj is different depending on the original position of the learning
problem (Pg ). If Pg is in the NO-region, moving downwards may let Pa enter the
blind spot, unless Na recedes sufficiently to let the blind spot move downward as well.
If Pg is in the lower border of the blind spot, Pa may fall outside of it, becoming a
solvable problem. However, by noticing that the downward jump in L is of a single
unit, it is more likely that this type of abstraction does not help in easing the learning
task, especially if Na < N. In summary, the application of ωeqobj is beneficial, from

(a) 50 (b) 50

40 40

30 30

20 20

10 20 30 40 50 10 20 30 40 50

Fig. 12.5 a Application of operator ωeqobj to Pg . The learning problem moves downwards toward
regions of (possibly) greater difficulty. b Application of operator ωhargrel or ωhargfunl to Pg . The
learning problem moves left toward regions of (possibly) lower difficulty
12.2 Computational Complexity 401

both the complexity and the learnability points of view, when Pg is located on or
below the phase transition, whereas it may have no effect, or even be harmful, when
Pg is located above it.
Arity Reduction Operator
As the exponential complexity of matching is mostly due to the number of variables,
trying to reduce this last could be beneficial, in principle, provided that still good
hypotheses can be learned.
Let us consider the case in which we want to hide a variable in all function
and predicates where it occurs. Then, we can apply a set of operators of the type
ωhargrel (Rk (x1 , . . . , xn ), xj ) (1 k K), or of the type ωhargfun (fh (x1 , . . . , xn ),
xj ) (1 h H). Each operator hides the column corresponding to xj in the cov-
ers RCOV (Rk ) or RCOV (fh ). At the same time, the hypothesis language is mod-
ified, because the predicate rk (x1 , . . . , xn ) corresponding to relation Rk , becomes
r(a) (x1 , . . . , xj−1 , xj+1 , . . . , xn ).
In the abstract space, the number of constants remains the same (La = L), while the
number of variables decreases by 1 (i.e., na = n − 1). The number of predicates most
likely decreases (i.e., ma m); in fact, hiding a variable in a binary predicate makes
this last a unary one, and hence it does not contribute anymore to the exponential
increase in computational cost of matching. For this reason it does not count anymore
in the m value. Finally, Na N, because some tuples may collapse.
As a consequence of arity abstraction the point Pg jumps horizontally to Pa ,
located on the line La = L, whereas the phase transition goes down because of the
decrease in n. The application of ωhargrel is most likely to be beneficial, from both
the complexity and the learnability points of view, when Pg is located on or below
the phase transition, whereas it may have no effect, or even be harmful, when Pg
is located above it, especially if it is at the right border, but outside, the blind spot.
However, considering that the curves for different values of n are quite close to one
another, it may be the case that the abstract problem jumps to the easy region without
entering the blind spot.
Term Construction Operator
The term construction operator ωaggr ({t1 , . . . , tk }, t(a) ) aggregates k objects
(xi , . . . , xk ) of type t1 , . . . , tk into an object y of a new type t(a) . To build term
y it is necessary to first find all the solutions of a smaller matching problem, and
to assign a new constant to each of the tuples (a1 , . . . , ak ) in this solution. For the
sake of simplicity, let us suppose that there is a single tuple (a1 , . . . , ak ) that can be
aggregated, and let b its new identifier. All objects a1 , . . . , ak must disappear from
the examples. In addition, a value UN will replace any occurrence of the ak ’s in any
function and relation. Then, na = n − k + 1, La = L + 1. The value Na is likely to
decrease and the value of m may also decrease (i.e., ma m). In the plane (m, L)
the point Pg moves leftward and upwards, which is most often beneficial, unless
Pg is located in the region corresponding to very low L and very large m values.
From the learnability point of view, the application of ωaggr may be particularly
beneficial when Pg is located at the upper border, but inside, the blind spot; in this
402 12 Discussion

case, problems that were unsolvable in the ground space may become solvable in the
abstract one.

12.3 Extensions of the KRA Model

The basic KRA model has been extended in two ways: the first extension keeps the
basic structure of the model unchanged, while adding a new facility for handling
types [514, 515, 550, 551]; the second extension, while maintaining the spirit of
the model, modifies its structure [244]. The two extensions will be described in the
following subsections.

12.3.1 The G-KRA Model

The model G-KRA has been recently proposed by Ouyang, Sun and Wang [514,
515, 550, 551] with the aim of increasing KRA’s generality and automatization.
The extension consists in the addition of a Concept Ontology to improve knowledge
sharing and reuse, being ontologies a nowadays widely used tool to conceptualize
a domain.
The basic ideas is that an agent A gathers directly from the world a Primary
Perception, whereas other agents can use this “ground” perception to build more
abstract ones using abstraction operators. The difference, with respect to the original
KRA model, is that there are several ontologies, with different levels of abstraction,
which specify what objects and what object’s properties can be observed at each level.
The authors introduce the notion of Ontology Class OC = (E, L, T ), where E =
(OBJ, ATT , FUNC, REL), and L and T are a language and a theory, respectively.
This formulation shows the following differences/similarities with respect to the
original KRA model:
• OC corresponds to the query environment QE, except that there is no explicit
query; moreover, the database, even though used in practice, in not part of the
conceptualization of the domain.
• E is KRA’s P-Set P.
• Each ontology class specifies the type, attributes, functions and relations concern-
ing a given set of observable objects. In the KRA models this information is
specified globally in the description frame Γ .
• The ontologies improve knowledge sharing and reuse, because they do not change
across applications in a given domain.
Among all the ontology classes that can be considered, one is called Fundamental
Ontology Class (FOC), and is the one corresponding to the actual ground world. The
ground observations are collected into a database S. Any abstract world is described
by the tuple R = (Pa , S, La , Ta ), where Pa = δ(Pg , OCa ) is the abstract perception,
12.3 Extensions of the KRA Model 403

obtained from Pg by mapping objects in the FOC to objects in OCa . This process
can be repeated, by applying again δ to OCa , and obtaining a hierarchy of more and
more abstract descriptions.
Given the ground perception and the ontologies, the abstraction hierarchy can be
built up automatically. The authors have applied the model to Model-Based Diagnosis
problems, and have exemplified the functioning of G-KRA using the same hydraulic
system described in Sect. 11.1.1.

12.3.2 Hendriks’ Model

The model of abstraction proposed by Hendriks [244] is a refinement of the original

KRA model. In this model, an observer (typically) uses sensors to perceives the
world W obtaining a perception P. This perception is stored for further reference in
a structure S (possibly a database) by a memorization process. The structure provides
an extensional representation of the perception. At this moment perception stimuli
are abstracted to create a simplified extensional representation, and only later the for-
malization step takes place, by rewriting the content of the structure into a formal lan-
guage with well-defined semantics. In this step, “symbol grounding” [37, 548] occurs.
As we can see in Fig. 12.6, the internal structure of the KRA model is altered,
and the ground information is lost.
An important difference with the KRA model resides in the fact that Hendriks
has separated the description process of KRA model into a pre-formal abstraction
process and a signification process. Moreover, whereas KRA is currently limited to

Fig. 12.6 KRA model modified by Hendriks [244]. The world provides the perception P , which
is memorized into a database Sg , which is then abstracted into a database Sa . Afterwards the
content of Sa is expressed in a well-defined language La , and a theory Ta is added to complete the
conceptualization of the domain
404 12 Discussion

static systems, Hendriks is also interested in temporal abstraction for representing

dynamic ones. The global abstraction process consists then of five steps:
• Perception: Signal/data acquisition from the world.
• Memorization: Storing and organizing the perceived stimuli in a structure, to
allow subsequent retrieval.
• Abstraction: A change of representation, with information loss (abstraction), that
hides details and preserves desirable properties.
• Signification: A change of representation without information loss (reformula-
tion), from an extensional representation (data) into a formal language, for assign-
ing meaning.
• Theorization: The application of a (domain) theory to reasoning.
The model has been applied to a problem of dynamic routing, where the impact
of varying perception and abstraction strategies on the resultant system-service has
been demonstrated. The author was able to identify interoperability issues caused by
such variations, even though the ontology of the domain model remained exactly the
same.

12.4 Summary

Even though abstraction is an interesting process per se, its links with other rea-
soning mechanisms and computational issues let the centrality of its role in natural
and artificial systems increase. In fact, abstraction may be the basis for analogical
and metaphorical reasoning, for creating caricatures and archetypes, as well as an
important tool to form categories. Putting abstraction at the basis of analogy allows
this last to be distinguished from similarity-based reasoning, because abstraction is
able to build bridges between superficially very different entities.
Virtually all views of abstraction share the idea that abstraction should bring some
advantage in terms of simplification. In computational problems advantage can be
quantified in terms of a reduction of the resources needed to solve a problem. The
effective gain cannot be estimated in general, because it depends on the problem at
hand. Some generic considerations can nevertheless be done.
As an example, let us consider an algorithm with complexity Cg = O(f (n)),
where n is the number of objects in a Pg ; hiding m objects generates a Pa in which
the same algorithm will run in Ca = O(f (n − m)). If f (n) is a linear function, then
the abstract complexity will still be linear (not a large gain indeed). But in the case
f (n) is exponential, there will be an exponential gain O(en−m ). However, the cost of
abstracting must be taken into account as well. In order to estimate this cost, let us
consider the database Dg , where Pg is memorized. Hiding m objets in the OBJ table
and in the attribute tables Aj -ATTR (1 j M) has a complexity 2O(n). Hiding
the objects in each FCOV (fh ) (1 h H) and in each RCOV (Rk ) (1 k K),
respectively, has complexity
12.4 Summary 405

H
K
O(|FCOV (fh )|) + O(|RCOV (Rk )|) (12.2)
h=1 k=1

Then, abstraction is useful only if:

H
K
Cg Ca + 2O(n) + O(|FCOV (fh )|) + O(|RCOV (Rk )|) (12.3)
h=1 k=1

In the worst case, when |FCOV (fh )| = O(n2 ) for each h, and |FCOV (Rk )| = O(n2 )
for each k, then:
Cg Ca + 2O(n) + (H + K)O(n2 ) (12.4)

For exponential problems Eq. (12.4) is likely to be satisfied. Similar generic con-
siderations can be done for other operators, but a realistic computation can only be
performed once the problem and the algorithms to solve it are given.
Even if the issue of saving (be it computational or cognitive or other) occurs in
most fields where abstraction applies, we have provided examples of fields where
abstraction not only plays a fundamental role, but also interferes with complex phe-
nomena such as the emergence of phase transitions in computation. In principle,
working in an abstract space looks very promising to circumvent the negative effects
that these phenomena generate, but an effective use of abstraction toward this goal
is still in a preliminary stage.
The KRA model of abstraction that we have described lends itself to improve-
ments and extensions. One possible direction is to consider a stochastic environment,
where the application of an operator does not generate a deterministic abstract state,
but only a probability distribution over a subset of states. A brief mention of this
has been done when considering probabilistic complexity measures in Chap. 10. The
extension proposed by Ouyang, Sun, and Wang adds to the description frame Γ an
ontology, which allows different types of objects to be abstracted in a controlled way,
and has been applied with success to problems of model-based diagnosis. Finally,
Hendrik modifies the structure of the model, by keeping its essential ideas, by per-
forming abstraction before defining a language.
Chapter 13
Conclusion

“To abstract is to distill the essence

from its superficial trappings”
[Goldstone and Barsalou, 1998]

n this chapter we summarizes the main contributions of this book and

provide some perspective for the future. Even though many research efforts have
been devoted to the investigation of abstraction and to its use in many disciplines,
a complete grasping of its deep nature and strong power is still far away. As a
consequence, the path to build intelligent agents endowed with a true abstraction
capability is still a long one. We hope this book will contribute to pave the way for
this fascinating research journey.

13.1 Ubiquity of Abstraction

The variety of roles played by abstraction in many fields of sciences, art, and life show
how pervasive, multifaceted, and elusive it may be. Nonetheless, the investigation
and comparison of these roles enlightened also a largely shared, basic understand-
ing of the notion, which can be broadly synthesized as a mechanism for changes of
representation aimed at simplicity. Illustrative examples of such changes of repre-
sentation are countless, and it is clear, by now, that they are a double-bladed knife: if
well chosen they may lead to a dramatic increase in problem solving performances,
but, if not, the may be even severely harmful.

L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 407
DOI: 10.1007/978-1-4614-7052-6_13, © Springer Science+Business Media New York 2013
408 13 Conclusion

13.2 Difficulty of a Formal Definition

Formally defining abstraction, and building up computational models thereof, is,

in its generality, a very hard task. In fact, it would require pinpointing its essential
properties, which is far from easy. In contrast to its diffusion in practice, there are only
very few general theories of abstraction, but none is, at the same time, sufficiently
general to capture all aspects and properties of the underlying mechanism, sufficiently
precise to allow a clear boundary to be set to the notion itself, and sufficiently detailed
to be effective in significant real-world applications.
Abstraction has several cognate notions, such as generalization, approximation,
and reformulation, but also making caricatures or prototypes. All these mechanisms,
even not fully overlapping, share some characteristics in their functioning and their
effects, so that they get easily confused. Mostly, their general goal is to obtain a
simplification, which is the common denominator, possibly concerning a problem
description, a problem solution, a representation, a mechanism, or any other aspect
of problem solving in a wide sense. However, simplicity is in itself a vague concept,
and does not help letting useful distinctions emerge.
After taking into consideration more than one point of view, we settled for a defini-
tion of abstraction based on the notion of configuration and information reduction. In
this choice we were inspired, on the one hand, by the description of physical systems
in terms of a state space, and, on the other, by the Abstract Data Type construct, well
known in programming languages, which we also took as a template to represent
the abstraction mechanism itself. The chosen approach allowed us to clearly delimit
abstraction, generalization, approximation, and reformulation in terms of configu-
ration subspaces and of the information amount they convey. Going a little further,
we also concentrated ourselves on two fundamental manners to reduce information,
namely, hiding or aggregating it.
Regarding the deep nature of abstraction, we arrived at the conclusion that it is not
possible, in a general way, to attribute to an entity (an object, an event, · · · ) the state
of being abstract or concrete; what only matters is the more-abstract-than relation,
which generates a lattice of partially ordered configurations. The relativity introduced
in the characterization of abstraction implies that the essence of abstraction is the
very process of removing information. We have to notice here that this process is a
generative one, in the sense that the abstract space does not “exist” before abstraction
is actually performed. Then, the condition of surjectivity, imposed by some models
on the “mapping” between the ground and the abstract space, is actually implicitly
verified, because only configurations that have at least one counterpart in the “ground”
space exist.

13.3 The Need for an Operational Theory of Abstraction

All the preceding considerations, together with the strong task-dependency of the
abstraction effectiveness, asked for some model of abstraction capable of going
beyond the idea of logic mappings (be they semantic or syntactic ones), to become
13.3 The Need for an Operational Theory of Abstraction 409

closer to the world, where, in order to solve a problem, observations are to be made,
and knowledge has to be elicited. Mappings between formal languages become then
ancillary with respect to the idea of abstraction acting as a set of (physical and
conceptual) “lens” with different resolutions, used to take in the world.
The KRA model, which we propose, is limited in scope, because it is mainly
targeted to those applications where the interaction with the environment (be it nat-
ural or artifactual) has a primary role, but, for those applications, it offers “concrete”
tools to perform abstraction in practice, in the form of a library of already available
abstraction operators. As we have shown, in this type of applications some of the
problems encountered in logical definitions of abstraction (such as the inconsistency
in the abstract space) loose much of their harm, because they come down to sim-
ply acknowledge that abstraction, hiding information, does not put us in a state of
contradiction, but of ignorance.
An important aspect of the view of abstraction captured by KRA is that moving
across abstraction levels should be easy, in order to support trying different abstrac-
tions, when solving a problem. For this reason, all the hidden information ought to
be memorized during the process of abstracting, so that it can be quickly retrieved.
Finally, only transformations generated by a precisely defined set of abstraction
operators are considered in the model. This is done to avoid the costly process of
checking the more-abstract-than relation on pairs of generic configurations.
In order to make the model widely applicable, many operators have been intro-
duced, covering both abstraction and approximation. At a first look, they may appear
complicated, especially if contrasted with the neat logical theories previously pro-
posed. However, if abstraction operators are to be routinely used in real-world
domains, they must cope with the wealth of details that this implies. In order to
facilitate the task of a user, we took a layered approach, borrowed from the Abstract
Data Types theory: high level templates describe broad classes of operators, specify-
ing their general aspects, intended use, effects, and computational characteristics. A
user can then be directed toward that class of operators that most likely suit his/her
problem. Once the class chosen, a specific operator is identified and applied. Again,
the information about the operator is embedded: first “what” the operator does is
described, and then, “how” it does it. The “how” is a method, i.e., a program that
implements all the transformations that the operator has to perform. The set of intro-
duced operators are intended to formalize many abstractions that were, implicitly or
explicitly, already present in the literature of several fields.
The operators that we have implemented are only a central core of a possible
collection,1 because they are domain-independent, and, hence, they may not be as
effective as domain-dependent operators could be. Practitioners using abstraction in
various domains are welcome to add and share new operators.
Furthermore, following a precise characterization of both abstraction and approx-
imation, it was also possible to define some approximation operators. Even though
also reformulation could be precisely characterized, no operator has been proposed,
because reformulation may be a complex process, formalizable only contextually.

1 See the site http://www.abstractionthebook.com

410 13 Conclusion

13.4 Perspectives of Abstraction in AI

The grounded approach to abstraction we have contributed to trace still leaves open
fundamental questions. The most important is how to link the task to be performed to
the kind of abstraction operator that is best suited to it. As we have seen, the structure
of the query may sometimes suggest what details can be overlooked and/or what
information can be synthesized, but, in general, the process of selecting a “good”
abstraction is still a matter of art. In some field, such as human vision, our brain
has evolved extremely powerful abstraction techniques; in fact, we are continuously
abstracting sets of pixels into meaningful “objects” in such an effortless way that we
are not even aware of doing it. The investigation of human abstraction mechanisms
could be extremely useful to the design of artificial vision systems.
Regarding the problem of selecting a good abstraction, the definition of a wide
set of operators may ease (even though not solve) the problem. In fact, the introduc-
tion of a single framework, in which very different operators are unified and treated
uniformly, allows a systematic and automatic search in the space of possible abstrac-
tions to be performed, without the need that the user design and implement different
operators manually. In other words, the approaches used in Machine Learning for
feature selection could be extended to include other types of abstraction operators in
the same loop; for instance, feature selection, construction and discretization could
be tried inside the same search. This is allowed by the uniform representation of all
the abstraction operators.
Another relevant question concerns the study of the properties that abstraction
operators ought to preserve across abstraction spaces. For instance, in Machine Learn-
ing, it would be very useful to design generality-preserving abstraction operators. The
study of this topic is complicated by the fact that useful properties are operator- and
task-dependent, so that it is not possible to obtain general results. Luckily enough,
identification of properties to be preserved is to be done just once.
An exciting direction of research includes also the automatic change of represen-
tation by composing abstraction and reformulation operators.2
However, the most challenging task is to learn “good” abstractions. Learning an
abstraction should not be reduced to the problem of searching for a good operator.
On the contrary, learning a good abstraction should imply that the found operator
(a) is useful for a number of different tasks (according to the principle of cognitive
economy), and (b) its subsequent applications should become automatic, as soon
as the applicability conditions are recognized, without any further search. A typical
example of learning an aggregation operator occurs in human vision: when a matrix
of pixels arrives at our retina, we “see” in it known objects without any conscious
search; this very effective image processing is the result of a possibly long process

2 Early experiments in applying abstraction operators to explore a space of representations to

improve the learning of anchors in an autonomous robot were promising steps for designing more
autonomous and adaptive systems [79].
13.4 Perspectives of Abstraction in AI 411

of learning and evolution, in which different abstraction operators have been tried,
and those that proved to be useful in each new task have been reinforced.
Certainly, studying abstraction both per se and in applications is one of the most
challenging direction of research in Artificial Intelligence and Complex Systems.
Significant results in the field would allow more efficient artifacts or models to be
built, but also a better understanding of human intelligence and common sense.
Appendix A
Concrete Art Manifesto

n 1930 the Dutch painter Theo van Doesbourg (a pseudonim for Christian
Emil Marie Küpper) published the Manifesto for Concrete Art, advocating the total
freedom of art from the need to describe or represent natural objects or sentiments.
The Manifesto is reported in Fig. A.1.
The translation of the Manifesto is the following one:
BASIS OF CONCRETE PAINTING
We say:
1. Art is universal.
2. A work of art must be entirely conceived and shaped by the mind before its exe-
cution. It shall not receive anything of nature’s or sensuality’s or sentimentality’s
formal data. We want to exclude lyricism, drama, symbolism, and so on.
3. The painting must be entirely built up with purely plastic elements, namely sur-
faces and colors. A pictorial element does not have any meaning beyond “itself”;
as a consequence, a painting does not have any meaning other than “itself”.
4. The construction of a painting, as well as that of its elements, must be simple and
visually controllable.
5. The painting technique must be mechanic, i.e., exact, anti-impressionistic.
6. An effort toward absolute clarity is mandatory.
Carlsund, Doesbourg, Hélion, Tutundjian and Wantz.

L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 413
DOI: 10.1007/978-1-4614-7052-6, © Springer Science+Business Media New York 2013
414 Appendix A Concrete Art Manifesto

Fig. A.1 Concrete Art Manifesto, by Theo van Doesburg (1930)

Appendix B
Cartographic Results for Roads

his appendix shows in Fig. B.1 some more roads and their different rep-
resentations with and without abstraction. The representations result from:
• a direct symbolization (initial),
• the cartographic result produced by the hand-crafted expert system GALBE,
specifically developed to generalize road [389, 391],
• the result produced by the set of rules obtained by learning without abstraction,
• the result produced by the set of rules obtained by combining learning and abstrac-
tion.

L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 415
DOI: 10.1007/978-1-4614-7052-6, © Springer Science+Business Media New York 2013
416 Appendix B Cartographic Results for Roads

Fig. B.1 Different road generalization results, for different roads. The improvements brought by
abstraction are clearly visible
Appendix C
Relational Algebra

n this appendix we recall the basic notions of Relational Algebra for

manipulating relational databases. Relational Algebra has been proposed by Ullman
[540] as a formal tool for modeling relational database semantics.
Relational databases provide operators for handling relations in their extensional
form. Given a set X of variables, a n-ary relation R(x1 , x2 , . . . , xn ) involving the
variables in X is represented as a table with n columns and k rows, where each row
describes an n-ple of individuals of X satisfying R.
The type T of a relation R(x1 , x2 , . . . , xn ) is defined as:

T : X1 × X2 × · · · × Xn (C.1)

where X1 , X2 , . . . , Xn are the domains from which the individuals bound to x1 ,

x2 , . . . , xn can be taken. The relation R(x1 , x2 , . . . , xn ) is a subset of its type.
A relational database provides a set of operators that allow one to compute new
relations from the existing ones [539]. Operators are usually made available as prim-
itive functions of a query language, which may depend on the specific database
implementation. Relational Algebra provides a formal definition of the semantics of
these operators, which is independent of the syntax of the query language.
Here, we briefly recall the basic notions of Relational Algebra, whereas a more
extensive introduction can be found in [540]. In the following, the list of the basic
operators is reported.
Union
Given two relations R1 and R2 of the same arity, the union R = R1 ∪ R2 is a relation
obtained by taking the union of the tuples occurring either in R1 or in R2 .

L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 417
DOI: 10.1007/978-1-4614-7052-6, © Springer Science+Business Media New York 2013
418 Appendix C Relational Algebra

PhD MANAGERS
ID SURNAME AGE ID SURNAME AGE
23 Smith 38 72 Adams 50
40 Adams 39 40 Adams 39
132 Ross 32 132 Ross 32

PhD MANAGERS PhD MANAGERS

ID SURNAME AGE ID SURNAME AGE
23 Smith 38 40 Adams 39
40 Adams 39 132 Ross 32
132 Ross 32
72 Adams 50 PhD – MANAGERS
ID SURNAME AGE
23 Smith 38

Fig. C.1 Given the tables corresponding to the relations R1 = PhD and R2 = MANAGERS, we
can construct the tables PhD ∪ MANAGERS, PhD ∩ MANAGERS, and PhD-MANAGERS

Intersection
Given two relations R1 and R2 of the same arity, the intersection R = R1 ∩ R2 is a
relation obtained by only keeping the tuples occurring in both relations R1 and R2 .
Set difference
Given two relations R1 and R2 of the same arity, the difference S = R1 − R2 is
obtained by eliminating from R1 the tuples that occur in R2 .
In Fig. C.1 examples of the Union, Intersection, and Set Difference operators are
reported.
Cartesian product
Let R1 and R2 be two relations of arity n and m, respectively. The Cartesian product
R = R1 ×R2 is a relation of arity n+m, whose tuples have been obtained by chaining
one tuple of R1 with one tuple of R2 in all possible ways.
Projection
Let R1 and R2 be two relations of arity n and m, respectively, with n > m; the relation
R2 will be called a projection of R1 if it can be generated by taking the distinct tuples
obtained by deleting a choice of (n − m) columns in R1 . The projection is formally
written as R2 = πi1 ,i2 ,...,im (R1 ), where i1 , i2 , . . . , im denote the columns of R1 that
are to be kept in R2 .
Selection
Let R be a n-ary relation. A selection S = σθ (R) is obtained by selecting all tuples
in R satisfying a condition θ stated as a logical formula, built up using the usual
connectives ∧, ∨, ¬, the arithmetic predicates <, >, =, ≤, ≥ and the values of the
tuple’s components.
Appendix C Relational Algebra 419

PhD
LOCATION
ID SURNAME AGE CITY REGION
23 Smith 38 Rome Lazio
40 Adams 39 Milan Lombardia
132 Ross 32 Bergamo Lombardia

PhD MANAGERS Proj -PhD

ID SURNAME AGE CITY REGION SURNAME AGE
23 Smith 38 Rome Lazio Smith 38
23 Smith 38 Milan Lombardia Adams 39
23 Smith 38 Bergamo Lombardia Ross 32
40 Adams 39 Rome Lazio
40 Adams 39 Milan Lombardia Sel-PhD
40 Adams 39 Bergamo Lombardia ID SURNAME AGE
132 Ross 32 Rome Lazio 23 Smith 38
132 Ross 32 Milan Lombardia 132 Ross 32
132 Ross 32 Bergamo Lombardia

Fig. C.2 Given the relations R1 = PhD and R2 = LOCATION, the Cartesian product of R1 and
R2 contains 9 tuples, obtained by concatenating each tuple in R1 with each tuples in R2 . Relation
Proj-PhD is the projection of relation PhD over the attributes SURNAME and AGE, i.e., Proj-
PhD = πSURNAME,AGE (PhD). Finally, relation Sel-PhD is obtained by selection from PhD, and
contains the tuples that satisfy the condition AGE 38, i.e., Sel-PhD = σAGE 38 (PhD)

FATHERHOOD R-FATHERHOOD
FATHER CHILD PARENT CHILD
John Ann John Ann
Stuart Jeanne Stuart Jeanne
Robert Albert Robert Albert

Fig. C.3 Given the relations R = FATHERHOOD, we can rename attribute FATHER as PARENT ,
obtaining the new relation R-FATHERHOOD, i.e., R-FATHERHOOD = ρPARENT ←FATHER (R)

In Fig. C.2 examples of the Cartesian product, Projection, and Selection operators
are reported.
Renaming
If R is a relation, then R(B ← A) is the same relation, where column A is now named
B. The renaming operation is denoted by R(B ← A) = ρB←A (R). In Fig. C.3 an
example of the Renaming operator is reported.
Natural-join
Let R and S be two relations of arity n and m, respectively, such that k columns
A1 , A2 , . . . , Ak in S have the same name as in R. The natural join Q = R
S is the
(n + m − k) arity relation defined as:

πA1 ,A2 ,...,A(n+m−k) σR.A1 =S.A1 ∧R.A1 =S.A1 ∧···∧R.Ak =S.Ak (R × S).
420 Appendix C Relational Algebra

AFFILIATION RESEARCH

RESEARCHER UNIVERSITY UNIVERSITY FIELD

Johnson Stanford Stanford Law
Ross MIT MIT Physics
Archer MIT MIT Informatics
CMU Informatics
AFFILIATION RESEARCH

RESEARCHER UNIVERSITY FIELD

Johnson Stanford Law
Ross MIT Physics
Ross MIT Informatics

Archer MIT Physics

Archer MIT Informatics

Fig. C.4 Given the two relations AFFILIATION and RESEARCH, their natural join is obtained by
considering all tuples that have the UNIVERSITY value in common

In other words, each tuple of Q is obtained by merging two tuples of R and S such
that the corresponding values of the shared columns are the same.
In Fig. C.4 an examples of the Natural-Join operator is reported.
Appendix D
Basic Notion of First Order Logics

n this appendix we recall the basic notions of First Order Logic (FOL),
in particular those that have been used in this book. Readers interested in a deeper
understanding of the topic can find excellent introductions in many textbooks (see,
for instance, [496, 545]).
First Order Logic (also known as First Order Predicate Calculus) is a language
used in Mathematics, Computer Science, and many other fields, for describing formal
reasoning. It is an extension of Propositional Logic to the manipulation of variables.
The definition of a logical language has two parts, namely the syntax of the language,
and the semantic.

D.1 Syntax

A FOL language L is a 5-tuple C, X, O, P, F, where C is a set of constants, X is a

set of variables, O is the set of logical operators, F is a set of function names and P is
a set of predicate names. All symbols occurring in the definition of L are partitioned
into two sets:
Logical symbols—Logical symbols include:
• Logical connectives: ∧ (conjunction), ∨ (disjunction), ¬ (negation), → (implica-
tion).
• Quantifiers: ∀ (universal quantifier), ∃ (existential quantifier).
• Parentheses and punctuation symbols.
• An infinite set of variable names. Each variable X takes value in a given domain
ΩX .

L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 421
DOI: 10.1007/978-1-4614-7052-6, © Springer Science+Business Media New York 2013
422 Appendix D Basic Notion of First Order Logics

Non-logical symbols—Non-logical symbols include:

• Predicate symbols. A predicate p(x1 , . . . , xn ), of arity n, describes an elementary
property of, or an elementary relation among sets of objects represented by a set
of variables.
• Function symbols. A function f (x1 , . . . , xn ), of arity n, associates to a tuple of
objects, represented by the set of variables, a value or another object.
• Constants. These are the identifiers of objects, and can be seen as function symbols
of 0-arity.
As an example, let us consider the following Language L = C, X, O, P, F, where:
C = {John, Mary, Ann, Rob, Tom, Billy, Lawrence, Mia} is
a set of constants.
X = {x, y, z, . . . , x1 , x2 , x3 , . . .} is a set of variables.
O = {∧, ∨, ¬, →, ∀, ∃} is the set of standard logical operators.
F = {mother(x), father(x)} is the set of functions. Function mother(x) (father(x))
assign to x his/her mother (father).
P = {married(x, y), grandmother(x, y), siblings(x, y)} is the set of predicates.
The expression power of the language resides in the possibility of combining the
elementary symbols to form complex terms and formulas.

D.1.1 Formulas

Logical formulas are expressions built up over the dictionary defined by logical
and non logical symbols. Well-formed-formulas (wffs) are the ones with the syntax
recursively defined in the following. We must defined terms, and formulas.
Terms
• A constant is a term.
• A variable is a term.
• If f is a function symbol of arity n and t1 , . . . , tn are terms, f (t1 , t2 , . . . , tn ) is a
term.
Formulas
• If p is a predicate symbol of arity n, and t1 , t2 , . . . , tn are terms, then p(t1 , t2 , . . . , tn )
is an atomic formula.
• If ϕ1 and ϕ2 are formulas, (ϕ1 ∧ ϕ2 ), (ϕ1 ∨ ϕ2 ), (ϕ1 → ϕ2 ), are formulas.
• If ϕ is a formula, then ¬ϕ is a formula.
• If ϕ is a formula and x is a variable occurring in ϕ, then ∀x(ϕ) and ∃x(ϕ) are
formulas.
Only expressions that can be obtained by finitely many applications of the above
rules are formulas.
Appendix D Basic Notion of First Order Logics 423

Frequently, in the literature an atomic formulas is called a literal. A literal con-

sisting of a non-negated predicate p(t1 , t2 , . . . , tn ) is said a positive literal, whereas
a negated predicate of the type ¬p(t1 , t2 , . . . , tn ) is said a negative literal.
In the language introduced above as an example, terms are, for instance, Mary, y,
mother(x), and father(mother(x)). Moreover, sibling(x, y), married(x, y) ∧
grandmather(x, z), ¬married(y, z), married(mother(x), father(x)), ∃x.sibling
(x, y), ∀x ∃y.grandmother(y, x) are all well-formed formulas.

D.2 Semantics

FOL formulas make assertions on generic objects denoted by variables. In order to

let a formula assume a precise meaning in the description of the world, it is necessary
to define an interpretation, in which generic objects, represented by variables, can
be mapped to specific individuals.
An interpretation is a universe U of individuals, together with a set of rules assign-
ing a meaning to formulas with respect to U. More precisely, for atomic formulas
we have:
• Constants identify (are associated to) individuals in U.
• Function symbols are associated to operations in U, which build new objects (or
values) starting from the primitive ones. In other words, the semantic of a function
y = f (x1 , . . . , xn ) is the set of tuples (x1 , . . . , xn , y), where xj ∈ Ωj (1 j n),
and y ∈ Ωy , such that f associates y to the tuple (x1 , . . . , xn ).
• 0-arity predicates are mapped to True or False.
• n-ary predicates are mapped to n-ary relations, i.e, to set of n-ary tuples of objects
existing in U and satisfying the predicate.
In other words, objects, operations, and relations are the extension of constants,
functions, and predicates, respectively. Among formulas, we have to distinguish
between open and closed formulas. Open formulas are those that contain at least one
free variable, namely a variable that is not assigned to a specific value. Closed for-
mulas are those that do not contain free variables. A free variable can be closed
by either assigning to it a specific constant, or attaching to it a quantifier. For
instance, the formula married(x, y) is open, whereas siblings(John, Mary) and
∃x.sibling(x, Ann) are closed ones. Open formulas (called “concepts” by Frege)
have an extension associated to them, whereas closed formulas (called “sentences”
by Frege) have associated a truth value.
Replacing a variable x by a constant A is called a substitution θ = x/A. An
atomic formula q(x/A) is true in U if the constant A belongs to the unary relation Rq ,
corresponding to predicate q.1 In analogous way, the atomic formula p(y/B, z/C) is

1With x/A, x/B, y/C we mean that the variables x, y, and z are replaced by the constant values A,
B, and C, respectively.
424 Appendix D Basic Notion of First Order Logics

Fig. D.1 Semantics of logical connectives AND (∧), OR(∨), NOT (¬), Implication (→), and
BI-Implication (↔)

true iff the tuple (B, C) belongs to the table defining the binary relation Rp , associated
to predicate p.
The truth of complex formulas can be evaluated in a universe U by combining the
truth of the single atomic formulas according to the classical semantics of the logical
connectives (see Fig. D.1). For instance, the formula ϕ(x, y) = q(x/A)∧p(x/A, y/B)
is true iff A belongs to relation Rq and (A, B) belongs to relation Rp .
By referring to the truth tables reported in Fig. D.1, it is easy to prove that, among
the five connectives ∧, ∨, ¬, →, and ↔, only three of them are essential because
implication and bi-implication can be expressed as a combination of the others. For
instance, formula ϕ → ψ (ϕ implies ψ), is semantically equivalent to ¬ϕ ∨ ψ,
while formula ϕ ↔ ψ (ϕ implies ψ and ψ implies ϕ) is semantically equivalent to
(¬ϕ ∨ ψ) ∧ (¬ψ ∨ ϕ).

D.3 Clausal Form

In wffs quantifiers can be nested arbitrarily. However, it can be proved that any wff
can be syntactically transformed in such a way that all quantifiers occur only at the
most external level, while preserving the formula’s semantics. This syntactic form
is called prenexed form. Moreover, the existential quantifier can be eliminated by
introducing the so called Skolem function.
The prenexed form of a formula can be a universally quantified formula of the
type ∀x1 ,x2 ,...,xn .ϕ(x1 , x2 , ..., xn ), where ϕ is a formula with only free variables, which
is built up by means of the connectives ∧, ∨, ¬, and, possibly, → and ↔. Finally,
any formula, built up through the connectives ∨, ∧ and ¬, can be represented in
Conjunctive Normal Form (CNF), i.e., as a conjunction of disjunctions of atoms
(literals). In particular, any FOL sentence can always be written as in the following:

∀x1 ,x2 ,....,xn .[(L11 ∨ L12 ∨ . . . ∨ L1k1 ) ∧ (L21 ∨ L22 ∨ . . . ∨ L2k2 ) ∧ . . .

(D.1)
∧(Lm1 ∨ Lm2 ∨ . . . ∨ Lmkm )]

where Lij denotes a positive or negative literal, with any subset of the variables
x1 , x2 , ...., xn as arguments.
Appendix D Basic Notion of First Order Logics 425

Form (D.1) is usually referred to as clausal form (the word clause denotes a
disjunction of literals), and is the one most widely used for representing knowledge
in Relational Machine Learning.
For the sake of simplicity, notation (D.1) is usually simplified as follows:
• Universal quantification is implicitly assumed, and the quantifier symbol is omit-
ted.
• Symbol ∧ denoting conjunction is replaced by “,” or implicitly assumed.
Horn clauses. A Horn clause is a clause with at most one positive literal. Horn clauses
are named after the logician Alfred Horn [262], who investigated the mathematical
properties of similar sentences in the non-clausal form of FOL. The general form of
Horn clauses is then:
¬L1 ∨ ¬L2 ∨ . . . ∨ ¬Lk−1 ∨ Lk , (D.2)

which can be equivalently rewritten as

¬(L1 ∧ L2 ∧ . . . ∧ Lk−1 ) ∨ Lk ≡ L1 ∧ L2 ∧ . . . ∧ Lk−1 → Lk (D.3)

Horn clauses play a basic role in Logic Programming [299] and are important for
Machine Learning [382]. A Horn clause with exactly one positive literal is said a
definite clause. A definite clause with no negative literals is also called a fact.
DATALOG. DATALOG is a subset of a Horn clause language designed for querying
databases. It imposes several further restrictions to the clausal form:
• It disallows complex terms as arguments of predicates. Only constants and vari-
ables can be terms of a predicate.
• Variables are range restricted, i.e., each variable in the conclusion of a clause must
also appear in a non negated literal in the premise.
Appendix E
Abstraction Operators

ll operators that we have defined so far are summarized in Table E.1. They
are grouped according to the elements of the description frame they act upon, and
their abstraction mechanism. Even though there is quite a large number of them,
several operators can be “technically” applied in the same way, exploiting synergies.
For instance, equating values of a variable can be implemented with the same code for
attributes, argument values in functions and relations, and in a function’s co-domain.
Nevertheless, we have kept them separate, because they differ in meaning, and also
in the impact they have on the Γ ’s.
As it was said at the beginning, the listed operators are defined at the level of
description frames, because they correspond to abstracting the observations that
are obtained from the sensors used to analyze the world. To each one of them a
corresponding method is associated, which acts on specific P-Sets according to
rules that guide the actual process of abstraction.

E.1 Some More Operators

In this appendix some operators are described in addition to those introduced in

Chap. 7. The complete set of available operators can be found in the book’s companion
site.
The introduced operators are by no means intendeded to exhaust the spectrum
of abstractions that can be thought of. However they are sufficient to describe most
of the abstractions proposed in the past in a unified way. Moreover, they provide a
guide for defining new ones, better suited to particular fields.

L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 427
DOI: 10.1007/978-1-4614-7052-6, © Springer Science+Business Media New York 2013
428 Appendix E Abstraction Operators

Table E.1 Summary of the elementary abstraction and approximation operators, classified accord-
ing to the elements of the description frame they act upon and their mechanism
Operators Elements Arguments Values
Hiding ωhobj , ωhtype , ωhfunarg , ωhrelarg ωhattrval
ωhattr , ωhrel , ωhfunargval
ωhfun ωhfuncodom
ωhrelargval
Equating ωeqobj , ωeqtype , ωeqfunarg ωeqattrval
ωeqattr , ωeqfun , ωeqrelarg ωeqfunargval
ωeqrel ωeqfuncodom
ωeqrelargval
Building ωhierattr , ωhierfun , ωhierattrval
hierarchy ωhierrel , ωhiertype ωhierfuncodom
Combining ωcoll , ωaggr , ωgroup ωconstr
Approximating ρrepl ρrepl ρrepl
ρidobj , ρidtype , ρidfunarg ρidattrval
ρidattr , ρidfun , ρidrelarg ρidfunargval
ρidrel ρidfuncodom
ρidrelargval

E.1.1 Operator that Hides a Type: ωhtype

(g)
If X (g) = ΓTYPE and y = t, type t cannot be anymore observed in a system, and
objects that were previously of this type become of type obj. We define:

(g)
ωhtype (t) = ωh (ΓTYPE , t)
def

and we obtain:
(a) (g)
ΓTYPE = ΓTYPE − {t} ∪ {obj}

The corresponding method methhtype (Pg , t), applied to an observed Pg , replaces

with obj the type of all objects of type t.

E.1.2 Operator that Hides a Value from a Function’s Codomain:

ωhfuncodom

(g)
If X (g) = ΓF , y = (fh , CD(fh )), v ∈ CD(fh ), then the operator

(g)
ωhfuncodom (fh , CD(fh ), v) = ωh (ΓF , (fh , CD(fh )), v)
def
Appendix E Abstraction Operators 429

removes some value v from the codomain of fh . Then an abstract function is created,
whose codomain is given by:

CD(fh(a) ) = CD(fh ) − {v},

and
(a) (g) (a)
ΓF = ΓF − {fh } ∪ {fh }

For instance, let us consider the function Price, with codomain CD(Price) =
{cheap, moderate, fair, costly, very-costly}; if we want to remove the
value very-costly, we have to specify, in method

methhfuncodom (Pg , Price, CD(Price), very-costly),

(a)
what happens for those tuples in FCOV (fh ) that contain v. One possibility is that
the value is turned into UN.

E.1.3 Operator that Builds Equivalence Classes of Relations: ωeqrel

(g)
If X (g) = ΓR and y(a) = R(a) , the operator makes indistinguishable all relations R
satisfying ϕeq (R1 , . . . , Rk ). Let
ΓR,eq = {(R1 , . . . , Rk )|ϕeq (R1 , . . . , Rk )}
be the set of indistinguishable relations. We define:

(g)
ωeqrel (ϕeq (R1 , . . . , Rk ), R(a) ) = ωeqelem (ΓR , ϕeq (R1 , . . . , Rk ), R(a) )
def

The operator ωeqrel (ϕeq (R1 , . . . , Rk ), R(a) ) generates first the set ΓR,eq , obtaining:

(a) (g)
ΓR = ΓR − ΓR,eq ∪ {R(a) }

It is the method metheqrel (Pg , ϕeq (R1 , . . . , Rk ), R(a) ) that specifies how the cover
of R(a) has to be computed.
As an example, let us suppose that the set of relations to be made indistinguishable
be define extensionally, as in the case of functions. For instance, let

ΓR,eq = {RIsMotherof , RIsStepMotherof },

430 Appendix E Abstraction Operators

where:
(g) (g)
RIsMotherof ⊆ Γwomen × Γpeople
(g) (g)
RIsStepMotherof ⊆ Γwomen × Γpeople

If we state the equivalence between the two relations, we may keep only R(a) in place
of the two. Again, method metheqrel (Pg , ϕeq (R1 , . . . , Rk ), R(a) ) shall specify how
the cover RCOV (RIsStepMotherof ) must be computed.

E.1.4 Operator that Builds Equivalence Classes of Values

in a Function’s Codomain: ωeqfuncodom

(g)
If X (g) = ΓF , Y = (fh , CD(fh )), Veq ⊆ CD(fh ), then the operator equates values
of the codomain of function fh and set all equal to v(a) . We define:

(g)
ωeqfuncodom (fh , CD(fh ), Veq , v(a) ) = ωeqval (ΓF , (fh , CD(fh )), Veq , v(a) )
def

An abstract function is defined:

(a) (g) (g)
fh ⊆ ΓO × . . . × ΓO → CD(fh ) − Veq ∪ {v(a) }

1 ......... th

Then:
(a) (g) (a)
ΓF = ΓF − {fh } ∪ {fh }

(a)
Method metheqfuncodom (Pg , (fh , CD(fh )), Veq , v(a) ) handles the cover of fh by
replacing in FCOV (fh(a) ) all occurrences of members of Veq with v(a) .
For the sake of exemplification, let us consider a gray-level picture, in which the
attribute Intensity of a pixel x can take on a value in the integer interval [0, 255]. Let
τ be a threshold, such that:

255 if I(x) > τ ,
I (a) (x) = (E.1)
I(x) otherwise.

In Eq. (E.1) all values greater than the threshold are considered equivalent. An exam-
ple is reported in Fig. E.1.
Appendix E Abstraction Operators 431

Fig. E.1 Example of method metheqfuncodom (Pg , (fh , CD(fh )), Veq , v(a) ). The picture on the left
is a 256-level gray picture. By a thresholding operation, all pixels whose intensity is greater than τ
are considered white

E.1.5 Operator that Builds a Hierarchy of Attributes: ωhierattr

(g) (g)
If X (g) = ΓA , Y = ∅, Ychild = ΓA,child , and y(a) = (A(a) , Λ(a) ), then the operator
(g)
works on an attribute hierarchy, where a set of nodes, those contained in ΓA,child ,
are replaced by (A(a) , Λ(a) ). We define:

(g) (g) (g)
ωhierattr ΓA,child , (A(a) , Λ(a) ) = ωhier ΓA , ΓA,child , (A(a) , Λ(a) )
def

and we obtain:
(a) (g) (g)
ΓA = ΓA − ΓA,child ∪ {(A(a) , Λ(a) )}.

(g)
The method methhierattr (Pg , ΓA,child , (A(a) , Λ(a) )) states how the values in Λ(a)
(g)
must be derived from those in the domains of the attributes in ΓA,child .
As an example, let us consider the attributes Length and Width. We intro-
duce the abstract attribute LinearSize(a) , such that Length is-a LinearSize(a) and
Width is-a LinearSize(a) . We have then, ΓA,child = {Length, Width}, and A(a) =
LinearSize(a) . The values of the attribute LinearSize(a) are to be defined; for instance,
we may assume that, for an object x,

LinearSize(a) (x) = Max[Length(x), Width(x)].

The original attribute do not enter Γa .

432 Appendix E Abstraction Operators

E.1.6 Operator that Builds a Hierarchy of Relations: ωhierrel

(g) (g)
If X (g) = ΓR , Y = ∅, Ychild = ΓR,child , and y(a) = R(a) , then the operator works
(g)
on a relation hierarchy, where a set of nodes, those contained in ΓR,child , are replaced
by R(a) . We define:

(g) (g) (g)
ωhierrel ΓR,child , R(a) = ωhier ΓR , ΓR,child , R(a)
def

and we obtain:
(a) (g) (g)
ΓR = ΓR − ΓR,child ∪ R(a)

(g)
The method methhierrel Pg , ΓR,child , R(a) states how the cover of R(a) must be
(g)
computed starting from those of the relations in ΓR,child .
(g) (g) (g) (g)
As an example, let RHorizAdjacent ⊆ ΓO × ΓO and RVertAdjacent ⊆ ΓO × ΓO
be two relations over pairs of objects. The former is verified when two objects touch
each other horizontally, whereas the latter is verified when two objects touch each
(a)
other vertically. We introduce the abstract relation RAdjacent ⊆ ΓO × ΓO , which does
not distinguish the modality (horizontal or vertical) of the adjacency. In this case we
(a)
have ΓR,child = {RHorizAdjacent , RVertAdjacent } and the new relation R(a) = RAdjacent .
(Ψ ) (g)
Operator ωhierrel (Pg , ΓR,child , R(a) ) will establish that, for instance:

(a)
FCOV (RAdjacent ) = FOCV (RHorizAdjacent ) ∪ FCOV (RVertAdjacent )

The original relations are hidden in the abstract space.

E.2 Approximation Operators

In this section we illustrate some additional approximation operators.

E.2.1 Operator that Identifies Types: ωidtype

(g)
If X (g) = ΓTYPE and y(a) = t(a) , the operator makes all types satisfying ϕid (t1 , . . . , tk )
indistinguishable. Then type t(a) is applied to all objects in the equivalence class.
We define:
(g)
ωidtypes (ϕid (t1 , . . . , tk ), t(a) ) = ωidelem (ΓTYPE , ϕid (t1 , . . . , tk ), t(a) )
def
Appendix E Abstraction Operators 433

The operator ωidtype (ϕid (t1 , . . . , tk ), t(a) ) generates first the set of ΓTYPE,id of indis-
tinguishable types, and then it applies t(a) to the obtained class. All types in ΓTYPE,id
become t(a) , obtaining:

(a) (g)
ΓTYPE = ΓTYPE − ΓTYPE,id ∪ {t(a) }

It is the method methidtype (Pg , ϕid (t1 , . . . , tk ), t(a) ) that specifies what properties
are to be assigned to t(a) , considering the ones of the equated types. For instance, if
the types in ΓTYPE,id have different sets of attributes, t(a) could have the intersection
of these sets, or the union, by setting some values to NA, depending on the choice of
the user.
As an example, we can consider the types chair and armchair and we can
equate them to be both chair(a) .

E.2.2 Operator that Approximates Attribute Values: ωidattrval

(g)
If X (g) = ΓA , Y = (A, ΛA ), and Vid = ΛA,id ⊆ ΛA , then the operator makes
indistinguishable a subset ΛA,id of the domain ΛA of A. We define:

(g)
ωidattrval ((A, ΛA ), ΛA,id , v(a) ) = ωidval (ΓA , (A, ΛA ), ΛA,id , v(a) )
def

We obtain an approximate attribute A(a) such that ΛA(a) = ΛA − ΛA,id ∪ {v(a) }, and

(a) (g)
ΓA = ΓA − {(A, ΛA )} ∪ {(A(a) , ΛA(a) )}

For the sake of exemplification, let us consider an attribute, say Color, which
takes values in the set:
{white, yellow, olive-green, sea-green, lawn-green, red,
pink, light-green, dark-green, blue, light-blue, aquamar-
ine, orange, magenta, cyan, black}.
We might consider equivalent all the shades of green, and identify them with
v(a) = sea-green. In this case, the true shade of green is no more known (see
Fig. E.2.
As another important example, let us consider the discretization of real num-
bers. Let us consider the interval [0, 100), and let us divide it into 10 subintervals
{[10k, 10(k + 1)) | 0 k 9}. Numbers falling inside one of the intervals are
considered all equal to the mean value 10k + 0.5.
434 Appendix E Abstraction Operators

Fig. E.2 Application of method meth(Pg , ωeqattrval (Color, ΛColor ), Vid , v(a) to the figure
on the left. Let Vid = {olive-green, sea-green, lawn-green, light-green,
dark-green}. Objects o1 , o2 , and o3 have color dark-green, lawn-green, and
sea-green, respectively. After equating all shades of green to sea-green, the color of all
three objects becomes sea-green. [A color version of the figure is reported in Fig. H.16 of
Appendix H]

E.3 Some More Methods

In Chap. 7 the methods associated to some operators have been described. In this
section we give some additional examples, whereas the complete set of methods is
provided in the book’s companion site.
Let us consider the operators that hide an attribute, or a function, or a relation,
i.e., ωhattr ((Am , Λm )), ωhfun (fh ), and ωhrel (Rk ). Hiding an attribute or a function or
a relation are all instantiations of the same PDT introduced in Sect. 7.2.1 , and then
we group them together in Table E.2, whereas their bodies are reported in Tables E.3,
E.4, and E.5, respectively.
Also at the description level the operators ωhattr ((Am , ΛM )), ωhfun (fh ), and
ωhrel (Rk ) are similar; in fact, they simply hide from the appropriate set the con-
cerned element (attribute, function, or relation), as it was illustrated in Sect. 7.2.1.
But when we must apply them to a specific Pg , some complication may arise. Let us
look first at Table E.2.
Operator ωhattr ((Am , Λm )) hides the attribute

from the set of available ones, and,
as a consequence, meth Pg , ωhattr (Am , Λm ) hides the value of that attribute in each
object in Pg . Hiding an attribute may cause the descriptions of some objects to become
identical. However, as each object has a unique identity, they remain distinguishable.
As both functions and relations cannot have an attribute as an argument, removing
Am does not have any further effect. For the hidden information, it is not necessary
to store all the tuples hidden in Ag , but only the value of Am for each object.
Appendix E Abstraction Operators 435

Table E.2 Summary of methods meth(Pg , ωhattr (Am , ΛM )), meth(Pg , ωhfun (fh )), and
meth(Pg , ωhrel (Rk ))
NAME meth(Pg , ωhattr ) meth(Pg , ωhfun ) meth(Pg , ωhrel )
INPUT Pg , (Am , Λm ) Pg , f h Pg , R k
OUTPUT Pa Pa Pa
APPL-CONDITIONS Am ∈ Ag fh ∈ F g Rk ∈ Rg
PARAMETERS ∅ ∅ ∅
MEMORY Δ(P ) Δ(P ) Δ(P )
BODY See Table E.3 See Table E.4 See Table E.5

Table E.3 Pseudo-code for the method meth Pg , ωhattr (Am , Λm )

METHOD meth Pg , ωhattr (Am , Λm )

Let |Og | = N
Δ(P ) = ∅
for n = 1, N do
Aa = Ag − {(on , A1 (on ), ..., Am (on ), ..., AM (on ))}
Δ(P ) = Δ(P ) ∪ {(on , Am (on ))}
Aa = Aa ∪ {(on , A1 (on ), ..., Am−1 (on ), Am+1 (on ), ..., AM (on ))}
end

Table E.4 Pseudo-code for the method meth Pg , ωhfun (fh )

METHOD meth Pg , ωhfun (fh )

(P )
ΔF = {fh }
Oa = Og
Aa = Ag
(g)
if CD(fh ) = ΓO
then Fa = Fg − {FCOV (fh )}
forall fj (x1 , ..., xtj ) | ∃xi = fh do
Define fj(a) (x1 , ..., xi−1 , xi+1,... xtj )
(a)
Fa = Fa − FCOV (fj ) ∪ FCOV (fj )
(P ) (P )
ΔF = ΔF ∪ {(fj , xi )}
end
Ra = Rg
(P )
ΔR = ∅
forall Rk (x1 , ..., xtk ) | ∃xi = fh do
(a)
Define Rk (x1 , ..., xi−1 , xi+1,... xtk )
(a)
Ra = Ra − FCOV (Rk ) ∪ FCOV (Rk )
(P ) (P )
ΔR = ΔR ∪ {(Rk , xi )}
end
endif
436 Appendix E Abstraction Operators

Table E.5 Pseudo-code for the method meth Pg , ωhrel (Rk )

METHOD meth Pg , ωhrel (Rk )

Oa = Og
Aa = Ag
Fa = Fg
Ra = Rg − {RCOV (Rk )}
(P )
ΔR = RCOV (Rk )

Hiding a function is a simple operation, per se, but it may have indirect effects
on the set of functions and relations. In fact, if the co-domain of fh is the set of
(g) (g)
objects, there may be in ΓF or ΓR some function or relation that has fh as one of
its arguments. Then, hiding fh , these arguments disappear and new abstract functions
or relations, with one less argument, are to be defined, increasing thus the degree of
abstraction. Hinding a relation has no side-effects.

E.4 Complete List of Operators

As a conclusion, we report here the complete list of the domain-independent operators

available so far.
Table E.6 List of currently available operators, organized according to their nature, elements acted upon, and abstraction mechanism
Operator Type Arguments Effects Comments
ωhobj Abstr o Hides object o
ωhtype Abstr t Hides type t All objects of type t are
hidden in every Pg
ωhattr Abstr (Am , Λm ) Hides attribute Am Values of attribute Am
with domain Λm are hidden in all objects
ωhfun Abstr fh Hides function fh
ωhrel Abstr Rk Hides relation Rk
ωhattrval Abstr (Am , Λm ), vi Hides value vi ∈ Λm Value vi is replaced
by UN in all Pg
Appendix E Abstraction Operators

ωhfunargval Abstr fh , xj ,o Hides the value o from the Value o assumed by xj

domain of argument xj is replaced by UN in
of function fh the FCOV (fh ) of all Pg
ωhrelargval Abstr Rk , xj , o Hides the value o from the Value o assumed by xj
domain of argument xj is replaced by UN in
of relation Rk the RCOV (Rk ) of all Pg
ωhfuncodom Abstr fh , CD(fh ), v Hides value v from the Value v assumed by fh
codomain of fh is replaced by UN in all Pg
ωhfunarg Abstr f h , xj Argument xj of Arity th of fh is reduced by 1
function fh is hidden
ωhrelarg Abstr Rk , xj Argument xj of Arity tk of Rk is reduced by 1
function Rk is hidden
ωeqobj Abstr ϕeq , o(a) Builds up an equivalence class o(a) is a generic name
with the objects satisfying ϕeq denoting the class
ωeqattr Abstr ϕeq , Builds up an equivalence class A(a) is a generic name
(A(a) , ΛA(a) ) with the attributes satisfying ϕeq denoting the class
ωeqfun Abstr ϕeq Builds up an equivalence class f(a) is a generic name
f (a) with the functions satisfying ϕeq denoting the class
437

(continued)
438

Table E.6 continued

Operator Type Arguments Effects Comments
ωeqrel Abstr ϕeq Builds up an equivalence class R(a) is a generic name
R(a) with the relations satisfying ϕeq denoting the class
ωeqtype Abstr ϕeq (t1 , . . . , tk ) Builds up an equivalence class t(a) is a generic name
t(a) with the types satisfying ϕeq denoting the class
ωeqattrval Abstr (A, ΛA ), ΛA,eq , v(a) All values in ΛA,eq v(a) is a generic name
form an equivalence class v(a) denoting the class
ωeqfunargval Abstr fh , xj , ΓO,eq , o(a) All values in ΓO,eq o(a) is a generic name
form an equivalence class o(a) denoting the class
ωeqrelargval Abstr Rk , xj , ΓO,eq , o(a) All values in ΓO,eq o(a) is a generic name
form an equivalence class o(a) denoting the class
ωeqfuncodom Abstr fh , CD(fh ), Veq , v(a) All values in Veq v(a) is a generic name
form an equivalence class v(a) denoting the class
ωeqfunarg Abstr fh , Zeq , z(a) All values in Zeq z(a) is a generic name
form an equivalence class z(a) denoting the class
ωeqrelarg Abstr Rk , Zeq , z(a) All values in Zeq z(a) is a generic name
form an equivalence class z(a) denoting the class
ωhierattrval Abstr (Am , Λm ), The values of attribute Am belonging A node of higher level is created
Λm,child , v(a) to Λm,child are hidden, and a new in a hierarchy of attribute’s values
node v(a) is created, such that
∀vi ∈ Λm,child : vi is − a v(a)
(g) (g)
ωhiertype Abstr ΓTYPE,child , t(a) Types t ∈ ΓTYPE,child In each Pg , all objects of a type
(g)
are hidden, and a new type t(a) t ∈ ΓTYPE,child are hidden
is created, such that t is − a t(a) and replaced by a corresponding
object of type t(a)
(continued)
Appendix E Abstraction Operators
Table E.6 continued
Operator Type Arguments Effects Comments
ωhierfuncodom Abstr (fh , CD(fh )), The values of CD(fh ) belonging A node of higher level is created
CD(fh )child , v(a) to CD(fh )child are hidden, and in a hierarchy of values
a new node v(a) is created, in a function’s codomain
such that ∀vi ∈ CD(fh ) : vi is − a v(a)
(g) (g) (g)
ωhierattr Abstr ΓA,child , (A(a) , Λ(a) ) The attributes contained in ΓA,child Values in Λi , for each Ai ∈ ΓA,child
are replaced by a new attribute, are linked to corresponding values
(A(a) , Λ(a) ), such that in Λ(a)
(g)
∀Ai ∈ ΓA,child : Ai is − a A(a)
Appendix E Abstraction Operators

(g) (g) (g)

ωhierfun Abstr ΓF ,child , f (a) The functions contained in ΓF ,child Values in CD(fi ), for each fi ∈ ΓF ,child
are replaced by a new function, are linked to corresponding values
f (a) such that in CD(f (a) )
(g)
∀fi ∈ ΓF ,child : fi is − a f (a)
(g) (g)
ωhierrel Abstr ΓR,child , R(a) The relations contained in ΓR,child
are replaced by a new relation,
R(a) such that
(g)
∀Ri ∈ ΓR,child : Ri is − a R(a)
ωcoll Abstr t, t(a) Makes a single, collective type t(a) ,
starting with a number
of elements of type t
ωaggr Abstr (t1 , . . . , ts ), t(a) Makes a composite type t(a) ,
starting with a set of objects
of different types t1 , . . . , ts
ωgroup Abstr ϕgroup , G(a) Forms a group with name G(a) with The group can be defined
the set of objects satisfying ϕgroup extensionally. The operator
acts on objects, not types
(continued)
439
Table E.6 continued
440

Operator Type Arguments Effects Comments

ωconstr Abstr Constr Defines a new description element Most often used to construct
(attribute, function, relation) a new attribute
(g)
Constr : ΓA,F ,G → ΓA(a),F ,G
(g) (g) (g) (g)
ΓA,F ,R = ΓA ∪ ΓF ∪ ΓR
(a) (a) (a) (a)
ΓA,F ,R = ΓA ∪ ΓF ∪ ΓR
ρreplattrval Approx (Am , Λm ), vi , v(p) Value vi of attribute Am v(p) is a specific value,
is replaced with value v(p) different from vi
(p) (p)
ρreplfun Approx f h , gh Function fh gh is a specific function,
(p)
is replaced with function gh different from fh
(p) (p)
ρreplrel Approx R k , Rk Relation Rk Rk is a specific relation,
(p)
is replaced with relation Rk different from Rk ρidobj
(p)
ρidobj Approx ϕid (o1 , . . . , ok ), o(p) Set of objects satisfying ϕid oh is a one of the objects
form an equivalence class in the equivalence class
All objects are equated to o(p)
(p)
ρidtype Approx ϕid (t1 , . . . , tk ), t(p) Set of types satisfying ϕid th is one of the types
form an equivalence class in the equivalence class
All objects of the equated types
become of type t(p)
ρidattrval Approx (A, ΛA ), ΛA,id , v(p) Values in ΛA,id v(p) is a one of the values
become equal to v(p) in ΛA,id
ρidfuncodom Approx fh , CD(fh ), Vid , v(p) Values in Vid v(p) is a one of the
become equal to v(p) values in Vid
ρidfunarg Approx fh , Zid , z(a) All arguments in Zid z(a) is an element of Zid
are equated to z(a)
ρidrelarg Approx Rk , Zid , z(a) All arguments in Zid z(a) is an element of Zid
are equated to z(a)
Appendix E Abstraction Operators

Type “Abstr” stands for abstraction operator, whereas “Approx” stands for approximation operator
Appendix F
Abstraction Patterns

n this appendix two more abstraction patterns are described, for the sake
of illustration. The complete set, corresponding to the full set of operators listed in
Appendix E, can be found in the book’s companion Web site.
In Table F.1 the pattern referring to hiding an argument of a function or relation
is reported.

Table F.1 HIDING-ARGUMENT—Abstraction Pattern that hides an argument of a function or

relation
NAME HIDING-ARGUMENT
ALSO KNOWN Described by Plaisted [419] as “propositionalization”. As it
requires a structured representation, it is less popular than
hiding an element. In Machine Learning it may correspond
to the task of “propositionalization”.
GOAL In Problem Solving and Automated Reasoning it is meant to
speed up inference by providing a sketch of a proof without
variables.
TYPICAL Very much used in databases, where it corresponds to the
APPLICATIONS and projection operation in relational algebra.
KNOWN USE
IMPLEMENTATION Problems with this operator may arise when the unique
ISSUES argument of a univariate function has to be hidden. In this
case the function becomes a constant. A relation, whose
arguments are all hidden, becomes a Boolean variable with
an empty cover.
KNOWN USES Machine Learning, CSP, Problem Solving, Theorem Proving.
SIMILAR PATTERNS This pattern is related to the Equating Arguments Pattern, and
to Building a hierarchy of arguments.

L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 441
DOI: 10.1007/978-1-4614-7052-6, © Springer Science+Business Media New York 2013
442 Appendix F Abstraction Patterns

In Table F.2 we provide the template for aggregating objects.

Table F.2 AGGREGATION—Aggregation Pattern that forms new objects starting from existing
ones
NAME AGGREGATION
ALSO KNOWN In Machine Learning the aggregation operator is known as
“predicate invention”, “predicate construction”, or “term
construction”, whereas in Data Mining it is related to “motif
discovery”. In general, it is the basis of the “constructive
induction” approach to learning. In Planning, Problem
Solving, and Reinforcement Learning it includes “state
aggregation” and “spatial and/or temporal aggregation”.
GOAL It aims at working, in any field, with “high level” constructs in
the description of data and in theories, in order to reduce the
computational cost and increasing the meaningfulness of
the results.
TYPICAL Finding regions and objects in the visual input, representing
APPLICATIONS physical apparata at various levels of details by introducing
composite components.
IMPLEMENTATION Implementing the grouping operator may require even complex
ISSUES algorithms, and the cost of aggregation has to be weighted
against the advantages in the use of the abstract
representation.
KNOWN USES Even though not always under the name of abstraction,
aggregation and feature construction is very much used in
computer vision, description of physical systems, Machine
Learning, Data Mining, and Artificial Intelligence in
general.
Appendix G
Abstraction of Michalski’s “Train” Problem

n this appendix the detailed application of the introduce operators in

Michalski’s “train” problem are reported.
The results of the method are described in

Chap. 9. In Table G.1 the method meth Pg , ωaggr ({car, load}, loadedcar)
is reported.
The parameters, which are listed in Table G.2, specify how objects are actually
aggregated and how attributes and relations change as a consequence.
Finally, Table G.3 describes the actual algorithm performing the aggregation
abstraction.

Table G.1 Method meth Pg , ωaggr ({car, load}, loadedcar)

NAME meth Pg , ωaggr ({car, load}, loadedcar)

INPUT Pg , {car, load}, loadedcar,

g : Ocar × Oload n → Oloadedcar (n 0)
g(y, x1 , . . . , xn ) = if [y ∈ Ocar ] ∧ [x1 , . . . , xn ∈ Oload ]∧
[(xi , y) ∈ RCOV (RInside ) (1 i n)]∧
[y, x1 , . . . , xn ] are labelled with the same example] then z

OUTPUT Pa , RPartof ⊆ (Oload ∪ Ocar ) × Oloadedcar

APPL-CONDITIONS ∃c ∈ Ocar
∃ (different) 1 , . . . , n ∈ Oload
c, 1 , . . . , n are labelled with the same example
(i , c) ∈ RCOV (RInside ) (1 i n)

PARAMETERS See Table G.2

MEMORY Δ(P ) , RCOV (RPartof )
BODY See Table G.3

L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 443
DOI: 10.1007/978-1-4614-7052-6, © Springer Science+Business Media New York 2013
444 Appendix G Abstraction of Michalski’s “Train” Problem

Table G.2 Parameters of the method meth Pg , ωaggr ({car, load}, loadedcar)

α(x, y) ⇒ LCshape(a) (z) = Cshape(y)

LClength(a) (z) = Clength(y)
LCwall (a) (z) = Cwall(y)
LCwheels(a) (z) = Cwheels(y)
γ(x1 , x2 ) ⇒ RInside is NA
(a)
if ∃ y s.t. (y , y) ∈ RCOV (RInfrontof ) then (y , z) ∈ RCOV (RInfrontof )
(a)
if ∃ y s.t. [(y, y ) ∈ RCOV (RInfrontof ) then (z, y ) ∈ RCOV (RInfrontof )

Table G.3 Pseudo-code for the method meth Pg , ωaggr ({car, load}, loadedcar)

METHOD meth Pg , ωaggr ({car, load}, loadedcar)

Let Rpartof ⊆ (Oload ∪ Ocar ) × Oloadedcar be a new predicate
Let σ = {1 , ..., n | i ∈ Oload , (i , c) ∈ RCOV (RInside ) (1 i n)}
Oa = Og , Aa = Ag , Ra = Rg
(P ) (P ) (P )
ΔO = ΔA = ΔR = ∅
RCOV (Rpartof ) = ∅
Build up d = g(c, 1 , . . . , n )
RCOV (Rpartof ) = RCOV (Rpartof ) ∪ {(c,d)}
for i = 1, n do
RCOV (Rpartof ) = RCOV (Rpartof ) ∪ {(i , d)}
end
Oa = Oa − {c, 1 2, . . . , n } ∪ {d}
(P )
ΔO = {c, 2 , . . . , n }
Aa = Aa − {(c,car, Cshape(c), Clength(c), Cwall(c), Cwheels(c))}−
{(i , load, Lshape(i )|(1 i n)}
Aa = Aa ∪ {(d, loadedcar, Cshape(c), Clength(c), Cwall(c), Cwheels(c))},
(P ) (P )
ΔA = ΔA ∪ {(c,car, Cshape(c), Clength(c), Cwall(c), Cwheels(c))}∪
{(i , load, Lshape(i )|(1 i n)}
forall(y , c) ∈ RCOV (RInfrontof ) do
(a)
RCOV (RInfrontof ) = RCOV (RInfrontof ) − {(y , c)} ∪ {(y , d)}
(P ) (P )
ΔR = ΔR ∪ {(y , c)}
end
forall(c, y ) ∈ RCOV (RInfrontof ) do
(a)
RCOV (RInfrontof ) = RCOV (RInfrontof ) − {(c, y )} ∪ {(d, y )}
(P ) (P )
ΔR = ΔR ∪ {(c, y )}
end
(P ) (P ) (P )
Δ(P ) = ΔO ∪ ΔA ∪ ΔR ∪ RCOV (Rpartof )
Appendix H
Color Figures

n this appendix, some of the figures appearing in the book are reported
with their original colors.

Fig. H.1 Vasilij Kandinsky, Composition VII, 1913. The Tretyakov Gallery, Moscow

L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 445
DOI: 10.1007/978-1-4614-7052-6, © Springer Science+Business Media New York 2013
446 Appendix H Color Figures

Fig. H.2 Nocturne in Black

and Gold by J. McNeill
Whistler (1875). It is con-
sidered as a first step toward
abstraction in painting

Fig. H.3 K. Malevich’s

Portrait of Ivan Klioune
(1911). The State Russian
Museum, St. Petersburg
Appendix H Color Figures 447

Fig. H.4 Upper (pink + yellow regions) and lower (yellow region) approximations of a concept X
= Oval, defined as a region in the 2D plane

Fig. H.5 Incas used quipus to memorize numbers. A quipu is a cord with nodes that assume
position-dependent values. An example of the complexity a quipu may reach. (Reprinted with
permission from Museo Larco, Pueblo Libre, Lima, Peru.)
448 Appendix H Color Figures

(a) (b)
Fig. H.6 a Picture of a poppy field. If we only have this picture, it is impossible to say whether
it is concrete or abstract. b The same picture in black and white. By comparison, this last is less
informative than the colored one, because the information referring to the color has been removed;
then picture b is more abstract than picture a

Fig. H.7 A color picture has been transformed into a black and white one. If the color is added
again, there is no clue for performing this addition correctly, if it is not know how the color was
originally removed
Appendix H Color Figures 449

Fig. H.8 Abstraction and generalization can be combined in every possible way. In the left-bottom
corner there is picture of one of the authors, which is specific (only one instance) and concrete (all
the skin, hair, face, ... details are visible). In the right-bottom corner there is a version of the picture
which is specific (only one instance, as the person is still recognizable) and abstract (most details
of the appearance are hidden). In the top-left corner the chimpanzee–human last common ancestor
is represented with many physical details, making thus the picture still concrete; however many
monkeys and hominides satisfy the same description, so that this is an example of a concrete but
general concept. Finally, in the top-right corner there is a representation of a human head according
to Marr [353] (see Fig. 2.13); the head is abstract (very few details of the appearance) and general
(any person could be an instance)

Fig. H.9 A geometrical scenario with various geometrical elements

450 Appendix H Color Figures

Fig. H.10 Example of method meth(Pg , ωhattr ((Am , Λm )). The attribute Am = Color is hidden
from the left picture giving a gray-level picture (right). Each pixel shows a value of the light intensity,
but this last is no more distributed over the R,G,B channels

Fig. H.11 Example of application of the method meth[Pg , ωhattrval ((Color, ΛColor ), turqu-
oise)]. The value turquoise is hidden from the left picture; a less colorful picture is obtained
(right), where objects of color turquoise become transparent (UN)
Appendix H Color Figures 451

Fig. H.12 The Rubik’s cube can be described in terms of the 26 small component cubes, which give
rise to the description frame Γ . Each arrangement of the cubes generates a specific configuration
ψ; the configuration set, Ψ , is very large. A configuration is a complete description of the positions
of the small cubes, so that it is unique. If Rubik’s cube is observed only partially, for instance by
looking only at one face, the observation corresponds to many configurations, each one obtained
by completing the invisible faces of the cube in a different way; in this case we have a P -Set P ,
which is a set of configurations. The query Q can be represented by a particular configuration to be
reached starting from an initial one

Fig. H.13 Application of method meth Pg , ωaggr ((figure, figure), tower) . Objects a and
b are aggregated to obtain object c1 , and objects c and d are aggregated to obtain object c2 . The
color of c1 is blue, because b is larger than a, whereas the color of c2 is green. Both composite
objects are large. The new object c1 is at the left of c2
452 Appendix H Color Figures

Fig. H.14 Examples of four structured objects, used to learn the concept of an “arch”. Each
component has a shape (rectangle or triangle) and a color (blue, red, yellow, or green). They are
linked by two relations, namely Rontop and Radjacent

(a)

(b) (c)

Fig. H.15 a Part of a map at 1/25000 scale. b A 16-fold reduction of the map. c Cartographic
generalization of the map at the 1/100 000 scale. By comparing b and c the differences between
simply reducing and generalizing are clearly apparent
Appendix H Color Figures 453

Fig. H.16 Application of method meth(Pg , ωeqattrval (Color, ΛColor ), Vid , v(a) to the figure
on the left. Let Vid = {olive-green, sea-green, lawn-green, light-green,
dark-green}. Objects o1 , o2 , and o3 have color dark-green, lawn-green, and
sea-green, respectively. After equating all shades of green to sea-green, the color of all
three considered objects becomes sea-green
References

1. A. Aamodt, E. Plaza, Case-based reasoning: foundational issues, methodological variations,

and system approaches. AI Comm. 7, 39–59 (1994)
2. R. Abbott et al., Bits don’t have error bars: Upward conceptualization and downward approx-
imation, in Philosophy and Engineering, ed. by I. van de Poel, D. Goldberg (Springer, Berlin,
2010), pp. 285–294
3. S. Abdallah, M. Plumbley, Predictive information, Multi-information, and Binding Informa-
tion. Technical Report C4DM-TR10-10, Centre for Digital Music, Queen Mary University of
London, (2010).
4. D. Achlioptas, L. Kirousis, E. Kranakis, D. Krinzac, M. Molloy, Y. Stamatiou, Random
constraint satisfaction: a more accurate picture. Lecture notes in Computer Science 1330,
107–120 (1997)
5. D. Aha, Incremental constructive induction: an instance-based approach. in Proceedings of
the 8th International Workshop on Machine Learning, (Evanston, USA, 1991), pp. 117–121.
6. H. Ajroud, A. Jaoua, Abstraction of objects by conceptual clustering. Inf. Sci. 109, 79–94
(1998)
7. C. Alexander, S. Ishikawa, M. Silverstein, M. Jacobson, I. Fiksdahl-King, S. Angel, A Pattern
Language-Towns, Buildings, Construction (Oxford University Press, New York, 1977)
8. J.F. Allen, Maintaining knowledge about temporal intervals. Commun. ACM 26, 832–843
(1983)
9. E. Alphonse, C. Rouveirol, Selective propositionalization for relational learning. Lect. Notes
Comput. Sci. 1704, 271–276 (1999)
10. E. Alphonse, C. Rouveirol, Lazy propositionalisation for relational learning. in Proceedings
of the 14th European Conference on Artificial Intelligence, (Berlin, Germany, 2000), pp.
256–260.
11. D. Alvaro, P. Pazo-Alvarez, A. Capilla, E. Amenedo, Oscillatory brain activity in the time
frequency domain associated to change blindness and change detection awareness. J. Cogn.
Neurosci. 24, 337–350 (2012)
12. E. Amaldi, V. Kann, On the approximability of minimizing nonzero variables or unsatisfied
relations in linear systems. Theor. Comput. Sci. 209, 237–260 (1998)
13. K. Amaratunga, R. Sudarshan, Multiresolution modeling with operator-customized wavelets
derived from finite elements. Comput. Methods Appl. Mech. Eng. 195, 2509–2532 (2006)
14. S. Amarel, On representations of problems of reasoning about actions. Mach. Intell. 3, 131–
171 (1968)

L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 455
DOI: 10.1007/978-1-4614-7052-6, © Springer Science+Business Media New York 2013
456 References

15. A. Amir, M. Lindenbaum, Grouping-based hypothesis verification in object recognition. in

Proceedings of the 13th Israeli Conference on Artificial Intelligence and Computer Vision,
(Tel-Aviv, Israel, 1998).
16. A. Amir, M. Lindenbaum, Quantitative analysis of grouping processes. IEEE Trans. PAMI
20, 168–185 (1998)
17. D. Andre, Learning hierarchical behaviors. in Proceedings of the NIPS Workshop on Abstrac-
tion and Hierarchy in Reinforcement Learning, (Amherst, USA, 1998).
18. D. Andre, S.J. Russell, State abstraction for programmable Reinforcement Learning agents.
in Proceedings of the 18th National Conference on Artificial Intelligence, (Menlo Park, USA,
2002), pp. 119–125.
19. G. Antonelli, The nature and purpose of numbers. J. Philos. 107, 191–212 (2010)
20. G. Antonelli, Numerical abstraction via the Frege quantifier. Notre Dame J. Formal Logic 51,
161–179 (2010)
21. N. Archer, M. Head, Y. Yuan, Patterns in information search for decision making: The effects
of information abstraction. Int. J. Hum. Comput. Stud. 45, 599–616 (1996)
22. A. Arenas, J. Duch, A. Fernández, S. Gómez, Size reduction of complex networks preserving
modularity. New J. Phys. 9, 176 (2007)
23. A. Arenas, A. Fernández, S. Gómez, Analysis of the structure of complex networks at different
resolution levels. New J. Phys. 10, 053039 (2008)
24. Aristotle, Introduction and commentary. Prior and Posterior Analytics, ed. by W.D. Ross,
(Oxford, UK, 1949).
25. M. Asadi, M. Huber, Action dependent state space abstraction for hierarchical learning sys-
tems. in Artificial Intelligence and Applications, ed. by M.H. Hamza, (IASTED/ACTA Press,
2005), pp. 421–426.
26. N. Ay, M. Müller, A. Szkola, Effective complexity of stationary process realizations. Entropy
13, 1200–1211 (2011)
27. F. Bacchus, Q. Yang, Downward refinement and the efficiency of hierarchical problem solving.
Artif. Intell. 71, 43–100 (1994)
28. C. Bäckström, Planning with abstraction hierarchies can be exponentially less efficient. in
Proceedings of the 14th Int Joint Conference on Artificial Intelligence, (Montreal, Canada,
1995), pp. 1599–1604.
29. A. Baker, Simplicity. in The Stanford Encyclopedia of Philosophy, ed. by E.N. Zalta (2010).
30. T. Ball, B. Cook, S. Lahiri, L. Zhang, Zapato: automatic theorem proving for predicate abstrac-
tion refinement. in Proceedings of the 16th Int. Conference on Computer-Aided Verification,
(Boston, USA, 2004), pp. 457–461.
31. D. Ballard, C. Brown, Computer Vision (Prentice Hall, New Jersey, 1982)
32. S. Bang, A hub-protein based visualization of large protein-protein interaction networks. in
Proceedings of the IEEE Conference of the Engineering in Medicine and Biology Society,
(Lyon, France, 2007), pp. 1217–1220.
33. J. Barron, J. Malik, Discovering efficiency in coarse-to-fine texture classification. Technical
Report UCB/EECS-2010-94, EECS Department, University of California, Berkeley (2010).
34. L. Barsalou, On the indistinguishability of exemplar memory and abstraction in category
representation. in ed. by T. Srull, R. Wyer Jr. Adavances in Social Cognition, Vol. III: Content
and Process Specificity in the Effects of Prior Experience, (Lawrence Erlbaum, Hillsdale,
1990), pp. 61–88.
35. L. Barsalou, Abstraction as dynamic interpretation in perceptual symbol systems. in ed. by L.
Gershkoff-Stowe, D. Rakison Building Object Categories, (Erlbaum, NJ, 2005), pp. 389–431.
36. L. Barsalou, Simulation, situated conceptualization, and prediction. Phil. Trans. Roy. Soc. B
364, 1281–1289 (2009)
37. L. Barsalou, K. Wiemer-Hastings, Situating abstract concepts, in Grounding Cognition: The
Role of Perception and Action in Memory, Language, and Thought, ed. by D. Pecher, R. Zwaan
(Cambridge University Press, New York, 2005), pp. 129–163
38. L. Barsalou, C. Wilson, W. Hasenkamp, On the vices of nominalization and the virtues of
contextualizing, in The Mind in Context, ed. by E. Smith (Guilford Press, NY, 2010), pp.
334–360
References 457

39. A.G. Barto, S. Mahadevan, Recent advances in hierarchical Reinforcement Learning. Discrete
Event Dyn. Syst. 13, 41–77 (2003)
40. J. Barwise, J. Seligman, Information Flow: The Logic of Distributed Systems (Cambridge
University Press, New York, 1997)
41. M. Basseville, A. Benveniste, K.C. Chou, S.A. Golden, R. Nikoukhah, A.S. Willsky, Modeling
and estimation of multiresolution stochastic processes. IEEE Trans. Inf. Theor. 38, 766–784
(1992)
42. M. Bassok, K. Dunbar, K. Holyoak, Introduction to the special section on the neural substrate
of analogical reasoning and metaphor comprehension. J. Exp. Psychol. Learn. Mem. Cogn.
38, 261–263 (2012)
43. J. Bauer, I. Boneva, M. Kurbán, A. Rensink, A modal-logic based graph abstraction. Lect.
Notes Comput. Sci. 5214, 321–335 (2008)
44. K. Bayer, M. Michalowski, B. Choueiry, C. Knoblock, Reformulating constraint satisfac-
tion problems to improve scalability. in Proceedings of the 7th International Symposium on
Abstraction, Reformulation, and Approximation, (Whistler, Canada, 2007), pp. 64–79.
45. A. Belussi, C. Combi, G. Pozzani, Towards a formal framework for spatio-temporal granular-
ities. in Proceedings of the 15th International Symposium on Temporal Representation and
Reasoning, (Montreal, Canada, 2008), pp. 49–53.
46. P. Benjamin, M. Erraguntla, D. Delen, R. Mayer, Simulation modeling and multiple levels of
abstraction. in Proceedings of the Winter Simulation Conference, (Piscataway, New Jersey,
1998), pp. 391–398.
47. J. Benner, The Ancient Hebrew Lexicon of the Bible (Virtualbookworm, College Station, 2005)
48. C. Bennett, Dissipation, information, computational complexity and the definition of organiza-
tion, in Emerging Syntheses in Science, ed. by D. Pines (Redwood City, USA, Addison-Wesley,
1987), pp. 215–234
49. C. Bennett, Logical depth and physical complexity. in ed. by R. Herken The Universal Turing
Machine: A Half-Century Survey, (Oxford University Press, Oxford, 2011), pp. 227–257.
50. A. Berengolts, M. Lindenbaum, On the performance of connected components grouping. Int.
J. Comput. Vis. 41, 195–216 (2001)
51. A. Berengolts, M. Lindenbaum, On the distribution of saliency. in Proceedings of IEEE
Conference on Computer Vision and Pattern Recognition, (Washington, USA, 2004), pp.
543–549.
52. F. Bergadano, A. Giordana, L. Saitta, Machine Learning: An Integrated Framework and its
Application (Ellis Horwood, Chichester, 1991)
53. G. Berkeley, Of the Principles of Human Knowledge. (Aaron Rahmes for Jeremy Pepyat,
Skynner Row, 1710)
54. S. Bertz, W. Herndon, The similarity of graphs and molecules, in Artificial Intelligence Appli-
cations to Chemistry, ed. by T. Pierce, B. Hohne (ACS, USA, 1986), pp. 169–175
55. T. Besold, Computational Models of Analogy-Making: An Overview Analysis of Compu-
tational Approaches to Analogical Reasoning, Ph.D. thesis (University of Amsterdam, NL,
2011).
56. C. Bessiere, P.V. Hentenryck, To be or not to be ... a global constraint. in Proceedings of the 9th
International Conference on Principles and Practices of Constraint Programming, (Kinsale,
Ireland, 2003).
57. W. Bialek, I. Nemenman, N. Tishby, Predictability, complexity, and learning. Neural Comput.
13, 2409–2463 (2001)
58. M. Biba, S. Ferilli, T. Basile, N.D. Mauro, F. Esposito, Induction of abstraction operators
using unsupervised discretization of continuous attributes. in Proceedings of The International
Conference on Inductive Logic Programming, (Santiago de Compostela, Spain, 2006), pp.
22–24.
59. I. Biederman, Recognition-by-components: a theory of human image understanding. Psychol.
Rev. 94, 115–147 (1987)
60. A. Bifet, G. Holmes, R. Kirkby, B. Pfahringer, Moa: Massive online analysis. J. Mach. Learn.
Res. 99, 1601–1604 (2010)
458 References

61. P. Binder, J. Plazas, Multiscale analysis of complex systems. Phys. Rev. E 63, 065203(R)
(2001).
62. J. Bishop, Data Abstraction in Programming Languages, (Addison-Wesley, Reading, 1986).
63. S. Bistarelli, P. Codognet, F. Rossi, Abstracting soft constraints: Framework, properties, exam-
ples. Artif. Intell. 139, 175–211 (2002)
64. A. Blum, P. Langley, Selection of relevant features and examples in machine learning. Artif.
Intell. 97, 245–271 (1997)
65. S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, D. Hwang, Complex networks: structure and
dynamics. Phys. Rep. 424, 175–308 (2006)
66. D. Bonchev, G. Buck, Quantitative measures of network complexity, in Complexity in Chem-
istry, Biology, and Ecology, ed. by D. Bonchev, D. Rouvray (Springer, USA, 2005), pp.
191–235
67. I. Boneva, A. Rensink, M. Durban, J. Bauer, Graph abstraction and abstract graph transfor-
mation (Centre for Telematics and Information Technology (University of Twente Enschede,
Technical report , 2007)
68. G. Booch, Object-Oriented Analysis and Design with Applications, (Addison-Wesley, Read-
ing, 2007).
69. M. Botta, A. Giordana, Smart+: A multi-strategy learning tool. in Proceedings of the 13th
International Joint Conference on Artificial Intelligence, (Chambéry, France, 1993), pp. 203–
207.
70. M. Botta, A. Giordana, L. Saitta, M. Sebag, Relational learning as search in a critical region.
J. Mach. Learn. Res. 4, 431–463 (2003)
71. P. Bottoni, L. Cinque, S. Levialdi, P. Musso, Matching the resolution level to salient image
features. Pattern Recogn. 31, 89–104 (1998)
72. I. Bournaud, M. Courtine, J.-D. Zucker, Propositionalization for clustering symbolic rela-
tional descriptions. in Proceedings of the 12th International Conference on Inductive Logic
Programming, (Szeged, Hungary, 2003), pp. 1–16.
73. E. Bourrel, V. Henn, Mixing micro and macro representations of traffic flow: a first theoretical
step. in Proceedings of the 9th Meeting of the Euro Working Group on, Transportation, pp.
10–13 (2002).
74. O. Bousquet, Apprentissage et simplicité. Diploma thesis, (Université de Paris Sud, Paris,
France, 1999), In French.
75. C. Boutilier, R. Dearden, M. Goldszmidt, Exploiting structure in policy construction. in Pro-
ceedings of the 14th International Joint Conference on Artificial Intelligence, (Montréal,
Canada, 1995), pp. 1104–1111.
76. C. Boutilier, R. Dearden, M. Goldszmidt, Stochastic dynamic programming with factored
representations. Artif. Intell. 121, 49–107 (2000)
77. J. Boyan, A. Moore, Generalization in reinforcement learning: safely approximating the value
function. Adv. Neural Inf. Process. Syst. 7, 369–376 (1995)
78. K. Brassel, R. Weibel, A review and conceptual framework of automated map generalization.
Int. J. Geogr. Inf. Syst. 2, 229–244 (1988)
79. N. Bredèche, Y. Chevaleyre, J.-D. Zucker, A. Drogoul, G. Sabah, A meta-learning approach
to ground symbols from visual percepts. Robot. Auton. Syst. 43, 149–162 (2003)
80. H. Brighton, C. Mellish, Advances in instance selection for instance-based learning algo-
rithms. Data Min. Knowl. Discov. 6, 153–172 (2002)
81. A. Brook, Approaches to abstraction: a commentary. Int. J. Educ. Res. 27, 77–88 (1997)
82. R. Brooks, Elephants don’t play chess. Robot. Auton. Syst. 6, 3–15 (1990)
83. R. Brooks, Intelligence without representation. Artif. Intell. 47, 139–159 (1991)
84. V. Bulitko, N. Sturtevant, J. Lu, T. Yau, Graph abstraction in real-time heuristic search. J.
Artif. Intell. Res. 30, 51–100 (2007)
85. N. Busch, I. Fruend, C. Herrmann, Electrophysiological evidence for different types of change
detection and change blindness. J. Cogn. Neurosci. 22, 1852–1869 (2010)
86. T.D.V. Cajetanus, De Nominum Analogia (1498) (Zammit, Rome (Italy, 1934)
References 459

87. T. Calders, R. Ng, J. Wijsen, Searching for dependencies at multiple abstraction levels. ACM
Trans. Database Syst. 27, 229–260 (2002)
88. G. Cantor, Contributions to the Founding of the Theory of Transfinite Numbers (Dover Pub-
lications, UK, 1915)
89. R. Cavendish, The Black Arts (Perigee Books, USA, 1967)
90. G. Chaitin, On the length of programs for computing finite binary sequences: statistical con-
siderations. J. ACM 16, 145–159 (1969)
91. D. Chalmers, R. French, D. Hofstadter, High-level perception, representation, and analogy: a
critique of Artificial Intelligence methodology. J. Exp. Theor. Artif. Intell. 4, 185–211 (1992)
92. V. Chandola, A. Banerjee, V. Kumar, Anomaly detection: a survey. ACM Comput. Survey 41,
1–58 (2009)
93. J. Charnley, S. Colton, I. Miguel, Automated reformulation of constraint satisfaction problems.
in Proceedings of the Automated Reasoning Workshop, (Bristol, UK, 2006), pp. 128–135.
94. A. Chella, M. Frixione, S. Gaglio, A cognitive architecture for artificial vision. Artif. Intell.
89, 73–111 (1997)
95. A. Chella, M. Frixione, S. Gaglio, Understanding dynamic scenes. Artif. Intell. 123, 89–132
(2000)
96. A. Chella, M. Frixione, S. Gaglio, Conceptual spaces for computer vision representations.
Artif. Intell. Rev. 16, 87–118 (2001)
97. C. Cheng, Y. Hu, Extracting the abstraction pyramid from complex networks. BMC Bioinform.
11, 411 (2010)
98. Y. Chevaleyre, F. Koriche, J.-D. Zucker, Learning linear classifiers with ternary weights
from metagenomic data. in Proceedings of the Conférence Francophone sur l’Apprentissage
Automatique, (Nancy, France, 2012), In French.
99. L. Chittaro, R. Ranon, Hierarchical model-based diagnosis based on structural abstraction.
Artif. Intell. 155, 147–182 (2004)
100. T. Chothia, D. Duggan, Abstractions for fault-tolerant global computing. Theor. Comput. Sci.
322, 567–613 (2004)
101. B. Choueiry, B. Faltings, R. Weigel, Abstraction by interchangeability in resource allocation.
in Proceedings of the 14th International Joint Conference on Artificial Intelligence, (Montreal,
Canada, 1995), pp. 1694–1701.
102. B. Choueiry, Y. Iwasaki, S. McIlraith, Towards a practical theory of reformulation for reason-
ing about physical systems. Artif. Intell. 162, 145–204 (2005)
103. B. Choueiry, A. Davis, Dynamic bundling: Less effort for more solutions. in Proceedings
of the 5th International Symposium on Abstraction, Reformulation and Approximation,
(Kananaskis, Alberta, Canada, 2002), pp. 64–82.
104. B. Choueiry, G. Noubir, On the computation of local interchangeability in discrete constraint
Satisfaction problems. in Proceedings of the 15th National Conference on Artificial Intelli-
gence, (Madison, USA, 1998), pp. 326–333.
105. J. Christensen, A hierarchical planner that creates its own hierarchies. in Proceedings of the
8th National Conference on Artificial Intelligence, (Boston, USA, 1990), pp. 1004–1009.
106. R. Cilibrasi, P. Vitànyi, Clustering by compression. IEEE Trans. Inform. Theor. 51, 1523–1545
(2005)
107. A. Cimatti, F. Giunchiglia, M. Roveri, Abstraction in planning via model checking. in Proceed-
ings of the 8th International Symposium on Abstraction, Reformulation, and Approximation,
(Asilomar, USA, 1998), pp. 37–41.
108. E. Clarke, B. Barton, Entropy and MDL discretization of continuous variables for Bayesian
belief networks. Int. J. Intell. Syst. 15, 61, 92 (2000).
109. E. Codd, Further normalization of the data base relational model. in Courant Computer Science
Symposium 6: Data Base Systems, (Prentice-Hall, Englewood Cliff, 1971), pp. 33–64.
110. W. Cohen, Fast effective rule induction. in Proceedings of the 12th International Conference
on Machine Learning, (Lake Tahoe, USA, 1995), pp. 115–123.
111. T. Colburn, G. Shute, Abstraction in computer science. Minds Mach. 17, 169–184 (2007)
460 References

112. E. Colunga, L. Smith, The emergence of abstract ideas: evidence from networks and babies.
Phil. Trans. Roy. Soc. B 358, 1205–1214 (2003)
113. L. Console, D. Theseider-Dupré, Abductive reasoning with abstraction axioms. Lect. Notes
Comput. Sci. 810, 98–112 (1994)
114. S. Cook, The complexity of theorem proving procedures. in Proceedings of the 3rd Annual
ACM Symposium on Theory of Computing, (Shaker Heights, USA, 1971), pp. 151–158.
115. S. Coradeschi, A. Saffiotti, Anchoring symbols to sensor data: preliminary report. in Pro-
ceedings of the 17th National Conference on Artificial Intelligence, (Austin, USA, 2000), pp.
129–135.
116. L. Costa, F. Rodrigues, A. Cristino, Complex networks: the key to systems biology. Genet.
Mol. Biol. 31, 591–601 (2008)
117. P. Cousot, R. Cousot, Basic concepts of abstract interpretation. Build. Inf. Soc. 156, 359–366
(2004)
118. V. Cross, Defining fuzzy relationships in object models: Abstraction and interpretation. Fuzzy
Sets Syst. 140, 5–27 (2003)
119. J. Crutchfield, N. Packard, Symbolic dynamics of noisy chaos. Physica D 7, 201–223 (1983)
120. W. Daelemans, Abstraction considered harmful: lazy learning of language processing. in
Proceedings of 6th Belgian-Dutch Conference on Machine Learning, (Maastricht, NL, 1996),
pp. 3–12.
121. L. Danon, J. Duch, A. Diaz-Guilera, A. Artenas, Comparing community structure identifica-
tion. J. Stat. Mech. Theor. Exp. P09008 (2005).
122. J. Davis, V. Costa, S. Ray, D. Page, An integrated approach to feature invention and model
construction for drug activity prediction. in Proceedings of the 24th International Conference
on Machine Learning, (Corvallis, USA, 2007), pp. 217–224.
123. P. Davis, R. Hillestad, Families of models that cross levels of resolution: issues for design,
calibration and management. in Proceedings of the 25th Conference on Winter Simulation,
(Los Angeles, USA, 1993), pp. 1003–1012.
124. R. Davis, Diagnostic reasoning based on structure and behavior. Artif. Intell. 24, 347–410
(1984)
125. F. de Goes, S. Goldenstein, M. Desbrun, L. Velho, Exoskeleton: curve network abstraction
for 3D shapes. Comput. Graph. 35, 112–121 (2011)
126. J. de Kleer, B. Williams, Diagnosing multiple faults. Artif. Intell. 32, 97–130 (1987)
127. M. de Vries, Engineering science as a “discipline of the particular”? Types of generalization
in Engineering sciences. in ed. by I. van de Poel, D. Goldberg Philosophy and Engineering:
An Emerging Agenda, (Springer, 2010), pp. 83–93.
128. T. Dean, R. Givan, Model minimization in Markov decision processes. in Proceedings of the
National Conference on Artificial Intelligence, (Providence, USA, 1997), pp. 106–111.
129. D. DeCarlo, A. Santella, Stylization and abstraction of photographs. ACM Trans. Graph. 21,
769–776 (2002)
130. M. Dehmer, L. Sivakumar, Recent developments in quantitative graph theory: information
inequalities for networks. PLoS ONE 7, e31395 (2012)
131. O. Dekel, S. Shalev-shwartz, Y. Singer, The Forgetron: A kernel-based perceptron on a fixed
budget. in In Advances in Neural Information Processing Systems 18, (MIT Press, 2005), pp.
259–266.
132. A. Delorme, G. Richard, M. Fabre-Thorpe, Key visual features for rapid categorization of
animals in natural scenes. Front. Psychol. 1, 0021 (2010)
133. D. Dennett, The Intentional Stance (MIT Press, Cambridge, 1987)
134. K. Devlin, Why universities require computer science students to take math. Commun. ACM
46, 37–39 (2003)
135. T. Dietterich, R. Michalski, Inductive learning of structural description. Artif. Intell. 16, 257–
294 (1981)
136. T. Dietterich, R. Michalski, A comparative review of selected methods for learning from
examples, in Machine Learning: An Artificial Intelligence Approach, ed. by J. Carbonell, R.
Michalski, T. Mitchell (Tioga Publishing, Palo Alto, 1983).
References 461

137. T. Dietterich, An overview of MAXQ hierarchical reinforcement learning. Lect. Notes Com-
put. Sci. 26–44, 2000 (1864)
138. T. Dietterich, Machine Learning for sequential data: A review. in Proceedings of the Joint
IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition,
(London, UK, 2002), pp. 15–30.
139. R. Dorat, M. Latapy, B. Conein, N. Auray, Multi-level analysis of an interaction network
between individuals in a mailing-list. Ann. Telecommun. 62, 325–349 (2007)
140. J. Dougherty, R. Kohavi, M. Sahami, Supervised and unsupervised discretization of contin-
uous features. in Proceedings of the 12th International Conference on Machine Learning,
(Tahoe City, USA, 1995), pp. 194–202.
141. G. Drastal, G. Czako, S. Raatz, Induction in an abstraction space. in Proceedings of the 11th
International Joint Conference on Artificial Intelligence, (Detroit, USA, 1989), pp. 708–712.
142. E. Dubinsky, Reflective abstraction in advanced mathematical thinking, in Advanced Mathe-
matical Thinking, ed. by D. Tall (Kluwer, Dordrecht, 1991), pp. 95–123
143. R. Durbin, S. Eddy, A. Krogh, G. Mitchison, Biological Sequence Analysis (Cambridge Uni-
versity Press, Cambridge, 1998)
144. P. Duygulu, M. Bastan, Multimedia translation for linking visual data to semantics in videos.
Mach. Vis. Appl. 22, 99–115 (2011)
145. S. Dzeroski, P. Langley, L. Todorovski, Computational discovery of scientific knowledge.
Lect. Notes Artif. Intell. 4660, 1–14 (2007)
146. T. Ellman, Synthesis of abstraction hierarchies for constraint satisfaction by clustering approx-
imately equivalent objects. in Proceedings of the 10th International Conference on Machine
Learning, (Amherst, USA, 1993), pp. 104–111.
147. F. Emmert-Streib, Statistic complexity: combining Kolmogorov complexity with an ensemble
approach. PLoS ONE 5, e12256 (2010)
148. F. Emmert-Streib, M. Dehmer, Exploring statistical and population aspects of network com-
plexity. PlosOne 7, e34523 (2012)
149. H. Enderton, A Mathematical Introduction to Logic, (Academic Press, 1972).
150. E. Engbers, M. Lindenbaum, A. Smeulders, An information-based measure for grouping
quality. in Proceedings of the European Conference on Computer Vision, (Prague, Czech
Republic, 2004), pp. 392–404.
151. D. Ensley, A hands-on approach to proof and abstraction. ACM SIGCSE Bull. 41, 45–47
(2009)
152. S. Epstein, X. Li, Cluster graphs as abstractions for constraint satisfaction problems. in Pro-
ceedings of the 8th International Symposium on Abstraction, Reformulation, and Approxi-
mation, (Lake Arrowhead, USA, 2009), pp. 58–65.
153. J. Euzenat, On a purely taxonomic and descriptive meaning for classes. in Proceedings of the
IJCAI Workshop on Object-Based Representation Systems, (Chambéry, France, 1993), pp.
81–92.
154. J. Euzenat, Représentation des connaissances: De l’ Approximation à la Confrontation. Ph.D.
thesis, (Université Joseph Fourier, Grenoble, France, 1999).
155. J. Euzenat, Granularity in relational formalisms with application to time and space represen-
tation. Comput. Intell. 17, 703–737 (2001)
156. J. Euzenat, A. Montanari, Time granularity, in Handbook of Temporal Reasoning in Artificial
Intelligence, ed. by M. Fisher, D. Gabbay, L. Vila (Elsevier, Amsterdam, 2005), pp. 59–118
157. P. Expert, T. Evans, V. Blondel, R. Lambiotte, Beyond space for spatial networks. PNAS 108,
7663–7668 (2010)
158. M. Fabre-Thorpe, Visual categorization: accessing abstraction in non-human primates. Phil.
Trans. Roy. Soc. B 358, 1215–1223 (2003)
159. B. Falkenhainer, K. Forbus, D. Gentner, The structure-mapping engine: algorithm and exam-
ples. Artif. Intell. 41, 1–63 (1989)
160. J. Fan, R. Samworth, Y. Wu, Ultrahigh dimensional feature selection: Beyond the linear model.
J. Mach. Learn. Res. 10, 2013–2038 (2009)
462 References

161. A. Feil, J. Mestre, Change blindness as a means of studying expertise in Physics. J. Learn.
Sci. 19, 480–505 (2010)
162. J. Feldman, How surprising is a simple pattern? Quantifying “Eureka!”. Cognition 93, 199–
224 (2004)
163. A. Felner, N. Ofek, Combining perimeter search and pattern database abstractions. in Proceed-
ings of the 7th International Symposium on Abstraction, Reformulation and Approximation,
(Whistler, Canada, 2007), pp. 155–168.
164. S. Ferilli, T. Basile, N.D. Mauro, F. Esposito, On the learnability of abstraction theories from
observations for relational learning. in Proceedings of European Conference on Machine
Learning, (Porto, Portugal, 2005), pp. 120–132.
165. G. Ferrari, Vedi cosa intendo?, in Percezione, liguaggio, coscienza, Saggi di filosofia della
mente, ed. by M. Carenini, M. Matteuzzi (Quodlibet, Macerata, Italy, 1999), pp. 203–224. In
Italian.
166. P. Ferrari, Abstraction in mathematics. Phil. Trans. Roy. Soc. B 358, 1225–1230 (2003)
167. R. Fikes, N. Nilsson, Strips: a new approach to the application of theorem proving to problem
solving. Artif. Intell. 2, 189–208 (1971)
168. K. Fine, The Limit of Abstraction (Clarendon Press, Oxford, 2002)
169. S. Fine, Y. Singer, N. Tishby, The hierarchical hidden markov model: analysis and applications.
Mach. Learn. 32, 41–62 (1998)
170. K. Fisher, J. Mitchell, On the relationship between classes, objects, and data abstraction.
Theor. Pract. Object Syst. 4, 3–25 (1998)
171. R. Fitch, B. Hengst, D. Šuc, G. Calbert, J. Scholz, Structural abstraction experiments in
Reinforcement Learning. Lect. Notes Artif. Intell. 3809, 164–175 (2005)
172. P. Flach, Predicate invention in inductive data Engineering. in Proceedings of the European
Conference on Machine Learning, (Wien, Austria, 1993), pp. 83–94.
173. P. Flach, N. Lavrač, The role of feature construction in inductive rule learning. in Proceedings
of the ICML Workshop on Attribute-Value and Relational Learning: Crossing the Boundaries,
(Stanford, USA, 2000), pp. 1–11.
174. P. Flener, U. Schmid, Predicate invention, in Encyclopedia of Machine Learning, ed. by C.
Sammut, G.I. Webb (Springer, USA, 2010), pp. 537–544
175. L. Floridi, The method of levels of abstraction. Minds Mach. 18, 303–329 (2008)
176. L. Floridi, J. Sanders, The method of abstraction, in Yearbook of the Artificial, vol. 2, ed. by
M. Negrotti (Peter Lang AG, Germany, 2004), pp. 178–220
177. G. Forman, An extensive empirical study of feature selection metrics for text classification.
J. Mach. Learn. Res. pp. 1289–1305 (2003).
178. S. Fortunato, C. Castellano, Community structure in graphs. Networks 814, 42 (2007)
179. A. Frank, A (Asuncion, UCI Machine Learning Repository, 2010)
180. A. Frank, An operational meta-model for handling multiple scales in agent-based simulations.
in Proceedings of the Dagstuhl Seminar, (Dagstuhl, Germany, 2012), pp. 1–6.
181. G. Frege, Rezension von E. Husserl: philosophie der Arithmetik. Zeitschrift fúr Philosophie
und Philosophische Kritik 103, 313–332 (1894).
182. E. Freuder, Eliminating interchangeable values in constraint satisfaction problems. in Proceed-
ings of the 9th National Conference of the American Association for Artificial Intelligence,
(Anaheim, USA, 1991), pp. 227–233.
183. E. Freuder, D. Sabin, Interchangeability supports abstraction and reformulation for multi-
dimensional constraint satisfaction. in Proceedings of 13th National Conference of the Amer-
ican Association for Artificial Intelligence, (Portland, USA, 1996), pp. 191–196.
184. G. Friedrich, Theory diagnoses: a concise characterization of faulty systems. in Proceedings of
the 13th International Joint Conference on Artificial Intelligence, (Chambéry, France, 1993),
pp. 1466–1471.
185. L. Frommberger, Qualitative Spatial Abstraction in Reinforcement Learning, (Springer, 2010)
186. M. Gabbrielli, S. Martini, Data abstraction, in Programming Languages: Principles and Par-
adigms, ed. by A. Tucker, R. Noonan (Springer, Heidelberg, 2010), pp. 265–276
References 463

187. U. Galassi, M. Botta, A. Giordana, Hierarchical hidden markov models for user/process profile
learning. Fundam. Informaticae 78, 487–505 (2007)
188. U. Galassi, A. Giordana, L. Saitta, Structured hidden markov models: a general tool for
modeling agent behaviors, in Soft Computing Applications in Business, ed. by B. Prasad
(Springer, Heidelberg, 2008), pp. 273–292
189. J. Gama, R. Sebastião, P. Rodrigues, Issues in evaluation of stream learning algorithms. in
Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, (New York, USA, 2009), pp. 329–338.
190. E. Gamma, R. Helm, R. Johnson, J. Vlissides, Design patterns: abstraction and reuse of
object-oriented design. Lect. Notes Comput. Sci. 707, 406–431 (1993)
191. E. Gamma, R. Helm, R. Johnson, J. Vlissides, Design Patterns (Elements of Reusable Object-
Oriented Software (Addison-Wesley Professional, Boston, 2005)
192. M. Garland, Multiresolution Modeling: Survey and Future Opportunities. Eurographics ’99-
State of the Art Reports, pp. 111–131 (1999).
193. P. Gärdenfors, Language and the evolution of cognition. in Lund University Cognitive Studies,
vol. 41. (Lund University Press, 1995).
194. P. Gärdenfors, Conceptual Spaces (MIT Press, Cambridge, 2004)
195. M. Gell-Mann, S. Lloyd, Effective complexity, in Nonextensive Entropy-Interdisciplinary
Applications, ed. by M. Gell-Mann, C. Tsallis (Oxford University Press, Oxford, 2003), pp.
387–398
196. I. Gent, T. Walsh, Phase transitions from real computational problems. in Proceedings of
the 8th International Symposium on Artificial Intelligence, (Monterrey, Mexico, 1995), pp.
356–364.
197. D. Gentner, Structure-mapping: a theoretical framework for analogy. Cogn. Sci. 7, 155–170
(1983)
198. D. Gentner, Analogical reasoning, Psychology, in Encyclopedia of Cognitive Science, ed. by
L. Nadel (Nature Publishing Group, London, 2003), pp. 106–112
199. D. Gentner, L. Smith, Analogical reasoning, in Encyclopedia of Human Behavior, ed. by V.S.
Ramachandran 2nd, edn. (Elsevier, Oxford, 2012), pp. 130–136
200. L. Getoor, B. Taskar, Introduction to Statistical Relational Learning (The MIT Press, Cam-
bridge, 2007)
201. C. Ghezzi, M. Jazayeri, D. Mandrioli, Fundamentals of Software Engineering, 2nd edn. (Pear-
son, NJ, 2003)
202. C. Ghidini, F. Giunchiglia, Local models semantics, or contextual reasoning = locality +
compatibility. Artif. Intell. 127, 221–259 (2001)
203. C. Ghidini, F. Giunchiglia, A semantic for abstraction. in Proceedings of 16th European Conf.
on Artificial Intelligence, (Valencia, Spain, 2004), pp. 338–342.
204. M. Gick, K. Holyoak, Analogical problem solving. Cogn. Psychol. 12, 306–355 (1980)
205. A. Gilpin, T. Sandholm, Lossless abstraction of imperfect information games. J. ACM 54,
1–32 (2007)
206. A. Giordana, G. Lobello, L. Saitta, Abstraction in propositional calculus. in Proceedings of
the Workshop on Knowledge Compilation and Speed Up Learning, (Amherst, USA, 1993),
pp. 56–64.
207. A. Giordana, G. Peretto, D. Roverso, L. Saitta, Abstraction: An alternative view of concept
acquisition, in Methodologies for Intelligent Systems, vol. 5, ed. by Z.W. Ras, M.L. Emrich
(Elsevier, New York, 1990), pp. 379–387
208. A. Giordana, L. Saitta, Abstraction: a general framework for learning. in Working Notes of the
AAAI Workshop on Automated Generation of Approximations and Abstractions, (Boston,
USA, 1990), pp. 245–256.
209. A. Giordana, L. Saitta, Phase transitions in relational learning. Mach. Learn. 41, 217–251
(2000)
210. A. Giordana, L. Saitta, D. Roverso, Abstracting concepts with inverse resolution. in Pro-
ceedings of the 8th International Machine Learning Workshop, (Evanston, USA, 1991), pp.
142–146.
464 References

211. P. Girard, R. Koenig-Robert, Ultra-rapid categorization of Fourier-spectrum equalized natural

images: Macaques and humans perform similarly. Plos One 6, e16453 (2011)
212. M. Girvan, M. Newman, Community structure in social and biological networks. PNAS 99,
7821–7826 (2002)
213. F. Giunchiglia, A. Villafiorita, T. Walsh, Theories of abstraction. AI Commun. 10, 167–176
(1997)
214. F. Giunchiglia, T. Walsh, A theory of abstraction. Artif. Intell. 57, 323–389 (1992)
215. R. Givan, T. Dean, M. Greig, Equivalence notions and model minimization in Markov decision
processes. Artif. Intell. 147, 163–223 (2003)
216. R. Goldstone, L. Barsalou, Reuniting perception and conception. Cognition 65, 231–262
(1998)
217. R. Golsdstein, V. Storey, Data abstraction: why and how? Data Knowl. Eng. 29, 293–311
(1999)
218. M. Gordon, G. Scantlebury, Non-random polycondensation: statistical theory of the substitu-
tion effect. Trans. Faraday Soc. 60, 604–621 (1964)
219. B. Gortais, Abstraction and art. Phil. Trans. Roy. Soc. B 358, 1241–1249 (2003)
220. F. Gosselin, P.G. Schyns, Bubbles: a technique to reveal the use of information in recognition
tasks. Vis. Res. 41, 2261–2271 (2001)
221. P. Grassberger, Toward a quantitative theory of self-generated complexity. J. Theor. Phys. 25,
907–938 (1986)
222. A. Grastien, G. Torta, A theory of abstraction for diagnosis of discrete-event systems. in Pro-
ceedings of the 9th International Symposium on Abstraction, Reformulation, and Approxi-
mation, (Cardona, Spain, 2011), pp. 50–57.
223. P. Grünwald, P. Vitànyi, Kolmogorov complexity and information theory. J. Logic Lang. Inf.
12, 497–529 (2003)
224. M. Grimaldi, P. Cunningham, A. Kokaram, An evaluation of alternative feature selection
strategies and ensemble techniques for classifying music. in Proceedings of the ECML Work-
shop on Multimedia Discovery and Mining, (Dubrovnik, Croatia, 1991).
225. C. Guestrin, M. Hauskrecht, B. Kveton, Solving factored MDPs with continuous and discrete
variables. in Proceedings of the 20th International Conference on Uncertainty in Artificial
Intelligence, (Portland, Oregon, USA, 2004), pp. 235–242.
226. C. Guestrin, D. Koller, R. Parr, Efficient solution algorithms for factored MDPs. J. Artif.
Intell. Res. 19, 399–468 (2003)
227. J. Guttag, Abstract data types and the development of data structures. Commun. ACM 20,
396–404 (1977)
228. I. Guyon, A. Elisseef, An introduction to feature extraction, in Feature Extraction, Foundations
and Applications, ed. by I. Guyon, S. Gunn, M. Nikravesh, L. Zadeh, Series Studies, in
Fuzziness and Soft Computing, (Springer, New York, 2005), pp. 1–24
229. I. Guyon, A. Elisseeff, An introduction to variable and feature selection. J. Mach. Learn. Res.
3, 1157–1182 (2003)
230. B. Hale, Abstract Objects (Basil Blackwell, Oxford, UK, 1987)
231. G. Halford, W. Wilson, S. Phillips, Abstraction: nature, costs, and benefits. Int. J. Educ. Res.
27, 21–35 (1997)
232. B. Hanczar, J.-D. Zucker, C. Henegar, L. Saitta, Feature construction from synergic pairs to
improve microarray-based classification. Bioinformatics 23, 2866–2872 (November 2007)
233. L. Harrie, R. Weibel, Modelling the overall process of Generalisation. in Generalisation of
Geographic Information: Cartographic Modelling and Applications, ed. by W. Mackaness,
A. Ruas, T. Sarjakoski (Elsevier, 2007), pp. 67–88.
234. D. Harry, D. Lindquist, Graph Abstraction Through Centrality Erosion and k-Clique Mini-
mization Technical Report (Olin College, Needham, USA, 2004)
235. T. Hartley, N. Burgess, Models of spatial cognition. in Encyclopedia of Cognitive Science,
(MacMillan, London, 2003), pp. 111–119.
236. L. Hartwell, J. Hopfield, S. Leibler, A. Murray, From molecular to modular cell biology.
Nature 402, C47–52 (1999)
References 465

237. P. Haslum, A. Botea, M. Helmert, B. Bonet, S. Koenig, Domain-independent construction

of pattern database heuristics for cost-optimal planning. in Proceedings of the 22nd National
Conference on Artificial Intelligence, (Vancouver, Canada, 2007), pp. 1007–1012.
238. T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining,
Inference and Prediction, 2nd edn. (Springer, New York, 2008)
239. X. He, S. Zemel, S. Richard, M. Carreira-Perpiñán. Multiscale conditional random fields
for image labeling. in Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, (Washington, USA, 2004), pp. 695–703.
240. G. Hegel, Phänomenologie des Geistes (Bamberg und Würzburg, Germany, 1807)
241. C. Helma, R. King, S. Kramer, The predictive toxicology Challenge 2000–2001. Bioinfor-
matics (Oxford, England) 17, 107–108 (2001).
242. M. Helmert, The fast downward planning system. J. Artif. Intell. Res. 26, 191–246 (2006)
243. M. Helmert, P. Haslum, J. Hoffmann, Explicit-state abstraction: A new method for generating
heuristic functions. in Proceedings of the 23rd National Conference on Artificial Intelligence,
vol. 3, (Chicago, USA, 2008), pp. 1547–1550.
244. T. Hendriks, The impact of independent model formation on model-based service interop-
erability. in Proceedings of 7th WSEAS International Conference on Artificial Intelligence,
Knowledge Engineering, and Data Bases, (Cambridge, UK, 2008), pp. 434–441.
245. C. Henegar, R. Cancello, S. Rome, H. Vidal, K. Clement, J. Zucker, Clustering biological
annotations and gene expression data to identify putatively co-regulated biological processes.
J. Bioinform. Comput. Biol. 4, 582–833 (2006)
246. B. Hengst, Safe state abstraction and reusable continuing subtasks in hierarchical Reinforce-
ment Learning. Lect. Notes Comput. Sci. 4830, 58–67 (2007)
247. B. Hengst, Generating hierarchical structure in Reinforcement Learning from state variables.
in Proceedings of the Pacific Rim International Conference on Artificial Intelligence, (Mal-
bourne, Australia, 2000), pp. 533–543.
248. P. Higgs, Broken symmetries, massless particles and gauge fields. Phys. Lett. 12, 132 (1964)
249. J. Hill, B. Houle, S. Merritt, A. Stix, Applying abstraction to master complexity: The com-
parison of abstraction ability in computer science majors with students in other disciplines.
in Proceedings of the 2nd International Workshop on The Role of Abstraction in Software
Engineering, (Leipzig, Germany, 2008), pp. 15–21.
250. H. Hirsh, N. Japkowicz, Bootstrapping training-data representations for inductive learning: a
case study in molecular biology. in Proceedings of the 12th National Conference on Artificial
Intelligence, (Seattle, Washington, USA, 1994).
251. C. Hoare, Notes on data structuring. in APIC Studies in Data Processing, Structured Program-
ming, vol. 8, (Academic Press, New York, 1972), pp. 83–174.
252. J. Hobbs, Granularity. in Proceedings of the 9th International Joint Conference on Artificial
Intelligence, (Los Angeles, USA, 1985), pp. 432–435.
253. V. Hodge, J. Austin, A survey of outlier detection methodologies. Artif. Intell. Rev. 22, 85–126
(2004)
254. J. Hoffmann, M. Zießler, The integration of visual and functional classification in concept
formation. Phychol. Res. 48, 69–78 (1986)
255. D. Hofstadter, Fluid Concepts and Creative Analogies (Harvester Wheatsheaf, London, 1995)
256. L. Holder, D. Cook, S. Djoko, Substructure discovery in the SUBDUE system. in Proceedings
of the AAAI Workshop on Knowledge Discovery in Databases, (Seattle, USA, 1994), pp.
169–180.
257. R. Holte, B. Choueiry, Abstraction and reformulation in artificial intelligence. Phil. Trans.
Roy. Soc. B 358, 1197–1204 (2003)
258. R. Holte, J. Grajkowski, B. Tanner, Hierarchical heuristic search. in Proceedings of the 6th
International Symposium on Abstraction, Approximation and Reformulation, (Airth Castle,
Scotland, UK, 2005), pp. 121–133.
259. R. Holte, T. Mkadmi, R. Zimmer, A. MacDonald, Speeding up problem solving by abstraction:
a graph oriented approach. Artif. Intell. 85, 321–361 (1996)
466 References

260. R. Holte, M. Perez, R. Zimmer, A. MacDonald, Hierarchical A∗ : Searching abstraction hier-

archies efficiently. in Proceedings of the National Conference on Artificial Intelligence, (Port-
land, USA, 1996), pp. 530–535.
261. K. Holyoak, P. Thagard, Mental Leaps: Analogy in Creative Thought (MIT Press, Cambridge,
1995)
262. A. Horn, On sentences which are true of direct unions of algebras. J. Symb. Logic 16, 14–21
(1951)
263. D. Hostadter, Analogy as the core of cognition, in The Analogical Mind: Perspectives from
Cognitive Science, ed. by D. Gentner, K.J. Holyoak, B.N. Kokinov (The MIT Press/Bradford
Book, Cambridge, 2001), pp. 499–538
264. Z. Hu, J. Mellor, J. Wu, M. Kanehisa, J. Stuart, C. DeLisi, Towards zoomable multidimensional
maps of the cell. Nat. Biotechnol. 25, 547–554 (2007)
265. D. Huang, W. Pan, Incorporating biological knowledge into distance-based clustering analysis
of microarray gene expression data. Bioinformatics 22, 1259–1268 (2006)
266. D. Huang, P. Wei, W. Pan, Combining gene annotations and gene expression data in model-
based clustering: weighted method. OMICS 10, 28–39 (2006)
267. J. Hunt, Guide to the Unified Process featuring UML, Java and Design Patterns (Springer,
Heidelberg, 2003)
268. E. Husserl, Philosophie der Arithmetik (Pfeffer, Halle, 1891)
269. T. Imielinski, Domain abstraction and limited reasoning. in Proceedings of the 10th Interna-
tional Joint Conference on Artificial Intelligence, (Milan, Italy, 1987), pp. 997–1003.
270. I. Inza, P. Larrañaga, R. Blanco, A. Cerrolaza, Filter versus wrapper gene selection approaches
in DNA microarray domains. Artif. Intell. Med. 31, 91–103 (2004)
271. E. Itkonen, Analogy as Structure and Process. Approaches in Linguistics, Cognitive Psychol-
ogy and Philosophy of Science, (John Benjamins, Amsterdam, 2005).
272. L. Itti, P. Baldi, Bayesian surprise attracts human attention. Vis. Res. 49, 1295–1306 (2009)
273. A. Jain, Data clustering: 50 years beyond k-Means. Pattern Recognit. Lett. 31, 651–666 (2010)
274. W. James, The Principles of Psychology (Dover Publications, New York, 1890)
275. N. Japkowicz, H. Hirsh, Towards a bootstrapping approach to constructive induction (In
Working Notes of the Workshop on Constructive Induction and Change of Representation,
New Brunswick, USA, 1994)
276. T. Joachims, Text categorization with support vector machines: learning with many relevant
features. Lect. Notes Comput. Sci. 1398, 137–142 (1998)
277. N.K. Jong, P. Stone, State abstraction discovery from irrelevant state variables. in Proceedings
of the 19th International Joint Conference on Artificial Intelligence, (Edinburgh, Scotland,
2005), pp. 752–757.
278. T. Kahle, E. Olbrich, J. Jost, N. Ay, Complexity measures from interaction structures. Phys.
Rev. E 79, 026201 (2009)
279. V. Kandinsky, Point et ligne sur le plan, (Éditions Gallimard, Paris, France, 1991), In French.
280. I. Kant, Die Kritik der reinen Vernunft, (Johann Friedrich Hartknoch, Riga, Germany, 1781),
In German
281. I. Kant, Critik der Urtheilskraft, (Lagarde und Friederich, Berlin, Germany, 1790), In German
282. E. Katsiri, A. Mycroft, Knowledge representation and scalable abstract reasoning for sentient
computing using First-Order Logic. in Proceedings 1st Workshop on Challenges and Novel
Applications for Automated Reasoning, (Miami, USA, 2002), pp. 73–82.
283. D. Kayser, Abstraction and natural language semantics. Phil. Trans. Roy. Soc. B 358, 1261–
1268 (2003)
284. S. Keller, On the use of case-based reasoning in generalization. in Proceedings of International
Conference on Spatial Data Handling, (Edinburgh, Scotland, 1994), pp. 1118–1132.
285. K. Khan, S. Muggleton, R. Parson, Repeat learning using predicate invention. in Proceedings
of the 8th International Workshop on Inductive Logic Programming, (Berlin, Germany, 1998),
pp. 165–174.
286. J. Kietz, K. Morik, A polynomial approach to the constructive induction of structural knowl-
edge. Mach. Learn. 14, 193–217 (1994)
References 467

287. R.D. King, A. Srinivasan, L. Dehaspe, Warmr: a data mining tool for chemical data. J. Comput.
Aided Mol. Des. 15, 173–181 (2001)
288. Y. Kinoshita, K. Nishizawa, An algebraic semantics of predicate abstraction for PML. Inform.
Media Technol. 5, 48–57 (2010)
289. C. Knoblock, Automatically generating abstractions for planning. Artif. Intell. 68, 243–302
(1994)
290. C. Knoblock, S. Minton, O. Etzioni, Integrating abstraction and explanation-based learning in
PRODIGY. in Proceedings of the 9th National Conference on Artificial Intelligence, (Menlo
Park, USA, 1991), pp. 541–546.
291. R. Kohavi, G. John, Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)
292. R. Kohavi, M. Sahami, Error-based and entropy-based discretization of continuous features.
in Proceedings of the 2nd Knowledge Discovery and Data Mining Conference, (Portland,
USA, 1996), pp. 114–119.
293. S. Kok, P. Domingos, Statistical predicate invention. in Proceedings of the 24th International
Conference on Machine Learning, (Corvallis, USA, 2007), pp. 433–440.
294. D. Koller, M. Sahami, Toward optimal feature selection. in Proceedings of the 13th Interna-
tional Conference on Machine Learning, (Bari, Italy, 1996), pp. 284–292.
295. A. Kolmogorov, Three approaches to the quantitative definition of information. Probl. Inf.
Trans. 1, 4–7 (1965)
296. M. Koppel, Complexity, depth, and sophistication. Complex Syst. 1, 1087–1091 (1987)
297. R. Korf, Toward a model of representation changes. Artif. Intell. 14, 41–78 (1980)
298. S. Kotsiantis, D. Kanellopoulos, Discretization techniques: a recent survey. GESTS Int. Trans.
Comput. Sci. Eng. 32, 47–58 (2006)
299. R. Kowalski, Logic for Problem-Solving (North-Holland Publising, Amsterdam, 1986)
300. O. Kozlova, O. Sigaud, C. Meyer, Texdyna: hierarchical Reinforcement Learning in factored
MDPs. Lect. Notes Artif. Intell. 6226, 489–500 (2010)
301. J. Kramer, Is abstraction the key to computing? Commun. ACM 50, 37–42 (2007)
302. S. Kramer, Predicate invention: A comprehensive view. Technical Report OFAI-TR-95-32,
Austrian Research Institute for Artificial Intelligence, (Vienna, 1995).
303. S. Kramer, N. Lavrač, P. Flach, Propositionalization approaches to relational data mining, in
Relational Data Mining, ed. by S. Dzeroski, N. Lavrač (Springer, Berlin, 2001), pp. 262–291
304. S. Kramer, B. Pfahringer, C. Helma, Stochastic propositionalization of non-determinate back-
ground knowledge. Lect. Notes Comput. Sci. 1446, 80–94 (1998)
305. M. Krogel, S. Rawles, F. Železný, P. Flach, N. Lavrač, S. Wrobel, Comparative evaluation of
approaches to propositionalization. in Proceedings of the 13th International Conference on
Inductive Logic Programming, (Szeged, Hungary, 2003), pp. 194–217.
306. Y. Kudoh, M. Haraguchi, Y. Okubo, Data abstractions for decision tree induction. Theor.
Comput. Sci. 292, 387–416 (2003)
307. P. Kuksa, Y. Qi, B. Bai, R. Collobert, J. Weston, V. Pavlovic, X. Ning, Semi-supervised
abstraction-augmented string kernel for multi-level bio-relation extraction. in Proceedings of
the European Conference on Machine Learning, (Barcelona, Spain, 2010), pp. 128–144.
308. M. Kurant, P. Thiran, Layered complex networks. Phys. Rev. Lett. 96, 138701 (2006)
309. R. López-Ruiz, Statistical complexity and Fisher-Shannon information: Applications, in K,
ed. by Statistical Complexity (New York, Sen (Springer, 2011), pp. 65–127
310. R. Lambiotte, Multi-scale modularity in complex networks. in it Proceedings of the 8th Inter-
national Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Net-
works (Avignon, France, 2010) pp. 546–553.
311. A. Lancichinetti, S. Fortunato, Consensus clustering in complex networks. Sci. Rep. 2, 336–
342 (2012)
312. A. Lancichinetti, S. Fortunato, J. Kertesz, Detecting the overlapping and hierarchical com-
munity structure of complex networks. New J. Phys. 11, 033015 (2009)
313. A. Lancichinetti, S. Fortunato, F. Radicchi, Benchmark graphs for testing community detec-
tion algorithms. Phys. Rev. E 78, 046110 (2008)
314. T. Lang, Rules for the robot droughtsmen. Geogr. Mag. 42, 50–51 (1969)
468 References

315. P. Langley, Scientific Discovery: Computational Explorations of the Creative Processes (MIT
Press, Cambridge, 1987)
316. P. Langley, The computer-aided discovery of scientific knowledge. Int. J. Hum-Comput. Stud.
53, 393–410 (2000)
317. Y. Lasheng, J. Zhongbin, L. Kang, Research on task decomposition and state abstraction in
Reinforcement Learning. Artif. Intell. Rev. 38, 119–127 (2012)
318. N. Lavrač, P. Flach, An extended transformation approach to Inductive Logic Programming.
ACM Trans. Comput. Log. 2, 458–494 (2001)
319. N. Lavrač, J. Fürnkranz, D. Gamberger, Explicit feature construction and manipulation for
covering rule learning algorithms, in Advances in Machine Learning I, ed. by J. Koronacki,
Z. Ras, S. Wierzchon (Springer, New York, 2010), pp. 121–146
320. N. Lavrač, D. Gamberger, P. Turney, A relevancy filter for constructive induction. IEEE Intell.
Syst. Their Appl. 13, 50–56 (1998)
321. H. Laycock, Notes to object. in, Stanford Encyclopedia of Philosophy, 2010.
322. A. Lazaric, M. Ghavamzadeh, R. Munos, Analysis of a classification-based policy iteration
algorithm. in Proceedings of the 27th International Conference on Machine Learning (Haifa,
Israel, 2010), pp. 607–614.
323. H. Leather, E. Bonilla, M. O’Boyle, Automatic feature generation for Machine Learning based
optimizing compilation. in Proceedings of International Symposium on Code Generation and
Optimization (Seattle, 2009), pp. 81–91.
324. C. Lecoutre, Constraint Networks: Techniques and Algorithms (Wiley, 2009).
325. E. Leicht, M. Newman, Community structure in directed networks. Phys. Rev. Lett. 100,
118703 (2008)
326. U. Leron, Abstraction barriers in Mathematics and Computer Science. in Proceedings of 3rd
International Conference on Logo and Math Education (Montreal, Canada, 1987).
327. D. Levin, D. Simons, Failure to detect changes to attended objects in motion pictures. Psychon.
Bull. Rev. 4, 501–506 (1997)
328. A. Levy, Creating abstractions using relevance reasoning. in Proceedings of the 12th National
Conference on Artificial Intelligence (Seattle, 1994), pp. 588–594.
329. D. Lewis, On the Plurality of Worlds (Basil Blackwell, Oxford, 1986)
330. L. Li, T. Walsh, M.L. Littman, Towards a unified theory of state abstraction for MDPs. in
Proceedings of the 9th International Symposium on Artificial Intelligence and Mathematics
(Fort Lauderdale, 2010), pp. 531–539.
331. M. Li, P. Vitànyi, An Introduction to Kolmogorov Complexity and its Applications, 2nd edn.
(Springer, New York, 1997)
332. S. Li, M. Ying, Soft constraint abstraction based on semiring homomorphism. Theor. Comput.
Sci. 403, 192–201 (2008)
333. B. Liskov, J. Guttag, Abstraction and Specification in Program Development (MIT Press,
Cambridge, 1986)
334. H. Liu, H. Motoda, On issues of instance selection. Data Min. Knowl. Discov. 6, 115–130
(2002)
335. H. Liu, H. Motoda, R. Setiono, Z. Zhao, Feature selection: An ever evolving frontier in Data
Mining. J. Mach. Learn. Res. 10, 4–13 (2010)
336. H. Liu, R. Setiono, Feature selection via discretization. IEEE Trans. Knowl. Data Eng. 9,
642–645 (1997)
337. H. Liu, F. Hussain, C.L. Tan, M. Dash, Discretization: An enabling technique. Data Min.
Knowl. Discov. 6, 393–423 (2002)
338. S. Lloyd, H. Pagels, Complexity as thermodynamic depth. Ann. Phys. 188, 186–213 (1988)
339. J. Locke, Essay Concerning Human Understanding (Eliz Holt for Thomas Basset, London,
1690)
340. R. López-Ruiz, H. Mancini, X. Calbet, A statistical measure of complexity. Phys. Lett. A 209,
321–326 (1995)
341. A. Lovett, K. Forbus, Modeling multiple strategies for solving geometric analogy problems.
in Proceedings of the 34th Annual Conference of the Cognitive Science Society (Sapporo,
Japan, 2012).
References 469

342. M. Lowry, The abstraction/implementation model of problem reformulation. in Proceedings

of the 10th International Joint Conference on Artificial Intelligence (Milan, Italy, 1987), pp.
1004–1010.
343. M. Lowry, Algorithm synthesis through problem reformulation. in Proceedings of the 6th
National Conf. on Artificial Intelligence (Seattle, 1987), pp. 432–436.
344. M. Lowry, Strata: Problem reformulation and ADT. in Proceedings of the 1st Workshop on
Change of Representation and Inductive Bias (Briarcliff, 1988), pp. 29–50.
345. M. Lowry, M. Subramaniam, Abstraction for analytic verification of concurrent software
systems. in Proceedings of the 8th Symposium on Abstraction, Reformulation, and Approx-
imation (Asilomar, 1998), pp. 85–94.
346. S. Lozano, A. Arenas, A. Sànchez, Community connectivity and heterogeneity: Clues and
insights on cooperation on social networks. J. Econ. Interact. Coord. 3, 183–199 (2008)
347. H. Lu, D. Chen, K. Holyoak, Bayesian analogy with relational transformations. Psychol. Rev.
119, 617–648 (2012)
348. D. Luebke, M. Reddy, J. Cohen, A. Varshney, B. Watson, R. Huebner, Level of Detail for 3D
Graphics (Morgan Kaufmann, 2003).
349. W. Mackaness, A. Ruas, L. Sarjakoski, Generalisation of Geographic Information: Carto-
graphic Modelling and Applications (Elsevier Science, Oxford, 2007)
350. S. Mannor, I. Menache, A. Hoze, U. Klein, Dynamic abstraction in Reinforcement Learning
via clustering. in Proceedings of the 21st International Conference on Machine Learning,
(Banff, Canada, 2004), pp. 71–78.
351. S. Markovitch, D. Rosenstein, Feature generation using general constructor functions. Mach.
Learn. 49, 59–98 (2002)
352. D. Marr, Vision: A Computational Investigation into the Human Representation and Process-
ing of Visual Information (W.H. Freeman and Company, New York, 1982)
353. D. Marr, H. Nishihara, Representation and recognition of the spatial organization of three-
dimensional shapes. Phil. Trans. Roy. Soc. B 200, 269–294 (1978)
354. L. Martin, C. Vrain, Systematic predicate invention in inductive logic programming. in Pro-
ceedings of the International Workshop on Inductive Logic Programming, (Prague, Czech
Republic, 1997), pp. 189–204.
355. K. Marx, A Contribution to the Critique of Political Economy (H. Kerr, Chicago, USA, 1904)
356. K. Marx, Foundations of the Critique of Political Economy (Harmondsworth, England, 1973)
357. C. Matheus, L. Rendell, Constructive induction on decision trees. in Proceedings of the 11th
International Joint Conference on Artificial Intelligence, (Detroit, MI, USA, 1989), pp. 645–
650.
358. M. Mazurowski, J. Malof, G. Tourassi, Comparative analysis of instance selection algorithms
for instance-based classifiers in the context of medical decision support. Phys. Med. Biol. 56,
473 (2010)
359. D. McDonald, A. Leung, W. Ferguson, T. Hussain, An abstraction framework for cooperation
among agents and people in a virtual world. in Proceedings of Conference on Artificial
Intelligence for the Interactive Digital Entertainment, (Marina del Rey, USA, 2006), pp. 54–
59.
360. R. McMaster, Knowledge acquisition for cartographic generalization : experimental methods,
in GIS and Generalization, ed. by J. Müller, R. Weibel, J. Lagrange (Taylor and Francis,
London, 1995), pp. 161–180
361. R. McMaster, K. Shea, Generalization in Digital Cartography (Association of American
Geographers, Washington, USA, 1992)
362. P. Mehra, L. Rendell, B. Wah, Principled constructive induction. in Proceedings of the 11th
International Joint Conference on Artificial Intelligence, (Detroit, USA, 1989), pp. 651–656.
363. F. Melo, S. Meyn, M. Ribeiro, An analysis of Reinforcement Learning with function approxi-
mation. in Proceedings of the 25th International Conference on Machine Learning, (Helsinki,
Finland, 2008), pp. 664–671.
364. C. Menschke, Robust elements in rough set abstractions. Lect. Notes Comput. Sci. 5548,
114–129
470 References

365. R. Michalski, R. Stepp, Revealing conceptual structure in data by inductive inference, in

Machine Intelligence, vol. 10, ed. by J. Hayes, D. Michie, Y. Pao (Chichester, UK, Horwood,
1982), pp. 173–196
366. R. Michalski, Pattern recognition as knowledge-guided computer induction. Technical Report
927, Department of Computer Science, (University of Illinois, Urbana-Champaign, 1978).
367. R. Michalski, Pattern recognition as a rule-guided inductive inference. IEEE Trans. Pattern
Anal. Mach. Intell. 2, 349–361 (1980)
368. R. Michalski, K. Kaufman, Learning patterns in noisy data: The AQ approach, in Machine
Learning and Its Applications, ed. by G. Paliouras, V. Karkaletsis, C. Spyropoulos (Springer,
New York, 2001), pp. 22–38
369. D. Michie, S. Muggleton, D. Page, A. Srinivasan, To the international computing community:
A new east-west challenge. in Research Report, (Oxford University Computing Laboratory,
Oxford, 1994).
370. L. Miclet, S. Bayoudh, A. Delhay, Analogical dissimilarity. J. Artif. Intell. Res. 32, 793–824
(2008)
371. J. Miles-Smith, D. Smith, Database abstraction: aggregation. Commun. ACM 20, 405–413
(1977)
372. J. Miles-Smith, D. Smith, Data base abstractions: aggregation and generalization. ACM Trans.
Database Syst. 2, 105–133 (1977)
373. J. Mill, A System of Logic (University Press of the Pacific, Honolulu, USA, 2002)
374. M. Minsky, Steps toward artificial intelligence. Proc. IRE 49, 8–30 (1961)
375. T.M. Mitchell, Machine Learning (McGraw-Hill, New York, 1997)
376. A. W. Moore, L. Baird, L.P. Kaelbling, Multi-value-functions: efficient automatic action
hierarchies for multiple goal MDPs. in Proceedings of the International Joint Conference on
Artificial Intelligence, (Stockholm, Sweden, 1999), pp. 1316–1323.
377. D. Morgan, The rise and fall of abstraction in the 18th Century art theory. Eighteenth-Century
Stud. 27, 449–478 (1994)
378. H. Motoda, H. Liu, Feature selection, extraction and construction. in Proceedings of the 6th
Pacific-Asian Conference on Knowledge Discovery and Data Mining, (Taipei, Taiwan, 2002),
pp. 67–72.
379. I. Mozetič, Hierarchical model-based diagnosis. Int. J. Man Mach. Stud. 35, 329–362 (1991)
380. J. Mpindi, H.S. Haapa-Paananen, S. Kilpinen, T. Pisto, E. Bucher, K. Ojala, K. Iljin, P. Vainio,
M. Björkman, S. Gupta, P. Kohonen, M. Nees, O. Kallioniemi, GTI: A novel algorithm for
identifying outlier gene expression profiles from integrated microarray datasets. PLoS ONE
6, e17259 (2011)
381. S. Muggleton, DUCE, an oracle-based approach to constructive induction. in Proceedings of
the 10th International Joint Conference on Artificial Intelligence, (Milan, Italy, 1987), pp.
287–292.
382. S. Muggleton (ed.), Inductive Logic Programming (Academic Press, London, UK, 1992)
383. S. Muggleton, W. Buntine, Machine invention of first-order predicates by inverting resolution.
in Proceedings of the 5th International Conference on Machine Learning, (Ann Arbor, USA,
1988), pp. 339–352.
384. S. Muggleton, L. de Raedt, Inductive logic programming: theory and methods. J. Logic
Program. 19, 629–679 (1994)
385. S. Muggleton, L. Raedt, D. Poole, I. Bratko, P. Flach, K. Inoue, A. Srinivasan, ILP turns 20.
Mach. Learn. 86, 3–23 (2011)
386. M. Mukherji, D. Kafura, A process-calculus-based abstraction for coordinating multi-agent
groups. Theor. Comput. Sci. 192, 287–314 (1998)
387. K. Murphy, M. Paskin, Linear time inference in hierarchical HMMs. Adv. Neural Inf. Process.
Syst. 14, 833–840 (2001)
388. S.K. Murthy, S. Kasif, S.L. Salzberg, A system for induction of oblique decision trees. J.
Artif. Intell. Res. 2, 1–32 (1994)
389. S. Mustière, GALBE: Adaptive generalization. The need for an adaptive process for automated
generalization: An example on roads. in Proceedings of the GISPlaNet Conference, (Lisbon,
Portugal, 1998).
References 471

390. S. Mustière, Apprentissage supervisé pour la généralisation cartographique. Ph.D. thesis,

(University Pierre et Marie Curie, Paris, France, 2001), In French.
391. S. Mustière, Cartographic generalization of roads in a local and adaptive approach: a knowl-
edge acquisition problem. Int. J. Geogr. Inf. Sci. 19, 937–955 (2005)
392. S. Mustière, L. Saitta, J.-D. Zucker, Abstraction in cartographic generalization. Lect. Notes
Artif. Intell. 638–644, 2000 (1932)
393. S. Mustière, J.-D. Zucker, L. Saitta, An abstraction-based machine learning approach to car-
tographic generalization. in Proceedings of the International Conference on Spatial Data
Handling, (Beijing, China, 2000), pp. 50–63.
394. L. Navarro, F. Flacher, V. Corruble, Dynamic level of detail for large scale agent-based urban
simulations. in Proceedings of the 10th International Conference on Autonomous Agents and
Multiagent Systems, (Taipei, Taiwan, 2011), pp. 701–708.
395. P. Nayak, A. Levy, A semantic theory of abstractions. in Proceedings of the 14th International
Joint Conference on Artificial Intelligence, (Montreal, Canada, 1995), pp. 196–203.
396. N. Neagu, S. Bistarelli, B. Faltings, Experimental evaluation of interchangeability in soft
CSPs. Lect. Notes Artif. Intell. 3010, 140–153 (2004)
397. A. Newell, Limitations of the current stock of ideas about problem solving. in Proceedings
of Conference on Electronic Information Handling, (Washington, USA, 1965), pp. 195–208.
398. A. Newell, Unified Theories of Cognition (Harvard University Press, Cambridge, 1990)
399. M. Newman, The structure and function of complex networks. SIAM Rev. 45, 167–256 (2003)
400. M. Newman, Fast algorithm for detecting community structure in networks. Phys. Rev. E 69,
066133 (2004)
401. T.A.N. Nguyen, J.-D. Zucker, N.H. Du, A. Drogoul, D.-A. Vo, A hybrid macro-micro pedes-
trians evacuation model to speed up simulation in road networks. Lect. Notes Comput. Sci.
7068, 371–383 (2011)
402. N. Nilsson, Artificial Intelligence: A New Synthesis (Morgan Kaufmann, San Francisco, USA,
1998)
403. H. Noonan, Count nouns and Mass nouns. Analysis 38, 167–172 (1978)
404. D. Nyerges, Representing geographic meaning, in Map Generalization, ed. by B. Buttenfield,
R. McMaster, H. Freeman (Essex, UK, Longman Scientific and Technical, Harlow , 1991),
pp. 59–85
405. D. O’Donoghue, M. Keane, A creative analogy machine: Results and challenges. in Proceed-
ings of the Internernational Conference on Computational Creativity, (Dublin, Ireland, 2012),
pp. 17–24.
406. K. Oehler, R. Gray, Combining image compression and classification using vector quantiza-
tion. IEEE Trans. Pattern Anal. Mach. Intell. 17, 461–473 (1995)
407. S. Oliveira, S. Seok, A multilevel approach to identify functional modules in a yeast protein-
protein interaction network. Lect. Notes Comput. Sci. 3992, 726–733 (2006)
408. C. Ortiz-Hill, Frege’s attack on Husserl and Cantor. Monist 77, 345–357 (1994)
409. F. Pachet, P. Roy, Analytical features: a knowledge-based approach to audio feature generation.
EURASIP J. Audio Speech Music Process. 2009, 1–23 (2009)
410. G. Pagallo, D. Haussler, Two algorithms that learn DNFs by discovering relevant features.
in Proceedings of the 6th International Workshop on Machine Learning, (Ithaca, New York,
USA, 1989), pp. 119–123.
411. B. Pang, R. Holte, Multimapping abstractions and hierarchical heuristic search. in Proceedings
of the 5th Symposium on Combinatorial Search, (Niagara Falls, Canada, 2012).
412. S. Pantazi, J. Arocha, J. Moehr, Case-based medical informatics. in Intelligent Paradigms
in Healthcare Enterprises, ed. by B. Silverman, A. Jain, A. Ichalkaranje, L. Jain (Springer,
2005), pp. 31–65.
413. R. Parr, A unifying framework for temporal abstraction in stochastic processes. in Proceed-
ings of the 8th International Symposium on Abstraction, Reformulation, and Approximation,
(Asilomar, USA, 1998), pp. 95–102.
414. Z. Pawlak, Rough sets. Int. J. Parallel Programm. 11, 341–356 (1982)
472 References

415. J. Pearl, On the connection between the complexity and the credibility of inferred models.
Int. J. Gen. Syst. 4, 255–264 (1978)
416. C. Perlich, F. Provost, Distribution-based aggregation for relational learning with identifier
attributes. Mach. Learn. 62, 65–105 (2006)
417. J. Piaget, Genetic epistemology (Columbia University Press, New York, 1968)
418. S. Piramuthu, R.T. Sikora, Iterative feature construction for improving inductive learning
algorithms. Expert Syst. Appl. 36, 3401–3406 (2009)
419. D. Plaisted, Theorem proving with abstraction. Artif. Intell. 16, 47–108 (1981)
420. Plato. oλιτ ια (Republic), 7.514a. 380 BC
421. J. Platt, Prediction of isomeric differences in paraffin properties. J. Phys. Chem. 56, 328–336
(1952)
422. C. Plazanet, Enrichissement des bases de données géographiques : Analyse de la géométrie
des objets linéaires pour la généralisation cartographique (Application aux routes). Ph.D.
thesis, (University Marne-la-Vallée, France, 1996), In French.
423. G. Plotkin, A further note on inductive generalization. in Machine Intelligence, vol 6, (Edin-
burgh University Press, 1971).
424. G. Polya, How to Solve It: A New Aspect of Mathematical Methods (Princeton University
Press, Princeton, 1945)
425. M. Ponsen, M. Taylor, K. Tuyls, Abstraction and generalization in reinforcement learning: a
summary and framework. Lect. Notes Comput. Sci. 5924, 1–32 (2010)
426. K. Popper, The Logic of Scientific Discovery (Harper Torch, New York, 1968)
427. M. Poudret, A. Arnould, J. Comet, P.L. Gall, P. Meseure, F. Képès, Topology-based abstraction
of complex biological systems: application to the Golgi apparatus. Theor. Biosci. 127, 79–88
(2008)
428. W. Prenninger, A. Pretschner, Abstraction for model-based testing. Electron. Notes Theore.
Comput. Sci. 116, 59–71 (2005)
429. A. Prieditis, Machine discovery of admissible heuristics. Mach. Learn. 12, 117–142 (1993)
430. E. Prifti, J.D. Zucker, K. Clement, C. Henegar, Interactional and functional centrality in
transcriptional co-expression networks. Bioinform. 26(24), 3083–3089 (2010)
431. P. Prosser, An empirical study of phase transitions in constraint satisfaction problems. Artif.
Intell. 81, 81–109 (1996)
432. G. Provan, Hierarchical model-based diagnosis. in Proceedings of the 12th International Work-
shop on Principles of Diagnosis, (Murnau, Germany, 2001), pp. 167–174.
433. J. Provost, B.J. Kuipers, R. Miikkulainen, Developing navigation behavior through self-
organizing distinctive state abstraction. Connection Sci. 18, 159–172 (2006)
434. L. Pyeatt, A. Howe, Decision tree function approximation in Reinforcement Learning. in
Proceedings of the 3rd International Symposium on Adaptive Systems: Evolutionary Com-
putation and Probabilistic Graphical Models, (Havana, Cuba, 2001), pp. 70–77.
435. Z. Pylyshyn, What the mind’s eye tells the mind’s brain: a critique of mental imagery. Psychol.
Bull. 80, 1–24 (1973)
436. Z. Pylyshyn, Computation and Cognition: Toward a Foundation for Cognitive Science (MIT
Press, Cambridge, 1984)
437. W. Quine, Word and Object (MIT Press, Cambridge, 1960)
438. J. Quinlan, R. Cameron-Jones, Induction of logic programs: Foil and related systems. New
Gen. Comput. 13, 287–312 (1995)
439. J.R. Quinlan, R.M. Cameron-Jones, Foil: A midterm report. Lect. Notes Comput. Sci. 667,
3–20 (1993)
440. R. Quinlan, Induction of decision trees. Mach. Learn. 1, 81–106 (1986)
441. L. Rabiner, A tutorial on Hidden Markov Models and selected applications in speech recog-
nition. Proc. IEEE 77, 257–286 (1989)
442. M. Ramscar, D. Yarlett, Semantic grounding in models of analogy: an environmental approach.
Cogn. Sci. 27, 41–71 (2003)
443. E. Ravasz, A. Barabasi, Hierarchical organization in complex networks. Phys. Rev. E 67,
026112 (2003)
References 473

444. B. Ravindran, A. Barto, Model minimization in hierarchical reinforcement learning. Lect.

Notes Comput. Sci. 2371, 196–211 (1985)
445. G. Rücker, C. Rücker, Substructure, subgraph, and walk counts as measures of the complexity
of graphs and molecules. J. Chem. Inf. Comput. Sci. 41, 1457–1462 (2001)
446. N. Regnauld, Généralisation du bâti: Structure spatiale de type graphe et représentation car-
tographique, Ph.D. thesis, (Université de Provence-Aix-Marseille 1, 1998), In French.
447. N. Regnauld, R. McMaster, A synoptic view of generalisation operators. Generalisation Geogr.
Inf. 37–66, 2007 (2007)
448. R. Reiter, On closed world data bases, in Logic and Data Bases, ed. by H. Gallaire, J. Minker
(Plenum Press, New York, 1978), pp. 119–140
449. R. Reiter, A theory of diagnosis from first principles. Artif. Intell. 32, 57–96 (1987)
450. L. Rendell, A scientific approach to practical induction, in Machine learning: A guide to
current research, ed. by T. Mitchell, J. Carbonell, R. Michalski (Kluwer Academic Publishers,
Norwell, USA, 1986), pp. 269–274
451. A. Rendl, I. Miguel, P. Gent, P. Gregory, Common subexpressions in constraint models of
planning problems. in Proceedings of the 8th International Symposium on Abstraction, Refor-
mulation, and Approximation, (Lake Arrowhead, USA, 2009), pp. 143–150.
452. A. Rensink, E. Zambon, Neighbourhood abstraction in GROOVE. Electron. Commun. EASST
32, 1–13 (2010)
453. R. Rensink, J. O’Regan, J. Clark, On the failure to detect changes in scenes cross brief
interruptions. Visual Cognition 7, 127–146 (2000)
454. L. Rising, Understanding the power of abstraction in patterns. IEEE Softw. 24, 46–51 (2007)
455. G. Roşu, Behavioral abstraction is hiding information. Theor. Comput. Sci. 327, 197–221
(2004)
456. P. Ronhovde, Z. Nussinov, Multiresolution community detection for megascale networks by
information-based replica correlations. Phys. Rev. E 80, 016109 (2009)
457. G. Rosen, Abstract objects. in The Stanford Encyclopedia of Philosophy, ed. by E. Zalta
(2009).
458. F. Rossi, K. Venable, T. Walsh, A Short Introduction to Preferences: Between Artificial Intel-
ligence and Social Choice (Morgan and Claypool Publishing, San Rafael, USA, 2011)
459. A. Ruas, Modèles de Généralisation de Données Géographiques: Base de Contraintes et
d’Autonomie, Ph.D. thesis, (University of Marne-la-Vallée, France, 1999), In French.
460. A. Ruas, Automatic Generalization Project: Learning Process from Interactive Generalization
(OEEPE Official Publication n. 39, 2001).
461. L. Sacchi, C. Larizza, C. Combi, R. Bellazzi, Data Mining with temporal abstractions: learning
rules from time series. Data Min. Knowl. Discov. 15, 217–247 (2007)
462. E. Sacerdoti, Planning in a hierarchy of abstration spaces. Artif. Intell. 5, 115–135 (1974)
463. M. Sachenbacher, P. Struss, Task-dependent qualitative domain abstraction. Artif. Intell. 162,
121–143 (2004)
464. S.D. Saeger, A. Shimojima, Channeling abstraction. in Proceedings of the International Sym-
posium on Abstraction, Reformulation and Approximation, (Whistler, Canada, 2007), pp.
133–147.
465. Y. Saeys, I. Inza, P. Larrañaga, A review of feature selection techniques in bioinformatics.
Bioinformatics 23, 2507 (2007)
466. L. Saitta, C. Henegar, J. Zucker, Abstracting complex interaction networks. in Proceedings of
the 8th Symposium on Abstraction, Reformulation, and Approximation, (Lake Arrowhead,
USA, 2009), pp. 821–825.
467. L. Saitta, P. Torasso, G. Torta, Formalizing the abstraction process in model-based diagnosis.
Lect. Notes Comput. Sci. 4612, 314–328 (2007)
468. L. Saitta, J.-D. Zucker, Semantic abstraction for concept representation and learning. in Pro-
ceedings of the International Symposium on Abstraction, Approximation and Reformulation,
(Pacific Grove, USA, 1998), pp. 103–120.
469. L. Saitta, J.-D. Zucker, Abstraction and phase transitions in relational learning. Lect. Notes
Comput. Sci. 1864, 291–302 (2000)
474 References

470. L. Saitta, J.-D. Zucker, A model of abstraction in visual perception. Int. J. Appl. Intell. 80,
134–155 (2001)
471. L. Saitta, J.-D. Zucker, Abstraction and complexity measures. Lect. Notes Comput. Sci. 4612,
375–390 (2007)
472. L. Saitta, C. Vrain, Abstracting Markov networks. in Proceedings of the AAAI Workshop on
Abstraction, Reformulation, and Approximation, (Atlanta, Georgia, USA, 2010).
473. L. Saitta (ed.), The abstraction paths. Special Issue of the Philos. Trans. Roy. Soc. B 358,
1435 (2003).
474. M. Sales-Pardo, R. Guimerà, A. Moreira, L.N. Amaral, Extracting the hierarchical organiza-
tion of complex systems. PNAS 104, 15224–15229 (2007)
475. C. Sammut (ed.), Encyclopedia of Machine Learning (Springer, New York, 2011)
476. J. Schlimmer, Learning and representation change. in Proceedings of the 6th National Con-
ference on, Artificial Intelligence, pp. 511–535 (1987).
477. J. Schmidhuber, Low-complexity art. J. Int. Soc. Arts Sci. Technol. 30, 97–103 (1997)
478. H. Schmidtke, W. Woo, A size-based qualitative approach to the representation of spatial gran-
ularity. in Proceedings of the 20th International Joint Conference on Artificial Intelligence,
(Bangalore, India, 2007), pp. 563–568.
479. R. Schrag, D. Miranker, Abstraction in the CSP phase transition boundary. in Proceedings of
the 4th International Symposium on Artificial Intelligence and Mathematics, (Ft. Lauderdale,
USA, 1995), pp. 126–133.
480. J. Seligman, From logic to probability. Lect. Notes Comput. Sci. 5363, 193–233 (2009)
481. R. Serna Oliver, I. Shcherbakov, G. Fohler, An operating system abstraction layer for portable
applications in wireless sensor networks. in Proceedings of the ACM Symposium on Applied
Computing, (Sierre, Switzerland, 2010), pp. 742–748.
482. A. Sfard, L. Linchevsky, The gains and pitfalls of reification-the case of alegbra. Educ. Stud.
Math. 26, 191–228 (1994)
483. C. Shannon, The mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423
(1948)
484. A. Sharpanskykh, Agent-based modeling and analysis of socio-technical systems. Cybern.
Syst. 42, 308–323 (2011)
485. S. Shekhar, C. Lu, P. Zhang, A unified approach to detecting spatial outliers. GeoInformatica
7, 139–166 (2003)
486. J. Shi, M. Littman, Abstraction methods for game theoretic poker. in Proceedings of the 2nd
International Conference on Computers and Games, (Hamamatsu, Japan, 2001), pp. 333–345.
487. J. Shiner, M. Davison, P. Landsberg, Simple measure of complexity. Phys. Rev. E 39, 1459–
1464 (1999)
488. G. Silverstein, M. Pazzani, Relational clichés: Constraining constructive induction during
relational learning. in Proceedings of the 8th International Workshop on Machine Learning,
(Evanston, USA, 1991), pp. 203–207.
489. G. Simmons, Shapes, Part Structure and Object Concepts, in Proceedings of the ECAI Work-
shop on Parts and Wholes: Conceptual Part-Whole Relationships and Formal Mereology
(Nederlands, Amsterdam, 1994)
490. H. Simon, The Sciences of the Artificial, 3rd edn. (MIT Press, Cambridge, 1999)
491. D. Simons, Current approaches to change blindness. Vis. Cogn. 7, 1–15 (2000)
492. D. Simons, C. Chabris, T. Schnur, Evidence for preserved representations in change blindness.
Conscious. Cogn. 11, 78–97 (2002)
493. Ö. Simsek, Workshop summary: abstraction in reinforcement learning. in Proceedings of the
International Conference on Machine Learning, (Montreal, Canada, 2009), p. 170.
494. M. Sizintsev, R. Wildes, Coarse-to-fine stereo vision with accurate 3D boundaries. Image Vis.
Comput. 28, 352–366 (2010)
495. B. Smith, M. Dyer, Locating the phase transition in binary constraint satisfaction problems.
Artif. Intell. 81, 155–181 (1996)
496. R.M. Smullyan, First-Order Logic (Dover Publications, Mineola, 1995)
References 475

497. N.N. Soja, S. Carey, E. Spelke, Ontological categories guide young children’s inductions of
word meaning: Object terms and substance terms. Cognition 38, 179–211 (1991)
498. R. Solomonoff, A formal theory of inductive inference-Part I. Inf. Contl. 7, 1–22 (1964)
499. R. Solomonoff, A formal theory of inductive inference-Part II. Inf. Contl. 7, 224–254 (1964)
500. J. Sowa, A. Majumdar, Conceptual structures for knowledge creation and communication. in
Proceedings of the International Conference on Conceptual Structures, (Dresden, Germany,
2012), pp. 17–24.
501. A. Srinivasan, S. Muggleton, M. Bain, Distinguishing exceptions from noise in non-monotonic
learning. in Proceedings of the 2nd International Workshop on Inductive Logic Programming,
(Tokyo, Japan, 1992), pp. 203–207.
502. S. Srivastava, N. Immerman, S. Zilberstein, Abstract planning with unknown object quantities
and properties. in Proceedings of the 8th Symposium on Abstraction, Reformulation, and
Approximation, (Lake Arrowhead, USA, 2009), pp. 143–150.
503. M. Stacey, C. McGregor, Temporal abstraction in intelligent clinical data analysis: a survey.
Artif. Intell. Med. 39, 1–24 (2007)
504. I. Stahl, Predicate invention in ILP-An overview. in Proceedings of the European Conference
on Machine Learning, (Vienna, Austria, 1993), pp. 311–322.
505. I. Stahl, The appropriateness of predicate invention as bias shift operation in ILP. Mach. Learn.
20, 95–117 (1995)
506. F. Staub, E. Stern, Abstract reasoning with mathematical constructs. Int. J. Educ. Res. 27,
63–75 (1997)
507. M. Stefik, Planning with constraints (MOLGEN: Part 1). Artif. Intell. 16, 111–139 (1981)
508. M. Stolle, D. Precup, Learning options in reinforcement learning. Lect. Notes Comput. Sci.
2371, 212–223 (2002)
509. J.V. Stone, Computer vision: What is the object? in Proceedings of the Artificial Intelligence
and Simulation of Behaviour Conference (Birmingham, UK, 1993), pp. 199–208
510. P. Struss, A. Malik, M. Sachenbacher, Qualitative modeling is the key to automated diagnosis.
in Proceedings of the 13st World Congress of the International Federation of Automatic
Control, (San Francisco, USA, 1996).
511. S. Stylianou, M.M. Fyrillas, Y. Chrysanthou, Scalable pedestrian simulation for virtual cities.
in Proceedings of the ACM Symposium on Virtual Reality software and technology, (New
York, USA, 2004), pp. 65–72.
512. D. Subramanian, A theory of justified reformulations. in Change of Representation and Induc-
tive Bias, ed. by P. Benjamin (Kluwer Academic Press, 1990), pp. 147–168.
513. D. Subramanian, R. Greiner, J. Pearl, The relevance of relevance (editorial). Artif. Intell. 97,
1–2 (1997)
514. S. Sun, N. Wang, Formalizing the multiple abstraction process within the G-KRA model
framework. in Proceedings of the International Conference on Intelligent Computing and
Integrated Systems, (Guilin, China, 2010), pp. 281–284.
515. S. Sun, N. Wang, D. Ouyang, General KRA abstraction model. J. Jilin Univ. 47, 537–542
(2009). In Chinese.
516. R. Sutton, D. Precup, S. Singh, Between MDPs and semi-MDPs: a framework for temporal
abstraction in reinforcement learning. Artif. Intell. 112, 181–211 (1999)
517. R. S. Sutton, E. J. Rafols, A. Koop, Temporal abstraction in temporal-difference networks. in
Proceedings of the NIPS-18, (Vancouver, Canada, 2006), pp. 1313–1320.
518. R. Sutton, Generalization in Reinforcement Learning: Successful examples using sparse
coarse coding. Advances in Neural Information Processing Systems, pp. 1038–1044 (1996).
519. R. Sutton, A. Barto, Reinforcement Learning (MIT Press, Cambridge, 1998)
520. R. Sutton, D. McAllester, S. Singh, Y. Mansour, Policy gradient methods for Reinforcement
Learning with function approximation. Adv. NIPS 12, 1057–1063 (2000)
521. A. Swearngin, B. Choueiry, E. Freuder, A reformulation strategy for multi-dimensional CSPs:
The case study of the SET game. in Proceedings of the 9th International Symposium on
Abstraction, Reformulation and Approximation, (Cardona, Spagna, 2011), pp. 107–116.
476 References

522. B. Sylvand, Une brève histoire du concept de “concept”. Ph.D. thesis, (Université La Sorbonne,
Paris, France, 2006), In French.
523. C. Szepesvári, Algorithms for Reinforcement Learning, (Morgan & Claypool, 2010).
524. M.E. Taylor, P. Stone, Transfer learning for reinforcement learning domains: a survey. J.
Mach. Learn. Res. 10, 1633–1685 (2009)
525. J. Tenenberg, Abstraction in Planning, Ph.D. thesis, (Universtiy of Rochester, USA, 1988).
526. J. Tenenberg, Preserving consistency across abstraction mappings. in Proceedings 10th Inter-
national Joint Conference on Artificial Intelligence (Milan, Italy, 1987), pp. 1011–1014.
527. B. ter Haar Romeny, Designing multi-scale medical image analysis algorithms. in Proceedings
of the International Conference on Pattern Recognition (Tutorial) (Istanbul, Turkey, 2010).
528. C. Thinus-Blanc, Animal Spatial Cognition (World Scientific Publishing, Singapore, 1996)
529. R. Tibshirani, Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B
(Methodological) 58, 267–288 (1996)
530. A. Tollner-Burngasser, M. Riley, W. Nelson, Individual and team susceptibility to change
blindness. Aviation Space Environ. Med. 81, 935–943 (2010)
531. P. Torasso, G. Torta, Automatic abstraction of time-varying system models for model based
diagnosis. Lect. Notes Artif. Intell. 3698, 176–190 (2005)
532. G. Torta, P. Torasso, Automatic abstraction in component-based diagnosis driven by system
observability. in Proceedings of the 18th International Joint Conference on Artificial Intelli-
gence, (Acapulco, Mexico, 2003), pp. 394–400.
533. G. Torta, P. Torasso, Qualitative domain abstractions for time-varying systems: an approach
based on reusable abstraction fragments. in Proceedings of the 17th International Workshop
on Principles of Diagnosis (Peñaranda de Duero, Spain, 2006), pp. 265–272.
534. V. Truppa, E.P. Mortari, D. Garofoli, S. Privitera, E. Visalberghi, Same/different concept
learning by Capuchin monkeys in matching-to-sample tasks. PLoS One 6, e23809 (2011)
535. J. Tsitsiklis, B. Van Roy, An analysis of temporal-difference learning with function approxi-
mation. IEEE Trans. Autom. Contl. 42, 674–690 (1997)
536. P. Turney, A uniform approach to analogies, synonyms, antonyms, and associations. in Pro-
ceedings of the International Conference on Computational Linguistics, vol. 1, (Manchester,
UK, 2008), pp. 905–912.
537. E. Tuv, A. Borisov, G. Runger, K. Torkkola, Feature selection with ensembles, artificial
variables, and redundancy elimination. J. Mach. Learn. Res. 10, 1341–1366 (2009)
538. B. Tversky, K. Hemenway, Objects, parts, and categories. J. Exp. Phychol. Gen. 113, 169–193
(1984)
539. J. Ullman, Principles of Databases (Computer Science, Baltimore, 1982)
540. J. Ullman, Implementation of logical query languages for databases. ACM Trans. Database
Syst. 10, 298–321 (1985)
541. S. Ullman, Visual routines. Cognition 18, 97–159 (1984)
542. P.E. Utgoff, D.J. Stracuzzi, Many-layered learning. Neural Comput. 14, 2497–2529 (2002)
543. R. Valdés-Pérez, Principles of human-computer collaboration for knowledge discovery in
science. Artif. Intell. 107, 335–346 (1999)
544. M. Valtorta, A result on the computational complexity of heuristic estimates for the A∗
algorithm. Inf. Sci. 34, 48–59 (1984)
545. D. van Dalen, Logic and Structure, 4th edn. (Springer, New York, 2004)
546. P. Vitányi, Meaningful information. in Proceedings of the 13th International Symposium on
Algorithms and Computation, (Vancouver, Canada, 2002), pp. 588–599.
547. D. Vo, A. Drogoul, J.-D. Zucker, An operational meta-model for handling multiple scales in
agent-based simulations. in Proceedings of the International Conference on Computing and
Communication Technologies, Research, Innovation, and Vision for the Future (Ho Chi Minh
City, Vietnam, 2012), pp. 1–6.
548. P. Vogt, The physical symbol grounding problem. Cogn. Syst. Res. 3, 429–457 (2002)
549. F. Wang, On the abstraction of conventional dynamic systems: from numerical analysis to
linguistic analysis. Inf. Sci. 171, 233–259 (2005)
References 477

550. N. Wang, D. Ouyang, S. Sun, Formalizing ontology-based hierarchical modeling process of

physical world. Lect. Notes Comput. Sci. 6319, 18–24 (2010)
551. N. Wang, D. Ouyang, S. Sun, Hierarchical abstraction process in model-based diagnosis.
Chinese J. Comput. 34, 383–394 (2011). In Chinese.
552. S. Watanabe, Knowing and Guessing: Quantitative Study of Inference and Information (Wiley,
New York, 1969)
553. R. Weibel, D. Burghardt, On-the-fly generalization. Encyclopedia of GIS (Springer, New
York, 2008), pp. 339–344.
554. R. Weibel, S. Keller, T. Reichenbacher, Overcoming the knowledge acquisition bottleneck
in map generalization: the role of interactive systems and computational intelligence. in Pro-
ceedings of the Conference on Spatial Information Theory (Semmering, Austria, 1995), pp.
139–156.
555. W. Weigel, B. Faltings, B. Choueiry, Context in discrete Constraint Satisfaction Problems. in
Proceedings of the European Conference on Artificial Intelligence (Budapest, Hungay, 1996),
pp. 205–209.
556. H. Welling, Four mental operations in creative cognition: the importance of abstraction. Cre-
ativity Res. J. 19, 163–177 (2007)
557. T. Werschlein, R. Weibel, Use of neural networks in line generalisation. in Proceedings of 5th
European Conference and Exhibition on Geographical Information Systems (Paris, France,
1994), pp. 77–85.
558. M. Wertheimer, über Gestalttheorie. Philosophische Zeitschrift für Forschung und
Aussprache, 1:39–60 (1925). In German.
559. B. Weslake, Explanatory depth. Philos. Sci. 77, 273–294 (2010)
560. J. Weston, A. Bordes, L. Bottou. Online (and offline) on an even tighter budget. in Proceedings
of the 10th International Workshop on Artificial Intelligence and, Statistics pp. 413–420
(2005).
561. C. Williams, T. Hogg, Exploiting the deep structure of constraint problems. Artif. Intell. 70,
73–117 (1994)
562. D. Wilson, T. Martinez, Reduction techniques for instance-based learning algorithms. Mach.
Learn. 38, 257–286 (2000)
563. P. Winston, Learning structural descriptions from examples. in The Psychology of Computer
Vision, ed. by P. Winston (McGraw-Hill, 1975), pp. 157–209.
564. R. Wirth, Completing logic programs by inverse resolution. in Proceedings of the 4th European
Working Session on Learning (Montpellier, France, 1989), pp. 239–250.
565. I. Witten, E. Frank, M. Hall, Data Mining: Practical Machine Learning Tools and Techniques,
3rd edn. (Morgan Kaufman, 2011).
566. J. Wneck, R. Michaslki, Hypothesis-driven constructive induction in AQ17-HCI: a method
and experiments. Mach. Learn. 14, 139–168 (1994)
567. J. Wogulis, P. Langley, Efficiency by learning intermediate concepts. in Proceedings of the
6th International Conference on Machine Learning (Ithaca, USA, 1989), pp. 78–80.
568. D. Wolpert, W. Macready, Self-dissimilarity: An empirically observable complexity measure.
in Proceedings of the International Conference on Complex Systems (Nashua, USA, 1997),
pp. 625–643.
569. C. Wright, Frege’s Conception of Numbers as Objects (Aberdeen University Press, Aberdeen,
Scotland, 1983)
570. K. Xu, W. Li, Exact phase transitions in random constraint satisfaction problems. J. Artif.
Intell. Res. 12, 93–103 (2000)
571. K. Xu, W. Li, Many hard examples in exact phase transitions. Theor. Comput. Sci. 355,
291–302 (2006)
572. B. Yang, M. Zhang, G. Xie, Abstract interpretation theory and its application. Comput. Eng.
Appl. 46, 16–20 (2010)
573. J. Yang, W. Wang, P. Wu, Discovering high-order periodic patterns. Knowl. Inf. Syst. 6,
243–268 (2004)
478 References

574. Q. Yang, Intelligent Planning: A Cecomposition and Abstraction Based Approach (Springer,
1997).
575. K. Yip, F. Zhao, Spatial aggregation: theory and applications. J. Artif. Intell. Res. 5, 1–26
(1996)
576. L. Zadeh, Fuzzy sets. Inf. Contl. 8, 338–353 (1965)
577. L. Zadeh, The concept of a linguistic variable and its application to approximate reasoning-I.
Inf. Sci. 8, 199–249 (1975)
578. M. Záková, F. Zelezný, Exploiting term, predicate, and feature taxonomies in propositional-
ization and propositional rule learning. Lect. Notes Comput. Sci. 4701, 798–805 (2007)
579. S. Zeki, The visual image in mind and brain. Sci. Am. 267, 68–76 (1992)
580. S. Zeki, A Vision of the Brain (Blackwell, Oxford, 1993)
581. S. Zeki, Inner Vision (Oxford University Press, Oxford, 1999)
582. S. Zeki, Splendors and Miseries of the Brain (Wiley-Blackwell, Oxford, 2009)
583. F. Železný, N. Lavrač, Propositionalization-based relational subgroup discovery with RSD.
Mach. Learn. 62, 33–63 (2006)
584. C. Zeng, S. Arikawa, Applying inverse resolution to EFS language learning. in Proceedings
of the International Conference for Young Computer Scientists (Shanghai, China, 1999), pp.
480–487.
585. S. Zhang, X. Ning, X. Zhang, Graph kernels, hierarchical clustering, and network community
structure: experiments and comparative analysis. Eur. Phys. J. B 57, 67–74 (2007)
586. F. Zhou, S. Mahler, H. Toivonen, Review of network abstraction techniques. in Proceedings of
the ECML Workshop on Explorative Analytics of Information Networks (Bristol, UK, 2009).
587. S. Zilles, R. Holte, The computational complexity of avoiding spurious states in state space
abstraction. Artif. Intell. 174, 1072–1092 (2010)
588. R. Zimmer, Abstraction in art with implications for perception. Phil. Trans. Roy. Soc. B 358,
1285–1291 (2003)
589. L. Zuck, A. Pnueli, Model checking and abstraction to the aid of parameterized systems (a
survey). Comput. Lang. Syst. Struct. 30, 139–169 (2004)
590. J.-D. Zucker, A grounded theory of abstraction in artificial intelligence. Phil. Trans. Roy. Soc.
Lond. B 358, 1293–1309 (2003)
591. J.-D. Zucker, J.-G. Ganascia, Selective reformulation of examples in concept learning. in
Proceedings of the 11th International Conference on Machine Learning (New Brunswick,
USA, 1994), pp. 352–360.
592. J.-D. Zucker, J.-G. Ganascia, Changes of representation for efficient learning in structural
domains. in Proceedings of the 13th International Conference on Machine Learning (Bari,
Italy, 1996), pp. 543–551.
Index

A Analogical reasoning, 389, 391–393

Abduction, 389 Aristotle, 12, 20, 338, 390
Abductive, 68 Artificial Intelligence, 2–4, 7, 9, 11, 32, 49,
Abstraction 236, 237, 326, 363, 364, 372, 373, 391,
composition, 69, 204 395
definition, 1–4, 6, 8, 51, 65, 69, 71, 164, Association, 1, 33, 43, 84, 143, 237, 251, 253,
265, 273, 408 256, 257, 375, 393
learning, 277 Axiomatic, 50, 51
mathematical, 12
metaphysical, 12
operators, 8, 13, 24, 27, 44, 168, 171, 172, B
174, 175, 177, 179, 180, 186, 198, 199, Backtrack, 63
203, 213, 218, 219, 221, 222, 227, 228, Backward, 26, 277
230, 237, 249, 256, 257, 264, 303, 312, Barsalou, 1, 3, 14, 16, 19, 32, 33, 151, 266,
316, 327, 363, 371, 379, 382, 384399, 407
402, 409–411 Bayes, 302, 326, 339, 340, 392
pattern, 218–220 Behavioral abstraction, 21, 22, 186
principle, 15, 22, 70, 71, 84, 186 Behavioral equivalence
process , 10, 11, 18, 21, 22, 45, 55, 165, Bisimulation, 299
171–174, 191, 203, 204, 221, 227, 230, Blurring, 33, 266
236–238, 251, 253, 263, 297, 318, 357,
370, 396, 404
truth, 263 C
type, 32 Cartographic generalization, 274, 373, 374,
Abstraction mapping, 67, 239, 248, 264, 394 378
Abstractness, 18, 20, 21 Categorization, 4, 7, 8, 14, 19, 32, 34, 35, 267
ABSTRIPS, 55, 56 Channel, 3, 50, 266, 391
Aggregate, 27, 36, 66, 79–84, 169, 172, 174, Choueiry, 55, 59, 270, 395
195, 196, 208, 209, 237, 238, 254, Clause, 50–52, 264, 265, 291
257–259, 267, 293, 297, 299–301, 317, Clause mapping, 52
324, 325, 364, 369, 385–387, 401 Clustering, 59, 274, 277, 335–337, 386
Aggregation Cognition, 3, 13, 22, 26, 31, 33–37, 40, 43, 44,
spatial, 36 151, 174, 238, 266–268, 278, 287, 385,
Aggregation operator, 208, 209, 257, 266, 318, 389
319, 364, 371, 384, 386, 387, 410 Complex systems, 8, 329, 330, 353
AI. See Artificial Intelligence Complexity measure, 332, 341, 345, 349, 353,
ALPINE, 57 405

L. Saitta and J.-D. Zucker, Abstraction in Artificial Intelligence and Complex Systems, 479
DOI: 10.1007/978-1-4614-7052-6, Ó Springer Science+Business Media New York 2013
480 Index

Composite, 5, 78, 154, 174, 191, 208, 209, Epstein, 60

254, 263, 267, 310 Equating, 203, 220, 231, 303
Composition, 6, 61, 69, 80, 171, 203, 204, 228, Equating operator
297, 301, 302 eqattr, 187, 189, 203, 246, 248, 253, 268,
Computational complexity, 8, 9, 21, 44, 274, 379
278, 302, 342, 364, 384, 385, 397 eqattrval, 189, 203, 246, 253, 268, 379
Computational cost, 54, 165, 338, 393, 401 eqfun, 187, 189, 190, 203
Conceptual knowledge, 33, 151 eqfunarg, 189, 190, 203
Conceptual level, 26, 27 eqfunargval, 189, 203
Conceptualize, 374, 402 eqfuncodom, 189, 203
Concrete level, 36, 57 eqob, 186, 187, 203, 249, 254, 266, 400
Concretion, 12, 13, 172, 236 eqobj, 186, 187, 203, 247, 249, 254, 266,
Connected, 268 400
Constraint, 29, 34, 50, 52, 58–60, 221, 297, eqrel, 187, 189, 190, 203
322, 338, 366, 385–387, 392, 395–397 eqrelarg, 189, 190, 203
Construction operator, 401 eqrelargval, 189, 203
Continuity, 39, 267 eqtype, 187, 188, 192, 203
Coradeschi, 61 Equivalence, 53
Cork, 338 Escher, 10
Cross, 84, 256 EuropeanCar, 261, 262
Curve, 383, 401 Euzenat, 22, 37, 267

D F
de-abstracting, 236, 266 Faltings, 59
de-abstraction, 12 Felner, 58
Deductive, 68 Ferrari, 22, 23, 43
Deleting arguments, 264 Fialetti, 10
Dennet, 18 Fikes, 55
Density, 59, 289 Filter, 279, 280, 283, 288, 289, 326
Design pattern, 8, 217–219 Fisher, 25
Desirable property, 57, 63 Flat, 303
Detail. See Level of detail Floridi, 4, 9, 16, 71–74, 76–78, 165, 236, 237,
Diagnostic, 35, 364–366, 371 244, 245, 247, 282
Dietterich, 276, 295, 297, 301 FOL, 51, 53, 155, 273–275
Discrete, 20, 61, 143, 144, 170, 235, 277, 330, Forgetron, 283
345, 352, 365 Forward, 26, 34, 231, 247, 289, 293, 336, 368,
Discretization, 60, 189, 201, 232, 273, 274, 390
278, 285, 286, 303, 326, 410 Fourier, 45, 288
Discretize, 253, 274, 285, 286 Fractal, 45, 269, 353
Distance, 9, 32, 36, 37, 47, 66, 268, 333, 337, Fragment, 332, 366, 369
340, 344, 345, 350, 353, 376, 386, 394, Frame induction, 386
395 Francisco, 251, 252
Domain, 60, 364, 365, 396, 397, 400 Frege, 14–16, 20–22, 70
Dorat, 335 Frege-Russel, 22
Downward, 25, 57, 63, 400 Fringe, 288
Drastal, 273 Frixione, 42
Function approximation, 297, 299, 300, 324
Fuzzy sets, 55, 60, 253
E
Edge complexity, 332
Embedded, 279, 280, 283, 288, 289, 326 G
Emmert-Streib, 279, 280, 283, 288, 289, 326, Gabbrielli, 25
336, 345 Gaglio, 42
Index 481

Galassi, 62 hfunargval, 183, 203

GAMA, 62 hfuncodom, 183, 203
Game, 63, 302, 323 hobj, 181, 203, 205–208, 214–216, 224,
Ganascia, 274 225, 266, 355, 356, 377, 378
Generalization, 1, 4, 7–9, 12–14, 16–19, hrel, 181, 183–185, 203, 263
21–23, 26, 32–34, 63, 79, 81–84, 86, hrelarg, 183–185, 203, 263
164, 168, 256, 258, 266, 268, 269, 273, hrelargval, 183, 203
274, 291, 296, 299, 302, 373, 374, 378, htype, 181, 203
393, 408 Hiding part, 294
Gentner, 391, 392 Hierarchic operator
Gestalt theory, 37–39, 267 hierattr, 191, 192, 203, 317, 318
Ghezzi, 23 hierattrval, 191, 192, 203, 317, 318
Ghidini, 165, 185–187, 263 hierfun, 191, 203
Gibbs, 163, 350 hierfuncodom, 191, 203
Girard, 35 hierrel, 191, 203, 263
Girvan, 338 hiertype, 191, 192, 203, 228, 241, 259, 262
Giunchigla, 264 Hierarchical, 4, 5, 55–57, 59, 61, 62, 77, 84,
Giunchiglia, 50, 52, 66–69, 74, 165, 171, 86, 192, 282, 299, 301, 302, 334, 335,
185–187, 204, 227, 234, 239, 263, 264 341, 363, 364, 385–387, 395, 397
Givan, 299 Hierarchies of macro, 301
GoF, 217 Hierarchy, 4–7, 9, 16–18, 25, 27, 56, 57, 59,
Goldstone, 1, 33, 151, 266, 407 63, 66, 69, 81, 165, 171, 239, 241, 256,
Gordon-Scantleburry, 332 258, 261, 263, 277, 297, 301, 303,
Gortais, 28 317–319, 335, 384, 385, 394, 395, 403
Granularity, 22, 36–38, 51, 53, 61, 71, 72, 77, Higgs, 236
78, 187, 247, 267, 304, 336, 344, 351, HIGHPOINT, 57
364, 395 Hilbert, 23
Grassberger, 350–352 Hoarse, 79
Greeks, 18 Hobbs, 19, 36, 51, 53, 186, 244, 247, 248, 250,
Ground abstraction, 51, 264, 265 251
Grounding, 2, 3, 53, 61, 265, 387, 403 Hoffmann, 42
Grounding symbols, 387 Holte, 58, 59, 239, 270, 394, 395
Guillaume, 19 Holyoak, 392
Homomorphism, 5, 6, 60, 141, 299, 394
Human perception, 1, 37
H Hume, 13, 22, 43
Hale, 15, 22, 70, 71 Husserl, 11, 20, 21
Halford, 34
Hamiltonian, 349
Hartley, 35 I
HCI, 288, 289 ILP, 287, 309, 310
Hengst, 302 Imielinski, 21, 51, 53, 186, 223, 244, 251
Herndon, 332 Inclusion, 13, 22, 25, 74, 84
HHMM, 384–387 Inductive, 339
Hide. See Hiding Infomorphism, 265
Hiding attributes, 308 Information content, 5, 6, 10, 12, 45, 141, 201,
Hiding information, 209, 409 224, 230, 231, 270, 310
Hiding irrelevant, 277 Interchangeability. See Interchangeable
Hiding operator Interchangeable, 59, 63, 192, 395
hattr, 181–183, 203, 228, 247, 266–268, Irrelevance
358 Irrelevant. See Irrelevance
hattrval, 183, 203, 308, 358 Isomorphism, 5, 141
hfun, 181, 183, 184, 203, 263 Itkonen, 393
hfunarg, 183, 184, 203 Itti, 340
482 Index

J Matching, 257, 310, 397, 398, 401

Jaime, 389 McDonald, 61
James, 29, 35 MDP, 294–300, 302, 320, 323, 324
Japanese, 261 Measure of simplicity, 339, 354
JAVA, 218 Mellon, 389
Jonsson, 57 Metaphysical, 12
Mill, 16, 391
Miranker, 60, 396, 399
K MIRO, 273, 288
Kafura, 61 ML. See Machine Learning, 273, 286
Kahle, 333 Model refinement, 386
Kandinsky, 29 Mondrian, 30
Kanizsa, 39 Moore, 301
Kant, 13, 390 Motoda, 289
Karhunen-Loeve, 288 Mozeti, 364
Karl, 338 MRM, 63
Katsiri, 60 Muggleton, 274
Kayser, 19 Mukherji, 61
Keane, 393 Multi-resolution, 42
Khan, 292 Multiscale, 42, 334, 335
Kinoshita, 50
Kleer, 363
Knoblock, 56, 57 N
Knowledge representation, 23, 51, 60, 141, Navarro, 46
202, 273, 296, 346, 393 Nayak, 27, 50, 52, 53, 227, 244, 260–263
Koening-Robert, 35 Network, 3, 8, 21, 26, 42, 62, 63, 274, 289,
Kolmogorov complexity, 3, 340, 342–346, 300, 302, 326, 330, 333–338, 378, 384
354–357 Newton, 390, 393
Koppel, 350, 351, 356 Nishizawa, 50
Korf, 5, 6, 54, 141 Normalized complexity, 350, 357
Kullback-Liebler, 340 Norvig, 11
Kurant, 335

O
L Objective, 14, 25–27, 280
Latent Semantic Analysis, 393 Object-to-variable binding, 33, 266
Laycock, 15, 16 Odoardo, 10
Lecoutre, 59 Ofek, 58
Leicht, 338 Oliveira, 335
Levy, 27, 50, 52, 53, 227, 244, 260–263 Ompatibility, 164, 167, 175
Lindenbaum, 43 Operators
LOD. See Level of details, 45, 46, 268 approximation, 179, 185, 198, 199, 221,
Logical structure, 333, 334 222, 227, 230–232, 249, 254, 255, 267,
Low complexity, 347, 351 409
Lower approximation reformulation, 202, 232, 410
Lowry, 54 Options, 297, 301, 302
Lozano, 338

P
M Pachet, 289
Macro-actions, 303, 324 Pagels, 349
Malevich, 29 Paris, 18, 19, 52
Manifesto, 30 Paskin, 62
Mapping function, 67, 69, 248 Pawlak, 255
Index 483

Pearl, 338 replrel, 203

Perception level, 53 replrelarg, 203
Perceptive, 2, 7, 43, 50, 152, 175, 230 replrelargval, 203
Perceptual, 33–38, 42, 45, 53, 54, 61, 148, 151, repltype, 203
152, 393 Representation
Perceptual constancy, 37, 267 abstract, 6, 36, 51, 63, 171, 229, 288, 325,
Person, 79 378, 393
Phillips, 34 ground, 63
Physical abstraction, 12 space of, 410
Pierre, 30 Representational, 1, 18, 29
Piet, 30 Rising, 219
Plaisted, 50–52, 171, 185, 187, 227, 239, RL. See Reinforcement learning, 292,
263–265, 275 294–297, 299–302, 321, 323
Plane ticket, 52 Roman, 391
Planning, 3, 55–59, 141, 236, 294, 296, 299, Roscelin, 19
302 Rosen, 7, 11
Plato, 3, 12, 390 Rou, 253
Pnueli, 58 Rough sets, 60, 253, 285
Poudret, 334 Russell, 299
Predicate mapping, 51–53, 192, 239, 240, 260,
261, 263, 264, 266
Problem reformulation, 55, 395 S
PRODIGY, 56 Sabin, 59
Productivity, 33, 218, 266 Sacerdoti, 55
Provan, 364 Sachenbacher, 364
Proximity, 335 Saffiotti, 61
Sales-Pardo, 335
Sammut, 297
Q Scale, 13, 35, 42, 61, 62, 66, 277, 300, 323,
Q-learning, 302 326, 330, 333–335, 347, 349, 353, 354,
Quinlan, 305, 307 363, 372–376
Quran, 391 Scaling, 341, 351
Schmidtke, 14, 36
Schrag, 60, 396, 399
R Schwin
Rapid process of abstraction, 267 Schyns, 44, 267
Ravasz, 335 Scotus, 338
Refinement, 4, 16, 55, 57, 58, 63, 76, 83, 247, Selectivity, 33, 37, 266
386, 387, 403 Semantic mapping, 50
Reformulation process Semantics, 1, 23, 27, 28, 50, 144, 157–159,
Reformulation techniques, 55, 60 161, 179, 259, 299, 301, 368, 403
Reiter, 363 Seok, 335
Relevance, 2, 3, 5, 63, 71, 78, 280, 336, 378 Shalizi, 341
Relevant. See Relevance Shape, 3, 12, 19, 34, 38, 42, 44, 81, 157–159,
Renaming, 264 161, 192, 311, 313, 316–319, 333, 336,
Rendl, 58 376, 380, 399
Replacement operator, 199, 230 Signal, 20, 42, 50, 53, 174, 176, 232, 266, 268,
Replacing operator 289, 294, 296
replattrval, 203 Simmons, 38
replfun, 199, 203 Simon, 17
replfunarg, 203 Simple complexity, 350, 359, 360
replfunargval, 203 Simple implementation, 281, 283, 286
replfuncodom, 203 Simple measure, 349, 350
replobj, 203 Simpler description, 21, 198
484 Index

Simplification, 2, 8, 9, 14, 16, 17, 19, 22, 23, Tractability, 217, 280, 286
26, 46, 71, 76, 165, 197, 231, 276, 334, Turney, 393
361, 373–378, 393, 404, 408
Simulation, 33, 46, 62, 299, 392
Skeleton, 42, 374 U
Sketch, 12, 40, 142, 387 Ullman, 44
Smith, 14, 27, 33, 79–84, 256–259, 392 Upper approximation, 255
Solving, 3, 5, 9, 17, 21, 52, 55, 59, 60, 63, 65, Upward, 57, 401
175, 177, 221, 227, 270, 294, 371, 384, Utility, 23, 243, 374
392, 395, 407–409
Sophistication, 329, 341, 350, 351, 356
Sowa V
Specialization, 13, 86, 168 Van, 154
Spectrum, 13, 202, 289, 302, 340 Visual perception, 37, 46, 266
Srivastava, 58 Vo, 33, 62
Stepp, 274 Vries, 17, 22
Stern, 11, 20, 21
Stijl, 30
Stone, 43 W
STRATA, 54, 55 Wang, 55, 402
STRIPS, 55, 56 Watanabe, 340
Struss, 364 Weigel, 59
Stuart, 391 Weight, 277, 338, 340, 394
Stylianou, 62 Westbound, 312
SUBDUE, 294 Whistler, 29
Subramanian, 55, 146 Wiemer-Hastings, 16
Subroutines, 301 Williams, 363
Summarize, 32, 66, 176, 202, 221, 296, 299, Wilson, 34
303, 345, 353, 407 Wrapper, 279, 280, 283, 288, 289, 326, 286,
Surrogate, 298 289
Sutton, 297, 301, 302 Wright, 15, 22, 70, 71
Symbol abstraction, 263
Syntactic, 18, 23, 50–53, 172, 231, 260, 264,
266, 408 Y
Syntactic mapping Ying, 60
Young, 34
Yves, 18
T
Tarskian
Tenenberg, 50–52, 57, 239, 240, 244, 264 Z
Thagard, 392 Zambon Zhang, 336, 338
Thinus, 14, 36 Zoom, 14, 36, 336
Thiran, 335 Zooming. See Zoom
Thomas, 391 Zucker, 51, 274, 387
Tibshirani, 280